An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does scale? | Download the latest release

Browse Forum Posts by Tags

Showing related tags and posts for the Forums application. See all tags in the site
  • Re: Is it possible to create many instances of the crawler running at the same time.

    I am making a change to the core which will allow you to adjust the CrawlActions, CrawlRules and EngineActions before Crawling, per Crawl instance, just like you can with ApplicationSettings and WebSettings.
    Posted to Feature Requests (Forum) by on Fri, May 7 2010
  • passing state into AN

    Good morning, I need to pass some data into AN for processing by a plugin. I could obviously write that data into the DB and pick it up in the CrawlAction later, but I'm already retrieving that data before calling into AN as a matter of necessity and I'd rather not be more redundant than I have...
    Posted to General Questions (Forum) by offbored on Wed, Feb 10 2010
  • Re: Extracting words: How to

    Hi there! You are right. This can be accomplished with a custom crawl action and isn't difficult... There isn't an existing CrawlAction that does what you want, but it would not be difficult to implement. Glad to hear you are hooked on I am too! Which Version are you using, by...
    Posted to Anonymous Forum (Forum) by on Fri, Nov 20 2009
  • Re: Crawl pages created or modified 30 day ago

    Let's go with what I communicated over IM. There are several ways to achieve what (I think and hope) I understand your needs to be. The switches and modifications from the post were to support a batch-style analysis - but we actually need to implement a continuous crawling mechanism, which is what...
    Posted to General Questions (Forum) by on Tue, Aug 11 2009
  • Re: Plugin help

    Templater is a piece of code that can look at a webpage and extract the 'meat' of the page - it can look at a blog site and tell you which xpath will select the main post, the titles, or looking at a forum site, which posts are the forum posts. It basically solves a tough problem in web scraping...
    Posted to General Questions (Forum) by on Sun, Aug 2 2009
  • Re: Crawling several sites with 1.2 version

    1.) You will need to write separate rules for each site, but one plugin will work. Else, how would the plugin know what information you want to pull? You can use UserDefinedFunctions.ExtractDomain or UserDefinedFunctions.ExtractHost to perform the filtering/switching. 2.) The easiest would be to Create...
    Posted to General Questions (Forum) by on Sun, Jul 26 2009
  • Re: Making Progress...but...

    Check the cfg.CrawlActions table. -Mike
    Posted to General Questions (Forum) by on Mon, Jul 13 2009
  • Re: Anonymouse Crawling

    I added an Anonymizer plugin to the branch so you can see how this would be implemented. Don't forget to check out the DB too... (Branch is a branch, but quite viable...) This code is checked into the trunk now. Mike
    Posted to Feature Requests (Forum) by on Mon, Jun 1 2009
Page 1 of 1 (8 items)
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, LLC