arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE
Does arachnode.net scale? | VS2008/2010/2012 & SQL2008/2012 | Download the latest release

Browse Forum Posts by Tags

Showing related tags and posts for the General Questions forum. See all tags in the site
  • Re: filter by URL

    Check out the CrawlRule AbsoluteUri.cs. Find all references to it and read the code. This rule shows how arachnode.net can parse and filter absoluteuris before and after crawling, and how to completely ignore those that don't confirm to your rules. AN does use regular expressions, but a better way...
    Posted to General Questions (Forum) by arachnode.net on Thu, Aug 6 2009
  • Re: Web Shopping Pricing Bots solution...

    Address has been merged with AbsoluteUri. The AbsoluteUri CrawlRule references the following tables: If you wanted to Crawl msn.com, delete all rows in the tables above, insert msn.com into DisallowedDomains, and set IsDisallowed=true for msn.com and set the following setting in CrawlRules.config: negateIsDisallowedForAbsoluteUri...
    Posted to General Questions (Forum) by arachnode.net on Sun, Apr 12 2009
Page 1 of 1 (2 items)
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE

copyright 2004-2013, arachnode.net LLC