arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Browse Forum Posts by Tags

Showing related tags and posts for the arachnode.net group. See all tags in the site
  • Re: How do I disallow certain parts of a site from a crawl

    You can modify that table. The better way to filter your crawl would be to perform these steps: Add your DisallowedDirectories as 'Words' in the 'cfg.DisallowedWords' database table, and ensure that the CrawlRule AbsoluteUri.cs is enabled. You will know if a CrawlAction, CrawlRule or...
    Posted to General Questions (Forum) by arachnode.net on Sat, Jul 17 2010
  • Re: Getting started: crawling multiple sites

    Looks like you have been registered for quite some time, mopiola. Which version of AN are you using? I wouldn't add CR's to the CR table, but rather add them as shown in Program.cs. This table is used by the Cache/Engine. Read this post about restricting a crawl: http://arachnode.net/forums/t...
    Posted to General Questions (Forum) by arachnode.net on Wed, Apr 28 2010
  • Re: Site with session ids / anchors

    Hmmm... I will doublecheck the cookie handing and get back to you. Question though: Are you using the latest and greatest from SVN? To make exceptions for named anchors check AbsoluteUri.cs. You can use one of the existing rules (or create a new one which is executed before AbsoluteUri.cs) and modify...
    Posted to General Questions (Forum) by arachnode.net on Tue, Apr 6 2010
  • Re: Web Shopping Pricing Bots solution...

    Address has been merged with AbsoluteUri. The AbsoluteUri CrawlRule references the following tables: If you wanted to Crawl msn.com, delete all rows in the tables above, insert msn.com into DisallowedDomains, and set IsDisallowed=true for msn.com and set the following setting in CrawlRules.config: negateIsDisallowedForAbsoluteUri...
    Posted to General Questions (Forum) by arachnode.net on Sun, Apr 12 2009
Page 1 of 1 (4 items)
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC