arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Browse Site by Tags

Showing related tags and posts across the entire site.
  • Re: How do I disallow certain parts of a site from a crawl

    You can modify that table. The better way to filter your crawl would be to perform these steps: Add your DisallowedDirectories as 'Words' in the 'cfg.DisallowedWords' database table, and ensure that the CrawlRule AbsoluteUri.cs is enabled. You will know if a CrawlAction, CrawlRule or...
    Posted to General Questions by arachnode.net on Sat, Jul 17 2010
  • Re: Problem downloading from php sites

    It doesn't look like your page ' http://www.napoli2nord.it/aziende.php#bandi?bandi.php ' is experiencing any errors that aren't caused by your configuration. In the DisallowedAbsoluteUris table, ' http://www.napoli2nord.it/aziende.php#bandi?bandi.php ' isn't listed. Unless...
    Posted to General Questions by arachnode.net on Sat, Mar 27 2010
  • Re: rewrite URL

    [quote user="megetron"] So what you are saying is that I will have to manually detect the fake absoluite uris? does arachnode.net holds the original uris? before the rewrite url action? [/quote] No. Check out AbsoluteUri.cs. This CrawlRule takes care of QueryStrings and NamedAnchors. You just...
    Posted to General Questions by arachnode.net on Sun, Aug 9 2009
Page 1 of 1 (20 items)
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC