arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Browse Site by Tags

Showing related tags and posts across the entire site.
  • Re: Possible Robots.txt Bug - Release 2.5

    Ok, I updated to the latest version of Arachnode from the trunk. The previously described issue is resolved in this version, as you said. The robots.txt is still not being handled properly, though. I walked through the code in RobotsDotTextManager.cs, and discovered what the problem is. It appears that...
    Posted to Bug Reports by bscott on Fri, May 13 2011
  • Possible Robots.txt Bug - Release 2.5

    Am I doing something wrong or is this a bug? What would you recommend I do? My install is from the release-2.5 tag. I realized that robots.txt wasn't working in my installation, so I walked through the code. When it's trying to read the robots.txt file, I found that on line 375 of SiteCrawler...
    Posted to Bug Reports by bscott on Wed, May 11 2011
  • Re: The remote server returned an error: (404) Not Found

    404 errors don't come from robots.txt files. Check out 'UserAgent' in the 'Configuration' database table. You are welcome.
    Posted to Bug Reports by arachnode.net on Fri, Aug 7 2009
Page 1 of 1 (20 items)
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC