arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release

Browse Forum Posts by Tags

Showing related tags and posts for the Forums application. See all tags in the site
  • Re: Can't turn off robots.txt

    That isn't the answer...I tried deleting that table specifically of all values right before I begin my test crawl and I still get the same "Prohibited by robots.txt" I have no data in my disallowed....(anything) that would stop me from crawling except for the robots.txt. Thanks for your...
    Posted to General Questions (Forum) by David Rodecker on Mon, Sep 27 2010
  • Can't turn off robots.txt

    I am using version 2.5.3916.23112 and I have turned the robotsdottext = 0 (not enabled) but in my "disallowedUri" table after the crawl it says "disallowed by robots.txt" I also try turning it to "=1" and that doesn't work either. Therefore I am unable to turn it off...
    Posted to General Questions (Forum) by David Rodecker on Sun, Sep 26 2010
  • Help Needed

    Hi, I would greatly appreciate any help, I am new to AN, i managed to get it up and running, It was actually not bad, just 4 easy steps and it's up and running, now here is what i am trying to achieve, 1) I have a list of web sites approximately 200 sites (these are job sites, job aggregators, companies...
    Posted to General Questions (Forum) by vishal on Tue, Oct 6 2009
  • Re: crawling specific web sites for tag words

    Hey - I'm back from my mini-vacation to the Washington coast. The site was disallowed as, by default, arachnode.net follows robots.txt rules. If you want to turn off the robots.txt behavior, check the 'CrawlRules' table, find the robots.txt rule and turn it off. No worries on being new to...
    Posted to General Questions (Forum) by arachnode.net on Thu, Aug 6 2009
  • Re: no robots.txt

    arachnode.net will crawl the site if no robots.txt file is present. We have a build of 1.1 in review right now. The latest check in should be a viable check-in if you're running a release from Sourgeforge or Codeplex.
    Posted to General Questions (Forum) by arachnode.net on Mon, Mar 16 2009
  • no robots.txt

    Hi, When there is no robots.txt, will arachnode crawl the page? I've got an exception 'no robots.txt' and there are no urls showing up. Thanks! Roel
    Posted to General Questions (Forum) by Roel on Mon, Mar 16 2009
Page 1 of 1 (6 items)
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC