arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Browse Site by Tags

Showing related tags and posts across the entire site.
  • Handling FTP links.

    Does Arachnode handle links in the form <A href="ftp://www.xxx.com/dir/filename.zip">? I modified the tables to accept ".zip" as a file but there appears to be a problem with the "ftp:" since it not HTTP.
    Posted to General Questions by ucg on Mon, Jan 10 2011
  • Re: Setting arachnode up for rss-collection

    Just for others that read this post... when troubleshooting AN, start by looking in the database tables 'Exceptions' and 'DisallowedAbsoluteUris'. For your question: Take a look at cfg.AllowedDataTypes. This table controls what you are allowed to crawl. First step it to make sure that...
    Posted to General Questions by arachnode.net on Thu, Aug 20 2009
  • Re: Exception: The remote server returned an error: (406) Not Acceptable.

    Another condition that can occur that throw a 406 is when you ask for file such as a GIF : But we actually get a JAVASCRIPT back, which be default, isn't allowed. The reason for the bait and switch could be tracking, or to save bandwidth for relatively unfamiliar crawlers, or due to a coding error...
    Posted to Bug Reports by arachnode.net on Tue, Aug 18 2009
  • Re: Exception: The remote server returned an error: (406) Not Acceptable.

    This is due to certain WebServers not handling the HttpRequest header 'Accept-Types' properly. You really are going to make me fix all of my little things, huh? I'll take a look... Just for a little background on why you may get this error. Arachnode.net uses a very specific set of configurables...
    Posted to Bug Reports by arachnode.net on Mon, Aug 17 2009
  • Re: Partial crawling

    You are very welcome. It can, but indexing is currently limted (from the lucene.net side, not the SQL FTI side) to text. Check the table AllowedDataTypes.
    Posted to General Questions by arachnode.net on Thu, May 7 2009
Page 1 of 2 (40 items) 1 2 Next >
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC