arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Browse Forum Posts by Tags

Showing related tags and posts for the General Questions forum. See all tags in the site
  • Ignoring Images and CSS Files

    I want the crawler to ignore these kinds of files altogether, so I marked them as IsDisallowed in the DisallowedFileExtensions table. Should that be enough to stop them from getting crawled? As is, they show up in the console when I'm crawling. I don't know if that means they're actually...
    Posted to General Questions (Forum) by bscott on Thu, May 19 2011
  • Re: Plugin help

    Templater is a piece of code that can look at a webpage and extract the 'meat' of the page - it can look at a blog site and tell you which xpath will select the main post, the titles, or looking at a forum site, which posts are the forum posts. It basically solves a tough problem in web scraping...
    Posted to General Questions (Forum) by arachnode.net on Sun, Aug 2 2009
Page 1 of 1 (2 items)
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC