arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Browse Forum Posts by Tags

Showing related tags and posts for the General Questions forum. See all tags in the site
  • Exceptions query, do my results look typical

    When I run the following query SELECT LEFT(Message, 39) AS Expr1, COUNT(ID) AS Expr2 FROM Exceptions AS Exceptions_1 GROUP BY LEFT(Message, 39) HAVING (LEFT(Message, 39) LIKE 'The remote name could not be resolved:%') UNION SELECT Message AS Expr1, COUNT(ID) AS Expr2 FROM Exceptions GROUP BY...
    Posted to General Questions (Forum) by DataMan on Fri, Apr 9 2010
  • Help Needed

    Hi, I would greatly appreciate any help, I am new to AN, i managed to get it up and running, It was actually not bad, just 4 easy steps and it's up and running, now here is what i am trying to achieve, 1) I have a list of web sites approximately 200 sites (these are job sites, job aggregators, companies...
    Posted to General Questions (Forum) by vishal on Tue, Oct 6 2009
  • Re: How to Begin my web crawler and build the Index?

    First, I have been moved so much that you are such warm-hearted to help me. Really thanks. Good news is that I have built the whole project successfully after setting the console as the starting item. I try to run the console programme on the Internet, and I change the target Uri to "www.sohu.com"...
    Posted to General Questions (Forum) by wumengge on Thu, Apr 16 2009
  • Re: New Issue "Deploying" and "Testing" with console.

    After my post to installing instructions I have manage to deploy the project on VS2008 & SQL2008. I think I don't have any problems with SQL Server and I can build the solution and run test program under Console project with no errors. But I got 2 questions: As it is commented on the test program...
    Posted to General Questions (Forum) by Ozan on Thu, Apr 2 2009
  • Crawl specific domains, take screenshot, run javascript, ID ads, write to DB ... any ideas?

    We want to capture info about AD SPOTS; - on specific list of approx 10,000 domains - capture a screenshot png / jpg - run javascript on each page (browser specific?) - read js and identify ad spot SIZES - identify PLACE on page that the adspot is located - relate this location back to the screenshot...
    Posted to General Questions (Forum) by jpntol on Sat, Feb 14 2009
  • Re: How to restrict crawl to single domain?

    OK, how about this: If you want to crawl 500 domains you would configure arachnode.net to restrict Crawls to those 500 domain only like the posts above describe how to do. Then, make sure your settings in Application.config are set as shown. The Crawl process works like this if you have the settings...
    Posted to General Questions (Forum) by arachnode.net on Fri, Feb 13 2009
Page 1 of 1 (6 items)
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC