arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Arachnode evaluation for change detection

rated by 0 users
Answered (Verified) This post has 1 verified answer | 2 Replies | 3 Followers

Top 150 Contributor
2 Posts
Roberto Ricco posted on Wed, Oct 14 2009 7:17 AM

Good afternoon,

my name is Ing. Roberto Riccò and I’m a senior developer at Software Technologies S.r.l. (Italy).

Our company is a software development company with main three directions of services:

- design & develop high quality software systems

- data integration

- reporting / business intelligence

We are evaluating ARACHNODE for a "crawling" project... and we need to implement these 3 functionalities:

1) Crawl daily about 10000 URLs and all linked pages and documents at a defined depth from internet, starting from a static URL list and eventually a set of additional keywords.

2) Categorize and store downloaded data (URLs, documents etc.) in a DB.

3) Detect changes between a previous copy of all pages and documents and the downloaded ones.

Could we use Arachnode's features to realize exactly these steps?

We would appreciate a positive reply, in this case please contact me at [email protected]

Thanks in advance.

Best regards

Roberto

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Answered (Verified) arachnode.net replied on Wed, Oct 14 2009 8:25 AM
Verified by arachnode.net

Roberto -

The answer is a resounding "YES".  arachnode.net supports everything you want to do right out of the box.

1.) AN can crawl over a million pages a day on a modest system.  Be sure you have a fast disk array.

2.) Yes.  Yes.  Yes. This is one of AN's strongest points.  All data is split and stored by DiscoveryType and associated by foreign keys.

3.) Yes.  All WebPages have datechanged fields.

-Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts
Answered (Verified) arachnode.net replied on Wed, Oct 14 2009 8:25 AM
Verified by arachnode.net

Roberto -

The answer is a resounding "YES".  arachnode.net supports everything you want to do right out of the box.

1.) AN can crawl over a million pages a day on a modest system.  Be sure you have a fast disk array.

2.) Yes.  Yes.  Yes. This is one of AN's strongest points.  All data is split and stored by DiscoveryType and associated by foreign keys.

3.) Yes.  All WebPages have datechanged fields.

-Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 25 Contributor
16 Posts

why did no body show his website based on AN in this forum? :(

 

Page 1 of 1 (3 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC