arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008
IT Professionals & Windows Deployment Professionals: SmartDeploy Enterprise is the first hardware-independent imaging toolset that uses boot time driver-injection, simplifying deployment and easing distribution by reducing total image count. [LINK]

Rookie's question

rated by 0 users
Answered (Verified) This post has 1 verified answer | 1 Reply | 2 Followers

Top 75 Contributor
3 Posts
sagie.shamay posted on 1 Jun 2009 7:50 AM

Hi.

I've tried to figure it out, but I think it will be better to ask here:

What is the difference between a Table and its discovery table? (e.g. what is the diffetence between Images table and Images_Discoveries table?)

To make things more general, what is the purpose of a discovery in the crawler?

Thanks, Sagie

 

Answered (Verified) Verified Answer

Top 10 Contributor
1,244 Posts

A Discovery is anything the crawler can discover.  Check the database table 'DiscoveryTypes' for the full list.

[Discovery]_Discoveries is used to store those 'me too' references.  A billion pages will point to 'http://google.com', but we only store the string 'http://google.com' once, and then a billion integer references to the 'Discovery'.

Always glad to help!
Mike

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Page 1 of 1 (2 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems