arachnode.net v2.0
An open source .NET web crawler written in C# using SQL 2005/2008

Rookie's question

rated by 0 users
Answered (Verified) This post has 1 verified answer | 1 Reply | 2 Followers

Top 50 Contributor
3 Posts
sagie.shamay posted on 06-01-2009 7:50 AM

Hi.

I've tried to figure it out, but I think it will be better to ask here:

What is the difference between a Table and its discovery table? (e.g. what is the diffetence between Images table and Images_Discoveries table?)

To make things more general, what is the purpose of a discovery in the crawler?

Thanks, Sagie

 

Answered (Verified) Verified Answer

Top 10 Contributor
Male
927 Posts

A Discovery is anything the crawler can discover.  Check the database table 'DiscoveryTypes' for the full list.

[Discovery]_Discoveries is used to store those 'me too' references.  A billion pages will point to 'http://google.com', but we only store the string 'http://google.com' once, and then a billion integer references to the 'Discovery'.

Always glad to help!
Mike

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Page 1 of 1 (2 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2009, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems