arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008

questions

rated by 0 users
Answered (Verified) This post has 1 verified answer | 6 Replies | 2 Followers

posted on 25 Jan 2010 11:37 AM

Hi, this is a professional solid work :)

My punch of questions are, if I want to deploy it on multiple servers, from where I can start and how many servers I can use to maximize its performance, and what are the components that should be on each server? and do you know a place like website directories where we can find servers and domains to add them to the crawler database?

Thank you :)

Answered (Verified) Verified Answer

Top 10 Contributor
1,202 Posts

Important: The biggest limiting factor in AN, using the default configuration, is the speed of your database disks.

That said, how AN performs depends on what you have turned on.  Big Smile

That said, if you aren't taxing the DB, the biggest limiting factor very well may be your internet connection and connection H/W... specifically the number of simultaneous connections you can make.

AN currently supports one DB machine, but multiple crawlers and can distribute each of the DownloadedImages/DownloadedFiles/DownloadedWebPages directories across any number of servers, provided you use DFS or any other FS clustering technology.

So, crawl code (the solution files) go on the crawling machines, and the DB is restored to the DB server.

You could have three additional machine that do nothing other than provide file shares for the Discoveries (Files, Images, WebPages), thereby offloading this work from, say, the DB server.

Again, the balance of resources will depend on what you want to crawl... (wouldn't make sense to have a killer DB machine if you aren't storing tons of data...)

Does this answer your question?

You can check http://directory.google.com/ for sites to crawl.  AN comes pre-configured with about 1 million Priorities for WebPages, to crawl by priority, of course.

If you purchase a license(s), I am more than happy to help you set AN up across multiple machines.

 

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

All Replies

Top 10 Contributor
1,202 Posts

Thank you!

Would you register so I know who you are, please?  This question is a bit involved, and if you register you will be notified when the thread is updated.

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

Top 150 Contributor
1 Posts

Hi,

I registered in the forums, and upon your request I will ask my questions again :)

If I want to deploy it on multiple servers, from where I can start and how many servers I can use to maximize its performance, and what are the components that should be on each server? and do you know a place like website directories where we can find servers and domains to add them to the crawler database? Thank you :)

Top 10 Contributor
1,202 Posts

Important: The biggest limiting factor in AN, using the default configuration, is the speed of your database disks.

That said, how AN performs depends on what you have turned on.  Big Smile

That said, if you aren't taxing the DB, the biggest limiting factor very well may be your internet connection and connection H/W... specifically the number of simultaneous connections you can make.

AN currently supports one DB machine, but multiple crawlers and can distribute each of the DownloadedImages/DownloadedFiles/DownloadedWebPages directories across any number of servers, provided you use DFS or any other FS clustering technology.

So, crawl code (the solution files) go on the crawling machines, and the DB is restored to the DB server.

You could have three additional machine that do nothing other than provide file shares for the Discoveries (Files, Images, WebPages), thereby offloading this work from, say, the DB server.

Again, the balance of resources will depend on what you want to crawl... (wouldn't make sense to have a killer DB machine if you aren't storing tons of data...)

Does this answer your question?

You can check http://directory.google.com/ for sites to crawl.  AN comes pre-configured with about 1 million Priorities for WebPages, to crawl by priority, of course.

If you purchase a license(s), I am more than happy to help you set AN up across multiple machines.

 

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

Thank you for your wow reply, how can i purchase AN 1.4, and BTW, the buy link doesn't work!! should I browse the site using firefox? and how much will it cost?

Top 10 Contributor
1,202 Posts

You are very welcome! 

Try this direct link: https://checkout.google.com/view/buy?o=shoppingcart&shoppingcart=973929896308267

Try using Firefox.  (Really surprising that the Google checkout link doesn't show up...)

Which version depends on how you will use it.  Commercial/Personal.

Question: Which browser/version are you using?  (Thanks so much for telling me...)

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

Top 10 Contributor
1,202 Posts

Also, v2.0 should be out today.

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

Page 1 of 1 (7 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems