arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008

Getting Started

rated by 0 users
Not Answered This post has 0 verified answers | 1 Reply | 2 Followers

Top 75 Contributor
2 Posts
mcnisiv posted on 31 May 2009 9:32 PM

First off, thanks for this solution and nice work.  I've read through the documentation and forums and would like to get some assistance in what's the best way to crawl a commercial website like www.homedepot.com to gather product info such as SKU and landing pages.  Here's what I've done but it doesn't seem to be working and the crawling stops prematurely.

1. I created a new entry into the CrawlRequests table specifying the URI (www.homedepot.com) and level 4 and only on the starting domain.

I only want to gather product info and not any images or anything else.

 

Any advice would be most useful.

 

Thanks!

All Replies

Top 10 Contributor
1,202 Posts

Thanks!  (We are very, very close on a new build as well... :))

This sounds correct.

1.) When you say the crawl is ending prematurely, what do you mean?

2.) Are there any exceptions in the Exceptions table?

3.) For restricting what you are crawling, have you found the Configuration table?

- Mike

 

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

Page 1 of 1 (2 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems