arachnode.net v2.0
An open source .NET web crawler written in C# using SQL 2005/2008

Web Shopping Pricing Bots solution...

rated by 0 users
Answered (Verified) This post has 1 verified answer | 28 Replies | 5 Followers

Top 50 Contributor
5 Posts
demol posted on 01-28-2009 9:57 AM

Hello guys!

Congratulations, your project is incredible...

I´d like to know if is possible to do a Web Shopping Pricing site using arachnode, like http://www.pricegrabber.com/...  Is it recommended?

 

Thanks!

Answered (Verified) Verified Answer

Top 10 Contributor
Male
920 Posts
Answered (Verified) arachnode.net replied on 01-28-2009 5:41 PM
Verified by arachnode.net

The answer is yes.

If you want to crawl a specific list of domains here's what you need to do:

1.) Insert your intended Domains into the DisallowedDomains table and set the column value for 'IsDisallowed' to True.

2.) Delete all rows from the DisallowedWords table.  The words in this table are for filtering adults-only content.  Since you know you want to crawl specific sites we can remove them.  And, since we'll need to negate the Address CrawlRule, we need to delete these rules or else we'll only get content from PriceGrabber.com that is adults-only content, which will likely be 2 pages.  (Yes, it's possible to crawl only adults-only content...)

3.) Set the value for negateIsDisallowed in the Address CrawlRule in CrawlActions.config to True.

4.) Insert your starting domains into the CrawlRequests table.

5.) Start crawling.

Then, slice and dice the imcoming data however you please.  Do you need additional information?

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

All Replies

Top 10 Contributor
Male
920 Posts
arachnode.net replied on 01-28-2009 11:33 AM

Thanks!  We're always looking for people to use our code and help make it better, so, please do!!!

I am super swamped with work and I will answer your question as best I can.

In the time before I can break away from work to best answer your questions, tell me:  What exactly are you wanting to do with a web shopping pricing site?

-Mike

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 50 Contributor
5 Posts

I´d like to do a site to compare products prices .... So I need a Bot to get the prices/products over several sites.

Something similar : http://www.pricegrabber.com

Thanks for the help

 

Top 10 Contributor
Male
920 Posts
Answered (Verified) arachnode.net replied on 01-28-2009 5:41 PM
Verified by arachnode.net

The answer is yes.

If you want to crawl a specific list of domains here's what you need to do:

1.) Insert your intended Domains into the DisallowedDomains table and set the column value for 'IsDisallowed' to True.

2.) Delete all rows from the DisallowedWords table.  The words in this table are for filtering adults-only content.  Since you know you want to crawl specific sites we can remove them.  And, since we'll need to negate the Address CrawlRule, we need to delete these rules or else we'll only get content from PriceGrabber.com that is adults-only content, which will likely be 2 pages.  (Yes, it's possible to crawl only adults-only content...)

3.) Set the value for negateIsDisallowed in the Address CrawlRule in CrawlActions.config to True.

4.) Insert your starting domains into the CrawlRequests table.

5.) Start crawling.

Then, slice and dice the imcoming data however you please.  Do you need additional information?

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 50 Contributor
5 Posts

WOW

Thats amazing!

Thanks man!!!!!!!!!!!

I´ll work around that... If I need something, I´ll ask you...

:)

 

Top 10 Contributor
Male
920 Posts
arachnode.net replied on 01-29-2009 9:08 AM

Please do.  I'm working on a few enhancements as requested by other users, so please report any issues you find, etc.

You're welcome,
Mike

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 50 Contributor
5 Posts

Hello...

I´m starting to use arachnode... Just one initial question... Is it possible to use VS2008? Any major problem?

thx

Top 10 Contributor
Male
920 Posts
arachnode.net replied on 02-04-2009 9:35 AM

Great! 

There shouldn't be any problems using the complete solution if you install SQL 2008 as well.  SQL 2008 is required to use the Analysis Services and Integration Services projects.  Let me know what you find, if you can't use the Functions project, etc.

Heads up: If you're planning on using the lucene.net indexing functionality, be sure to get the latest from the SVN repository.  I've been working on this code and optimizing it a great deal.  It's worth your while to get the latest over the tag-1.0 version.

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 50 Contributor
5 Posts

Thanks guy !

But I´m a bit confused with arachnode...

I´m able to use Console example and create a lucene indexes...

But, I dont know how to start to solve my problem...

If you could help-me with some samples would be great. Maybe a lillte test that get the NAME and PRICES products  from www.buy.com and show it on other page...

Any help is aprecciate... Thanks !

Top 10 Contributor
Male
920 Posts

Gotcha.

1.) How much coding experience do you have?

2.) How much experience do you have with regular expressions?

3.) Have you taken a look at ManageLuceneDotNetIndexes.cs?  You'll want to write a Plugin for arachnode.net that will strip out the information you require.

Take a look and get back to me.

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 50 Contributor
5 Posts

Hi,

I am brand new to arachnode and have a few questions which I thought I could put in this thread. I have followed this guide and http://arachnode.net/forums/t/44.aspx, but i recieve an error, when trying to run the application (compiling works fine):

Error: Cannot deploy. There is no database connection specified. To correct this error, add a database connection using the project properties.

In the function properties I have added the database and tested the connection works properly, but what have I missed?

Thanks in advance

Top 10 Contributor
Male
920 Posts

Hi there!

http://arachnode.net/forums/t/94.aspx

http://arachnode.net/forums/t/44.aspx

Do either of these threads help?

Are there connection strings present for the DataSource project in Properties > Settings...?

-Mike

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 50 Contributor
5 Posts

I found out that I forgot the types config database. But now I receive a different error in the types project:

  The type 'Domain' already exists, or you do not have permission to create it.    Types

One thing I do not understand is why you have different SqlConnection objects, why not have a global which you reuses in the different objects, the other seem to be double work in my opinion?

Top 10 Contributor
Male
920 Posts

You can safely remove the Types project from the Solution.

Multiple connection strings are an oversight on my part.  There will be one ConnectionString location in the next release.

Thanks!

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 50 Contributor
5 Posts

Thanks, but just created a new error:

(null reference)

"Value cannot be null.\r\nParameter name: value"

The stack trace is:

   at System.Boolean.Parse(String value)
   at Arachnode.Configuration.ApplicationSettings.get_ClassifyAbsoluteUris() in I:\Archanode\source\Configuration\ApplicationSettings.cs:line 122
   at Arachnode.Console.Program..cctor() in I:\Archanode\source\Console\Program.cs:line 24

Page 1 of 2 (29 items) 1 2 Next > | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2009, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems