arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Help Needed

rated by 0 users
Answered (Verified) This post has 1 verified answer | 3 Replies | 2 Followers

Top 25 Contributor
Male
20 Posts
vishal posted on Tue, Oct 6 2009 3:37 PM

Hi,

I would greatly appreciate any help, I am new to AN, i managed to get it up and running, It was actually not bad, just 4 easy steps and it's up and running, now here is what i am trying to achieve, 

1) I have a list of web sites approximately 200 sites (these are job sites, job aggregators, companies having job posting pages etc)

2) I want to use AN to crawl each site and extract the content crawled and store it in db, because each job would have some title (hopefully) tag the result appropriately, for example a software engineer job with skillset as C#, ASP.net may be tagged as "Software Engineer", "ASP.net", "Developer", "C#", "May be job location"

3) Repeat this process every 2 days and update the database

4) while crawling if the job posts have email addresses, phone numbers, web addresses then store them separately but link them to crawled content

5) AN to run as a service

6) Afterwards I want to put a WebApp that shows the results on a web page based on user entered criteria ran over the crawled results, 

I know this is quite a lot to ask, I am also trying to get it going myself but it would be much quicker if I can get a helping hand on this.

With Regards,

Vishal

[email protected]

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

You see code in Program.cs where you submit CrawlRequests?  Create a CrawlRequest for each site.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

Vishal -

This IS a lot to ask.  This is basically building an application for you.  If you would like to contract me to do the work we can talk.

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 25 Contributor
Male
20 Posts
vishal replied on Wed, Oct 7 2009 11:40 AM

please help me set it up to crawl for multiple sites

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

You see code in Program.cs where you submit CrawlRequests?  Create a CrawlRequest for each site.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (4 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC