arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Switching from developer to commercial license? Pros?

rated by 0 users
Answered (Verified) This post has 1 verified answer | 4 Replies | 1 Follower

posted on Sun, Nov 21 2010 10:04 AM

Currently I am using the developer license and am really looking for a way to store our data directly into a sql database. I was searching and I saw that the commercial license does this exact thing. However before I switch I am wondering how the data gets stored. Or more specifically what page schema does the html go into? Does arachnode simple grab all of the html data and throw it into a "text:field" or does it do some parsing and look for phone number / email address / business names etc. I am more wondering what exact benefits over the current database + lucene index I would get with the commercial license. What I need is a way to parse specific tags from my crawled web pages. Please advice specially how the web page data gets stored into the sql database based upon the commercial license. Thanks

 

PS: This is a repeat question from the forums.  Didn't know if that mattered

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

Which forum post were you referring to regarding automatic feature extraction?

AN doesn't perform automatic feature extraction, although you can easily customize the crawl pipeline with a plugin.

If you were to upgrade to the Commercial license, I would be more than happy to show you how to parse and insert results according to your business requirements.

Thanks!
Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

Which forum post were you referring to regarding automatic feature extraction?

AN doesn't perform automatic feature extraction, although you can easily customize the crawl pipeline with a plugin.

If you were to upgrade to the Commercial license, I would be more than happy to show you how to parse and insert results according to your business requirements.

Thanks!
Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

replied on Tue, Nov 23 2010 11:12 PM

I don't know why this is labeled as anonymous.  I am user "drodecker"  A couple question...how would you be able to help us with the parsing and getting our correct data.  Would that be through email, conference call, video? Also I know I have seen on other forums that you can be hired and wondering what your rates are or what requirements you need.   Let me know if you need more requirements.  Basically what we need is to get phone numbers, business names,address, categories from a URL and have them be stored in a MS-SQL database. The URL is similar to this one: http://www.merchantcircle.com/business/A.000.Bail.Bonds.410-560-0777 but starting from this page: http://www.merchantcircle.com/featured/  We have been able to scrape all the information but haven't parsed it at all and are looking for a fast solution which is why I am reaching out to you.  Thanks for all your help so far and please advise in the best situation.

replied on Wed, Nov 24 2010 9:03 AM

I don't know who this is anonymous...I keep trying to login.  I am user "drodecker".  I am wondering how you would be able to help us.  Through email, conference call, videos etc? I also say on your website and through forums that you will do "contract"work for people.  I am also wondering how much your rates would be and how much it would cost for our requirements.  What we are trying to do is get: 1) Business name 2) Address 3) Phone Number 4) Categories from pages such as this "http://www.merchantcircle.com/business/A.000.Bail.Bonds.410-560-0777" All of these pages have a similar "site map" page which is here: "http://www.merchantcircle.com/featured/".  What we would need is to crawl all of the pages starting from the sitemap page and insert all of our "required" fields into a MS SQL database.  Please advise as to our situation and how you can help.  Thanks

replied on Wed, Nov 24 2010 9:04 AM

I don't know who this is anonymous...I keep trying to login.  I am user "drodecker".  I am wondering how you would be able to help us.  Through email, conference call, videos etc? I also say on your website and through forums that you will do "contract"work for people.  I am also wondering how much your rates would be and how much it would cost for our requirements.  What we are trying to do is get: 1) Business name 2) Address 3) Phone Number 4) Categories from pages such as this "http://www.merchantcircle.com/business/A.000.Bail.Bonds.410-560-0777" All of these pages have a similar "site map" page which is here: "http://www.merchantcircle.com/featured/".  What we would need is to crawl all of the pages starting from the sitemap page and insert all of our "required" fields into a MS SQL database.  Please advise as to our situation and how you can help.  Thanks

Page 1 of 1 (5 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC