Currently I am using the developer license and am really looking for a way to store our data directly into a sql database. I was searching and I saw that the commercial license does this exact thing. However before I switch I am wondering how the data gets stored. Or more specifically what page schema does the html go into? Does arachnode simple grab all of the html data and throw it into a "text:field" or does it do some parsing and look for phone number / email address / business names etc.
I am more wondering what exact benefits over the current database + lucene index I would get with the commercial license. What I need is a way to parse specific tags from my crawled web pages. Please advice specially how the web page data gets stored into the sql database based upon the commercial license. Thanks
PS: This is a repeat question from the forums. Didn't know if that mattered
Which forum post were you referring to regarding automatic feature extraction?
AN doesn't perform automatic feature extraction, although you can easily customize the crawl pipeline with a plugin.
If you were to upgrade to the Commercial license, I would be more than happy to show you how to parse and insert results according to your business requirements.
For best service when you require assistance:
I don't know why this is labeled as anonymous. I am user "drodecker" A couple question...how would you be able to help us with the parsing and getting our correct data. Would that be through email, conference call, video? Also I know I have seen on other forums that you can be hired and wondering what your rates are or what requirements you need. Let me know if you need more requirements. Basically what we need is to get phone numbers, business names,address, categories from a URL and have them be stored in a MS-SQL database. The URL is similar to this one: http://www.merchantcircle.com/business/A.000.Bail.Bonds.410-560-0777 but starting from this page: http://www.merchantcircle.com/featured/ We have been able to scrape all the information but haven't parsed it at all and are looking for a fast solution which is why I am reaching out to you. Thanks for all your help so far and please advise in the best situation.
I don't know who this is anonymous...I keep trying to login. I am user "drodecker". I am wondering how you would be able to help us. Through email, conference call, videos etc? I also say on your website and through forums that you will do "contract"work for people. I am also wondering how much your rates would be and how much it would cost for our requirements. What we are trying to do is get: 1) Business name 2) Address 3) Phone Number 4) Categories from pages such as this "http://www.merchantcircle.com/business/A.000.Bail.Bonds.410-560-0777" All of these pages have a similar "site map" page which is here: "http://www.merchantcircle.com/featured/". What we would need is to crawl all of the pages starting from the sitemap page and insert all of our "required" fields into a MS SQL database. Please advise as to our situation and how you can help. Thanks