Basically I am looking to extract the web data, from some of the publicly available websites. Web Data means, I want to get all the data available in a portal (Ex: I want to extract all the Digital Camera Makes, Models, Features and Prices from Amazon.com. I want to extract the dynamic data , that gets generated on click of search). Can i use Arachnode for this purpose ?
Absolutely. I crawl several major sites with this exact purpose.
For best service when you require assistance:
We are using the trial version of Arachnode.Net, just to see whether it matches our above said requirements. Using the trial version we are able to crawl and download the static html pages for a particular website, but it fails to gather dynamic data (which is typically the search results of a product page of an e-commerce website).
Is the algorithm for discovering dynamic pages a part of Arachnode code (Licensed version) ?
If Yes, do you have any Write Up/Demo to guide us in using Arachnode.Net to extract dynamic data ?
We are in the final stages of evaluating your software for purchase.
I haven't made an official write-up since you are the second person to seriously inquire about dynamic rendering.
Here's how it works:
Enable the Renderers in the Crawler constructor, second parameter.
Then, when you create a CrawlRequest, enable Dynamic rendering for the CR and its children.
To note, you probably won't be able to hover over parameters from the CrawlRequest.HtmlDocument, due to the logic work that I have done to allow mulitple threads to independantly use the IE rendering engine. (You're not really supposed to be able to do this...) But, you can assign variables from the HtmlDocument property and this will work. (string innerHtml = crawlRequest.HtmlDocument.Body.InnerHtml, etc.)
When we set the enableRenderers parameter to true and set the RenderType to Dynamic, the crawler keeps on running (till I stopped it manually after 20 mins). But, the Files, Images and WebPages folders are empty. Could you please let us know what we are doing wrong after analysing the code snippets pasted below?
Check your ApplicationSettings class, or cfg.Configuration. You likley switched build flavors and your Files, etc. are in the \Demo folder.
We have not made any changes either to ApplicationSettings Class or cfg.Configuration Table. Is it possible for you to assist us through skype to setup arachnode for crawling dynamic pages? If yes, please let us know your available time and skype id by sending an email to [email protected]. This will help us to quickly decide on purchase of your product.
I will make a video tonight using the demo code showing you how to crawl Dynamically. I do not have Skype.
Please provide me with an AbsoluteUri that contains dynamic content. (e.g. http://amazon.com/etc.)
Thank you for the reply.
You can show us a demo on this Website http://www.carsales.com.au/
I wouldn't use Dynamic (AJAX and forms submission) for this webpage. The Dynamic modes for AN are for use in 1.) submitting form variables to a website and 2.) rendering AJAX content.
I would learn how to create query strings and submit direct requests for data.
Like this: http://www.carsales.com.au/all-cars/results.aspx?PriceTo=442&Ntt=red&tsrc=allcarhome&keywords=red&N=1216+1246+1247+1252+1282+4294967249+461+442&PriceFrom=461&Ntk=CarAll&Dx=mode+matchany&Nne=15&Ntx=mode+matchallpartial&D=red
You can change a portion of the query string to the correct parameters, and vary your searches this way.
So, if you have the query strings available, don't use the Dynamic mode. Unless, of course, you can't figure it out, but this scheme doesn't look that difficult.
Been an exhausting day, today. I do recommend that you do not use the IE DOM for this site, but if you do have another one that may be more appropriate I can take a look and give you my recommendations, likely Saturday morning.
You can try any of these websites given below. One more thing, I could not understand your statement "I do recommend that you do not use IE DOM for this site". I also did not get under what context you suggested this. Really sorry.
You can try using the below mentioned web sites. Also I did not understand your statement " I do recommend that you do not use the IE DOM for this site". Also under which context are you suggesting us this ? Sorry for my lack of knowledge.