I'm using RenderType.Dynamic which parses sites using the build in renderer (thanks!), it is however very slow--on the order of 20-30 seconds. In particular it occurs the following lines in Arachnode.SiteCrawler.Managers.DataManager.ProcessCrawlRequest:
crawlRequest.Data = crawlRequest.Encoding.GetBytes(crawlRequest.HtmlDocument.Body.OuterHtml);crawlRequest.DecodedHtml = crawlRequest.HtmlDocument.Body.OuterHtml;
The lines occur after the page is loaded and a diagnostic prior to this that gets and displays the document in Arachnode.Renderer.Renderer.Render2 using webBrowser1.Document.Body.InnerHtml takes only milliseconds.
Any ideas of what could be slowing it down?
Also note that changing the above lines to:
crawlRequest.DecodedHtml = crawlRequest.HtmlDocument.Body.OuterHtml;crawlRequest.Data = crawlRequest.Encoding.GetBytes(crawlRequest.DecodedHtml);
cuts the delay in half.
First question: Who are you?
For best service when you require assistance:
Rob ([email protected]
Using the latest from SVN, set to 1 thread and crawl Google's homepage. (no custom code)
How long? I have a project that crawls using the Renderers and render speed is what you would expect from browsing.