arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release

Renderer response speed.

rated by 0 users
Answered (Not Verified) This post has 0 verified answers | 3 Replies | 1 Follower

posted on Mon, Nov 1 2010 8:12 AM

I'm using RenderType.Dynamic which parses sites using the build in renderer (thanks!), it is however very slow--on the order of 20-30 seconds.  In particular it occurs the following lines in Arachnode.SiteCrawler.Managers.DataManager.ProcessCrawlRequest:

crawlRequest.Data = crawlRequest.Encoding.GetBytes(crawlRequest.HtmlDocument.Body.OuterHtml);
crawlRequest.DecodedHtml = crawlRequest.HtmlDocument.Body.OuterHtml;

The lines occur after the page is loaded and a diagnostic prior to this that gets and displays the document in Arachnode.Renderer.Renderer.Render2 using webBrowser1.Document.Body.InnerHtml takes only milliseconds. 

Any ideas of what could be slowing it down?

Also note that changing the above lines to:

crawlRequest.DecodedHtml = crawlRequest.HtmlDocument.Body.OuterHtml;
crawlRequest.Data = crawlRequest.Encoding.GetBytes(crawlRequest.DecodedHtml);

cuts the delay in half.

 

All Replies

Top 10 Contributor
1,905 Posts

First question: Who are you?

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

replied on Mon, Nov 1 2010 9:49 AM

Rob ([email protected]).  We are contracted by a customer using Arachnode (licensed of course) to harvest commercial data from a list of specific sites.  The data is public though not always easy to crawl, it is buried in menus and maps and occasionally populated with Ajax.

Top 10 Contributor
1,905 Posts

Using the latest from SVN, set to 1 thread and crawl Google's homepage.  (no custom code)

How long?  I have a project that crawls using the Renderers and render speed is what you would expect from browsing.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (4 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC