arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE
Does arachnode.net scale? | VS2008/2010/2012 & SQL2008/2012 | Download the latest release

How do I keep AN.Next alive and ready for new CrawlRequests after a Crawl has completed...

rated by 0 users
Answered (Not Verified) This post has 0 verified answers | 0 Replies | 1 Follower

Top 10 Contributor
1,696 Posts
arachnode.net posted on Fri, Mar 23 2012 9:49 AM

Let's say you'd like to keep AN.Next alive and waiting for new CrawlRequests, but don't want to instantiate a new Crawler each time your calling application has something for AN.Next to do...

Look in CoreConfiguration.xml.

  <KeepCrawlerAlive value="true" />

 

Look in Crawler.cs.

        /// Keeps the Crawler threads active even when there are no CrawlRequests to crawl.

        /// </summary>

        public bool KeepCrawlerAlive { get; set; }

 

if (!areAnyThreadsCrawling)

                    {

                        if (CrawlRequestsToCrawl.Count == 0 && !KeepCrawlerAlive)

                        {

                            break;

                        }

                    }

 

Also, if you elect to use the Renderers, like in AN 2.6, by enabling the following Plugin...

            //_crawler.CrawlRequestPlugins += crawlRequestPlugins.DownloadDataAndDecodeHtml;

            _crawler.CrawlRequestPlugins += crawlRequestPlugins.DownloadDataRenderAndDecodeHtml;

Look in Program.cs for the following code:

 

/**/

            //necessary for the Rendering functionality if you have enabled the Plugin 'DownloadDataRenderAndDecodeHtml'

            //if (_crawler != null)

            //{

            //    //may be null if all configuration settings are not initialized in the database

            //    while (!_hasCrawlCompleted)

            //    {

            //        Application.DoEvents();

            //    }

            //}

            /**/

 

You'll need to uncomment this to prevent a COM context switching exception that may occur on longer running crawls.

 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (1 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE

copyright 2004-2013, arachnode.net LLC