arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE
Does arachnode.net scale? | VS2008/2010/2012 & SQL2008/2012 | Download the latest release

Crawlrequests not being crawled, saved in dbo.CrawlRequests instead

rated by 0 users
Not Answered This post has 0 verified answers | 1 Reply | 2 Followers

Top 25 Contributor
16 Posts
sebastian_h posted on Wed, Feb 2 2011 6:40 AM

Hi Mike,

 

I've been trying to automate the spider program, and I came across an error whereby after injecting two URLs to be crawled and the engine starts, the crawlrequests do not get processed by the crawl threads and instead get saved to the dbo.CrawlRequest table. Arachnode then exits. I've saved this output to a text file:

 

8468.x.txt

 

This was caused by me commenting out a Console.ReadLine() in the following block of code:

----------------------------------------------------------------------------------------------------------------------------------

                    _stopwatch.Start();

                    //add all CrawlRequests before starting the Engine...
                    _crawler.Engine.Start();
                }
            }
            catch (System.Exception exception)
            {
                System.Console.WriteLine(exception.Message);
                System.Console.WriteLine(exception.StackTrace);
            }

            //necessary for the Rendering functionality.
            //if you have intantiated the Crawler using: _crawler = new Crawler(false);, then this section may be commented.
            //while (!_hasCrawlCompleted)
            //{
            //    Application.DoEvents();
            //}

            System.Console.ReadLine();

            if (_crawler != null && _crawler.Engine != null)
            {
                _crawler.Engine.Stop();
            }


            //if you would like to view Files and Images when running the Web project, see here: http://arachnode.net/forums/p/1027/12031.aspx
        }

--------------------------------------------------------------------------------------------------------------------------------------

 

I'm just wondering if the ReadLine is necessary and why commenting it out is causing the crawler to not process crawlrequests. I'd like to comment it out as I'd like to run crawls completely devoid of user input and wholly dependent on config entries in a database.

 

Thanks.

 

Sebastian

All Replies

Top 10 Contributor
1,692 Posts

OK.  This isn't an error but pauses the console so you can read crawler output at the end of the crawl.

Create a variable such as 'bool IsCrawling' and set to true when you start the Crawler.  In the OnCrawlComplete method, set to false, and while(IsCrawling) sleep.

Thanks,
Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE

copyright 2004-2013, arachnode.net LLC