Hi Mike,
I've been trying to automate the spider program, and I came across an error whereby after injecting two URLs to be crawled and the engine starts, the crawlrequests do not get processed by the crawl threads and instead get saved to the dbo.CrawlRequest table. Arachnode then exits. I've saved this output to a text file:
8468.x.txt
This was caused by me commenting out a Console.ReadLine() in the following block of code:
----------------------------------------------------------------------------------------------------------------------------------
_stopwatch.Start(); //add all CrawlRequests before starting the Engine... _crawler.Engine.Start(); } } catch (System.Exception exception) { System.Console.WriteLine(exception.Message); System.Console.WriteLine(exception.StackTrace); } //necessary for the Rendering functionality. //if you have intantiated the Crawler using: _crawler = new Crawler(false);, then this section may be commented. //while (!_hasCrawlCompleted) //{ // Application.DoEvents(); //} System.Console.ReadLine(); if (_crawler != null && _crawler.Engine != null) { _crawler.Engine.Stop(); } //if you would like to view Files and Images when running the Web project, see here: http://arachnode.net/forums/p/1027/12031.aspx }
--------------------------------------------------------------------------------------------------------------------------------------
I'm just wondering if the ReadLine is necessary and why commenting it out is causing the crawler to not process crawlrequests. I'd like to comment it out as I'd like to run crawls completely devoid of user input and wholly dependent on config entries in a database.
Thanks.
Sebastian
OK. This isn't an error but pauses the console so you can read crawler output at the end of the crawl.
Create a variable such as 'bool IsCrawling' and set to true when you start the Crawler. In the OnCrawlComplete method, set to false, and while(IsCrawling) sleep.
Thanks,Mike
For best service when you require assistance:
Skype: arachnodedotnet