arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Crawl Freezes In the Middle of Crawl

rated by 0 users
Answered (Verified) This post has 1 verified answer | 17 Replies | 1 Follower

Top 10 Contributor
35 Posts
flash posted on Sat, Jan 7 2012 11:48 PM

Hi,

I have been running the crawl for a year now on a 4 sites.

Recently the crawls seems to be freezing in the middle of the crawl on all sites.
This issue first started occurring on one of the sites and now all are "Infected".
I have tried running the Crawler on the console application for testing, but the issue is not resolved.

I get no exceptions nor errors, The window just stops updating (no new lines of data appear) while the console window is still alive (I can sent "Enter" manually).

Thank you in advance,
Runny

 

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

My instincts tell me you are being throttled.

Your version doesn't contain a lot of the auto-throttling functionality that is in the later versions.  AN 2.6 will detect how fast the webservers want to serve you data and will slow itself so you don't get blocked, which sounds like what is happening with your crawls.

Another thing to look at is, how much data do you have in your DB's?  Any chance you are overrunning what you machine(s) can handle?

Make sure you aren't running into any of the conditions here: http://arachnode.net/blogs/arachnode_net/archive/2010/08/27/memory-conditions-to-avoid.aspx

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

When was the last time you updated your source? 

I haven't seen AN freezing or heard of it freezing in two years.  ???

Any logging/specific information you could provide would help a lot.  Big Smile

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
1,905 Posts

Also, depending on how aggressively you are crawling, you may be blocked by the websites, your ISP, or in intermediary.

Some home routers allow a maximum number of outbound connections as well.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
35 Posts
flash replied on Tue, Jan 10 2012 5:03 AM

Thank you for your response.

The issue started two month ago on one of the 4 sites we are crawling and escalating ever since.

Maybe this will help :
On the site that started the issue sites we disabled crawling for about a month.Than we reran it. strangely enough, few new WebPages were found before it froze again.

Anyway,
Are there any configuration changes you might suggest we could do in order to debug this issue ?

Thanks,
Runny

Top 10 Contributor
35 Posts
flash replied on Tue, Jan 10 2012 5:41 AM

I have decided to try and upgrade the version we are currently using.
According to the [cfg].[Version] table our installed version is 2.0.0.0.

For some reason I cannot download the newest version from the Downloads Licenced) section and the subversion (I get a page error and the page refreshes username/password given does not work.

Please help,
Runny

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

My instincts tell me you are being throttled.

Your version doesn't contain a lot of the auto-throttling functionality that is in the later versions.  AN 2.6 will detect how fast the webservers want to serve you data and will slow itself so you don't get blocked, which sounds like what is happening with your crawls.

Another thing to look at is, how much data do you have in your DB's?  Any chance you are overrunning what you machine(s) can handle?

Make sure you aren't running into any of the conditions here: http://arachnode.net/blogs/arachnode_net/archive/2010/08/27/memory-conditions-to-avoid.aspx

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
35 Posts
flash replied on Wed, Jan 11 2012 5:33 AM

I will look into the conditions, Thanks.

I can't download a new version from the Licensed section, whenever I click on "download" the page refreshes and nothing is working (Both in Firefoxand and IE).
Am I doing something wrong, Is it a license issue ?

Thanks,
Runny

Top 10 Contributor
1,905 Posts

There is only SVN access now.  Smile

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
35 Posts
flash replied on Thu, Jan 12 2012 12:38 AM

The doesn't seem to be working either.

I can access the repository, but the user name and password seem to be faulty

Top 10 Contributor
1,905 Posts

Try again?  I reset your password.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
35 Posts
flash replied on Sat, Jan 14 2012 11:17 PM

It's still isn't working.

I have tried using IE,Firefox,chrom and Tortoise.

All fail.

Top 10 Contributor
1,905 Posts

I reset your p/w again.  I don't know what to tell you.  Tongue Tied

Post a screenshot of the error please?

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
35 Posts
flash replied on Tue, Jan 17 2012 10:46 PM

Grrrrrrrrr...

I get no error, the user name and password screen keeps popping back, no matter what I enter.
Tried my own user name/Password and the combination mentioned on the download page.

I have a FTP at ftp.spock.info, is it possible you'd upload it there ?

Thanks

 

Top 10 Contributor
1,905 Posts

Yes.  Email me the creds.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
35 Posts
flash replied on Mon, Jan 23 2012 4:28 AM

I have emailed you the credentials.

In any case, The ftp is :

IP : ftp.spock.info
User/Password : 

Just put the files wherever you feel like.
We have a 1MB upload limit in our office, so it might take a while to upload.

Thanks,
Runny

Page 1 of 2 (16 items) 1 2 Next > | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC