arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: startIndex

rated by 0 users
Answered (Verified) This post has 1 verified answer | 3 Replies | 2 Followers

Top 10 Contributor
229 Posts
megetron posted on Sat, May 1 2010 10:22 AM

Since I have set the crawler not to save WebPages in database, I get lots of error...

maybe this is got something to do with the fact the the crawler speed up, and no delays between request which cause some errors.

I will post them one by one,
Created ID AbsoluteUri1 AbsoluteUri2 HelpLink Message Source StackTrace
2010-05-01 19:05:33.967 171263 http://www.lyricsoncall.com/lyrics/widespread-panic/jaded-tourist-lyrics.html http://www.lyricsoncall.com/lyrics/widespread-panic/jaded-tourist-lyrics.html NULL Index was out of range. Must be non-negative and less than the size of the collection.  Parameter name: startIndex mscorlib    at System.Globalization.CompareInfo.IndexOf(String source, String value, Int32 startIndex, Int32 count, CompareOptions options)     at System.Globalization.CompareInfo.IndexOf(String source, String value, Int32 startIndex)     at System.String.IndexOf(String value, Int32 startIndex)     at Arachnode.Plugins.CrawlActions.MasterPagesMusicSela.PerformAction(CrawlRequest crawlRequest, ArachnodeDAO arachnodeDAO)     at Arachnode.SiteCrawler.Managers.ActionManager.PerformCrawlActions(CrawlRequest crawlRequest, CrawlActionType crawlActionType, ArachnodeDAO arachnodeDAO)

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts

You need to store a reference in the WebPages table so that the foreign key references will resolve.

Since you have already downloaded the stream, you have the data to process.  You have to insert the WebPage row, but you don't have to insert the Source.  And, if you know what you are doing (which you do), you don't have to save the source to disk.  But you do need the record in the DB, so the HyperLink can be sourced.

Do this make things clear?

Also, it appears that you are using an old version.

-Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

You have to save the WebPages if you save any other data.

Also, the error is coming from your Plugin, so you may have caused additional errors.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
229 Posts

Yes I know that, I hoped you changed this logic.

why should I change the complete page if all i want to save is one link inside it?

The performance is really going dpwn with that, cant we save it to cache instead of database?

thanks.

Top 10 Contributor
1,905 Posts

You need to store a reference in the WebPages table so that the foreign key references will resolve.

Since you have already downloaded the stream, you have the data to process.  You have to insert the WebPage row, but you don't have to insert the Source.  And, if you know what you are doing (which you do), you don't have to save the source to disk.  But you do need the record in the DB, so the HyperLink can be sourced.

Do this make things clear?

Also, it appears that you are using an old version.

-Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (4 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC