arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE
Does arachnode.net scale? | VS2008/2010/2012 & SQL2008/2012 | Download the latest release

Storing multiple versions of the same page

rated by 0 users
Not Answered This post has 0 verified answers | 1 Reply | 2 Followers

Top 150 Contributor
2 Posts
osasson posted on Thu, Feb 24 2011 8:13 AM

Hi,

Is there any way to store multiple versions of the same page across multiple crawls? In other words, if I crawl http://xxx/a.html today and I run a new crawl again tomorrow, is there any straightforward way to keep both copies?

 

Thanks,

Ori

All Replies

Top 10 Contributor
1,696 Posts

Create a copy of the WebPages table and place a trigger on the 'WebPages_INSERT' SP.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE

copyright 2004-2013, arachnode.net LLC