arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008
IT Professionals & Windows Deployment Professionals: SmartDeploy Enterprise is the first hardware-independent imaging toolset that uses boot time driver-injection, simplifying deployment and easing distribution by reducing total image count. [LINK]

Differences between versions of pages

rated by 0 users
Answered (Verified) This post has 1 verified answer | 5 Replies | 2 Followers

Top 50 Contributor
7 Posts
orozcoc posted on 4 Feb 2010 4:21 PM

My understanding is that AN downloads a page when it changes, am I right?

If so, how does AN knows when it changes? Can we see the changes between pages as a delta?

Thanks a lot

Camilo Orozco

GNF

Answered (Verified) Verified Answer

Top 10 Contributor
1,244 Posts

AN uses the 'LastModified' HTTP header to determine when a page has changed.

Also, in the WebPages table, is the 'LastUpdated' column, which tracks when the content changes, as well.

I do have a fair number of requests for the delta technology... would make a nice feature request.  Big Smile

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

All Replies

Top 10 Contributor
1,244 Posts

AN uses the 'LastModified' HTTP header to determine when a page has changed.

Also, in the WebPages table, is the 'LastUpdated' column, which tracks when the content changes, as well.

I do have a fair number of requests for the delta technology... would make a nice feature request.  Big Smile

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Top 50 Contributor
7 Posts

I haven't read the whole AN documentation so this question might be answered in it:

Is there a way to trigger a plugin from a page(s) change? Basically if the LastUpdated column changes?

Thanks a lot

Camilo Orozco

Top 10 Contributor
1,244 Posts

You bet.  You could always get the WebPage from the database using the ArachnodeDAO.GetWebPage(...) and compare the dates.  Check CrawlRequest.WebClient.HttpResponse.LastModified.  (or close to that...)

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Top 50 Contributor
7 Posts

Another question:

When AN downloads a new version of a page, the existing version in AN will get overwritten?

Thanks a lot

Camilo Orozco

Top 10 Contributor
1,244 Posts

Yes.  It will.

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Page 1 of 1 (6 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems