My understanding is that AN downloads a page when it changes, am I right?
If so, how does AN knows when it changes? Can we see the changes between pages as a delta?
Thanks a lot
Camilo Orozco
GNF
AN uses the 'LastModified' HTTP header to determine when a page has changed.
Also, in the WebPages table, is the 'LastUpdated' column, which tracks when the content changes, as well.
I do have a fair number of requests for the delta technology... would make a nice feature request.
For best service when you require assistance:
An open source .NET web crawler written in C# using SQL 2005/2008.
Twitter: http://twitter.com/arachnode_net
arachnode.net provides custom crawling and contracting resources. Please ask.
C# crawler, C# web crawler, C# site crawler
I haven't read the whole AN documentation so this question might be answered in it:
Is there a way to trigger a plugin from a page(s) change? Basically if the LastUpdated column changes?
You bet. You could always get the WebPage from the database using the ArachnodeDAO.GetWebPage(...) and compare the dates. Check CrawlRequest.WebClient.HttpResponse.LastModified. (or close to that...)
Another question:
When AN downloads a new version of a page, the existing version in AN will get overwritten?
Yes. It will.