arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release
Can't override WebClient.DownloadData...

I'm investigating explicitly implementing the HttpResponse header 'Last-Modified'.

I did a quick search on Google for 'c# WebClient only crawl if content change LastModified' and got this:

How's that for a dead-end prime opportunity for some traffic once implemented?  Big Smile


Posted Sat, Aug 8 2009 9:07 AM by arachnode.net

Comments

arachnode.net wrote re: Can't override WebClient.DownloadData...
on Sat, Aug 15 2009 8:54 AM

I ended up implementing WebClientManager.cs which sends a 'HEAD' request, checks the LastModifed and then sends the 'GET' request where appropriate.

An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC