arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release
Browse.aspx and Cache.aspx

I recently had a question concerning whether or not AN could cache a webpage and use local content to browse a site in an offline manner or semi-offline manner.

The demo crawl that is presented at http://arachnode.net/content/Search.aspx downloaded the following page: (Link)

  • Images were downloaded.  Example: (Link) (Directory Browsing is enabled)
  • Files were partially downloaded but scripts were not downloaded.  Example: (Link) (Directory Browsing is enabled)
  • Script code was referenced by AbsoluteUri:  Example: <script type="text/javascript" src="http://b.static.ak.fbcdn.net/rsrc.php/yK/r/NK-XVT6bZ0B.js"></script>
  • As illustrated by the Browse functionality, items boxed in green are found locally, and items boxed in red are not.  Links in the center column are not marked in any way, due to script rendering.
  • Scripts which may require local script access or access to non-downloadable server resources may not function properly and may need to be hot-linked for proper WebPage rendering.
  • The cached version of this page, without the Browse markup is found here: (Link)


Posted Thu, Apr 28 2011 5:17 PM by arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC