arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release

404 errors

rated by 0 users
Answered (Not Verified) This post has 0 verified answers | 13 Replies | 2 Followers

Top 50 Contributor
10 Posts
samar posted on Fri, May 25 2012 3:16 PM

Hi

When I crawl a homepage, I can see in the exceptions table, that it is returning some 404 error, page not found.

The AbsoluteUri1 column show and url and the AbsoluteUri2 shows another. I have understood the relationship as to be parent child. That is AbsoluteUri1 webpage contains the link in AbsoluteUri2.

With this in mind I would have thought that I was able to open the webpage at AbsoluteUri1 and find the link in AbsoluteUri2, but not. I cannot find the links anywhere in uri1.

How is this possible.

 

 

All Replies

Top 10 Contributor
1,905 Posts

What are the AbsoluteUris?

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC