When I crawl this site:
http://www.jenkinskling.com
the following response is returned:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>R a z o r B a l l</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="refresh" content="0;url=http://www.jenkinskling.com/jenkinskling/">
<script type="text/javascript" language="JavaScript">document.location.href="http://www.jenkinskling.com/jenkinskling/";</script>
</head>
<body> </body>
</html>
Arachnode sets up the next crawl request for http://www.jenkinskling.com/jenkinskling instead of http://www.jenkinskling.com/jenkinskling/ which results in a 404.
I started looking around in the weblient.cs and DataManager.cs files but I don't really want to change these. Any ideas?
This is one place where I felt I needed to make a concession for duplicate WebPages (which also serves as a bugfix for one condition in the .NET Uri parsing, for which I have an open Connect bug which was never addressed.) e.g. http://www.jenkinskling.com/jenkinskling/ and http://www.jenkinskling.com/jenkinskling are the same page on 99% of WebServers.
If you want to change this, look at DiscoveryManager.cs and find: if (Uri.TryCreate(match.Groups["HyperLink"].Value.TrimEnd('/'), UriKind.RelativeOrAbsolute, out hyperLinkDiscovery)). Remove the "TrimEnd('/')" portion.
As an alternative you can create a plugin with the code from DiscoveryManager.cs and add to the Discoveries that way.
Thanks!Mike
For best service when you require assistance:
Skype: arachnodedotnet
I can take a look at this later today. It should be an easy fix.