I am getting this:
The request was aborted: The connection was closed unexpectedly. System at System.Net.ConnectStream.Read(Byte buffer, Int32 offset, Int32 size) at Arachnode.SiteCrawler.Components.WebClient.DownloadData(String absoluteUri) in E:\DEVELOPMENT\2.5\SiteCrawler\Components\WebClient.cs:line 238
What can it be?
This comes from the Web servers themselves.
More results from social.msdn.microsoft.com »
More results from gossamer-threads.com »
For best service when you require assistance:
Also, in this case, it's not so much an error, but you trying to outwit a site that doesn't want you to crawl them.
Check Program.cs. This is likely overriding your threads config setting.
Yes you right. my mistake. should follow server instructions.
How can I know what is the reason for that? I am using IE7 and I can see a page, but when trying to crawl the same page, I get the message...
I even replace IP, but stil the same.
Please advice, thank you,
Try setting the UserAgent to a browser UserAgent.
Also, many web servers will only allow so many concurrent connections from any one IP.
This is a good post on server behavior: http://arachnode.net/blogs/arachnode_net/archive/2010/04/29/troubleshooting-crawl-result-differences-between-different-crawl-environments.aspx
Nice link set:
Hi, after investigating this further:
1, rename user agent several useragents of IE.2. Changing the webclient.Cs headrs:
HttpWebRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate"); HttpWebRequest.Headers.Add("UA-CPU", "x86"); HttpWebRequest.Headers.Add("Accept-Language", "en-us"); HttpWebRequest.Headers.Add("Pragma", "no-cache");
but still the error exists...
So I used fiddler and I find that the page is downloaded partially and stops in the middle. the webserver closes connection.
Now I tried to crawl only one thread in frequence of 10 seconds, and still happens.
Funny thing is that when debug the appllication it's seems like there are 10 threads running and 1 as I defined in the configuration table...