arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008

Exception: Expected a File or an Image but discovered a WebPage.

rated by 0 users
Answered (Verified) This post has 1 verified answer | 4 Replies | 2 Followers

Top 10 Contributor
202 Posts
megetron posted on 17 Aug 2009 1:31 PM

Hello,

what is the reason it expects an file/image  if this is a web page? I get this error too many...

AbsoluteUri1 http://10net.co.il/108752/%D7%A6%D7%A4%D7%99%D7%99%D7%94-%D7%99%D7%A9%D7%99%D7%A8%D7%94-%D7%91%D7%A1%D7%A8%D7%98%D7%99-%D7%A7%D7%95%D7%9E%D7%93%D7%99%D7%94

AbsoluteUri2

http://10net.co.il/site/detail/detail/detailDetail.asp?detail_id=1286833&iPageNumCat0=2&seaWordCat=

 HelpLink

NULL

 Message

Expected a File or an Image but discovered a WebPage.

Source StackTrace

Arachnode.SiteCrawler at Arachnode.SiteCrawler.Components.Crawl.ProcessCrawlRequest(CrawlRequest crawlRequest, Boolean obeyCrawlRules, Boolean executeCrawlActions)

Answered (Verified) Verified Answer

Top 10 Contributor
1,202 Posts

This exception has been completely removed from the upcoming Version 1.3 release.

This basically was an attempt at catching WebPages that had, say, valid image tags returning scripts...

The new 'DataManager' and the new PreGet CrawlRule type and the DataType.cs CrawlRule have fixed this annoyance.  :)

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

All Replies

Top 10 Contributor
1,202 Posts

There are a good number of sites that list HyperLinks that should be images but return a WebPage instead.

Also, this classification isn't 100% accurate.  Needs a bit of work.

I found the bug in this piece of code.

This should be a fun one to fix properly.  :)

Thanks for all of your testing!

-Mike

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

Top 10 Contributor
Male
101 Posts

I seem to get this error consistently when hitting http://nytimes.com/

What's a good way to see exactly what's coming back from that url to research why it's expecting a file or image? 

I almost wonder whether the site's default page is doing something tricky.

I know I can debug it, but what's a good tool that shows everything coming back?  Maybe firebug?

Thx

 

Top 10 Contributor
1,202 Posts

This exception has been completely removed from the upcoming Version 1.3 release.

This basically was an attempt at catching WebPages that had, say, valid image tags returning scripts...

The new 'DataManager' and the new PreGet CrawlRule type and the DataType.cs CrawlRule have fixed this annoyance.  :)

An open source .NET web crawler written in C# using SQL 2005/2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

C# crawler, C# web crawler, C# site crawler

Top 10 Contributor
202 Posts

glad to hear this. the solution sounds good. this exception is flooding the exception tables in a manner that when quering this table you must filter thie errors.

Thank you for the fix.

Page 1 of 1 (5 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems