Hello,
Is there an option crawling sites from a specic country? or better, by a specific language?
1.) You could filter by extension...
2.) Is there a language tag that you know of that is returned in the HTML headers? If so, you could write a plug-in to achieve this functionality.
For best service when you require assistance:
An open source .NET web crawler written in C# using SQL 2005/2008.
Twitter: http://twitter.com/arachnode_net
arachnode.net provides custom crawling and contracting resources. Please ask.
C# crawler, C# web crawler, C# site crawler
I am not sure thier is one. but UNCODE characters used in most websites, and I guess that if most of the characters are lets say japanese characters, so it is a japanese website. the crawler can make a language check for a page, and estimate the precenteges by reading the text of the website.
is it possible?
Yes, this is possible, and actually might be rather easy for you to implement.
Look at Source.cs - this is a CrawlRule that can filter content based on the content of the page. :)