arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008
IT Professionals & Windows Deployment Professionals: SmartDeploy Enterprise is the first hardware-independent imaging toolset that uses boot time driver-injection, simplifying deployment and easing distribution by reducing total image count. [LINK]

crawl per country / langiage

rated by 0 users
Not Answered This post has 0 verified answers | 3 Replies | 2 Followers

Top 10 Contributor
219 Posts
megetron posted on 22 May 2009 10:11 AM

Hello,

Is there an option crawling sites from a specic country? or better, by a specific language?

 

All Replies

Top 10 Contributor
1,244 Posts

1.) You could filter by extension...

2.) Is there a language tag that you know of that is returned in the HTML headers?  If so, you could write a plug-in to achieve this functionality.

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Top 10 Contributor
219 Posts

I am not sure thier is one. but UNCODE characters used in most websites, and I guess that if most of the characters are lets say japanese characters, so it is a japanese website. the crawler can make a language check for a page, and estimate the precenteges by reading the text of the website.

is it possible?

Top 10 Contributor
1,244 Posts

Yes, this is possible, and actually might be rather easy for you to implement.

Look at Source.cs - this is a CrawlRule that can filter content based on the content of the page.  :)

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Page 1 of 1 (4 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems