arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008
IT Professionals & Windows Deployment Professionals: SmartDeploy Enterprise is the first hardware-independent imaging toolset that uses boot time driver-injection, simplifying deployment and easing distribution by reducing total image count. [LINK]

crawl non en-us page issue

rated by 0 users
Not Answered This post has 0 verified answers | 3 Replies | 2 Followers

Top 150 Contributor
1 Posts
George2 posted on 19 Sep 2009 7:05 AM

Hello everyone,

I have set up arachnode.net and it works fine for en-us page (works ok for crawl and search). But for non en-us page, I find two issues,

1. the snippet content from search result page can not display non en-us character correctly (but when I click the link to display the real content from search result page, the content page is displayed correctly for non en-us content page, and it proves it is not my browser issue to display non en-us content characters);
2. when I search non en-us query, I usually find nothing.

Any ideas what is wrong?

thanks in advance,
George

All Replies

Top 10 Contributor
1,244 Posts

Which site is giving trouble?

I will very likely check in Vesion 1.3 today.

Mike

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Top 10 Contributor
1,244 Posts

Non-english charaters may not be supported with lucene.net StandardAnalyzer.  Additionally, I'm opening all Cached file using UTF-8, which obviously doesn't work in all cases.

http://www.aspfree.com/c/a/BrainDump/Working-with-Lucene-dot-Net/2/

Add a feature request/bug report?  I'm planning on taking two weeks away from the code, likely starting today.

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Top 10 Contributor
1,244 Posts

This is fixed in Version 1.3

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Page 1 of 1 (4 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems