arachnode.net v2.0
An open source .NET web crawler written in C# using SQL 2005/2008

crawl non en-us page issue

rated by 0 users
Not Answered This post has 0 verified answers | 3 Replies | 2 Followers

Top 100 Contributor
1 Posts
George2 posted on 09-19-2009 7:05 AM

Hello everyone,

I have set up arachnode.net and it works fine for en-us page (works ok for crawl and search). But for non en-us page, I find two issues,

1. the snippet content from search result page can not display non en-us character correctly (but when I click the link to display the real content from search result page, the content page is displayed correctly for non en-us content page, and it proves it is not my browser issue to display non en-us content characters);
2. when I search non en-us query, I usually find nothing.

Any ideas what is wrong?

thanks in advance,
George

All Replies

Top 10 Contributor
Male
920 Posts

Which site is giving trouble?

I will very likely check in Vesion 1.3 today.

Mike

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 10 Contributor
Male
920 Posts

Non-english charaters may not be supported with lucene.net StandardAnalyzer.  Additionally, I'm opening all Cached file using UTF-8, which obviously doesn't work in all cases.

http://www.aspfree.com/c/a/BrainDump/Working-with-Lucene-dot-Net/2/

Add a feature request/bug report?  I'm planning on taking two weeks away from the code, likely starting today.

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Top 10 Contributor
Male
920 Posts

This is fixed in Version 1.3

An open source .NET web crawler written in C# using SQL 2008.

Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872

Twitter: http://twitter.com/arachnode_net

arachnode.net is provides custom crawling and contracting resources.  Please ask.

http://bit.ly/TOFX4

Page 1 of 1 (4 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2009, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems