arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

crawl non en-us page issue

rated by 0 users
Not Answered This post has 0 verified answers | 3 Replies | 2 Followers

Top 200 Contributor
1 Posts
George2 posted on Sat, Sep 19 2009 7:05 AM

Hello everyone,

I have set up arachnode.net and it works fine for en-us page (works ok for crawl and search). But for non en-us page, I find two issues,

1. the snippet content from search result page can not display non en-us character correctly (but when I click the link to display the real content from search result page, the content page is displayed correctly for non en-us content page, and it proves it is not my browser issue to display non en-us content characters);
2. when I search non en-us query, I usually find nothing.

Any ideas what is wrong?

thanks in advance,
George

All Replies

Top 10 Contributor
1,905 Posts

Which site is giving trouble?

I will very likely check in Vesion 1.3 today.

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
1,905 Posts

Non-english charaters may not be supported with lucene.net StandardAnalyzer.  Additionally, I'm opening all Cached file using UTF-8, which obviously doesn't work in all cases.

http://www.aspfree.com/c/a/BrainDump/Working-with-Lucene-dot-Net/2/

Add a feature request/bug report?  I'm planning on taking two weeks away from the code, likely starting today.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
1,905 Posts

This is fixed in Version 1.3

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (4 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC