arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Remove Crawl/Discovery Results

rated by 0 users
Not Answered This post has 0 verified answers | 5 Replies | 2 Followers

Top 75 Contributor
7 Posts
rlink12 posted on Wed, Sep 8 2010 3:26 PM

How do i remove the results from a pervious crawl/discovery so they no longer are a part of search results in the Web Application?

I deleted the records from WebPages, Discoveries and Files where the absolute uri includes the site to be deleted but the search (in the we app), is still returning the unwanted results.

Thanks.

All Replies

Top 10 Contributor
1,905 Posts

The unique key inside of the lucen.net index is to combination of the DiscoveryType and the DiscoveryID.

However, there isn't a direct reciprocal link from SQL back to the Lucene.NET indexes.

The answer is to use the 'RebuildIndexOnLoad' switch.

Engaging this switch will place a new Lucene.NET index in the LuceneDotNetIndexDirectory location.

This does make me think though - if there were a performant way to manage the indexes from within SQL, perhaps from a trigger.  Too bad that CLR function projects can't (easily) reference non-SQL projects.

Does this answer your question?

::mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 75 Contributor
7 Posts
rlink12 replied on Fri, Sep 10 2010 12:18 PM

Yes, this answerewd the question.  It did take along time to re-buld the index and never finished correctly.

 

Thanks.

Top 10 Contributor
1,905 Posts

Did you want to provide an error message?

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 75 Contributor
7 Posts

Continued to get a com exception error.

 

 

Top 10 Contributor
1,905 Posts

Haha... OK.  See my sig.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (6 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC