arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Disallowed Words

rated by 0 users
This post has 1 Reply | 2 Followers

Top 150 Contributor
Posts 3
Jerry Posted: Tue, Oct 13 2015 7:00 PM

Hi, i'm trying to crawl some sites using a disallowed words , but the sites are stored, the words exists into disallowedwords table

 

Or, i need activate some else?

Thanks for your support

Top 10 Contributor
Posts 1,905

Thisis correct - the Source.cs Plugin works with the cfg.DisallowedWords database table.

You can always clear this table out, or, use your own plugin to filter as you see fit.

if(crawlRequest.DecodedHtml.Contains("thisforbiddenword"))

{

crawlRequest.IsDisallowed = true;

}

Definitely look at AbsoluteUri.cs and Source.cs

Thanks!
Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC