arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Searching pages for keywords

rated by 0 users
Answered (Verified) This post has 1 verified answer | 1 Reply | 2 Followers

Top 50 Contributor
9 Posts
DataMan posted on Fri, Mar 12 2010 6:00 PM

So I've been trying to figure out how to get AN to only return pages that have certain words on them.  I would think that there would be a table or a text file somewhere that you fill and voila,  only pages with those terms on it would be returned.

How would I accomplish that?  I've been trying to figure out CrawlRules and trying to figure out how to write a plugin but can't seem to get anywhere.

Thanks

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

You mean filter them - only allow certain pages into the system that contain specific words?

Look at Source.cs in the SiteCrawler project.

-Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC