arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Extracting blogpost / forummessages by keywords from a single location

rated by 0 users
Not Answered This post has 0 verified answers | 3 Replies | 2 Followers

Top 150 Contributor
Male
2 Posts
KIK!T posted on Thu, Sep 23 2010 6:09 AM

Hello, 

About a week a go a got my licenced version of AN. First of all it is a great (and fast) classlibrary. I have already crawled a few sites but i am stuck trying to achieve the following:

I would like to crawl one or a few sites and extract blogposts / forummessages that contain certain keywords. I have already setup a different database containing the keywords to search for. I have looked into the templater but can't really make sense of it all. To much functionality ;)

Can someone please guide me in wich steps are needed to extract only the messages containing the keywords.. ?

Kind regards, 

Domenique / Kikit

All Replies

Top 10 Contributor
1,905 Posts

If you are interested in the Templater, check this out: http://code.google.com/p/boilerpipe/  (Documentation... :))

Then, when you have your excerpts, perform your filtering.

Hopefully you have seen this too: http://arachnode.net/Content/CreatingPlugins.aspx

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 150 Contributor
Male
2 Posts
KIK!T replied on Fri, Sep 24 2010 10:48 AM

Thank you very much for the quick reply. I will check out the info you provided.

Thanks again :) Keep up the good work ;)

Top 10 Contributor
1,905 Posts

:)  Thanks!  Always like to hear compliments!

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (4 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC