arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

How do you modify CrawlRequest data in a plugin so it's passed to other plugins?

rated by 0 users
Answered (Verified) This post has 0 verified answers | 31 Replies | 2 Followers

Top 75 Contributor
5 Posts
Alan posted on Tue, May 8 2012 8:24 PM

I want to create a plugin called HtmlSlimmer which allows you to specify various HTML tags such as JavaScript, HTML comments, CSS styles, etc to have stripped out of a page's HTML before it's saved to the database/file system to save on disk space.

I've gone through the plugin tutorial but all the HTML properties of the CrawlRequest are read only unless I keep my class inside the Plugins or Crawler apps. I would like to keep this class in my own project in which I've pulled in the AN crawler so I can customize various aspects of work flow and UI.

Is it possible to have a class that is derived from ACralwAction in a different project and still have the ability to set those otherwise read only roperties such as Html?

Page 1 of 1 (3 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC