arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008
IT Professionals & Windows Deployment Professionals: SmartDeploy Enterprise is the first hardware-independent imaging toolset that uses boot time driver-injection, simplifying deployment and easing distribution by reducing total image count. [LINK]

Anonymouse Crawling

rated by 0 users
Answered (Verified) This post has 1 verified answer | 3 Replies | 2 Followers

Top 75 Contributor
3 Posts
sagie.shamay posted on 1 Jun 2009 1:34 AM

Hi.

I think this is a great product, and I'm excited to start crawling with it.

Is it possible with the current version to make an anonymouse crawl?

If true, how can I do so? If not, can it be added to your future features list?

 

Thanks, Sagie

Answered (Verified) Verified Answer

Top 10 Contributor
1,244 Posts

I added an Anonymizer plugin to the branch so you can see how this would be implemented.  Don't forget to check out the DB too...

(Branch is a branch, but quite viable...)  This code is checked into the trunk now.

Mike

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

All Replies

Top 10 Contributor
1,244 Posts

This functionality isn't coded, but it is possible, and rather easily.

You would want to create a PreCrawlRequest CrawlAction and change the AbsoluteUri to your anonymizer.

Take a look at ManageLuceneDotNetActions.cs.  This is a PostCrawlRequest CrawlAction.

If you can't figure out how to get a plug-in going, find the other posts on the site that talk about plug-ins and keep pinging me.  It would be fairly easy for me to implement.  :)

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Top 10 Contributor
1,244 Posts

I added an Anonymizer plugin to the branch so you can see how this would be implemented.  Don't forget to check out the DB too...

(Branch is a branch, but quite viable...)  This code is checked into the trunk now.

Mike

For best service when you require assistance:  Big Smile

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

An open source .NET web crawler written in C# using SQL 2005/2008.

Twitter: http://twitter.com/arachnode_net

arachnode.net provides custom crawling and contracting resources.  Please ask.

C# crawler, C# web crawler, C# site crawler

Top 75 Contributor
3 Posts

Thanks,I've figured out how to add an action, rule, etc., and I will be happy to contribute it when I'll finish

Page 1 of 1 (4 items) | RSS
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems