arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Robots metatag

rated by 0 users
Answered (Verified) This post has 1 verified answer | 6 Replies | 1 Follower

Top 10 Contributor
83 Posts
InvestisDev posted on Wed, Nov 20 2013 6:27 AM

Hi mike

Does AN.net respect following metatag and skip indexing the pages containing this tag?

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 <meta name="robots" content="noindex, nofollow">

Or is there such metatag which would help skipping pages from being indexed?

Thanks,

 

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by InvestisDev

You would modify the parsing logic in DiscoveryManager.cs.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
83 Posts

oops!

Don't know what happened there. I was trying to write

<meta name="robots" content="noindex, nofollow">

Top 10 Contributor
1,905 Posts

You know...  I never implemented this.  Smile

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
83 Posts

OK

We have tried implementing noindex logic in ManageLuceneDotNetIndexes.cs and it works.

For nofollow, where we don't need to follow the links on that page, we are still investigating. Can you direct us with some solution?

Thanks

Top 10 Contributor
1,905 Posts
Verified by InvestisDev

You would modify the parsing logic in DiscoveryManager.cs.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
83 Posts

Great, thanks mike!

We will try this and let you know how it goes.

Thanks

Top 10 Contributor
83 Posts
Hi Mike, Did you find any solution for the above mentioned issue.
Page 1 of 1 (7 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC