arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release

Is there any way to not consider subdomain while crawling?

rated by 0 users
Answered (Verified) This post has 1 verified answer | 36 Replies | 3 Followers

Top 150 Contributor
3 Posts
vchauhan.me posted on Mon, Mar 5 2012 3:05 AM

Hello,

Is there any way to not consider subdomain while crawling?

Thanks

Vinayak Chauhan

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by vchauhan.me

OK.

Use UriClassificationType.Host as the RestrictCrawlTo and RestrictDiscoveriesTo value.

Thanks,
Mike 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

Do you mean if a link is found on msn.com that points to cars.msn.com, don't allow the Crawler to crawl cars.msn.com?

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
82 Posts

Hello,

I need the same thing to exclude subdomains like msn.com, cars.msn.com

 

Top 150 Contributor
3 Posts

This kind of sub-domain and also the domains like yahoo.com, facebook.com twitter.com too.

 

Top 10 Contributor
1,905 Posts
Verified by vchauhan.me

OK.

Use UriClassificationType.Host as the RestrictCrawlTo and RestrictDiscoveriesTo value.

Thanks,
Mike 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (5 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC