arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

How can i crawl site with multiple subdomains ?

rated by 0 users
Answered (Verified) This post has 1 verified answer | 5 Replies | 1 Follower

posted on Tue, Oct 26 2010 3:29 AM

Hello there,

Can you please tell me how can i crawl a site with multiple subdomains?

I am able to crawl site like: http://www.abc.com

But i am NOT able to crawl site like: http://www.xyz.abc.com     <-- (with subdomain)

Can you please tell me how can i crawl http://www.xyz.abc.com site?

Is there any configuration for it?

Please reply ASAP.

 

-Nilesh

 

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts

Look at this post: http://arachnode.net/forums/p/323/10294.aspx#10294

...and this one... http://arachnode.net/forums/p/1400/12854.aspx#12854

You need to change UriClassificationType from Host (more specific) to Domain (less specific)...

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

Look at this post: http://arachnode.net/forums/p/323/10294.aspx#10294

...and this one... http://arachnode.net/forums/p/1400/12854.aspx#12854

You need to change UriClassificationType from Host (more specific) to Domain (less specific)...

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC