arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Specify CrawlRules per CrawlRequest

rated by 0 users
Answered (Verified) This post has 1 verified answer | 26 Replies | 2 Followers

Top 25 Contributor
19 Posts
ptrennum posted on Sun, Jun 12 2011 2:37 PM

How is it possible to specify unique CrawlRules to be applied on a per site basis?

I currently have it setup as follows:

                    //reset crawl rules for MySite
                    foreach (ACrawlRule crawlRule in _crawler.CrawlRules.Values)
                    {
                        if (crawlRule.TypeName == "Arachnode.Plugins.CrawlRules.MyCrawlRule1")
                        {
                            crawlRule.IsEnabled = true;
                        }
                        else
                        {
                            crawlRule.IsEnabled = false;
                        }
                    }

                    wasTheCrawlRequestAddedForCrawling = _crawler.Crawl(new CrawlRequest(new Discovery("http://mysite.com"), int.MaxValue, UriClassificationType.None, UriClassificationType.None, 1, RenderType.None, RenderType.None));

 

                    //reset crawl rules for MyOtherSite
                    foreach (ACrawlRule crawlRule in _crawler.CrawlRules.Values)
                    {
                        if (crawlRule.TypeName == "Arachnode.Plugins.CrawlRules.MyCrawlRule2")
                        {
                            crawlRule.IsEnabled = true;
                        }
                        else
                        {
                            crawlRule.IsEnabled = false;
                        }
                    }

                    wasTheCrawlRequestAddedForCrawling = _crawler.Crawl(new CrawlRequest(new Discovery("http://myothersite.com"), int.MaxValue, UriClassificationType.None, UriClassificationType.None, 1, RenderType.None, RenderType.None));

I don't think this is correct...

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

Yes.  You are correct.  Big Smile

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 25 Contributor
19 Posts

I think I answered my own question and what needs to happen is to specify the host in the rule itself.

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

Yes.  You are correct.  Big Smile

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (3 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC