In one machine, in a windows service, should we create multiple AN "Crawler Instance" to crawl multiple websites? Suppose we want to crawl 10 big websites (Amazon, Bestbuy, ...) and we also want to handle (e.g. start, stop and pause) each website separately. What is the best way to do?
Are there 10, or some low number, to be exact?
The easiest way to manage a low number of separate websites is to use distinct instances/databases. There isn't a facility to support start/stop/pause for domains within AN now, and while you could accomplish this via a plugin, management of the data is much easier if it is separated from the start.
Look at ApplicationSettings.UniqueIdentifier. As each Crawler instance will share the ASP.Net cache, ensure that each instance has a unique key so that the crawlers don't crawl the same content. If you set RestrictCrawlTo.Host(/Domain) for the CrawlRequests then the Crawlers won't overlap in there Discoveries. If you don't plan to allow non-Domain content (an image on the BestBuy site originating from Amazon.com) then you don't need to set the ApplicationSettings.UniqueIdentifier. (RestrictDiscoveriesTo.Host(/Domain))
Easiest way to know if you'll need to set the ApplicationSettings.UniqueIdentifier is if you examine you data in the BestBuy.com instance and you see content from any other domain, then, yes, you'll need to set the ApplicationSettings.UniqueIdentifier.
For best service when you require assistance: