arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release

Index Multiple sites Simultaneously

rated by 0 users
Answered (Verified) This post has 1 verified answer | 2 Replies | 2 Followers

Top 150 Contributor
3 Posts
kanderson posted on Thu, Feb 11 2016 9:17 PM

I have a list of website urls that need to be crawled. Each crawl should store the Lucene index in it own file directory.  I want all of the websites to be indexed at the same time. Can you provide me with directions on how this can be accomplished? The plug in Arachnode.Plugins.CrawlActions.ManageLuceneDotNetIndexes reads the location of the file location to store the results from the settings field which would be changed on each new instance of the crawl. I don't want to overwrite the settings field but it may be necessary.  Any insight will be very helpful. 

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

Yes, change 'Settings' at 'private static void AssignApplicationSettingsForDebug()' - you are correct about how this should be done.

Thanks,
Mike 

 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

Yes, change 'Settings' at 'private static void AssignApplicationSettingsForDebug()' - you are correct about how this should be done.

Thanks,
Mike 

 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 150 Contributor
3 Posts

I don't have a private static void AssignApplicationSettingsForDebug(). I am creating my own instance of the crawler. I get your point here. For each call I have to overwrite the default value of the lucene folder directory. 
But in this model you still have one instance of the Crawler. Is it possible to have multiple instances of Crawler<ArachnodeDAO> _crawler running at the same time. Say for instance inside of a loop.

Page 1 of 1 (3 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC