arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release
Performance Tuning

From time to time I take a look at areas of AN which might be hastened.

OVERVIEW TIPS:

  • Disable ConsoleLogging: ApplicationSettings.EnableConsoleOutput = true;
  • Disable Intellitrace.
  • More threads does not necessarily mean better performance.
  • Limit SQL's maximum memory consumption.
  • Ensure arachnode.net has enough RAM to accomodate ApplicationSettings.DesiredMaximumMemoryUsageInMegabytes.
  • VisualStudio 2008/2010/2012/2013/20XX hangs often when debugging large number of threads: Use CTRL+F5.

Running a Performance Analysis with the following settings yields this (the Performance Analysis had ApplicationSettings.EnableConsoleOutput = true):

Notice only Discoveries and CrawlRequests are inserted/updated/deleted to/from the database.

The function performing the most work is actually writing to the Console - and NOT the database functions.  Optimized.

Notice the Maximum number of CrawlRequests processed/second is 1,726.

Running a Performance Analysis with the following settings (all of the above + ApplicationSettings.EnableConsoleOutput = false) yields this:

With 100 threads, crawling the local test site resulted in 3,145! WebPages a second.

After many Performance Analysis runs I was unable to find anything significant to optimize.  What is/was left belongs to the .NET framework.


Posted Sun, Apr 12 2015 1:14 PM by arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC