Technical Specifications v.99 beta

System Requirements
Arachnode.net v.99 beta or higher
Microsoft .NET Framework version 2.0 or higher, or Mono 1.2.5.1 or higher
SQL Server 2005 or SQL Server 2005 Express Edition with Advanced Services. Advanced Services is required for full-text indexing capabilities. Download here.  Arachnode.net was developed with Visual Studio 2005 Professional Edition and SQL Server 2005 Developer Edition.

  • Supported Operating Systems: Windows 2000 Service Pack 4; Windows Server 2003 Service Pack 1; Windows Vista; Windows XP Service Pack 2; Linux
  • Computer with Intel or compatible Pentium III 500 MHz or faster processor (1 GHz or faster is recommended.)
  • Minimum of 512 MB of RAM (1 GB or more is recommended.)
  • 600 MB of available hard disk space (100 GB or more is recommended to store discovered data)

Some features of Arachnode.net may not be available with Express Editions.  Visual Studio 2008 Trial is available here in install or virtual hard disk format.  SQL Server Enterprise Edition Trial is available here in install or virtual hard disk format.

CommunityServer 2007.1
System Requirements here.

Key Features
.NET architecture
Arachnode.net is the only complete open source .NET site crawler available to the general public.

SQL Server 2005 and full-text indexing
SQL Server 2005 full-text indexing is enabled and configured at all appropriate content storage locations.

Analysis Services
Arachnode.net comes with over 120 stored procedures and views designed for use with Anaylsis Services
and other business intelligence software. These procedures and views address trending, popularity, and many other common analysis and reporting needs.

SQL Server 2005 and SSIS
Arachnode.net comes pre-configured with several SSIS procedures to extract and prepare key information from collected data
for text mining and analysis.

HTML to XML
Arachnode.net can convert standard Web pages to XML stored in SQL Server 2005. Use xpath to extract common
elements from downloaded content using the pre-configured XML indexes.

EXIF data extraction
Arachnode.net can extract, store, and index all discoverable EXIF data fields from discovered images.

Multi-threading and Throttling
Arachnode.net can be configured to run any number of threads and to use as much or as little processor time as you require.

Respectul Crawling
Arachnode.net provides pre- and post-request rules governing address and content filtering, robots.txt behavior, request frequency and crawl depth.  The default crawling environment is respectful, courteous and kind.

Configurable Rules and Actions
Implement your own custom pre- and post-request crawl rules and actions
without source recompilation.  The existing crawl rules and actions architecture easily enables crawling enhancements such as federation and partitioning.

Integration with CommunityServer 2007.1
Stored procedures and a post-request crawl action are provided to submit discovered content to any 
CommunityServer 2007.1 installation.

Links: Technical Specifications, Frequently Asked Questions

arachnode.net - a .NET web crawler written in C# using SQL 2005