arachnode.net
An open source .NET web crawler written in C# using SQL 2005/2008
IT Professionals & Windows Deployment Professionals: SmartDeploy Enterprise is the first hardware-independent imaging toolset that uses boot time driver-injection, simplifying deployment and easing distribution by reducing total image count. [LINK]

Browse Forum Posts by Tags

Showing related tags and posts for the Forums application. See all tags in the site
  • Re: What is the best method to parse html tags!

    Milan Solanki: > What would be the best method of parsing html I do this daily. I use biterscripting for parsing our own web pages, and extracting all kinds of info from it in all kinds of formats. You can start with the sample script posted at http://www.biterscripting.com/SS_WebPageToText.html as...
    Posted to General Questions (Forum) by JenniC on 4 Nov 2009
  • Re: Plugin help

    Templater.cs is ALPHA code. 1.) HtmlAgilityPack is a memory hog, and should only be used when you absolutely need it. It is used in the templater code because I need XPATH support. 2.) ExtractText does a much, much better job of stripping out tags than the HtmlAgilityPack does, and it's faster as...
    Posted to General Questions (Forum) by arachnode.net on 11 Aug 2009
  • Re: Plugin help

    arachnode.net already contains support for the HtmlAgilityPack - however, the HtmlAgilityPack is a HUGE memory hog and has an extremely negative impact on crawling rate. If you can avoid it, don't use it. If you have to use it, change the configuration setting for 'ExtractWebPageMetaData'...
    Posted to General Questions (Forum) by arachnode.net on 7 Aug 2009
Page 1 of 1 (3 items)
An open source .NET web crawler written in C# using SQL 2005/2008

copyright 2004-2010, arachnode.net LLC

Powered by Community Server (Non-Commercial Edition), by Telligent Systems