arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

v1.0 first comments

rated by 0 users
Answered (Verified) This post has 1 verified answer | 4 Replies | 2 Followers

Top 25 Contributor
16 Posts
polfilm posted on Mon, Jan 5 2009 3:26 AM

1. esthetics only. zip contains root folder named "New Folder"

2. Web project did not load. It said something about not being able to load that particular project type. I found out that my VS2005 SP1 was missing. Then SP1 would not install on my 2003 saying "Error 1718. File was rejected by digital signature policy" fortunately there is a fix for it here: http://support.microsoft.com/kb/925336/en-us So that one is now working

3. Types. There is no project file for Types so the project is not loading. I created a project Type from scratch blindly puting it in Arachnode.Type default namespace.

4.Still 17 errors during compilation...something about HtmlAgilityPack...just occoured to me it might be something external not in the installation document.  Found: http://www.codeplex.com/htmlagilitypack....trying to figure out how to hook it up now.

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts

Hey!

Thanks for checking this out for me... I had a pretty serious situation occur over the weekend so I didn't have the time to test a fresh install.

Here's what I did to fix the bugs you found.

1.) Updated the release.  \source and \database are at the 'root' of the .zip folder.

2.) I have never seen this error, so thanks for posting on it.

3.) Added the Types project into the solution.  VisualSVN told me it was added but TortoiseSVN said that it wasn't.

4.) The SiteCrawler project had a reference to the HtmlAgilityPack in its \bin folder.  I corrected this to point to the to the HtmlAgilityPack .dll in the Library project.  Downloading and building the HtmlAgilityPack isn't required now.

Question: What are the specs on your machine?

And, please do let me know of the bugs you find and of suggestions you have!  :D

Thanks again!
Mike

p.s. The TextAnalytics reference is a hint to what's coming next.  :D

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 25 Contributor
16 Posts

1. Download HtmlAgilityPack from the website above. Compile with VS2005.

2. in VS open Arachnode.net/SiteCrawler project. Go to references. Remove missing library HtmlAgilityPack

3. Right click on references, then Browse and point to newly compiled DLL.

 

The project now compiles. Ready to do some tests...with below missing.

 

PS: there is still one library missing. Under Library and Plugins i show TextAnalytics library missing.

Top 25 Contributor
16 Posts
polfilm replied on Mon, Jan 5 2009 5:56 AM

Well, what can I say....INCREDIBLE. This is a nice piece of work. 30 Threads are putting my server at 95% cpu. I'm running SQL on it as well (makes a small humming noise :)) taking over 1.6GB of RAM

Entire solution has been running from within Visual Studio for about 2 hours now. It's miles ahead from yesterdays all day session with 0.9.

.....and best part: no more yellow :)))

 

Top 10 Contributor
1,905 Posts

Hey!

Thanks for checking this out for me... I had a pretty serious situation occur over the weekend so I didn't have the time to test a fresh install.

Here's what I did to fix the bugs you found.

1.) Updated the release.  \source and \database are at the 'root' of the .zip folder.

2.) I have never seen this error, so thanks for posting on it.

3.) Added the Types project into the solution.  VisualSVN told me it was added but TortoiseSVN said that it wasn't.

4.) The SiteCrawler project had a reference to the HtmlAgilityPack in its \bin folder.  I corrected this to point to the to the HtmlAgilityPack .dll in the Library project.  Downloading and building the HtmlAgilityPack isn't required now.

Question: What are the specs on your machine?

And, please do let me know of the bugs you find and of suggestions you have!  :D

Thanks again!
Mike

p.s. The TextAnalytics reference is a hint to what's coming next.  :D

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 25 Contributor
16 Posts

Server is old Dual Core 3GHz with 3GB of RAM (2003SP2) but its running tons of my dev stuff like IIS, Sharepoint  (brrr) and SQL2K5 which takes most of the RAM when it gets goin...and I was running your app from within Visual Studio... I think the actual console was eating about 150-200MB

Page 1 of 1 (5 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC