arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Unable to launch Console

rated by 0 users
Answered (Verified) This post has 1 verified answer | 6 Replies | 2 Followers

Top 50 Contributor
14 Posts
sanwal233 posted on Wed, Dec 29 2010 11:39 PM

Compiled the whole VS solution. I get few warnings but no errors.

2 types of warnings are there:

1. 'xyz' method is obsolete, use 'abc' method. These type are from Lucene module.

2. Other warning is from Fucntions.csproj.user . It is schema related. For reference i pasted below this warning:

I setup console project as startup. but on F5, nothing launches and VS says 'deployment failed'. 

Can anyone help ?

I am using VS 2010 (dont have VS 2008), i dont think that should make any difference. backward compatibility is always ensured in such dev tools.

 

Warning 2 The element 'Project' in namespace 'http://schemas.microsoft.com/developer/msbuild/2003' has incomplete content. List of possible elements expected: 'PropertyGroup, ItemGroup, ItemDefinitionGroup, Choose, UsingTask, ProjectExtensions, Target, Import' in namespace 'http://schemas.microsoft.com/developer/msbuild/2003'. D:\Arachnode\LatestRelease2.5\Functions\Functions.csproj.user 13 3 Functions

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by sanwal233

1.) They will, but this table is really for the crawler to use.

2.) You should comment what you want.  You have the option to se in the DB or in code.

3.) Check out the reset DB procedure to completely reset the DB.  Make a SQL backup to save your data.  If the crawl completes the DB will be in 'clean slate' mode and you can crawl again.  See those options that prompt you every time you start the console - those help you to reset the crawler. 

4.) Don't change that table. :)

5.) http://arachnode.net/search/SearchResults.aspx?q=negateisdisallowed  What order in which config table?  There are more than a few.  Stick out tongue

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

I can answer this post further in a few hours.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
1,905 Posts

OK, back.

1.) Warning, are just that - warnings.  Lucene frequently makes methods obsolete, and then removed them in the next minor version.  Most of these warnings are addressed in code on my desktop and can be safely ignored.

2.) This I don't know because I use 2008 for AN.

For you failed deployment: Sounds like a VS2008 bug still exists in 2010.  Restart VS, and restart SQL and the project will deploy.  (It does in 2008).  If it doesn't, look at the properties of the Functions project and ensure you have a valid DB connection string.

http://arachnode.net/forums/p/421/10450.aspx#10450

Do either of these posts help with the 'Functions' warning?

  1. Missing element according to the MSBuild XML schema?

    Dec 12, 2007... in namespace 'http://schemas.microsoft.com/developer/msbuild/2003'has incomplete contentList of possible elements expected: 'PropertyGroup,ItemGroupItemDefinitionGroupChooseUsingTaskProjectExtensionsTarget,Import' in namespace 'http://schemas.microsoft.com/developer/msbuild/2003'. ...
    social.msdn.microsoft.com/.../11350d1e-f2f2-40bb-b433-469135a37e3f - Cached
  2. [CTP] Slaam! Mobile - Page 2 - Zuneboards

    20 posts - 13 authors - Last post: Jun 5, 2008 List of possible elements expected: 'PropertyGroupItemGroup... in namespace'http://schemas.microsoft.com/developer/msbuild/2003'. ...
    www.zuneboards.com/.../26046-community-technical-preview-slaam-mobile-2.html - Cached

 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
14 Posts

Ok, I got it working ! Did the console run and did the search test.

I see there are 3 webdav servers running :

1. search - works fine

2. administration: the page shows fine, but clicking to view a table content, it says "SQL connection couldnt be made, named pipe etc....."  I have TCP, named pipe etc. enabled in SQL config manager.

Did i miss adding connection string anywhere ?

3.  crawl.aspx - How to use this ? It only asks for a URL.

 

 - How can i give the starting URL to crawl ? Say if i want to crawl a given website (and no links outside that website) , can i configure this ? Where ?

 - I hope crawling a single large website wont get my IP banned ?

 - And if i dont want images etc. only the html text, where to do this config.

- finally how do i access the raw downloaded htmls . are they in DB ?

- and can i insert sm custom parsing code when a page is downloaded, so tht i can store it in my own way. Or may be once crawl is done, i can do processing on whole bunch.

 

Top 10 Contributor
1,905 Posts

1.) Great!

2.) The Administration project may not be supported in 2010.  Check the web.config.

3.) This is a beta and there isn't any documentation on it.  Step into the code and see what it does!  There are plenty of comments in the source.

You will want to read these sections completely:

http://arachnode.net/Content/CreatingPlugins.aspx

http://arachnode.net/Content/FrequentlyAskedQuestions.aspx

These sections will answer most of your questions.  Get back to me when you have, OK?  Smile

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
14 Posts

Thanks Mike for the help !

Ok, So i went through most of the documentation and got pretty good idea to basic workings.

My questions further which need you to come in are:

1. Can i give crawlrequests directly by adding entries in the table ? Even during ongoing crawl, will those get picked-up ?

2. Do i need to run console everytime i need to crawl ? But Program.cs resets lot of Configuration settings , should i comment those lines out if i want to control through config table modification. ?

3. So after every crawl, how do i consolidate/backup my data before another crawl ? Which minimum tables i should clear to get to clean-slate for next crawl 

 

4. I see in program.cs if i give new crawlrequest, then i can hardcode my restriction in code . Then Is there need to modify UriClassificationType table ? 

Lastly for now...

What is Order in config.. table and what does NegateIsDisallowed mean ?

 

Thanks...

Sanwal

Top 10 Contributor
1,905 Posts
Verified by sanwal233

1.) They will, but this table is really for the crawler to use.

2.) You should comment what you want.  You have the option to se in the DB or in code.

3.) Check out the reset DB procedure to completely reset the DB.  Make a SQL backup to save your data.  If the crawl completes the DB will be in 'clean slate' mode and you can crawl again.  See those options that prompt you every time you start the console - those help you to reset the crawler. 

4.) Don't change that table. :)

5.) http://arachnode.net/search/SearchResults.aspx?q=negateisdisallowed  What order in which config table?  There are more than a few.  Stick out tongue

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (7 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC