arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

What are these fields used for and mean?

rated by 0 users
Not Answered This post has 0 verified answers | 8 Replies | 3 Followers

Top 50 Contributor
11 Posts
VE6CPU posted on Thu, Jul 23 2009 10:29 AM

Using the CVS version in the CrawlRequests table what do the RestrictCrawlTo and RestrictDiscoveriesTo do and mean?

Thanks

Stephen

All Replies

Top 10 Contributor
1,905 Posts

They correspond to the enum: 

namespace

 

 

Arachnode.SiteCrawler.Value.Enums Flags]

{

[

 

public enum UriClassificationType :

byte

{

None = 0,

Domain = 1,

Extension = 2,

FileExtension = 4,

Host = 8,

Scheme = 16

}

}

RestrictCrawlTo means that the Crawl won't crawl WebPages that aren't the same UriClassificationType as the 'RestrictCrawlTo' UriClassificationType.

RestrictDiscoveriesTo means that the Crawl won't acknowledge Discoveries that aren't the same UriClassificationType as the 'RestrictCrawlTo' UriClassificationType, with those Discoveries being those shown in yellow.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
11 Posts
VE6CPU replied on Fri, Jul 24 2009 9:41 AM

So if I wanted both 5 and 7 I should just add them together?

Top 10 Contributor
1,905 Posts

Yes, but the field is a bitmask, so AND the two binary representations of 5 and 7.

It is probably easiest to use the ArachnodeDAO and call 'InsertCrawlRequest'...

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
11 Posts
VE6CPU replied on Fri, Jul 24 2009 10:37 AM

I think this is where I'm a bit confused.  Where/how do you do this?

Top 10 Contributor
1,905 Posts

ArachnodeDAO arachnodeDAO = new ArachnodeDAO();

arachnodeDAO.InsertCrawlRequest(...);

Does this help?

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
11 Posts
VE6CPU replied on Mon, Jul 27 2009 7:06 AM

Thanks.  It does help.  From the way you had first phrased it, it sounded like there was already a front end or something that had all of these in there already.  No problem.  Just have to brush up on my C# a bit.  Spent to many years in a VB6 environment.

Top 10 Contributor
1,905 Posts

A front end would be nice - someday perhaps... (there is a partial one in the works that I know of - perhaps I should check on the status...)

Check out Program.cs - this has a good example of how to use RestrictCrawlTo and RestrictDiscoveriesTo...

Guess arachnode.net will have to live as a class library for now.  Big Smile

-Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
Male
101 Posts
Kevin replied on Mon, Jul 27 2009 12:13 PM

Hint taken!  Yes I've got a web admin interface in the works, just not able to spend the time on it that I want to!  It has not been forgotten though!

Page 1 of 1 (9 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC