arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release

Engine & Crawl Actions

rated by 0 users
Answered (Verified) This post has 1 verified answer | 5 Replies | 1 Follower

Top 25 Contributor
27 Posts
egecko posted on Wed, Jun 3 2015 11:13 PM

So everything was working great, and then I made some changes to add a new class into the SiteCrawler project. Afterwards when I deployed it for testing, the service was having trouble starting.  In particular it was failing and throwing an exception when it was attempting to get our custom plugin with the message that the constructor for the class could not be found.  That's very strange since nothing in the plugin's actual constructor implementation or signature changed and only the internal logic had been updated to include a new protected method.  Anyway, after tracing through, I isolated the problem to some logic in the Engine.cs class' GetObjectHandle() method.  

In particular, when it was looping through, it actually found the plugin type twice without breaking out of the loop.  This in turn ended up leaving the "type" variable pointed at the _second_ type which appears to be some kind of "DisplayClass" C# is generating for the generic.  I'm not sure what the best way to handle this actually is;  I have some other code that has a similar need to find types as well but this is the first time I've encountered this particular issue.  

I would suggest at least breaking out of the loop once you've found the first matching type. =D  It'd probably be better to figure out some way to weed out these particular types of classes but the suggestions I've found so far haven't been all that great.  If I have some free time I'm going to learn more about these display classes and how they come into existence, perhaps that will shed some light on how better to filter them out.

eGecko

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts

We only need to check for the top level class - as this is what the Engine need to instantiate Plugins from the DB.  :)

I checked in a fix for Engine.cs.

LMK,
Mike 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

What are the full types for the matching candidates?

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 25 Contributor
27 Posts
egecko replied on Thu, Jun 4 2015 12:50 AM

So I found out where the "DisplayClasses" come from and why this actually happened.  Display classes are compiler generated classes that are used to hold lambda expressions.  In an earlier thread I made reference to spinning off a new Task inside our plug-in to perform its work in the background while AN went off and did more important stuff.  To do this, I used a lambda inside the plugin's PerformAction() method, e.g.

public override void PerformAction(CrawlRequest<TArachnodeDAO> crawlRequest, IArachnodeDAO arachnodeDAO)

{

.....

    Task.Run(new Action(() => {

                            .... do the background logic here .....

                       }));

.....

}

As a result, the compiler ended up generating a display class that started with the same name as the containing class which in this case corresponds to the name of the plugin and hence how two classes with the same starting name were found inside the assembly's type.  For completeness, the names of the matching candidates in this case were:

Plug-in class match:  Arachnode.Plugins.CrawlActions.SolrNotifier`1

Display class match: Arachnode.Plugins.CrawlActions.SolrNotifier`1+<>c__DisplayClass3

 

eGecko

 

Top 10 Contributor
1,905 Posts

OK.  Nice catch!

I will make the Engine's code check for an exact match.  (later this evening)

Mike

 

 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 25 Contributor
27 Posts

I'm not sure you can do an exact match unless you know exactly how many lambdas appear in the actual code being instantiated. :( There were so discrepancies in some of the flags on the type being checked that could be helpful.  In particular, as I was tracing through I noted the following differences that could be leveraged to make sure the proper class is instantiated:

type2.IsNested -- this was False on the actual class but True for the lambda/display class

type2.IsNestedPrivate -- this was False on the actual class but True for the lambda/display class

type2.IsPublic -- this was True on the actual class but False for the lambda/display class

type2.IsSealed -- this was False on the actual class but True for the lambda/display class

 

I also came across some helpful internetizens who suggested checking for the presence of the System.Runtime.CompilerServices.CompilerGeneratedAttribute on the type.  I did not have much luck with my very brief attempt at checking it.  Instead I just presumed that the first match for the name is the proper one and issued a break when it was found. :)

eGecko

Top 10 Contributor
1,905 Posts

We only need to check for the top level class - as this is what the Engine need to instantiate Plugins from the DB.  :)

I checked in a fix for Engine.cs.

LMK,
Mike 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (6 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC