arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Limit crawl to web site query results

rated by 0 users
Answered (Verified) This post has 1 verified answer | 1 Reply | 2 Followers

Top 75 Contributor
7 Posts
jrief posted on Fri, Jun 19 2015 8:54 PM

Can we limit the crawl to the results page of a web query?  For example I have a URL of:

www.somesite.com?query=findThis

I would like to only crawl the links on that results page.  The results are (typically) in a frame of its own which is surrounded by other frames that get crawled.   I guess this would be a request to only crawl within a specific frame?

It appears none of the UriClassificationType enums handle this.

Why am I expecting a response about a custom plugin?   ; )

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

Yes, it's custom plugin time.  Smile

AN doesn't know that you only want to deal with the links from one frame, so you'll need to parse them out yourself.

Let me know what you come up with?

Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC