arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2005/2008/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Mongo/Raven/MySQL/Hadoop Does arachnode.net scale? | VS2008/2010/2012 & SQL2008/2012 | Download the latest release

absoluteuris batch processing

rated by 0 users
Not Answered This post has 0 verified answers | 1 Reply | 2 Followers

Top 10 Contributor
229 Posts
megetron posted on Sun, Aug 23 2009 10:26 AM

Hello,

If I amy suggest a new feature that might be usefull for others.

The feature is called a batch processing of specific  absoluteuris that specify according to regular expression or so.

for example lets say I wish to add to crawlrequests only absolute uris looks like "watch.php?catid={0}&pid={1}

Below is how I implemented this. what do you think of adding a support for such? a more flexible way to populate the CR table.

 

                string[] sPID = { "23&pid=1", "23&pid=2", "23&pid=3", "23&pid=4",
                                  "22&pid=1", "22&pid=2", "22&pid=3", "22&pid=4",
                                  "21&pid=1", "21&pid=2", "21&pid=3", "21&pid=4","21&pid=5",
                                  "27&pid=1",
                                  "15&pid=1", "15&pid=2", "15&pid=3", "15&pid=4","15&pid=5","15&pid=6",
                                  "24&pid=1","24&pid=2","24&pid=3",
                                  "28&pid=1", "28&pid=2", "28&pid=3", "28&pid=4","28&pid=5",
                                  "20&pid=1", "20&pid=2", "20&pid=3", "20&pid=4","20&pid=5", "20&pid=6", "20&pid=7", "20&pid=8", "20&pid=9","20&pid=10",
                                  "34&pid=1","34&pid=2",
                                  "29&pid=1","29&pid=2",
                                  "30&pid=1","30&pid=2","30&pid=3"
                                };

                foreach (string s in sPID)
                {
                    _crawler.Crawl(new CrawlRequest(new Discovery("mydomain.com/watch.php?catgid="+s), int.MaxValue, UriClassificationType.Domain, UriClassificationType.Domain, 1));
                }

All Replies

Top 10 Contributor
1,750 Posts

At one point there was code in the QueryProcessor that did this.  Feature request duly noted.  :)

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2005/2008/CE

copyright 2004-2014, arachnode.net LLC