arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

AN with SiteMap without hyperlinks.

rated by 0 users
Answered (Verified) This post has 1 verified answer | 1 Reply | 1 Follower

Top 150 Contributor
3 Posts
kanderson posted on Sat, Oct 17 2015 2:40 AM

I need to know how to index a website using only their Sitemap. The pages that should be index are in the xml element loc. The sitemap does not contain hyperlinks.

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by kanderson

There are many forms of .xml documents on the internet.

You'll want to parse it out using the Xml parser of your choosing and then submit those HyperLinks to the Crawler.

crawlRequest.Crawler.Crawl(...);

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC