arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release

No webpage content in 'source' column in WebPage table

rated by 0 users
Answered (Verified) This post has 1 verified answer | 1 Reply | 1 Follower

Top 75 Contributor
6 Posts
Manoj posted on Tue, Mar 19 2013 2:08 AM

Hi all, I am using arachnode.net webcrawler and after crawling is complete when I checked WebPages table then for Source column , I found that the content of source column is empty. After casting that in byte[] (byte array) the length of array was found to be 0 only. Means no content of webpage is saved in database.

Why is this happening? Please help. I am using paid version of arahnode.net web cralwer.

I ran the following query as well to convert byte[] content to string, but that too showed empty string

';SELECT

TOP 1000 AbsoluteUri, CodePage, dbo.ConvertSource(Source, CodePage) as

data

FROM [dbo].

[WebPages]'

The result of this query is as follows:

 

Please help me in resolving this issue.

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by arachnode.net

Set ApplicationSettings.InsertWebPageSource = true;

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC