arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop
Search the Live Index Does arachnode.net scale? | Download the latest release

Crawling Google Docs secure website

rated by 0 users
Answered (Verified) This post has 1 verified answer | 5 Replies | 2 Followers

Top 150 Contributor
2 Posts
lizatt posted on Tue, Mar 8 2011 1:19 PM

Tried crawling Google Docs using my credentials. Get following error:

System.Net.WebException was caught
  Message=The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.
  Source=System
  StackTrace:
       at System.Net.HttpWebRequest.GetResponse()
       at Arachnode.Next.Crawler`1.CrawlThread(Object o) in E:\dev331c\ad\ediscovery\Connectors\WebCrawler\arachnode\trunk\Next\Crawler.cs:line 466
  InnerException: System.Security.Authentication.AuthenticationException
       Message=The remote certificate is invalid according to the validation procedure.
       Source=System
       StackTrace:
            at System.Net.Security.SslState.StartSendAuthResetSignal(ProtocolToken message, AsyncProtocolRequest asyncRequest, Exception exception)
            at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
            at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
            at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
            at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
            at System.Net.TlsStream.ProcessAuthentication(LazyAsyncResult result)
            at System.Net.TlsStream.Write(Byte[] buffer, Int32 offset, Int32 size)
            at System.Net.PooledStream.Write(Byte[] buffer, Int32 offset, Int32 size)
            at System.Net.ConnectStream.WriteHeaders(Boolean async)
       InnerException:

I am setting the credentials at around line 437 of Crawler.cs with the following:

httpWebRequest.Credentials = new NetworkCredential("username""password""");
crawlRequest.HttpWebRequest = httpWebRequest;

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts

http://stackoverflow.com/questions/536352/webclient-https-issues

Investigating this.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

Which AbsoluteUri?

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 150 Contributor
2 Posts

If you have a gmail account, just go to your own google docs uri and supply your own credentials. It will be something along the lines of

https://docs.google.com/?tab=mo&authuser=0#home

Top 10 Contributor
1,905 Posts

http://stackoverflow.com/questions/536352/webclient-https-issues

Investigating this.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
1,905 Posts

Yes, this fixes it.

I am not at a place where I can check in, however.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
1,905 Posts

Quick update: I did check this in...

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (6 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC