First, download Fiddler Web Proxy if you haven't already, and start Fiddler.
http://www.fiddler2.com/fiddler2/
Second, log in to the site you want to crawl using a browser.
Find the cookie value passed to the site. In this case, it's LinkedIn.

Navigate to Console\Program.cs and locate the following section of code:

Enter your cookie value in place of the one shown.
arachnode.net will use the values in Crawler.CookieContainer to log into sites that require a login, and will dynamically manage all cookie interaction.
Place a breakpoint in SiteCrawler\Crawl.cs at the location shown and view the crawlRequest.DecodedHtml property in Html view to ensure that you have logged into the site successfully.

Results for Facebook:

Facebook uses a large amount of javascript to render content. To fully render the page you would use the Renderers.
Posted
Thu, Jun 2 2011 1:53 PM
by
arachnode.net