I have download arachnode.net 2 days back and i am impressed by your work. my question can I crawl Facebook and twitter using arachnode?
Twitter shouldn't be any problem at all. You may have to ignore the Robots.txt file to crawl ALL of twitter, but you CAN turn this off in AN.
I ran a test crawl from Facebook, while logged out, and downloaded a ton of pages. So, tell me - what do you want to crawl on Facebook?
I started crawls from here: http://www.facebook.com/people/Mike-Anderson/773002299 and here: http://twitter.com/arachnode_net
...and got these results (SUCCESS, to me, anyway...)
Also, if you register, you can receive email notifications when there are replies to your posts.
For best service when you require assistance:
Twitter links are available whether logged in or not. So, you can submit a crawl request for something like http://twitter.com/search?q=haiti and walk it no problem. Rules/templates for walking twitter content is likely quite different than for html pages though ;)
Regarding facebook, I'm not sure about that one. I believe you have to be logged in with a valid fb account to see most anything.
Hope that helps a bit.
I am coming back from vacation today and can answer your question when I get back... (just wanted to say 'Hello' though...)
thanks kevin for ur info
Waiting for u Mike
Cool. Just got back - weary and worn but had an amazing time.
Will write more in the morning.
aha I managed to crawl facebook I had to coment out the Politness.cs code .. ooops :S and I was logged on to my facebook account .. is it the correct way to do this for any social site
You can get a lot of mileage out of using the CredentialCache in the Crawler.
Which version are you using, BTW?