hello, I've relly been checking the code of AN, but there are some trouble for me. I hope get your help, thanks!
1.) http://something.arachnode.net/Default.aspx
These classifications are used for restricting a Crawl to a certain Domain, Extension, FileExtension, Host, and/or Scheme.
2.) CrawlRestricted means where can the crawl go - can I follow a WebPage to another domain, etc.? DiscoveryRestricted means can I download images that come from another domain? (A WebPage may contain images/files from another Domain - is it OK to download those?)
3.) You should DEFINITELY write a plugin and NOT change the core because as the core changes/improves you will have to modify/merge your code in with mine. Tell more more about what you are wanting to do for #3?
An open source .NET web crawler written in C# using SQL 2005/2008.
Join the arachnode.net group on Facebook: http://www.facebook.com/groups.php?ref=sb#/group.php?gid=166721755872
Twitter: http://twitter.com/arachnode_net
arachnode.net provides custom crawling and contracting resources. Please ask.
http://bit.ly/TOFX4
C# crawler, C# web crawler, C# site crawler
Thank you for your reply!
In my project, i want to the special content like :
Tn the page , http://arachnode.net/forums/, only crawl the webpage follow these :
Then get the content of these webpage. So if i want to control the crawling action , i could DEFINITELY write a plugin but i must call the plugin in the core. Otherwise, it will crawl a lot of data which i don't want to get. Thank you!
How can I help?
hello again,
Yes, you should write a RegEx plugin to filter CrawlRequests and Discoveries.
See AbsoluteUri.cs.
Do you know how to call a plugin in the core? See cfg.CrawlActions and AbsoluteUri.cs.
Thanks!Mike