I am thinking to use your product to crawl up to million web sites. My goal is to find top websites that abound in pdf documents. I am not really interested in indexing, downloading content and storing it to a hard drive. I just need pdfs discovering within the only sites I specified, but with unlimited depth of discovering.
The list of websites will be dynamic. Is it possible to add new crawl requests dynamically, not changing source code like it's done in the demo console application?
And the last question, can I run crawler on schedule basis?
http://arachnode.net/media/g/releases/tags/AN.Next+_2800_DEMO_2900_/default.aspx - not found
Yes, AN can help you. You can easily filter for .pdf document and nothing else, simply crawling pages and validating the content type for suspected .pdf's and only downloading and processing those documents.
It is possible to add them dynamically. You could add them to the CrawlRequests database table while crawling, or could have your process read from CrawlRequests.txt, like the Service (Service project) does.
Yes, it's easily possible to run AN on a schedule.
Thanks for the broken link. Looks like CommunityServer needs to re-index the Media galleries.
For best service when you require assistance: