Hi, this is a professional solid work :)
My punch of questions are, if I want to deploy it on multiple servers, from where I can start and how many servers I can use to maximize its performance, and what are the components that should be on each server? and do you know a place like website directories where we can find servers and domains to add them to the crawler database?
Thank you :)
Important: The biggest limiting factor in AN, using the default configuration, is the speed of your database disks.
That said, how AN performs depends on what you have turned on.
That said, if you aren't taxing the DB, the biggest limiting factor very well may be your internet connection and connection H/W... specifically the number of simultaneous connections you can make.
AN currently supports one DB machine, but multiple crawlers and can distribute each of the DownloadedImages/DownloadedFiles/DownloadedWebPages directories across any number of servers, provided you use DFS or any other FS clustering technology.
So, crawl code (the solution files) go on the crawling machines, and the DB is restored to the DB server.
You could have three additional machine that do nothing other than provide file shares for the Discoveries (Files, Images, WebPages), thereby offloading this work from, say, the DB server.
Again, the balance of resources will depend on what you want to crawl... (wouldn't make sense to have a killer DB machine if you aren't storing tons of data...)
Does this answer your question?
You can check http://directory.google.com/ for sites to crawl. AN comes pre-configured with about 1 million Priorities for WebPages, to crawl by priority, of course.
If you purchase a license(s), I am more than happy to help you set AN up across multiple machines.
For best service when you require assistance:
Would you register so I know who you are, please? This question is a bit involved, and if you register you will be notified when the thread is updated.
I registered in the forums, and upon your request I will ask my questions again :)
If I want to deploy it on multiple servers, from where I can start and how many servers I can use to maximize its performance, and what are the components that should be on each server? and do you know a place like website directories where we can find servers and domains to add them to the crawler database? Thank you :)
Thank you for your wow reply, how can i purchase AN 1.4, and BTW, the buy link doesn't work!! should I browse the site using firefox? and how much will it cost?
You are very welcome!
Try this direct link: https://checkout.google.com/view/buy?o=shoppingcart&shoppingcart=973929896308267
Try using Firefox. (Really surprising that the Google checkout link doesn't show up...)
Which version depends on how you will use it. Commercial/Personal.
Question: Which browser/version are you using? (Thanks so much for telling me...)
Also, v2.0 should be out today.