Hello
I am saving Discovered Web pages to the disk. However I would like to extract the exact url for each of the pages as well.
For example when www.mydomain.com/into.html is stored inside ...\www\mydomain\com\ folder, I would like the file name to be intro.html. Is this possible?
Thanks
Bimalka
You could but there are certain characters, character combinations and filename lengths that are permitted under non-Windows webservers that won't save to disk properly. Also, the DiscoverManager accounts for for a maximum path length including the hash. So, if a file path gets long, and the file name gets really long you'll not be able to save the file, unless you invoke some semi-/un-documented Windows APIs to be able to store paths of nearly unlimited length.
You can change it if you want to here: public static string GetDiscoveryPath(string downloadedDiscoveryDirectory, string absoluteUri, string fullTextIndexType) in DiscoveryManager.cs)
Mike
For best service when you require assistance:
Skype: arachnodedotnet