Hello,
we have to crawl a site that is multi lingual and in our case the url will look something like
site in English language starts with -> http://xyz.com/en/homepage.aspx
site in German language starts with -> http://xyz.com/de-DE//homepage.aspx
while crawling a site in English, it also takes URL of German language too...
Is there any way to start crawling a site that starts with such path?
Thanks
Look at this post...
http://arachnode.net/forums/p/1718/15661.aspx#15661
[Flags] public enum UriClassificationType : short { None = 0, Domain = 1, Extension = 2, FileExtension = 4, Host = 8, Scheme = 16, OriginalDirectoryLevelUp = 32, OriginalDirectory = 64, OriginalDirectoryLevel = 128, OriginalDirectoryLevelDown = 256 }
Logically 'OR' OriginalDirectory and whatever else you'd like to restrict your crawl to. (OriginalDirectory will restrict to /en/ or /de-DE/...
Thanks,Mike
For best service when you require assistance:
Skype: arachnodedotnet