we have to crawl a site that is multi lingual and in our case the url will look something like
site in English language starts with -> http://xyz.com/en/homepage.aspx
site in German language starts with -> http://xyz.com/de-DE//homepage.aspx
while crawling a site in English, it also takes URL of German language too...
Is there any way to start crawling a site that starts with such path?
Look at this post...
public enum UriClassificationType : short
None = 0,
Domain = 1,
Extension = 2,
FileExtension = 4,
Host = 8,
Scheme = 16,
OriginalDirectoryLevelUp = 32,
OriginalDirectory = 64,
OriginalDirectoryLevel = 128,
OriginalDirectoryLevelDown = 256
Logically 'OR' OriginalDirectory and whatever else you'd like to restrict your crawl to. (OriginalDirectory will restrict to /en/ or /de-DE/...
For best service when you require assistance: