Hi,
I have a small doubt. Does the WebPages_MetaData text / xml content change (or get updated) when I recrawl the page the next day. Or does it create a new WebPageID (in WebPage) & enter a new record in WebPages_MetaData table if it gets recrawled. Or does it check for any modifcations in the page (or its contents) & if modified then creates a record or updates the existing WebPages_MetaData record?
Also does the recrawl take into account what is there in the disallowed table & bypass crawling if the url is present over there?
Please note that the recrawl that I am talking about will be in separate runs on different dates.
Thanks
Debasish
Sorry missed out one more question.
What does the lastDiscovered and lastModified dates tell us here exactly (WebPage). I have noticed that the lastModifed field is mostly null. When does it get populated.
Yes - each WebPage and WebPage_MetaData is tied to the WebPage it crawled - and AbsoluteUri is an AbsoluteUri is an AbsoluteUri...
The crawl process modifies the row, if existing.
If an AbsoluteUri is in the DisallowedAbsoluteUris table it won't be crawled.
-Mike
For best service when you require assistance:
Skype: arachnodedotnet
Those dates are Database row "timestamps".
It gets populated/updated when the WebPage source changes.
IM when you are ready to move forward, of you already haven't.