I have a small doubt. Does the WebPages_MetaData text /
xml content change (or get updated) when I recrawl the page the next
day. Or does it create a new WebPageID (in WebPage) & enter a new record in
WebPages_MetaData table if it gets recrawled. Or does it check for any
modifcations in the page (or its contents) & if modified then
creates a record or updates the existing WebPages_MetaData record?
does the recrawl take into account what is there in the
disallowed table & bypass crawling if the url is present over
Please note that the recrawl that I am talking about will be in separate runs on different dates.
Sorry missed out one more question.
What does the lastDiscovered and lastModified dates tell us here exactly (WebPage). I have noticed that the lastModifed field is mostly null. When does it get populated.
Yes - each WebPage and WebPage_MetaData is tied to the WebPage it crawled - and AbsoluteUri is an AbsoluteUri is an AbsoluteUri...
The crawl process modifies the row, if existing.
If an AbsoluteUri is in the DisallowedAbsoluteUris table it won't be crawled.
For best service when you require assistance:
Those dates are Database row "timestamps".
It gets populated/updated when the WebPage source changes.
IM when you are ready to move forward, of you already haven't.