Probably the way I pulled this down but getting an error. I am walking through it but wanted to throw a note up.
Pulled down latest from svn, but tried to keep my connectionstring info there. THANKS for making the changes so that there are no longer any hard coded connectionstrings anywhere that I can see.
Left db as is.
Recompiled and am just trying to run the web search page directly.
SearchResults.ascx.cs has webPagesRow coming back null: ArachnodeDataSet.WebPagesRow webPagesRow = arachnodeDAO.GetWebPage(Path.GetFileNameWithoutExtension(discoveryPath));
the discoveryPath looks to be:"C:\\AppWorkspace\\arachnode.net\\source\\Console\\DownloadedWebPages\\http\\dev\\communityserver\\com\\forums\\979.aspx"Not sure where it is grabbing this - must be the first search result that comes back with a hit on my search of 'test'. So it's in the section where the file does NOT exist.
Anyway webPagesRow comes back null which makes the following bomb: ManagedWebPage managedWebPage = webPageManager.ManageWebPage(webPagesRow.ID, webPagesRow.AbsoluteUri, webPagesRow.Source, webPagesRow.FullTextIndexType, false, false, true);
I know I took a shortcut in pulling down latest and trying to recompile-run, but I assume this could be a bonified error we need to catch?
Thx
I believe I have a better strategy for finishing the reporting procedures. So, Friday, Saturday and Sunday are dedicated to finishing 1.1.
To solve the 'One Place One ConnectionString' problem I think something like this will need to be implemented: http://geekswithblogs.net/akraus1/articles/75391.aspx
Tonight, I added a configuration parameter to set the Request timeout.
Also, I have a goal to get the current dataset and lucene.net indexes to 1,000,000 indexed pages. We're at 74,888 right now. We should be at 100,000 by morning.
For best service when you require assistance:
Skype: arachnodedotnet
There is still a hardcoded connection string in IsDisallowed.cs, sadly.
I added code in the Search functionality to try and retrieve the page from the database if it isn't found on disk. So, if the WebPage is returning NULL then 979 isn't in the WebPages table. Either that, or there is an error in retrieving WebPage 979.
Yes. We should be trapping this error and logging it in the database. I just add some code that updates the page at the bottom of the search page when a page/result isn't found and logs the error as well.
Thanks for catching this!!!
I'm working hard on getting the Reporting right - I had to change tack on the Reporting Views - managing 80+ views and stored procedures is pretty draining.
Yeah I plugged in a working connectionstring, and tweaked the Settings.Settings file with my connection info, and can do crawls.
But the search definitely bombs because of the error that needs to be trapped. I'll see if you make a tweak to the code in the next day or so, rather than tweak it myself. Maybe you'll have some comments on my reply regarding ideas on the priority stuff and end up doing something there too.
Keep up the good work!
Just checked in the error handling code. :)
All good stuff on the priority post. I'll work on nailing down 1.1 and will tackle whatever strategy we come up with for 1.2.
-Mike
I'm wondering if my data is maybe whacked? In your new code below, the following line is still giving an error:
if (webPagesRow.AbsoluteUri != null
It's webPagesRow that is null so the above line will actually cause a null exception error. I"ll walk thru the code and see why webPagesRow is returning null.
try
{
webPagesRow = ArachnodeDAO.GetWebPage(
.GetFileNameWithoutExtension(discoveryPath));
(ArachnodeDAO);
); managedWebPage.FileStream.Close(); discoveryPath = managedWebPage.DiscoveryPath; }
managedWebPage.FileStream.Close();
discoveryPath = managedWebPage.DiscoveryPath;
}
exception) {
) { _arachnodeDAO.InsertException(webPagesRow.AbsoluteUri,
_arachnodeDAO.InsertException(webPagesRow.AbsoluteUri,
, exception); }
else
_arachnodeDAO.InsertException(
, exception); } TotalNumberOfHits--;
TotalNumberOfHits--;
;
Sorry, that didn't paste very well did it :(
Just checked in again. My bad.
Is your working directory \AppWorkspace too?
Don't remember which test I just did (which search term I used). But, the discovery path is:
discoveryPath = "C:\\AppWorkspace\\arachnode.net\\source\\Console\\DownloadedWebPages\\http\\en\\wikipedia\\org\\wiki\\3461.htm"
...and no that file does not exist.
I had done a db reset, then started a new crawl against the default uri you provide in program.cs. Actually I think you removed this from the current revision but I grabbed from previous :)
Did the file exist at one point? Resetting the database also resets the IDENTITY SEEDS.
I bet it would be good once the product matures a bit more to have the service install itself and protect the lucene.net index files like SQL does with its databases...
Yeah. And maybe I'll just change:
if
(webPagesRow.AbsoluteUri != null)
to be:
if ((webPagesRow != null) && (webPagesRow.AbsoluteUri != null)) UPDATE I made that change above, and now down in SearchResults.ascx.cs I'm getting a null error at: uxLblStrength.Text = Document.GetField( "strength").StringValue(); ...no strength field in the Document. Should I manually delete all lucene index files and do a reset just to make sure all is in synch?! Thx
((webPagesRow != null) && (webPagesRow.AbsoluteUri != null))
UPDATE
I made that change above, and now down in SearchResults.ascx.cs I'm getting a null error at:
uxLblStrength.Text = Document.GetField( "strength").StringValue(); ...no strength field in the Document. Should I manually delete all lucene index files and do a reset just to make sure all is in synch?! Thx
uxLblStrength.Text = Document.GetField(
"strength").StringValue();
...no strength field in the Document. Should I manually delete all lucene index files and do a reset just to make sure all is in synch?!
Sure. That works too.
I had to change the lucene.net indexes. I removed a few fields and added one or two, and I have to write a converter for 1.0 to 1.1 indexes. Sadly, I'm not 100% happy with myself for this... it's essentially a breaking change. I have to write a converter on the chance that someone has invested a lot of serious crawling time in the 1.0 indexes.
So, yeah - I would start fresh. Keep the 1.0 indexes and I will write a converter/merger.
I zapped all the stored web pages, and all the lucene index files, and ran a db reset. Then I did a walk that grabbed about 1000 domains and only about 100 webpages.
I killed the crawl, and sadly my lucene files look very small. The search does not bomb now, but the search isn't returning anything.
I get 4 files in the currentcrawls folder, and 2 in the parent folder, but they are all 0k or 1k.
Any ideas?
Unfortunately I can't just do a revert in SVN. I pull things down, convert the project/solution to vs2008, plug in my connectionstrings, etc. Maybe I'm missing a config file change since I am not replacing them?
Sorry to waste time on this.
Lucene.net stores results in memory and they aren't immediately flushed.
Find the Stop() method in ManageLuceneDotNetIndexes. Here's where the indexes will write to disk if the crawl is stopped. If you click close on the console window you sould get a dialog like a application that is hanging. When this happens the console is actually writing it's state back to the DB.
Is your CrawlRequests table empty?
What is in the Exceptions table?
LOL yeah I have the single crawl request in the crawl table. But I definitely missed the exceptions! I'm getting a ton of:
Procedure or function arachnode_omsp_CrawlRequests_SELECT has too many arguments specified.
I totally forgot to even look here. sp change? Or did I miss a revision in the data layer?
EDIT:
Yeah, arachnode_omsp_CrawlRequests_SELECT in my db has 3 parms but my ArachnodeDataset.xsd has 4 defined. Looks like added
Yeah - I did a database checkin too. :) I should have clarified this. Burning the candle at both end here. Every time I have to touch the reporting views they always kick my ass.