<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://arachnode.net/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>General Questions</title><link>http://arachnode.net/forums/7.aspx</link><description /><dc:language>en</dc:language><generator>CommunityServer 2008.5 SP1 (Debug Build: 31106.3070)</generator><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15612.aspx</link><pubDate>Sat, 23 Jul 2011 01:55:58 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15612</guid><dc:creator>ptrennum</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15612.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15612</wfw:commentRss><description>&lt;p&gt;Thanks Mike this is great info and exactly what I was wondering!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15606.aspx</link><pubDate>Tue, 19 Jul 2011 00:36:55 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15606</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15606.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15606</wfw:commentRss><description>&lt;p&gt;It does not happen very often &lt;img src="http://arachnode.net/emoticons/emotion-4.gif" alt="Stick out tongue" /&gt; but I may be wrong on this...&lt;/p&gt;
&lt;p&gt;The space &amp;#39;saved&amp;#39; by storing in the 8KB extents in the DB and efficiently on disk (Size: = Size on disk:) seems to be offset by the information required to either store a &amp;#39;0x0&amp;#39; in the Source column (less space) or a NULL value (more space)... &amp;nbsp;So, unless you were storing a lot of files that were significantly below the cluster size of the drive, it is likely that SQL will take up slightly MORE space than storing on disk. &amp;nbsp;My tests show about a 5% overhead.&lt;/p&gt;
&lt;p&gt;(I compared the size of the WebPages directory with the size of the shrunk WebPages FILEGROUP file before and after setting the Source column of the WebPages table to &amp;#39;0x0&amp;#39; and the difference was greater than the size of the WebPages directory. &amp;nbsp;Settings the Source column to &amp;#39;NULL&amp;#39; increased the difference.)&lt;/p&gt;
&lt;p&gt;&lt;a href="http://arachnode.net/cfs-file.ashx/__key/CommunityServer.Discussions.Components.Files/7/3443.wp1.JPG"&gt;&lt;img src="http://arachnode.net/resized-image.ashx/__size/550x0/__key/CommunityServer.Discussions.Components.Files/7/3443.wp1.JPG" border="0" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/aa174529(v=sql.80).aspx"&gt;http://msdn.microsoft.com/en-us/library/aa174529(v=sql.80).aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;What you might want to try is either compressing the Source column or compressing the DownloadedWebPages folder.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://arachnode.net/cfs-file.ashx/__key/CommunityServer.Discussions.Components.Files/7/4743.Squashed.JPG"&gt;&lt;img src="http://arachnode.net/resized-image.ashx/__size/550x0/__key/CommunityServer.Discussions.Components.Files/7/4743.Squashed.JPG" border="0" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.microsoft.com/sqlserver/2008/en/us/compression.aspx"&gt;http://www.microsoft.com/sqlserver/2008/en/us/compression.aspx&lt;/a&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15602.aspx</link><pubDate>Mon, 18 Jul 2011 16:19:44 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15602</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15602.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15602</wfw:commentRss><description>&lt;p&gt;You are very welcome.&lt;/p&gt;
&lt;p&gt;I will put on a crawl to give some real numbers.&lt;/p&gt;
&lt;p&gt;Be back later today...&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15600.aspx</link><pubDate>Sat, 16 Jul 2011 18:01:09 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15600</guid><dc:creator>ptrennum</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15600.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15600</wfw:commentRss><description>&lt;p&gt;Thanks a lot Mike.&amp;nbsp; Just to get your final opinion on this...&lt;/p&gt;
&lt;p&gt;Do you think that saving to the DB will actually reduce the storage requirements and if yes by approximately how much?&lt;/p&gt;
&lt;p&gt;Thanks again!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15598.aspx</link><pubDate>Sat, 16 Jul 2011 05:45:23 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15598</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15598.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15598</wfw:commentRss><description>&lt;p&gt;The changes were made in Plugins\SearchManager.cs and in the Web project.&lt;/p&gt;
&lt;p&gt;When requesting functionality that requires the WebPage Source, AN will check the filesystem first, and then check the DB, and if no WebPage Source is found a message will be presented to the user indicating an error and a row will be inserted into the Exceptions table.&lt;/p&gt;
&lt;p&gt;Check out the SVN history to see the changes.&lt;/p&gt;
&lt;p&gt;(TortoiseSVN &amp;gt; Show log)&lt;/p&gt;
&lt;p&gt;&lt;a href="http://arachnode.net/cfs-file.ashx/__key/CommunityServer.Discussions.Components.Files/7/2727.Log1.JPG"&gt;&lt;img border="0" src="http://arachnode.net/resized-image.ashx/__size/550x0/__key/CommunityServer.Discussions.Components.Files/7/2727.Log1.JPG" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You can compare revisions to see the changes.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://arachnode.net/cfs-file.ashx/__key/CommunityServer.Discussions.Components.Files/7/3175.Log2.JPG"&gt;&lt;img border="0" src="http://arachnode.net/resized-image.ashx/__size/550x0/__key/CommunityServer.Discussions.Components.Files/7/3175.Log2.JPG" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To store the WebPage Source in the database you need to set ApplicationSettings.InsertWebPageSource = true, and (optionally) set ApplicationSettings.SaveDiscoveredWebPagesToDisk = false.&lt;/p&gt;
&lt;p&gt;That should be it.&lt;/p&gt;
&lt;p&gt;Whenever a modification needs to be made to AN that is for the good of AN, and for others, I am happy to commit to the modification. &amp;nbsp;This includes bug fixes and features such as this. &amp;nbsp;When in doubt, please ask. &amp;nbsp;&lt;img src="http://arachnode.net/emoticons/emotion-2.gif" alt="Big Smile" /&gt;&lt;/p&gt;
&lt;p&gt;You are very welcome.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15597.aspx</link><pubDate>Fri, 15 Jul 2011 22:57:23 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15597</guid><dc:creator>ptrennum</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15597.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15597</wfw:commentRss><description>&lt;p&gt;awesome thanks!!&amp;nbsp; So the console project is where the changes have been made then?&amp;nbsp; Should I just be able to run the console and it will now put everything in DB instead of file system?&amp;nbsp; Are there any changes to the Web project as well?&lt;/p&gt;
&lt;p&gt;thanks again&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15596.aspx</link><pubDate>Fri, 15 Jul 2011 21:59:42 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15596</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15596.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15596</wfw:commentRss><description>&lt;p&gt;Checked in. &amp;nbsp;&lt;img src="http://arachnode.net/emoticons/emotion-2.gif" alt="Big Smile" /&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15595.aspx</link><pubDate>Thu, 14 Jul 2011 22:56:23 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15595</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15595.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15595</wfw:commentRss><description>&lt;p&gt;OK.&lt;/p&gt;
&lt;p&gt;Yes. &amp;nbsp;I will test it tonight and tomorrow and likely check it in tomorrow evening.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15591.aspx</link><pubDate>Thu, 14 Jul 2011 21:31:15 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15591</guid><dc:creator>ptrennum</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15591.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15591</wfw:commentRss><description>&lt;p&gt;1) sorry I had only collected 1 webpage for the size&lt;/p&gt;
&lt;p&gt;So you will make the code changes for storing webpages in the DB and I can pull it down from SVN?&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15590.aspx</link><pubDate>Thu, 14 Jul 2011 21:17:16 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15590</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15590.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15590</wfw:commentRss><description>&lt;p&gt;1.) I mean, how many webpages have you collected for the size you have on disk? &amp;nbsp;&lt;img src="http://arachnode.net/emoticons/emotion-2.gif" alt="Big Smile" /&gt;&lt;/p&gt;
&lt;p&gt;2.) This may be a viable option, storing webpages in the DB.&lt;/p&gt;
&lt;p&gt;3.) Cool. &amp;nbsp;OK.&lt;/p&gt;
&lt;p&gt;The modification? &amp;nbsp;I will make it for you as this is a nice feature.&lt;/p&gt;
&lt;p&gt;It will first.) examine the filesystem for the Discovery and then.) examine the DB and if the Discovery isn&amp;#39;t found it will finally.) report the missing Discovery to the user and report the exception to the database.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15588.aspx</link><pubDate>Thu, 14 Jul 2011 16:59:56 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15588</guid><dc:creator>ptrennum</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15588.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15588</wfw:commentRss><description>&lt;p&gt;Sorry about the delayed response here.&lt;/p&gt;
&lt;p&gt;1) would be indexing about 10-15 webpages total&lt;/p&gt;
&lt;p&gt;2) to allow users to perform searches on the 10-15 webpage index while updating the index potentially twice a year&lt;/p&gt;
&lt;p&gt;3)will do&lt;/p&gt;
&lt;p&gt;What is the code modification to elect to insert WebPage source and read from the DB?&amp;nbsp; I know there is at least one config change but there must be more...&lt;/p&gt;
&lt;p&gt;Thanks!!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15560.aspx</link><pubDate>Wed, 06 Jul 2011 21:14:44 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15560</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15560.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15560</wfw:commentRss><description>&lt;p&gt;Compress: Let Windows compress the contents of the files. &amp;nbsp;You can always experiment with things like this as well:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://support.microsoft.com/kb/307987"&gt;http://support.microsoft.com/kb/307987&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Data_cluster"&gt;http://en.wikipedia.org/wiki/Data_cluster&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://support.microsoft.com/kb/140365"&gt;http://support.microsoft.com/kb/140365&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yes, you do need to store the WebPages else Lucene.Net won&amp;#39;t have anything to summarize. &amp;nbsp;The text of the pages isn&amp;#39;t stored in Lucene.Net as my testing indicated that searching was much faster when retrieving the WebPage from disk rather than extracting from the index itself. &amp;nbsp;And, optimizing the index takes significantly longer when you have large amounts of data, as you would expect.&lt;/p&gt;
&lt;p&gt;Alternatively to storing the WebPages on disk you could elect to insert the WebPage source and read from the database (requires code modification). &amp;nbsp;This method will make better usage of disk space (due to cluster allocation), but in my experience personally and professionally, once the database table reaches N rows (depends on your system) the table becomes un-manageable (very subjective, I know), and operations such as re-organize and re-build simple take too long.&lt;/p&gt;
&lt;p&gt;As an example, in 2008 I was aware of a 400 million row table that stored the text of posts (not complete webpages) on a 16-way server with 64GB of RAM and used a 50 drive SAN and it took 24 hours to re-organize the index, and it was clear that this particular implementation would not sustain the current rate of growth.&lt;/p&gt;
&lt;p&gt;However, if you are building static indexes, this solution may work for you.&lt;/p&gt;
&lt;p&gt;1.) How many WebPages total?&lt;/p&gt;
&lt;p&gt;2.) What is the desired implementation of AN?&lt;/p&gt;
&lt;p&gt;3.) When you do get the index back up to size, do take a screenshot, please.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15559.aspx</link><pubDate>Wed, 06 Jul 2011 17:52:22 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15559</guid><dc:creator>ptrennum</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15559.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15559</wfw:commentRss><description>&lt;p&gt;The webpages directory is about 6GB and the Lucene.Net indexes are about 1.5GB&lt;/p&gt;
&lt;p&gt;I&amp;#39;m not sure what you mean by compress the WebPages directory?&amp;nbsp; &lt;/p&gt;
&lt;p&gt;I will make the autogrow less than 1GB thanks!&lt;/p&gt;
&lt;p&gt;I would take a screenshot but I have just restored everything to an older (smaller) version as I completely ran out of space on my server.&lt;/p&gt;
&lt;p&gt;I&amp;#39;m trying to figure out why I need the WebPages to be saved?&amp;nbsp; I tried changing the config to insertwebpages but when I ran a search no results were returned.&amp;nbsp; As soon as I set it back to save webpages the search results worked again.&amp;nbsp; Do I have to save webpages in order for the search to work?&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15558.aspx</link><pubDate>Tue, 05 Jul 2011 22:15:41 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15558</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15558.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15558</wfw:commentRss><description>&lt;p&gt;What are the distinct sizes of the WebPages directory and the Lucene.NET indexes?&lt;/p&gt;
&lt;p&gt;The only thing you can really do to reduce the storage for WebPages is to compress the directory.&lt;/p&gt;
&lt;p&gt;You can also set the AutoGrow to something less than 1GB for the DB files. &amp;nbsp;(this will free up space on your drive(s) too...)&lt;/p&gt;
&lt;p&gt;Would you please take a screenshot of the Lucene.NET directory?&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>How to reduce storage requirements...</title><link>http://arachnode.net/forums/thread/15557.aspx</link><pubDate>Tue, 05 Jul 2011 18:39:37 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15557</guid><dc:creator>ptrennum</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15557.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=15557</wfw:commentRss><description>&lt;p&gt;I am wondering what options are available to reduce the amount of storage required for saving webpages etc.&amp;nbsp; Right now after indexing two sites I am at about 7-8GB between saved webpages and Lucene indexes.&lt;/p&gt;
&lt;p&gt;I am not storing images or anything else other than the webpages and the lucene indexes.&amp;nbsp; Basically I want to be able to continue indexing and allowing for searches on my indexes without getting up to huge storage numbers if possible.&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>