<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://arachnode.net/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Feature Requests</title><link>http://arachnode.net/forums/9.aspx</link><description /><dc:language>en</dc:language><generator>CommunityServer 2008.5 SP1 (Debug Build: 31106.3070)</generator><item><title>Re: Handling content type application/download</title><link>http://arachnode.net/forums/thread/15095.aspx</link><pubDate>Sat, 16 Apr 2011 19:51:33 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15095</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15095.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=9&amp;PostID=15095</wfw:commentRss><description>&lt;p&gt;Yes, it would be nice if there were a 1 to 1 mapping of content-types to extensions, but as you know, this isn&amp;#39;t the case. &amp;nbsp;Oh, well - the internet works pretty well without that. &amp;nbsp;&lt;img src="http://arachnode.net/emoticons/emotion-2.gif" alt="Big Smile" /&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: Handling content type application/download</title><link>http://arachnode.net/forums/thread/15094.aspx</link><pubDate>Fri, 15 Apr 2011 20:50:56 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15094</guid><dc:creator>ucg</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15094.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=9&amp;PostID=15094</wfw:commentRss><description>&lt;p&gt;Thanks, that got it going.&amp;nbsp; I had to add &amp;quot;application/download&amp;quot; to the content types table first.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;One issue that I had to work around is that the Arachnode file is output with the extension set in the AllowedDataTypes column &amp;quot;FullTextIndexTypes&amp;quot; regardless of the type of data being downloaded.&amp;nbsp; For my application I set it to &amp;quot;.dat&amp;quot;.&amp;nbsp; To get real output names I already have a plugin that generates a table that maps the Arachnode &amp;quot;hash&amp;quot; name to the Discovery URI (or &amp;quot;crawlRequest.WebClient.HttpWebResponse.ResponseUri&amp;quot; which more acurately reflects the returned file) for prost-crawl moving and renaming of data, it is a post request plugin that builds entires during the crawl.&amp;nbsp; That had to be altered to record the file name given in the &amp;quot;Content-disposition&amp;quot; part of the header so the original data type would not be lost.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;Like this...&lt;/p&gt;
&lt;p&gt;&amp;nbsp; WebHeaderCollection whc = (crawlRequest.WebClient.HttpWebResponse != null)? crawlRequest.WebClient.HttpWebResponse.Headers : null;&lt;br /&gt;&amp;nbsp; string localcontenttype = (whc != null)? whc.Get(&amp;quot;Content-Type&amp;quot;).ToLower() : null;&lt;br /&gt;&amp;nbsp; string localfilename = ((localcontenttype != null) &amp;amp;&amp;amp; localcontenttype.Equals(&amp;quot;application/download&amp;quot;)) ? whc.Get(&amp;quot;Content-Disposition&amp;quot;) : null;&lt;br /&gt;&amp;nbsp; if(localfilename != null) localfilename = Regex.Replace(localfilename,&amp;quot;.*?filename=[\\\&amp;quot;]?([^\\\&amp;quot;]*).*&amp;quot;,&amp;quot;$1&amp;quot;);&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: Handling content type application/download</title><link>http://arachnode.net/forums/thread/15093.aspx</link><pubDate>Fri, 08 Apr 2011 17:34:00 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15093</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15093.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=9&amp;PostID=15093</wfw:commentRss><description>&lt;p&gt;You need to add the appropriate row to the cfg.AllowedDataTypes table.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Handling content type application/download</title><link>http://arachnode.net/forums/thread/15092.aspx</link><pubDate>Fri, 08 Apr 2011 16:10:36 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:15092</guid><dc:creator>ucg</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/15092.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=9&amp;PostID=15092</wfw:commentRss><description>&lt;p&gt;In attempting to gather data from a number of sites our spider has encountered a site that returns pages with the content type &amp;quot;application/download&amp;quot;.&amp;nbsp; The normal browser response is to prompt the user to view or save the file.&amp;nbsp; Currently it appears that Arachnode ignores this content type entirely.&amp;nbsp; Are there any plans to address this?&amp;nbsp; Any advice?&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>