<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://arachnode.net/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>General Questions</title><link>http://arachnode.net/forums/7.aspx</link><description /><dc:language>en</dc:language><generator>CommunityServer 2008.5 SP1 (Debug Build: 31106.3070)</generator><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12696.aspx</link><pubDate>Thu, 17 Jun 2010 13:35:16 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12696</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12696.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12696</wfw:commentRss><description>&lt;p&gt;OK, got it.&amp;nbsp; As an alternative, so you don&amp;#39;t have to worry about oddities calling AN from a WebPage... would it be feasible for you to call the Console via Process.Start(...);?&lt;/p&gt;
&lt;p&gt;I will take a look at your code and see if I can make a RegEx that covers &amp;#39;onclick&amp;#39; events.&lt;/p&gt;
&lt;p&gt;-Mike&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12695.aspx</link><pubDate>Thu, 17 Jun 2010 07:24:30 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12695</guid><dc:creator>flash</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12695.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12695</wfw:commentRss><description>&lt;p&gt;It is just that we doesnt want to have any thing open on the computer which luncher AN , and we want it to happen automaticlly every XXX time some day&lt;/p&gt;
&lt;p&gt;I would take a look about the things you offered and if I would have any farther problems I would connect you&lt;br /&gt;&lt;br /&gt;about the code change&lt;br /&gt;I didnt tried to create an additional regex or anything like that , the reason is that i totally suck at regex so i just copied a regex from asp.net forums and changed the values to fit ;)&lt;br /&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12694.aspx</link><pubDate>Wed, 16 Jun 2010 16:00:07 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12694</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12694.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12694</wfw:commentRss><description>&lt;p&gt;Is there any particular reason why you need to run AN from a WebPage?&amp;nbsp; It can be done, but IMO AN runs best from a console or from the included Service.&amp;nbsp; I believe it would be best to run AN from&amp;nbsp;the included Service, which could check your webpage for a &amp;#39;start&amp;#39; value, and then start crawling.&amp;nbsp; The Service implementation has been much more thoroughly tested than running AN from an ASP.NET page.&amp;nbsp; Thoughts?&lt;/p&gt;
&lt;p&gt;Take a look at Application (may not be in the solution but will be on disk...)&amp;nbsp; This shows you how to run AN from a WebPage.&amp;nbsp; It can be a bit tricky, based on your timeouts, but if you run AN in a background thread all should work fine.&lt;/p&gt;
&lt;p&gt;Thanks for sharing your modification.&lt;/p&gt;
&lt;p&gt;Have you tried running any perf tests on your code?&amp;nbsp; It may be simpler/faster to create an additional RegEx that matches &amp;quot;onclick&amp;quot;, or simply &amp;quot;href=&amp;quot; (after filtering) than running index checks.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12693.aspx</link><pubDate>Wed, 16 Jun 2010 15:31:02 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12693</guid><dc:creator>flash</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12693.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12693</wfw:commentRss><description>&lt;p&gt;&lt;span style="text-decoration:underline;"&gt;What i am trying to do&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;1.&lt;br /&gt;I am trying to create an ASP.NET Page&lt;br /&gt;Which would be directed by some application once every some time (prolly once a day, not sure yet tough)&lt;br /&gt;2.&lt;br /&gt;the ASP.NET page would lunch AN on few sites&lt;br /&gt;(the AN code would be in an diffrante class library and all which would be in the asp.net page would be SomeClassLibrary.Run();)&lt;br /&gt;3.&lt;br /&gt;The class library would check a list of sites and Crawl them&lt;br /&gt;4.&lt;br /&gt;when AN finishes it&amp;#39;s job, I would make my own actions with the data from AN databases and insert it in a diffrate way more fitting to my Domain Model to a diffrante database&lt;/p&gt;
&lt;p&gt;&lt;span style="text-decoration:underline;"&gt;About the discoreyManager Change:&lt;/span&gt;&lt;br /&gt;Also there had been an issue which AN didnt gave me all link results, after a look in DiscoveryManager I had seen it doesnt retrieve hyperlinks which contains &amp;quot;onclick&amp;quot; and some more attributes in them&lt;br /&gt;so I changed the regex to the following:&lt;br /&gt;@&amp;quot;&amp;lt;a[\s]+[^&amp;gt;]*?href[\s]?=[\s\&amp;quot;&amp;quot;\&amp;#39;]+(.*?)[\&amp;quot;&amp;quot;\&amp;#39;]+.*?&amp;gt;([^&amp;lt;]+|.*?)?&amp;lt;\/a&amp;gt;&amp;quot;&lt;br /&gt;&lt;br /&gt;which now gives me the FULL &amp;lt;a href /&amp;gt; instade of only the href part&lt;br /&gt;so in AssignHyperLinkDiscoveries function, after &lt;br /&gt;&amp;quot;if (!match.Value.ToLower().StartsWith(&amp;quot;&amp;lt;script&amp;quot;))&lt;br /&gt;{ &amp;quot;... &lt;br /&gt;I added:&lt;br /&gt;string value = FixMatchValueBug(match.Value); // this gives you &amp;quot;href:LINK&amp;quot; like you had for match.value&lt;br /&gt;string groupValue = value.Replace(&amp;quot;href=&amp;quot;, &amp;quot;&amp;quot;).Replace(&amp;quot;&amp;#39;&amp;quot;,&amp;quot;&amp;quot;).Replace(&amp;#39;&amp;quot;&amp;#39;,&amp;#39; &amp;#39;).Trim(); // this gives you only the LINK like you had for &lt;br /&gt;&lt;br /&gt;I also added the following function:&lt;/p&gt;
&lt;p&gt;//Next part is bit massy, it is the function that returns the &amp;quot;href:LINK&amp;quot; , &lt;br /&gt;private static string FixMatchValueBug(string value)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int nHrefStartIndex = value.IndexOf(&amp;quot;href=&amp;quot;); // gets the starting index of href &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; char perfix = value.ToCharArray()[nHrefStartIndex + &amp;quot;href=&amp;quot;.Length]; //gets the char after href= , to see it if is &amp;#39; or &amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int nHrefEndIndex = value.IndexOf(perfix, nHrefStartIndex + &amp;quot;href=&amp;quot;.Length + 1); //gets the href=LINK ending index&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return value.Substring(nHrefStartIndex, nHrefEndIndex - nHrefStartIndex); //returns the string from HREF till it&amp;#39;s ending&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;and the last, changed the following 2 lines in the AssignHyperLinkDiscoveries&amp;nbsp; function&lt;br /&gt;from:&lt;br /&gt;if (Uri.TryCreate(match.Groups[&amp;quot;HyperLink&amp;quot;].Value.TrimEnd(&amp;#39;/&amp;#39;), UriKind.RelativeOrAbsolute, out hyperLinkDiscovery))&lt;br /&gt;crawlRequest.Tag = match.Value;&lt;br /&gt;to:&lt;br /&gt;if (Uri.TryCreate(groupValue.TrimEnd(&amp;#39;/&amp;#39;), UriKind.RelativeOrAbsolute, out hyperLinkDiscovery))&lt;br /&gt;crawlRequest.Tag = value;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12692.aspx</link><pubDate>Wed, 16 Jun 2010 15:13:42 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12692</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12692.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12692</wfw:commentRss><description>&lt;p&gt;I am completely confused at to what you are trying to do.&amp;nbsp; Could you write out the steps (1. 2. 3.) as to what you need to accomplish?&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;
&lt;p&gt;Mike&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12691.aspx</link><pubDate>Wed, 16 Jun 2010 15:12:05 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12691</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12691.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12691</wfw:commentRss><description>&lt;p&gt;OK, would you mind sharing the RegEx you created?&lt;/p&gt;
&lt;p&gt;You don&amp;#39;t need to explicitly wire the Engine events.&lt;/p&gt;
&lt;p&gt;I need to read your other posts, trying to figure out what you are trying to do...&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12690.aspx</link><pubDate>Wed, 16 Jun 2010 14:25:09 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12690</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12690.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12690</wfw:commentRss><description>&lt;p&gt;Thank you very much.&amp;nbsp; This clears it up!&amp;nbsp; &lt;img src="http://arachnode.net/emoticons/emotion-2.gif" alt="Big Smile" /&gt;&lt;/p&gt;
&lt;p&gt;I will answer your questions in :45.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12689.aspx</link><pubDate>Wed, 16 Jun 2010 14:20:22 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12689</guid><dc:creator>flash</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12689.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12689</wfw:commentRss><description>&lt;p&gt;this is my cilent account&lt;br /&gt;itayeng = flash same user&lt;br /&gt;itayeng is the developer which works for the site that needs Aranchnode&lt;br /&gt;Flash is the cilent which bought the application&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12688.aspx</link><pubDate>Wed, 16 Jun 2010 14:17:18 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12688</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12688.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12688</wfw:commentRss><description>&lt;p&gt;How do you have the code for DiscoveryManager.cs?&lt;/p&gt;
&lt;p&gt;I will help you, but please tell me how you have the code inside the SiteCrawler project.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12687.aspx</link><pubDate>Wed, 16 Jun 2010 14:05:38 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12687</guid><dc:creator>itayeng</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12687.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12687</wfw:commentRss><description>&lt;p&gt;Thank you on the quick replay&lt;/p&gt;
&lt;p&gt;in SiteCrawler the only change I made was the previous&lt;br /&gt;in the discoveryManager I changed the regex of the hyperlink&lt;br /&gt;and in the hyperlink matches function I added a part which cuts the parts you used (because my regex got the hole &amp;quot;&amp;lt;a href ... *** &amp;gt; &amp;lt;/a&amp;gt;&amp;quot; and yours copied only the &amp;quot;href=***&amp;quot;, so I cutted the values and pasted them instade)&lt;br /&gt;&lt;br /&gt;expact that I didnt tuch the SiteCrawler but I opened a new ClassLibrary and copied the Program file and App.config from the console application, in the program file i commented the console parts&lt;br /&gt;&lt;br /&gt;In the Program(The Copied one) I also commented _crawler.Engine.OnCrawlRequestCompleted += Engine_OnCrawlRequestCompleted; and the Engine_OnCrawlRequestCompleted function (do I need to keep this ?)&lt;br /&gt;&lt;br /&gt;Expact that I call the function from my ASP.NET page and let it run&lt;br /&gt;If you think that is the case then I would paste you the changed parts if you wish or just roll back to the original version , altough the change in discoveryManager is quite important and I really don&amp;#39;t think it&amp;#39;s him&lt;br /&gt;&lt;br /&gt;Also as far as i checked&lt;br /&gt;the original console project runs smoothly.&lt;br /&gt;&lt;br /&gt;Thank you on your time a paitence&lt;/p&gt;
&lt;p&gt;Itay&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>Re: can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12686.aspx</link><pubDate>Wed, 16 Jun 2010 13:42:58 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12686</guid><dc:creator>arachnode.net</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12686.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12686</wfw:commentRss><description>&lt;p&gt;I will answer this question after you explain how you modified SiteCrawler.dll.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>can I start the Crawl Action thrue a web site</title><link>http://arachnode.net/forums/thread/12685.aspx</link><pubDate>Wed, 16 Jun 2010 11:45:28 GMT</pubDate><guid isPermaLink="false">a2478770-777f-41ab-83b8-a21ff47ebb1f:12685</guid><dc:creator>itayeng</dc:creator><slash:comments>0</slash:comments><comments>http://arachnode.net/forums/thread/12685.aspx</comments><wfw:commentRss>http://arachnode.net/forums/commentrss.aspx?SectionID=7&amp;PostID=12685</wfw:commentRss><description>&lt;p&gt;can I post to the crawler thrue a web page to start crawling and once it entered to return to the cilent a diffrante page (with my results after i fetched them?)&lt;br /&gt;&lt;br /&gt;Because I tried to create an application which is almost the same as the console one just without the &amp;#39;console&amp;#39; stuff (also retrieced all references) and when i send it seems it gets to the egine.stop() instantly without crawling anything&lt;br /&gt;&lt;br /&gt;while if I test it in the console app it starts to crawl a lot of stuff before it actually gets there&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>