arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Write with arachnode.net

rated by 0 users
Answered (Verified) This post has 1 verified answer | 16 Replies | 2 Followers

Top 10 Contributor
229 Posts
megetron posted on Thu, Feb 25 2010 2:23 AM

Hello,
Does arachnode.net knows how to write, or only to read data?

as a part of a SEO campaign sometimes you wish to send over the internet links to a specific website.

Adding such links to pages that allows adding data automatically (forums, comments, and so on)...

is it possible to do such, or this is a new feature?

Thanks.

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts
Verified by megetron

Natively, AN doesn't write/post, but you could easily do so from a CrawlAction.

Good starting links...

  • WebClient.UploadData Method

    GetString(responseArray)) [C#] string uriString; Console.Write("\nPlease enter the URI to post data to {for example, http://www.contoso.com} : "); uriString ...
    msdn.microsoft.com/.../system.net.webclient.uploaddata(VS.71).aspx - Cached - Similar -
  • C# and the Web: Writing a Web Client Application with Managed Code ...

    In order to build a Web client using managed code, the author had to build two custom classes in C# - HTTPWebRequest and HTTPWebResponse.
    msdn.microsoft.com/en-us/magazine/cc301587.aspx - Cached - Similar -
    More results from msdn.microsoft.com »
  • Scott Hanselman's Computer Zen - HTTP POSTs and HTTP GETs with ...

    HTTP POSTs and HTTP GETs with WebClient and C# and Faking a PostBack ... With a POST the 'DATA' moves from the QueryString into the HTTP Body, ...
    www.hanselman.com/.../HTTPPOSTsAndHTTPGETsWithWebClientAndCAndFakingAPost... - Cached -
  • Dave Amenta .com » Blog Archive » C#: WebClient Usage

    May 9, 2008 ... UploadString or WebClient.UploadData you can POST data to the server easily. ... C# WebClient Asynchronous call example: ...
    www.daveamenta.com/2008-05/c-webclient-usage/ - Cached - Similar -
  • Http Post in C#

    Apr 21, 2006 ... DownloadString(msSite); But this is not post. WebClient in C# allow in addition to download files from the web very easilly. Regards ...
    geekswithblogs.net/rakker/archive/2006/04/21/76044.aspx - Cached - Similar -
  • Fake a form submission with C# WebClient - Stack Overflow

    Fake a form submission with C# WebClient .... The "string postData =" in the example above is the entire post data that you want to send. ...
    stackoverflow.com/.../fake-a-form-submission-with-c-webclient - Cached - Similar -
  • Use HttpWebRequest To send POST HTTP request to another web server

    Dec 17, 2009 ... Another realted article about posting data to another web server and directing user to that site with it is is posted here Post Request To ...
    www.netomatix.com/httppostdata.aspx - Cached - Similar -
  • C# & WebClient - POST/File upload problem - C# / C Sharp answers

    Dear Group, I'm currently developing a simple Windows application in C#, which is supposed to upload images - through 'WebClient' - into remote ...
    bytes.com › topicc sharpanswers - Cached - Similar -
  • HTTP POST/WebClient (C#) and CSV file

    Back to RapNet Upload · Full HTTP POST/WebRequest Example (C#) and CSV file · HTTP POST/WebClient (C#) and CSV formated string; HTTP POST/WebClient (C#) and ...
    technet.rapaport.com/Info/LotUpload/.../WebClient_file.aspx - Cached -
  • .NET Buzz Forum - HTTP POSTs and HTTP GETs with WebClient and C ...

    1 post - 1 author - Last post: Dec 5, 2004 Original Post: HTTP POSTs and HTTP GETs with WebClient and C# and Faking a PostBack. Feed Title: Scott Hanselman's ComputerZen.com ...
  • Mike

    For best service when you require assistance:

    1. Check the DisallowedAbsoluteUris and Exceptions tables first.
    2. Cut and paste actual exceptions from the Exceptions table.
    3. Include screenshots.

    Skype: arachnodedotnet

    All Replies

    Top 10 Contributor
    1,905 Posts
    Verified by megetron

    Natively, AN doesn't write/post, but you could easily do so from a CrawlAction.

    Good starting links...

  • WebClient.UploadData Method

    GetString(responseArray)) [C#] string uriString; Console.Write("\nPlease enter the URI to post data to {for example, http://www.contoso.com} : "); uriString ...
    msdn.microsoft.com/.../system.net.webclient.uploaddata(VS.71).aspx - Cached - Similar -
  • C# and the Web: Writing a Web Client Application with Managed Code ...

    In order to build a Web client using managed code, the author had to build two custom classes in C# - HTTPWebRequest and HTTPWebResponse.
    msdn.microsoft.com/en-us/magazine/cc301587.aspx - Cached - Similar -
    More results from msdn.microsoft.com »
  • Scott Hanselman's Computer Zen - HTTP POSTs and HTTP GETs with ...

    HTTP POSTs and HTTP GETs with WebClient and C# and Faking a PostBack ... With a POST the 'DATA' moves from the QueryString into the HTTP Body, ...
    www.hanselman.com/.../HTTPPOSTsAndHTTPGETsWithWebClientAndCAndFakingAPost... - Cached -
  • Dave Amenta .com » Blog Archive » C#: WebClient Usage

    May 9, 2008 ... UploadString or WebClient.UploadData you can POST data to the server easily. ... C# WebClient Asynchronous call example: ...
    www.daveamenta.com/2008-05/c-webclient-usage/ - Cached - Similar -
  • Http Post in C#

    Apr 21, 2006 ... DownloadString(msSite); But this is not post. WebClient in C# allow in addition to download files from the web very easilly. Regards ...
    geekswithblogs.net/rakker/archive/2006/04/21/76044.aspx - Cached - Similar -
  • Fake a form submission with C# WebClient - Stack Overflow

    Fake a form submission with C# WebClient .... The "string postData =" in the example above is the entire post data that you want to send. ...
    stackoverflow.com/.../fake-a-form-submission-with-c-webclient - Cached - Similar -
  • Use HttpWebRequest To send POST HTTP request to another web server

    Dec 17, 2009 ... Another realted article about posting data to another web server and directing user to that site with it is is posted here Post Request To ...
    www.netomatix.com/httppostdata.aspx - Cached - Similar -
  • C# & WebClient - POST/File upload problem - C# / C Sharp answers

    Dear Group, I'm currently developing a simple Windows application in C#, which is supposed to upload images - through 'WebClient' - into remote ...
    bytes.com › topicc sharpanswers - Cached - Similar -
  • HTTP POST/WebClient (C#) and CSV file

    Back to RapNet Upload · Full HTTP POST/WebRequest Example (C#) and CSV file · HTTP POST/WebClient (C#) and CSV formated string; HTTP POST/WebClient (C#) and ...
    technet.rapaport.com/Info/LotUpload/.../WebClient_file.aspx - Cached -
  • .NET Buzz Forum - HTTP POSTs and HTTP GETs with WebClient and C ...

    1 post - 1 author - Last post: Dec 5, 2004 Original Post: HTTP POSTs and HTTP GETs with WebClient and C# and Faking a PostBack. Feed Title: Scott Hanselman's ComputerZen.com ...
  • Mike

    For best service when you require assistance:

    1. Check the DisallowedAbsoluteUris and Exceptions tables first.
    2. Cut and paste actual exceptions from the Exceptions table.
    3. Include screenshots.

    Skype: arachnodedotnet

    Top 10 Contributor
    229 Posts

    Thanks!

    After browsing links you have sent I found that WhbClient.UploadValues is exactly what I need to makes it happen,

    Top 10 Contributor
    229 Posts
    megetron replied on Fri, Apr 23 2010 12:52 AM

    Hi Mike,

    I keep trying to add a write capabilities to AN, this could be a nice feature. this is needed instead of writing directly to a database.

    But the problem is that it just won't work after I follow all of your links I got this code as a test code:


                string URLAuth = "http://karusela.net/2243_%d7%94%d7%a1%d7%9e%d7%95%d7%a8%d7%90%d7%99-%d7%94%d7%90%d7%97%d7%a8%d7%95%d7%9f.aspx";
                WebClient webClient = new WebClient();

                NameValueCollection formData = new NameValueCollection();
                formData["ctl00$PersonalizationManager1$WebPartManager1$wp961475462$wp927486962$dvwComment$txtAddedBy"] = "AddedBy";
                formData["ctl00$PersonalizationManager1$WebPartManager1$wp961475462$wp927486962$dvwComment$txtTitle"] = "Title";
                formData["ctl00$PersonalizationManager1$WebPartManager1$wp961475462$wp927486962$dvwComment$txtBody"] = "Body";
                formData["ctl00$PersonalizationManager1$WebPartManager1$wp961475462$wp927486962$dvwComment"] = "Insert$-1";
               
                byte[] responseBytes = webClient.UploadValues(URLAuth, "POST", formData);
                string ResultAuthTicket = Encoding.UTF8.GetString(responseBytes);

                webClient.Dispose();

     

    but it won't work.

    Can you see in a glance what is wrong?

     

    Top 10 Contributor
    1,905 Posts

    Try the ID instead?  "ctl00_PersonalizationManager1_WebPartManager1_wp961475462_wp927486962_dvwComment_txtAddedBy"

    But,. that does look correct.  One thing I did notice when I was toying with the idea of adding a form submission plugin to AN was that many of the sites that I attempted to interact with simply didn't work with the WebClient.  I came to the conclusion that the sites were able to detect the .NET WebClient and simply didn't allow POSTs.

    Also, are there multiple ways to POST to the webserver?  The code for the actual site may require a post from a specific control.  There is beta code (needs to be vetted by a few people first) for the Renderer functionality, and this will allow you to fill in the form values on the actual webpage, and then click whatever button you want.  This DOES work, and well.  Look for Renderer.cs.  Big Smile

    - Mike

    For best service when you require assistance:

    1. Check the DisallowedAbsoluteUris and Exceptions tables first.
    2. Cut and paste actual exceptions from the Exceptions table.
    3. Include screenshots.

    Skype: arachnodedotnet

    Top 10 Contributor
    229 Posts

    I cannot find that file.

    1.4 release version?

    Top 10 Contributor
    1,905 Posts

    Renderer.cs is in the SVN trunk ATM.

    - Mike

    For best service when you require assistance:

    1. Check the DisallowedAbsoluteUris and Exceptions tables first.
    2. Cut and paste actual exceptions from the Exceptions table.
    3. Include screenshots.

    Skype: arachnodedotnet

    Top 10 Contributor
    229 Posts

    This is revouletenary statement. webrowsers detect the webrequest and do not allow posts....Do they do sych cause of spammers?

    Cause this is the webmaster responsibility to take care for such, so why to block webrequest to post data?

    I am trying to read now the renderer code, and if I got it right, you are trying to simulate a user click, by invoke the control?

    very different way of doing it, I am a little bit of confused here.

    Top 10 Contributor
    229 Posts

    Copied he beta code from renderer.cs file into a custom plugin.
    The uri being used is exactly the same Uri you have toyed with in the rendere plugin.

    For some reason  crawlRequest.HtmlDocument is always NULL.

    Tried to change the crawlActionTypeID but still it keeps NULL value.

    Can you please advise on that?

    Top 10 Contributor
    1,905 Posts

    Not web browsers, but web servers... and absolutely because of spam.

    Yes, with the Renderer, you use the actual IE control so it looks as though a human is interacting with the page.

    For best service when you require assistance:

    1. Check the DisallowedAbsoluteUris and Exceptions tables first.
    2. Cut and paste actual exceptions from the Exceptions table.
    3. Include screenshots.

    Skype: arachnodedotnet

    Top 10 Contributor
    1,905 Posts

    You have to change the Render type when creating a CrawlRequest and (for the time being) uncomment a section of code in Program.cs in the Console project.

     

    For best service when you require assistance:

    1. Check the DisallowedAbsoluteUris and Exceptions tables first.
    2. Cut and paste actual exceptions from the Exceptions table.
    3. Include screenshots.

    Skype: arachnodedotnet

    Top 10 Contributor
    229 Posts

    I had to create the crawler constructor with true value for this to work.

    Thank you so much for your support.

    Based on the renderer idea,  I still struggle to make the page above works.

                    HtmlElement AddedBy = crawlRequest.HtmlDocument.GetElementById("ctl00_PersonalizationManager1_WebPartManager1_wp961475462_wp927486962_dvwComment_txtAddedBy");
                    AddedBy.InnerText = "adde by";
                    HtmlElement Title = crawlRequest.HtmlDocument.GetElementById("ctl00_PersonalizationManager1_WebPartManager1_wp961475462_wp927486962_dvwComment_txtTitle");
                    Title.InnerText = "title";
                    HtmlElement Body = crawlRequest.HtmlDocument.GetElementById("ctl00_PersonalizationManager1_WebPartManager1_wp961475462_wp927486962_dvwComment_txtBody");
                    Body.InnerText = "Body messege";
                   
                    string originalHtmlDocumentBodyInnerHtml = crawlRequest.HtmlDocument.Body.InnerHtml;
                    crawlRequest.HtmlDocument.InvokeScript("__doPostBack", new object[] { "ctl00$PersonalizationManager1$WebPartManager1$wp961475462$wp927486962$dvwComment", "Insert$-1" });
                    RenderDecodedHtml(originalHtmlDocumentBodyInnerHtml, crawlRequest.HtmlDocument);

    I debug it for almost an hour, but still had difficults find the correct syntax of that.

    Please notice I am trying to invoke some javascript.

    Top 10 Contributor
    1,905 Posts

    Of course!

    Try backing off the complexity a bit and see if you can click a button on the page first, and then move onto the scripts.  (The IE control isn't exactly the most user-friendly piece of software to work with...)  Big Smile

    -Mike

    For best service when you require assistance:

    1. Check the DisallowedAbsoluteUris and Exceptions tables first.
    2. Cut and paste actual exceptions from the Exceptions table.
    3. Include screenshots.

    Skype: arachnodedotnet

    Top 10 Contributor
    229 Posts

    Once complexity   off, it works fine...

    There  are some problems using the webbrowser mechanisom, and I hope to solve them out.

    One of the problems I have encounter is that I cannot set a textarea ant I tried all of the forms:

     

     

     

     

     

    mshtml.

     

     

    HTMLTextAreaElement AddedBy = (mshtml.HTMLTextAreaElement)crawlRequest.HtmlDocument.GetElementById("vB_Editor_QR_textarea").DomElement;

    crawlRequest.HtmlDocument.All["vB_Editor_QR_textarea"];
    {
    HtmlElement el = crawlRequest.HtmlDocument.GetElementById("vB_Editor_QR_textarea");
    el.InnerHtml =
    "sdfsdf";

     

     

     

    el.SetAttribute("value", "fghdfg");

    }
    ((mshtml.HTMLTextAreaElement)GetCurrentWebForm.item("txtArea")).value == "text"

    BTW, can you upload a new version to repository that fix the 61 renderer forms that pops up whenever you launch application? orelse tell me where to eliminate it by code?

    Thank you.

     

     

     

     

    Top 10 Contributor
    1,905 Posts

    Crawl this AbsoluteUri: http://www.stradeanas.it/index.php?/appalti/rilevanza_comunitaria/index and find code in Renderer.cs for a working example.

    Also, find the Renderer form in the Renderer project and set Visible to false - this should hide the Renderers.  I haven't tried hiding them - need to test more with this feature - still a beta IMO.  Big Smile

     

     

     

     

     

     

    namespace

     

     

    Arachnode.SiteCrawler.Actions

    {

     

     

    ///

    <summary>

     

     

    ///

    The Renderer plugin is used to process DecodedHtml form rendered CrawlRequests.

     

     

    ///

    The primary usage of rendering a WebPage is to obtain HyperLinks to Discoveries that are not present in the DecodedHtml

     

     

    ///

    when downloaded with the WebClient, or when viewing the source from a browser.

     

     

    ///

    </summary>

     

     

    internal class Renderer :

    ACrawlAction

    For best service when you require assistance:

    1. Check the DisallowedAbsoluteUris and Exceptions tables first.
    2. Cut and paste actual exceptions from the Exceptions table.
    3. Include screenshots.

    Skype: arachnodedotnet

    Page 1 of 2 (17 items) 1 2 Next > | RSS
    An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

    copyright 2004-2017, arachnode.net LLC