arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Search

  • Re: Proxies

    Thank you Mike for the Quick reply. Above you showed how to connect for the console project but now I am looking at the Renderer and I am not sure how to pass credentials to the proxy project. Can you shed some light on this. If you need access to private proxies, let me know. Thank you JCrawl
    Posted to General Questions (Forum) by JCrawl on Sun, Aug 16 2015
  • Re: Frames

    Hello Sorry been on Vacation... In my example, I cannot access the product description as the contents are in a frame http://www.amazon.com/First-Baby-Annabell-Soft-Doll/dp/B00FBWB9A2 I can see the contents when I view source but when I look at the content dnloaded from AN it is missing <iframe id="product-description-iframe" class="ap_never_hide"
    Posted to General Questions (Forum) by JCrawl on Fri, Jul 24 2015
  • Frames

    Hello I have noticed that the crawlreqests do not contain the content found in frames. Is it possible to get this? Is there something I did not set right? Thank you
    Posted to General Questions (Forum) by JCrawl on Fri, Jul 17 2015
  • Proxies

    Hello I am just getting started with proxies. My proxies require authentication, how do I set that up>. I entered the proxies as 12.12.12.12:80:UserName:Password however I am sure that is not the proper format.. can you please help me. Thank you
    Posted to General Questions (Forum) by JCrawl on Wed, Jul 15 2015
  • Re: crawl peer and database peer questions

    Hey Mike Did the documentation ever get finished? Thanks
    Posted to General Questions (Forum) by JCrawl on Mon, Jul 6 2015
  • Manipulte URI before checking Cache and DB

    Hello Is there a location to manipulate the URI coming in before checking if it is in the cache. IE http://someuri/date=today http://someuri/date=FiveMinuteLater For me... I do not care that the page is five minutes older... I can get the same data by using simply http://someuri/ So instead of processing http://someuri/date=today or any other variation
    Posted to General Questions (Forum) by JCrawl on Sun, Jul 5 2015
  • CrawlRule or CrawlAction?

    Not sure if I need to create a crawl rule or a crawl action... I am trying to extend with a plugin.. I am looking for pages with specific content and if it exists then I would like to save the specific content to the database but not the store the actual content of the site... I originally did this as a crawl rule however... my performance has dropped
    Posted to General Questions (Forum) by JCrawl on Wed, Jul 1 2015
  • Proxies

    Been busy and have not had time to get back to this. Where is the best place / price to get proxies. How many do you suggest. Thank you
    Posted to General Questions (Forum) by JCrawl on Wed, Jul 1 2015
  • Renderer, agilityPack question

    Hello 2 simple questions, 1) what is the Renderer project used for, when would you need to enable this option etc. 2) I have read on the forum that the HTML agility pack is a memory hog and does not cleanup until memory pressure is experienced, Is there a recommended alternative that has the same functionality as the htmlagilitypack. Let me know Thank
    Posted to General Questions (Forum) by JCrawl on Wed, Jun 10 2015
  • Re: Lost Crawl Requests?

    Sure you do not have the resetdatabase option set.... would explain why the requests are lost when the crawler started back up
    Posted to Bug Reports (Forum) by JCrawl on Wed, Jun 10 2015
Page 1 of 3 (23 items) 1 2 3 Next > | More Search Options
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC