arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Search

  • Re: Crawling Facebook and twitter

    Twitter links are available whether logged in or not. So, you can submit a crawl request for something like http://twitter.com/search?q=haiti and walk it no problem. Rules/templates for walking twitter content is likely quite different than for html pages though ;) Regarding facebook, I'm not sure about that one. I believe you have to be logged
    Posted to Anonymous Forum (Forum) by Kevin on Mon, Jan 18 2010
  • Re: CREATE ASSEMBLY failed because method 'InstantiateCharactersNear'.....

    I have it running successfully on a Win7 laptop but I do remember making some tweaks. I turned off some of the default Win7 security crud that bugs you all the time! Post again if you don't get it going and we can try to compare your config with what I have set up.
    Posted to Anonymous Forum (Forum) by Kevin on Wed, Jan 13 2010
  • Re: CREATE ASSEMBLY failed because method 'InstantiateCharactersNear'.....

    Did you run the database reset stored procedure? It does some work to set up permissions and such for the clr functions.
    Posted to Anonymous Forum (Forum) by Kevin on Mon, Jan 11 2010
  • Re: The project file \Functions.csproj cannot be opened.

    Hmmm... I REALLY recommend upgrading and purchasing the latest version. Tons of improvements and tweaks to make it easier to use. Short of that, the integration project should not be required so technically you could just exclude that from the project. On the functions project, it does some sql clr stuff with sql server so I wonder if there's some
    Posted to Anonymous Forum (Forum) by Kevin on Wed, Jan 6 2010
  • Re: Requested Registry Access is not allowed

    Sounds like a possible permissions error for the user you are running the console app under. Anything special about that user's permissions? You might look at: http://support.microsoft.com/kb/329291
    Posted to Bug Reports (Forum) by Kevin on Wed, Jan 6 2010
  • Re: Disallowed Directories

    I think you are proposing a crawl rule that just does a string match against the URI being walked to make sure it is in the /us/ folder, yes?
    Posted to Feature Requests (Forum) by Kevin on Wed, Jan 6 2010
  • Re: Extracting words: How to

    I might add, that you may think AN is overkill but actually it's a great solution! It very likely takes care of lots of details that other packages don't handle or even know about. So, you don't have to worry about it! And, the architecture is great, scalable, and easy to customize either in code or in custom actions and rules. Sounds like
    Posted to Anonymous Forum (Forum) by Kevin on Tue, Nov 24 2009
  • Re: Is this the tool for me

    Wayne, I echo what Mike said. It sounds like AN has all the core tools you need to make things happen. It can pull down all content and assets for you. It has parsing opportunites, you can use the HtmlAgilityPack, and you can write custom plug-ins to do things with data as it is visited and retrieved. There is definitely some coding involved in building
    Posted to General Questions (Forum) by Kevin on Tue, Nov 24 2009
  • Re: Request.decodedhtml

    Not built in. However, wouldn't be too hard to add. I'd suggest adding this type of work OUTSIDE the actual crawling process as it would really slow things down. If you want to do it during crawl, this might be a great opportunity to write a post crawl plug-in. If done after crawling, you could have a little app that spins through, instantiates
    Posted to General Questions (Forum) by Kevin on Tue, Oct 27 2009
  • Re: What is the best method to parse html tags!

    Milan, if you don't have it already, here's a link to the HtmlAgility Pack docs: http://htmlagilitypack.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=33903
    Posted to General Questions (Forum) by Kevin on Fri, Oct 23 2009
Page 1 of 9 (85 items) 1 2 3 4 5 Next > ... Last ยป | More Search Options
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC