I have installed all the part of the arachnode.net 1.1,and complie the whole project without any Errors.
However ,I am puzzled that how to use arachnode,net. There are so many files in the folder,but where is the entry of the builded project? Is there any windows in the project , or this is only a console programme?(so big a project without any windows?Really make me puzzled.)
I am a colledge school student in China,my parterner and I have great interests in this project, we need your help to learn more of this project.
You are very welcome!
1.) Check the FullTextIndexType column in the Files, Images and WebPages table. This column contains the file extensions.2.) I haven't seen near that many errors from Analysis services. The project isn't essential to crawling and can be excluded.3.) You can deploy to IIS or you can use Visual Studio's web host. Either one is fine.
For best service when you require assistance:
The first part, about not receiving any compilation errors is great news! ;)
arachnode.net is currently meant to be used as a service (always on) or as class library. A Console project is provided to help with stepping through the code. A Web Admin interface is in development, but we do not have a release date at this time.
Tell me, what are you trying to accomplish. I can best help you by knowing more about your specific intentions for the code.
(In the meantime, set the Console project to the startup project, and press F5. A crawl will start at arachnode.net. Check the bottom of the stored procedure '[dbo].[arachnode_usp_arachnode.net_RESET_DATABASE]'.)
Always glad to help,Mike
It means a great deal to me that you have chosen arachnode.net as something to learn from, especially as something to influence your graduation project.
The project Test.csproj can be removed from the solution if you can't load it.
One of my main intentions for arachnode.net was to keep the code simple and accessible for everyone. While this was and is a great intention, crawling the internet, and crawling it properly is not a simple process. Undoubtedly you have evaluated other crawlers and noticed that most all of them in C# crawl HyperLinks only, do not download content in the form of files or images and do not store or index the content. Why? The process of crawling seems simple. I thought that it would be relatively easy to craft a crawler until I discovered the thousands of conditions that must be met to crawl properly. If you have evaluated nutch (a complete crawler, like arachnode.net is), then you have noticed that nutch is quite complex and takes a good amount of time to learn and debug what is going on under the covers.
The basic usage of arachnode.net is this. CrawlRequests are placed into the CrawlRequests table in the database. The CrawlRequests are fed into the system and crawled, and each CrawlRequests is run against a configurable set of CrawlActions and CrawlRules. The lucene.net functionality is implemented as a CrawlAction. The Web project attaches to the indexes and you can search.
The best way to understand how arachnode.net works is to 1.) Run the application from the default configuration and 2.) Step through the code using the debugger.
From the default installation, start Visual Studio and get the crawler crawling. Take a look at the database tables, and familiarize yourself with what is being collected. Now, reset the database per the installation instructions. Modify the ‘MaximumNumberOfCrawlThreads’ setting in the Confuration table in the database and set this value to 1. This will instruct the crawler to only use one crawl threads and makes debugging much, much easier. Next, brew yourself a fresh cup of coffee or tea and step into the code. The best way to learn what is occurring is to read the code, line by line. Yes, there is a good deal of code, but it will be worth it, I promise.
In the past several days, what have you discovered in your research?
First, I have been moved so much that you are such warm-hearted to help me. Really thanks.
Good news is that I have built the whole project successfully after setting the console as the starting item. I try to run the console programme on the Internet, and I change the target Uri to "www.sohu.com" which is a Portal in China. To my greatly surprise, the process of crawling is faster than any other crawler that I have used, while another there questions rush into my head:
I will insist to read your code line by line, thanks for your answer all above.