BioCrawler: An intelligent crawler for the semantic web

作者：

Highlights：

•

摘要

Web crawling has become an important aspect of web search, as the WWW keeps getting bigger and search engines strive to index the most important and up to date content. Many experimental approaches exist, but few actually try to model the current behaviour of search engines, which is to crawl and refresh the sites they deem as important, much more frequently than others. BioCrawler mirrors this behaviour on the semantic web, by applying the learning strategies adopted in previous work on ecosystem simulation, called BioTope. BioCrawler employs the principles of BioTope’s intelligent agents on the semantic web, learns which sites are rich in semantic content and which sites link to them and adjusts its crawling habits accordingly. In the end, it learns to behave much like the state of the art search engine crawlers do. However, BioCrawler reaches that behavior solely by exploiting on-page factors, rather than off-page factors, such as the currently used link popularity.

论文关键词：Web crawling,Focused crawling,Multi-agent system,Semantic web

论文评审过程：Available online 28 July 2007.

论文官网地址：https://doi.org/10.1016/j.eswa.2007.07.054