Discovering and analyzing World Wide Web collections

作者:Sougata Mukherjea

摘要

With the explosive growth of the World Wide Web, it is becoming increasingly difficult for users to discover Web pages that are relevant to a topic. To address this problem we are developing a system that allows the collection and analysis of Web pages related to a particular topic. In this paper we present the system's overall architecture and introduce the focused crawler used by the system. We also discuss the various techniques we use to allow the user to analyze and gain useful insinghts about a collection. Finally, we present some statistics on the collections.

论文关键词:Authorities, Focused crawling, Graph algorithms, Hubs, Site graph analysis

论文评审过程:

论文官网地址:https://doi.org/10.1007/BF02637157