Document Categorization and Query Generation on the World Wide Web Using WebACE

作者:Daniel Boley, Maria Gini, Robert Gross, Eui-Hong (Sam) Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, Jerome Moore

摘要

We present WebACE, an agent for exploring and categorizing documents onthe World Wide Web based on a user profile. The heart of the agent is anunsupervised categorization of a set of documents, combined with a processfor generating new queries that is used to search for new relateddocuments and for filtering the resulting documents to extract the onesmost closely related to the starting set. The document categories are notgiven a priori. We present the overall architecture and describe twonovel algorithms which provide significant improvement over HierarchicalAgglomeration Clustering and AutoClass algorithms and form the basis forthe query generation and search component of the agent. We report on theresults of our experiments comparing these new algorithms with moretraditional clustering algorithms and we show that our algorithms are fastand sacalable.

论文关键词:clustering, divisive partitioning, graph partitioning, principal component analysis, web documents

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1006592405320