An investigation of document partitions

作者:

Highlights:

摘要

In this paper, the empirical significance of document partitions is investigated as a function of term-weight and similarity thresholds. The term-weight threshold selects a particular level of indexing exhaustivity and specificity for the document representation and the similarity threshold selects a specific level of the associated single-link hierarchy. The results show that the same empirically “preferred” partitions can be detected by two independent strategies; an analysis of cluster-based retrieval effectiveness and an analysis of regularities in the underlying structure of the document graph. These results represent the first step in an investigation designed to determine if the statistical significance of document partitions can explain the empirical significance of the same partitions.

论文关键词:

论文评审过程:Available online 13 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(86)90006-3