An investigation of document structures

作者:

Highlights:

摘要

The presence of clustering structure in a document collection and the influence of the presence of clustering structure on the success of cluster-based retrieval are investigated as a function of term-weight and similarity thresholds. The term-weight threshold selects a particular level of indexing exhaustivity for the document representation, and the similarity threshold selects a specific level of the associated single-link hierarchy. Results show clear evidence for clustering structure in the most exhaustive and the least exhaustive subject representations. Results also show that observed values of cluster-based retrieval effectiveness at all exhaustivity levels can be explained by assuming that the pairwise associations responsible for the structure imposed on the document collection are generated randomly. The results suggest that the structure imposed on a small document collection by an automatically produced subject representation is unrelated to the structure imposed on the documents by relevance relationships.

论文关键词:

论文评审过程:Received 3 April 1989, Accepted 11 July 1989, Available online 19 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(90)90095-J