A comparison of two-poisson, inverse document frequency and discrimination value models of document representation

作者:

Highlights:

摘要

In this paper we present a comparison of the Two-Poisson (2P), Inverse Document Frequency (IDF) and Discrimination Value (DV) models of document representation. The first objective of the study was to understand the nature of the relationship between the term property underlying the 2P model and the other statistical properties of index terms that we known about: discrimination value and inverse document frequency. The second objective was to compare the properties with respect to the feature of ultimate interest, i.e., retrieval effectiveness. The study showed that 2P and IDF properties work in parallel, while 2P and DV have a negative relationship. An explanation for this negative correlation was given by viewing the distribution of inter document dissimilarities. In the retrieval experiment most of the 2P strategies tested acheived the same performance level as the DV and IDF strategies. The important conclusion made from this study is that despite the fact that the 2P model is extremely selective in its choice of indexing vocabulary for a database, it still performs with the same effectiveness as the more traditional models. The overall contribution of this work is in the area of understanding features that influence the indexing potential of terms.

论文关键词:

论文评审过程:Received 12 September 1988, Accepted 22 June 1989, Available online 19 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(90)90030-6