Term dependence: Truncating the Bahadur Lazarsfeld expansion

作者:

Highlights:

摘要

The performance of probabilistic information retrieval systems is studied where differing statistical dependence assumptions are used when estimating the probabilities inherent in the retrieval model. Experimental results using the Bahadur Lazarsfeld expansion suggest that the greatest degree of performance increase is achieved by incorporating term dependence information in estimating Pr(d¦rel). It is suggested that incorporating dependence in Pr(d¦rel) to degree 3 be used; incorporating more dependence information results in relatively little increase in performance. Experiments examine the span of dependence in natural language text, the window of terms in which dependencies are computed, and their effect on information retrieval performance. Results provide additional support for the notion of a window of ± 3 to ± 5 terms in width; terms in this window may be most useful when computing dependence.

论文关键词:

论文评审过程:Received 22 September 1992, Accepted 4 February 1993, Available online 18 July 2002.

论文官网地址:https://doi.org/10.1016/0306-4573(94)90071-X