Exploring term dependences in probabilistic information retrieval model

作者:

Highlights:

摘要

Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each another. However, this kind of conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence in probabilistic retrieval model by adapting Bahadur–Lazarsfeld expansion (BLE) to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply BLE to the general probabilistic models and the state-of-the-art 2-Poisson model. Through the experiments on two standard document collections, HANTEC2.0 in Korean and WT10g in English, we demonstrate that incorporation of term dependences using the BLE significantly contribute to the improvement of performance in at least two different language IR systems.

论文关键词:Information retrieval,Term dependence,Bahadur–Lazarsfeld expansion,Probabilistic model,2-Poisson model

论文评审过程:Received 3 January 2002, Accepted 9 August 2002, Available online 19 December 2002.

论文官网地址:https://doi.org/10.1016/S0306-4573(02)00078-X