Linear combination of component results in information retrieval

作者:

Highlights:

摘要

In information retrieval, data fusion (also known as meta-search) has been investigated by many researchers. Previous investigation and experimentation demonstrate that the linear combination method is an effective data fusion method for combining multiple information retrieval results. One advantage is its flexibility, since different weights can be assigned to different component systems so as to obtain better fusion results. The key issue is how to assign good weights to all the component retrieval systems involved. Surprisingly, research in this field is limited and it is still an open question. In this paper, we use the multiple linear regression technique with estimated relevance scores and judged scores to obtain suitable weights. Although the multiple linear regression technique is not new, the way of using it in this paper has never been attempted before for the data fusion problem in information retrieval. Our experiments with five groups of runs submitted to TREC show that the linear combination method with such a weighting strategy steadily outperforms the best component system and other data fusion methods including CombSum, CombMNZ, PosFuse, MAPFuse, SegFuse, and the linear combination method with performance level/performance square weighting schemes by large margins.

论文关键词:Information retrieval,Data fusion,Meta-search,The linear combination method,Multiple linear regression,Logistic regression,Model

论文评审过程:Received 26 July 2010, Revised 26 August 2011, Accepted 29 August 2011, Available online 9 September 2011.

论文官网地址:https://doi.org/10.1016/j.datak.2011.08.003