Evaluation of retrieval effectiveness with incomplete relevance data: Theoretical and experimental comparison of three measures

作者:

Highlights:

摘要

This paper investigates two relatively new measures of retrieval effectiveness in relation to the problem of incomplete relevance data. The measures, Bpref and RankEff, which do not take into account documents that have not been relevance judged, are compared theoretically and experimentally. The experimental comparisons involve a third measure, the well-known mean uninterpolated average precision. The results indicate that RankEff is the most stable of the three measures when the amount of relevance data is reduced, with respect to system ranking and absolute values. In addition, RankEff has the lowest error-rate.

论文关键词:Cranfield,Incomplete judgments,Retrieval effectiveness,Rank effectiveness

论文评审过程:Received 7 September 2006, Revised 9 January 2007, Accepted 10 January 2007, Available online 6 March 2007.

论文官网地址:https://doi.org/10.1016/j.ipm.2007.01.011