Relevance judgements for assessing recall

作者:

Highlights:

摘要

Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a query has long been recognized as problematic. To compare performance of different systems, standard collections of documents, queries, and relevance judgments have been used. Unfortunately the standard collections, such as SMART and TREC, have locked in a particular approach to relevance that is suitable for assessing precision but not recall. The problem is demonstrated by comparing two information retrieval methods over several queries, and showing how a new method of forming relevance judgments that a suitable for assessing recall gives different results. Recall is an interesting and practical issue, but current test procedures are inadequate for measuring it.

论文关键词:

论文评审过程:Available online 23 February 1999.

论文官网地址:https://doi.org/10.1016/0306-4573(95)00061-5