How many performance measures to evaluate information retrieval systems?

作者:Alain Baccini, Sébastien Déjean, Laetitia Lafage, Josiane Mothe

摘要

Evaluating effectiveness of information retrieval systems is achieved by performing on a collection of documents, a search, in which a set of test queries are performed and, for each query, the list of the relevant documents. This evaluation framework also includes performance measures making it possible to control the impact of a modification of search parameters. The program trec_eval calculates a large number of measures, some being more used like the mean average precision or recall-precision curves. The motivation of our work is to compare all measures and to help the user to choose a small number of them when evaluating different information retrieval systems. In this paper, we present the study we carried out from a massive data analysis of TREC results. Relationships between the 130 measures calculated by trec_eval for individual queries are investigated, and we show that they can be clustered into homogeneous clusters.

论文关键词:Information retrieval, Performance measures, Evaluation, Statistical data analysis

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-011-0391-7