Median measure: an approach to IR systems evaluation

作者:

Highlights:

摘要

In this paper results from three studies examining 1295 relevance judgments by 36 information retrieval (IR) system end-users is reported. Both the region of the relevance judgments, from non-relevant to highly relevant, and the motivations or levels for the relevance judgments are examined. Three major findings are studied. First, the frequency distributions of relevance judgments by IR system end-users tend to take on a bi-modal shape with peaks at the extremes (non-relevant/relevant) with a flatter middle range. Second, the different type of scale (interval or ordinal) used in each study did not alter the shape of the relevance frequency distributions. And third, on an interval scale, the median point of relevance judgment distributions correlates with the point where relevant and partially relevant items begin to be retrieved. The median point of a distribution of relevance judgments may provide a measure of user/IR system interaction to supplement precision/recall measures. The implications of investigation for relevance theory and IR systems evaluation are discussed.

论文关键词:

论文评审过程:Received 28 July 2000, Accepted 27 October 2000, Available online 27 June 2001.

论文官网地址:https://doi.org/10.1016/S0306-4573(00)00064-9