Methods for estimating the number of relevant documents in a collection

作者：

Highlights：

•

摘要

Several statistical sampling methods are evaluated for estimating the total number of relevant documents in a collection for a given query. The total number of relevant documents is needed in order to compute recall values for use in evaluating document retrieval systems. The simplest method considered uses simple random sampling to estimate the number of relevant documents. Another type of random sampling, which assigns unequal selection probabilities to the individual documents in the collection, is also investigated. An alternative approach considered uses curve fitting and extrapolation, where a smooth curve is developed which relates precision to document rank. Another curve relates a function of precision to the query-document score. In either case, the curve is extrapolated to the total number of documents in order to estimate the number of relevant documents. Empirical comparisons are made of all three methods.

论文关键词：

论文评审过程：Received 24 July 1981, Available online 23 August 2002.

论文官网地址：https://doi.org/10.1016/0306-4573(82)90058-9