Searching in Medline: Query expansion and manual indexing evaluation

作者:

Highlights:

摘要

Based on a relatively large subset representing one third of the Medline collection, this paper evaluates ten different IR models, including recent developments in both probabilistic and language models. We show that the best performing IR models is a probabilistic model developed within the Divergence from Randomness framework [Amati, G., & van Rijsbergen, C.J. (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM-Transactions on Information Systems 20(4), 357–389], which result in 170% enhancements in mean average precision when compared to the classical tf idf vector-space model. This paper also reports on our impact evaluations on the retrieval effectiveness of manually assigned descriptors (MeSH or Medical Subject Headings), showing that by including these terms retrieval performance can improve from 2.4% to 13.5%, depending on the underling IR model. Finally, we design a new general blind-query expansion approach showing improved retrieval performances compared to those obtained using the Rocchio approach.

论文关键词:Manual indexing,Blind query expansion,Medline,MeSH,Genomics TREC,Probabilistic model,Language model,Rocchio query expansion,Evaluation

论文评审过程:Received 19 December 2006, Revised 16 March 2007, Accepted 17 March 2007, Available online 23 May 2007.

论文官网地址:https://doi.org/10.1016/j.ipm.2007.03.013