An information retrieval model based on vector space method by supervised learning

作者:

Highlights:

摘要

This paper proposes a method to improve retrieval performance of the vector space model (VSM) in part by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, information such as original document similarities is incorporated into the retrieval model, which is built by using a sequence of linear transformations. High-dimensional and sparse vectors are then reduced by singular value decomposition (SVD) and transformed into a low-dimensional vector space, namely the space representing the latent semantic meanings of words. The method has been tested with two test collections, the Medline collection and the Cranfield collection. In order to train the model, multiple partitions are created for each collection. Improvement of average precision of the averages over all partitions, compared with the latent semantic indexing (LSI) model, are 20.57% (Medline) and 22.23% (Cranfield) for the two training data sets, and 0.47% (Medline) and 4.78% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve user-supplied relevance information for the long term in the system in order to use it later.

论文关键词:Information retrieval,Supervised learning,Vector space model,Relevance feedback,Singular value decomposition,Linear transformation

论文评审过程:Received 5 May 2001, Accepted 25 October 2001, Available online 6 December 2001.

论文官网地址:https://doi.org/10.1016/S0306-4573(01)00053-X