Support vector machines: relevance feedback and information retrieval

作者:

Highlights:

摘要

We compare support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. It is assumed a preliminary search finds a set of documents that the user marks as relevant or not and then feedback iterations commence. Particular attention is paid to IR searches where the number of relevant documents in the database is low and the preliminary set of documents used to start the search has few relevant documents. Experiments show that if inverse document frequency (IDF) weighting is not used because one is unwilling to pay the time penalty needed to obtain these features, then SVMs are better whether using term-frequency (TF) or binary weighting. SVM performance is marginally better than Ide dec-hi if TF-IDF weighting is used and there is a reasonable number of relevant documents found in the preliminary search. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred.

论文关键词:Information retrieval,Support vector machines,Relevancy feedback,Rocchio,Ide dec-hi

论文评审过程:Received 28 January 2001, Accepted 11 May 2001, Available online 3 January 2002.

论文官网地址:https://doi.org/10.1016/S0306-4573(01)00037-1