A multi-level matching method with hybrid similarity for document retrieval

作者:

Highlights:

摘要

This paper presents a multi-level matching method for document retrieval (DR) using a hybrid document similarity. Documents are represented by multi-level structure including document level and paragraph level. This multi-level-structured representation is designed to model underlying semantics in a more flexible and accurate way that the conventional flat term histograms find it hard to cope with. The matching between documents is then transformed into an optimization problem with Earth Mover’s Distance (EMD). A hybrid similarity is used to synthesize the global and local semantics in documents to improve the retrieval accuracy. In this paper, we have performed extensive experimental study and verification. The results suggest that the proposed method works well for lengthy documents with evident spatial distributions of terms.

论文关键词:Document retrieval,EMD,Multi-level matching,Hybrid similarity,Multi-level structure

论文评审过程:Available online 31 August 2011.

论文官网地址:https://doi.org/10.1016/j.eswa.2011.08.128