Sentence similarity based on semantic kernels for intelligent text retrieval

作者:Samir Amir, Adrian Tanasescu, Djamel A. Zighed

摘要

We propose a new approach to compute semantic similarity between sentences. It is based on the semantic kernel, composed of subject, verb, and object that, we suppose, summarize the general meaning of each sentence. Thanks to linguistics resources available such as Stanford Parser, many features are then extracted from the semantic kernels and aggregated by mean of weights. The weighting is produced by a supervised machine learning technique on a training data set provided by human experts as ground truth. The cross validation shows good performances. Thanks to this similarity measure between sentences, one can build an intelligent text retrieval engine more sensitive to the semantic content, specifically suited for short texts than the classical methods based on bag of words. An application is being developed for highlighting parts of speech in scientific articles.

论文关键词:Sentence similarity, Text retrieval, Semantic kernels

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-016-0434-3