Learning short-text semantic similarity with word embeddings and external knowledge sources

作者：

Highlights：

•

摘要

We present a novel method based on interdependent representations of short texts for determining their degree of semantic similarity. The method represents each short text as two dense vectors: the former is built using the word-to-word similarity based on pre-trained word vectors, the latter is built using the word-to-word similarity based on external sources of knowledge. We also developed a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms. We evaluated the proposed method on three popular datasets, namely Microsoft Research Paraphrase Corpus, STS2015 and P4PIN, and obtained state-of-the-art results on all three without using prior knowledge of natural language, e.g., part-of-speech tags or parse tree, which indicates the interdependent representations of short text pairs are effective and efficient for semantic textual similarity tasks.

论文关键词：Paraphrase identification,Sentence similarity,Short text similarity,Semantic textual similarity

论文评审过程：Received 22 August 2018, Revised 7 July 2019, Accepted 10 July 2019, Available online 25 July 2019, Version of Record 9 September 2019.

论文官网地址：https://doi.org/10.1016/j.knosys.2019.07.013