Phrase embedding learning based on external and internal context with compositionality constraint

作者：

Highlights：

•

摘要

Different methods are proposed to learn phrase embedding, which can be mainly divided into two strands. The first strand is based on the distributional hypothesis to treat a phrase as one non-divisible unit and to learn phrase embedding based on its external context similar to learn word embedding. However, distributional methods cannot make use of the information embedded in component words and they also face data spareness problem. The second strand is based on the principle of compositionality to infer phrase embedding based on the embedding of its component words. Compositional methods would give erroneous result if a phrase is non-compositional. In this paper, we propose a hybrid method by a linear combination of the distributional component and the compositional component with an individualized phrase compositionality constraint. The phrase compositionality is automatically computed based on the distributional embedding of the phrase and its component words. Evaluation on five phrase level semantic tasks and experiments show that our proposed method has overall best performance. Most importantly, our method is more robust as it is less sensitive to datasets.

论文关键词：Phrase embedding,Compositionality,Distributional hypothesis,Composition model

论文评审过程：Received 20 November 2017, Revised 4 April 2018, Accepted 5 April 2018, Available online 6 April 2018, Version of Record 12 May 2018.

论文官网地址：https://doi.org/10.1016/j.knosys.2018.04.009