A constrained optimization algorithm for learning GloVe embeddings with semantic lexicons

作者:

Highlights:

摘要

GloVe representations of words as vector embeddings in continuous spaces are learned from matrix factorization of the words’ co-occurrences matrix constructed from large corpora. Due to their high quality as textual features, GloVe embeddings have been extensively utilized for many text mining and natural language processing tasks with considerable success. Further improvements of these word representations can be obtained by also taking into account the valuable information of the semantic properties of the words and the complex relationships between them as provided by semantic lexicons. In this paper we adopt optimization techniques from the domain of machine learning with constrained optimization in order to leverage the relational knowledge between words, and we propose an efficient algorithm that produces word embeddings enhanced by the semantic information. The proposed algorithm outperforms other related approaches that utilize semantic information either during training or as a post-processing step. Our claims are validated by experiments on popular text mining and natural language processing tasks, including word similarities, word analogies, and sentiment analysis, which demonstrate that our proposed model can significantly improve the quality of word vector representations.

论文关键词:GloVe,Word embeddings,Semantic lexicons,Constrained optimization

论文评审过程:Received 8 August 2019, Revised 5 February 2020, Accepted 6 February 2020, Available online 12 February 2020, Version of Record 4 April 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.105628