Word representation using refined contexts

作者:Ming Zhang, Vasile Palade, Yan Wang, Zhicheng Ji

摘要

In this paper, inspired from the idea that the contextual distances and positions may have a substantial impact on distinguishing the relationships between words, a novel word association method with two weighting schemes for refining contexts, named as Weighted Point-wise Mutual Information with Contextual Distances and Positions (PMIDP), is proposed to eliminate the noisy and redundant information hidden in an imbalanced corpus. One weighting scheme, called PMIdist, revises the Point-wise Mutual Information (PMI) method by scaling the co-occurrence counts according to the distance between a word and the context. The second weighting scheme is a ratio that can measure the contextual position variation within the window of a given word. Then, the refined word association in PMIDP is defined as the multiplication of the two proposed weighting schemes, which essentially aims to flexibly adjust the word association when solving target-oriented similarity tasks. The proposed word association method has been applied on two widely known models, i.e., the positive PMI matrix with truncated Singular Vector Decomposition (PPMI-SVD) model and the Global Vectors (GloVe) model. Experimental results demonstrate that the PMIDP method can significantly improve the performances of the two models on both semantic and relational similarity tasks and show advantages when compared with other state-of-the-art models.

论文关键词:Word representation, Contextual distance, Contextual position variation, Point-wise mutual information, Singular vector decomposition

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02898-y