Neural graph embeddings as explicit low-rank matrix factorization for link prediction

作者:

Highlights:

• In this work, we extend the work of Qiu et al. (WSDM 2018) and propose a new approach to learning low-rank factorization embeddings that incorporate information from unlikely pairs of nodes.

• We apply smoothing to shifted PMI scores to enable a low-rank factorization of matrix entries that does not penalize low-frequency node pairs. We show that this correction may drastically improve accuracy.

• By considering different transformations of the PMI matrix, and by deriving matrix forms of the joint probability matrix, we demonstrate superior link prediction performance. These results lead us to argue that the linguistic collocation metric (PMI) may not be the best metric for link prediction.

• Finally, we compare our results with the state-of-the-art accuracy scores from the approach of Abu-El-Haija et al. (NeurIPS 2018), and conclude that our approach is on par. Their algorithm is using a stochastic (implicit) matrix factorization using neural networks, while ours is based on explicit SVD.

摘要

•In this work, we extend the work of Qiu et al. (WSDM 2018) and propose a new approach to learning low-rank factorization embeddings that incorporate information from unlikely pairs of nodes.•We apply smoothing to shifted PMI scores to enable a low-rank factorization of matrix entries that does not penalize low-frequency node pairs. We show that this correction may drastically improve accuracy.•By considering different transformations of the PMI matrix, and by deriving matrix forms of the joint probability matrix, we demonstrate superior link prediction performance. These results lead us to argue that the linguistic collocation metric (PMI) may not be the best metric for link prediction.•Finally, we compare our results with the state-of-the-art accuracy scores from the approach of Abu-El-Haija et al. (NeurIPS 2018), and conclude that our approach is on par. Their algorithm is using a stochastic (implicit) matrix factorization using neural networks, while ours is based on explicit SVD.

论文关键词:Graph embedding,Random walks,Matrix factorization,Information theory,Link prediction

论文评审过程:Received 26 March 2021, Revised 8 June 2022, Accepted 13 August 2022, Available online 19 August 2022, Version of Record 26 August 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108977