Neural graph embeddings as explicit low-rank matrix factorization for link prediction

Highlights：

• In this work, we extend the work of Qiu et al. (WSDM 2018) and propose a new approach to learning low-rank factorization embeddings that incorporate information from unlikely pairs of nodes.

• We apply smoothing to shifted PMI scores to enable a low-rank factorization of matrix entries that does not penalize low-frequency node pairs. We show that this correction may drastically improve accuracy.

• By considering different transformations of the PMI matrix, and by deriving matrix forms of the joint probability matrix, we demonstrate superior link prediction performance. These results lead us to argue that the linguistic collocation metric (PMI) may not be the best metric for link prediction.

• Finally, we compare our results with the state-of-the-art accuracy scores from the approach of Abu-El-Haija et al. (NeurIPS 2018), and conclude that our approach is on par. Their algorithm is using a stochastic (implicit) matrix factorization using neural networks, while ours is based on explicit SVD.

摘要

•In this work, we extend the work of Qiu et al. (WSDM 2018) and propose a new approach to learning low-rank factorization embeddings that incorporate information from unlikely pairs of nodes.•We apply smoothing to shifted PMI scores to enable a low-rank factorization of matrix entries that does not penalize low-frequency node pairs. We show that this correction may drastically improve accuracy.•By considering different transformations of the PMI matrix, and by deriving matrix forms of the joint probability matrix, we demonstrate superior link prediction performance. These results lead us to argue that the linguistic collocation metric (PMI) may not be the best metric for link prediction.•Finally, we compare our results with the state-of-the-art accuracy scores from the approach of Abu-El-Haija et al. (NeurIPS 2018), and conclude that our approach is on par. Their algorithm is using a stochastic (implicit) matrix factorization using neural networks, while ours is based on explicit SVD.