From biological to social networks: Link prediction based on multi-way spectral clustering

作者:

Highlights:

摘要

Link prediction in protein–protein interaction networks (PPINs) is an important task in biology, since the vast majority of biological functions involve such protein interactions. Link prediction is also important for online social networks (OSNs), which provide predictions about who is a friend of whom. Many link prediction methods for PPINs/OSNs are local-based and do not exploit all network structure, which limits prediction accuracy. On the other hand, there are global approaches to detect the overall path structure in a network, being computationally prohibitive for huge-size PPINs/OSNs. In this paper, we enhance a previously proposed multi-way spectral clustering method by introducing new ways to capture node proximity in both PPINs/OSNs. Our new enhanced method uses information obtained from the top few eigenvectors of the normalized Laplacian matrix. As a result, it produces a less noisy matrix, which is smaller and more compact than the original one. In this way, we are able to provide faster and more accurate link predictions. Moreover, our new spectral clustering model is based on the well-known Bray–Curtis coefficient to measure proximity between two nodes. Compared to traditional clustering algorithms, such as k-means and DBSCAN, which assume globular (convex) regions in Euclidean space, our approach is more flexible in capturing the non-connected components of a social graph and a wider range of cluster geometries. We perform an extensive experimental comparison of the proposed method against existing link prediction algorithms and k-means algorithm, using two synthetic data sets, three real social networks and three real human protein data sets. Our experimental results show that our SpectralLink algorithm outperforms the local approaches, the k-means algorithm and another spectral clustering method in terms of effectiveness, whereas it is more efficient than the global approaches.

论文关键词:Cross-disciplinary applications,Information filtering,Clustering,Bioinformatics databases

论文评审过程:Received 25 February 2012, Revised 16 May 2013, Accepted 16 May 2013, Available online 28 May 2013.

论文官网地址:https://doi.org/10.1016/j.datak.2013.05.008