DFuzzy: a deep learning-based fuzzy clustering model for large graphs

作者:Vandana Bhatia, Rinkle Rani

摘要

Graph clustering is successfully applied in various applications for finding similar patterns. Recently, deep learning- based autoencoder has been used efficiently for detecting disjoint clusters. However, in real-world graphs, vertices may belong to multiple clusters. Thus, it is obligatory to analyze the membership of vertices toward clusters. Furthermore, existing approaches are centralized and are inefficient in handling large graphs. In this paper, a deep learning-based model ‘DFuzzy’ is proposed for finding fuzzy clusters from large graphs in distributed environment. It performs clustering in three phases. In first phase, pre-training is performed by initializing the candidate cluster centers. Then, fine tuning is performed to learn the latent representations by mining the local information and capturing the structure using PageRank. Further, modularity is used to redefine clusters. In last phase, reconstruction error is minimized and final cluster centers are updated. Experiments are performed over real-life graph data, and the performance of DFuzzy is compared with four state-of-the-art clustering algorithms. Results show that DFuzzy scales up linearly to handle large graphs and produces better quality of clusters when compared to state-of-the-art clustering algorithms. It is also observed that deep structures can help in getting better graph representations and provide improved clustering performance.

论文关键词:Fuzzy clustering, PageRank, Deep learning, Large graphs, Pregel

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-018-1156-3