A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

作者:Yantao Li, Xiang Zhao, Zehui Qu

摘要

As a fundamental technique for data analysis, graph clustering grouping graph data into clusters has attracted great attentions in recent years. In this paper, we present DPOCG, a dynamic programming framework for large-scale online clustering on graphs, which improves the scalability of a wide range of graph clustering algorithms. Specifically, DPOCG first identifies the nodes whose states are unchanged compared with the states at the previous time on a large-scale graph, then constructs these unchanged nodes as supernodes, which greatly reduces the size of the graph at the current time, and collapses nodes whose degrees are less than a predefined threshold. Based on our density-based graph clustering algorithm (DGCM), DPOCG partitions the reduced graph into clusters. In addition, we theoretically analyze DPOCG in terms of supernode generation, clustering on reduced graph, and computational complexity. We evaluate DPOCG on a synthetic dataset and seven real-world datasets, respectively, and the experimental results show that DPOCG consumes less running time and improves the efficiency of clustering.

论文关键词:Online graph clustering, Large-scale graphs, supernodes, Running time, Efficiency

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-020-10329-1