Practical and high-quality partitioning algorithm for large-scale and time-evolving graphs

作者:

Highlights:

摘要

With the appearance of widespread large-scale graph data, distributed graph processing grows in popularity. In order to store and analyze a large-scale graph in a distributed manner, the graph should be properly partitioned in advance. Although the partitioning of static graphs has been sufficiently investigated, the emerging graphs nowadays are typically dynamic or time-evolving. State-of-the-art partitioning algorithms for such large-scale and time-evolving graphs either are impractical because of the complexity caused by the global optimization or sacrifice partitioning quality due to a huge number of cross partition edges. This paper investigates the problem and proposes a new practical and high-quality graph partitioning algorithm. First, graph updates accumulated in a time interval are expressed as a delta graph. Second, vertices in the delta graph are classified into several subsets based on a non-overlapping community detection method. Third, vertices in the same subset are assigned to a properly chosen partition as a whole. The chosen partition is lightly loaded as well as closely related with vertices in the subset. Experimental results show that compared with the widely used hash-based and heuristic-based partitioning algorithms, our proposed algorithm gains significant decrease in terms of the number of cross partition edges while maintaining a similar level of load balance.

论文关键词:Time-evolving graph,Graph partitioning,Distributed graph processing,Cross partition edge,Load balance

论文评审过程:Received 18 January 2021, Revised 12 May 2021, Accepted 9 June 2021, Available online 10 June 2021, Version of Record 24 June 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107211