Scalable distributed reachability query processing in multi-labeled networks

作者:

Highlights:

摘要

Testing reachability in a graph gains substantial interest as an important operation in network analysis and graph mining. In its simplest form, a reachability query is defined by a pair of nodes (u, v) and a graph G, and detects if there is a path from u to v. This paper addresses a specific case of reachability on multi-labeled distributed graphs, where the query is parameterized by a set of source nodes S, a set of target nodes T and a set of constraints C on the edge labels. We conduct a performance evaluation on both synthetic and real-world datasets, using multiple instances of Neo4j servers (as workers) running simultaneously. The results show that the number of workers, the network density and the number of cross-edges have a significant impact on the overall performance. Moreover, we observe that the proposed approach is scalable and can be used to solve label-constrained distributed set reachability queries in multi-labeled networks.

论文关键词:Graph mining,Reachability queries,Distributed algorithms

论文评审过程:Received 28 June 2019, Revised 31 August 2020, Accepted 3 September 2020, Available online 8 September 2020, Version of Record 25 November 2020.

论文官网地址:https://doi.org/10.1016/j.datak.2020.101854