Query processing on large graphs: Approaches to scalability and response time trade offs

作者:

Highlights:

摘要

Graphs, being an expressive data structure, have become increasingly important for modeling real-world applications, such as collaboration, different kinds of transactions, social networks, to name a few. With the advent of social networks and the web, the graph sizes have grown too large to fit in main memory precipitating the need for alternative approaches for an efficient, scalable evaluation of queries on graphs of any size.In this paper, we use the time-tested “divide and conquer” approach by partitioning a graph into desired number of partitions (and possibly with appropriate characteristics) and process queries over those partitions to obtain all or specified number of answers. This entails correctly computing answers that span multiple partitions or even need the same partition more than once. Given a set of partitions, there are a number of approaches using which a query can be evaluated: (i) One Partition At a Time (OPAT) approach, (ii) Traditional use of Multiple Processors (TraditionalMP), and (iii) using the Map/Reduce Multi-Processor approach (MapReduceMP) approach. The first approach, detailed in this paper, has established scalability through independent processing of partitions. The other two approaches address response time in addition to scalability. For the OPAT query evaluation approach, necessary minimal book keeping has been identified and its correctness established in this paper. Query answering on partitioned graphs also requires analyzing partitioning schemes for their impact on query processing and determining the number as well as the sequence in which partitions need to be loaded to reduce the response time for processing queries. We correlate query properties and partition characteristics to reduce query processing time in terms of the resources available.We also identify a set of quantitative metrics and use them for formulating heuristics to determine the order of loading partitions for efficient query processing. For OPAT approach, extensive experiments on large graphs (synthetic and real-world) using different partitioning schemes analyze the proposed heuristics on a variety of query types. The other two approaches are fleshed out, analyzed, and contrasted with the OPAT approach. An existing graph querying system has been extended to evaluate queries on partitioned graphs. Finally all three approaches are compared for their strengths and weaknesses.

论文关键词:Graph query processing,Plan generation,Query evaluation on partitioned graphs,Scalability,Map/Reduce

论文评审过程:Available online 2 September 2019, Version of Record 9 April 2020.

论文官网地址:https://doi.org/10.1016/j.datak.2019.101736