- Main
Runtime Optimizations for Evaluating Batches of Graph Queries
- Xu, Chengshuo
- Advisor(s): Gupta, Rajiv
Abstract
Graph processing frameworks are typically designed to optimize the evaluation of a single graph query. However, in practice, we often need to respond to multiple graph queries, either from different users or from a single user performing a complex analytics task. This thesis is aimed at simultaneously evaluating batches of graph queries of two types: point-to-all and point-to-point. By fully utilizing system resources, batched evaluation amortizes runtime overheads incurred due to fetching vertices and edge lists, synchronizing threads, and maintaining computation frontiers. In addition, new runtime optimizations are developed that enable faster evaluation of batches of queries than their independent and one-by-one evaluation.
In the context of point-to-all queries, we develop the sharing optimization that dynamically identifies shared queries that substantially represent subcomputations in the evaluation of different queries in a batch, evaluates the shared queries, and then uses their results to accelerate the evaluation of all queries in the batch. The resulting SimGQ system delivers substantial speedups over a conventional framework that evaluates and responds to queries one by one. We have also adapted the batching principles used by SimGQ to the streaming graph scenario in which we continuously maintain the results of a small batch of shared queries and use them for low-cost evaluation of arbitrary user queries.
For point-to-point queries we have identified unique characteristics of such queries and based upon them developed two new optimizations. The first optimization, online pruning, eliminates propagation from vertices that are determined to not contribute to a query’s final solution and thus enables early termination. The second optimization, dynamic direction prediction, dynamically selects the direction in which to evaluate the query – either forward (from source) or backward (from destination) – as their costs can differ greatly. The resulting system, PnP, delivers substantial performance benefits over the state-of-the-art. To solve a batch of point-to-point queries, we extended this system by incorporating the batching principles in SimGQ along with a new query aggregation technique that eliminates the redundant computation across point-to-point queries that share the same source or destination vertex.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-