- Main
Reuse Techniques for Efficiently Evaluating a Sequence of Iterative Graph Queries
- Jiang, Xiaolin
- Advisor(s): Gupta, Rajiv
Abstract
Graph analytics is employed in many domains (e.g., social networks, web graphs) to uncover insights by analyzing high volumes of connected data. Real world graphs are often large and iterative graph analytics requires repeated passes over the graph till the algorithm converges to a stable solution. As a result, iterative graph queries are expensive to evaluate. Thus, there has been a great deal of interest in developing scalable and efficient graph analytics systems for various platforms (e.g., GPUs, multicore servers, clusters). While much of the research has focused on developing algorithms for efficiently evaluating a single global graph query, in practice we may be faced with a sequence of queries. This research develops two platform independent complimentary reuse based approaches, value reuse and graph structure reuse, that collect information from executing a small number of queries and then use this information to speedup the evaluation of all future queries.
The first approach is based on the observation that, due to their global nature, vertex specific graph queries present an opportunity for sharing work across queries. To take advantage of this opportunity, we have developed the VRGQ framework that accelerates the vi evaluation of a stream of queries via coarse-grained value reuse. In particular, the results of queries for a small set of source vertices are reused to speedup the evaluation of all future queries. We present a two step algorithm that in its first step initializes the query result based upon value reuse and then in the second step iteratively evaluates the query to convergence. The reused results for a small number of queries are held in a reuse table. Our experiments with best reuse configurations on four power law graphs and thousands of graph queries of five kinds yielded average speedups of 143×, 13.2×, 6.89×, 1.43×, and 1.18×. We have also extended the above approach to evaluation of queries over streaming graphs and simultaneous evaluation of a batch of graph queries.
The second approach is based upon graph structure reuse. We have developed a new technique for identifying a proxy graph named the Core Graph (CG) that is not only small, it also produces highly precise results. A CG is a subgraph of the larger input graph that contains all vertices but on average contains only 10.7% of edges and yet produces precise results for 94.5–99.9% vertices in the graph. The identification of such an effective CG is based on our key new insight, namely, a small subset of critical edges are responsible for determining the converged results of nearly all the vertices across different queries. We develop algorithms for identifying a CG and demonstrate its benefits by improving the performance of three systems: Subway for GPU-based graph processing, GridGraph for out- of-core disk-based processing, and Ligra for in-memory graph processing. In our experiments with six kinds of graph queries and four large input graphs, we achieved speedups of up to 13.62× in GridGraph and 4.97× in Ligra.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-