Vertex-centric Parallel Computation of SQL Queries
In response to the increased interest in analyzing graph data, a large number of graph processing systems have been developed. In particular, systems that implement the vertex-centric bulk synchronous parallel (BSP) computational model, where vertices can run a user-defined program in parallel and communicate with each other via messages. SQL is known to be well suited for parallelism, this motivates us to investigate whether we can exploit the vertex-centric parallelism to evaluate SQL queries.
In this dissertation we present a scheme for parallel execution of SQL queries on top of any vertex-centric BSP graph processing engine. The scheme comprises a graph encoding of relational instances and a vertex program specification of our algorithm called TAG-join, which matches the theoretical communication and computation complexity of state-of-the-art join algorithms. When run on top of the vertex-centric TigerGraph database engine on a single multi-core server, TAG-join exploits thread parallelism and is competitive with (and often outperforms) reference RDBMSs on the TPC benchmarks they are traditionally tuned for. In a distributed cluster, TAG-join shows competitive performance with popular distributed SQL engines.