A Declarative Language for Advanced Analytics and its Scalable Implementation
- Author(s): Shkapsky, Alexander Philip
- Advisor(s): Zaniolo, Carlo
- et al.
Advanced analytics are used to discover hidden patterns and trends in massive datasets. Great strides have been made by researchers to provide computational models, systems and accompanying languages for analytics. However, there is still a dire need for highly expressive declarative languages that enable the compilation, optimization and evaluation of advanced analytics over massive datasets. Specifically, a language for analytics needs (i) to support the expression of analytics over multiple data models (ii) to provide high-level declarative constructs enabling system optimizations, and (iii) be conducive for iterative or recursive evaluation.
In this dissertation, we propose an expressive Datalog language for advanced analytics, and compilation and optimization techniques for its efficient evaluation on systems designed for iterative execution. Specifically, this dissertation makes two main contributions:
(i) We develop and demonstrate a next generation Datalog System - the Deductive Application Language System (DeALS). To extend the range of analytics supported in DeALS, we add support for aggregation in recursion into our logic-based language. We propose the design and implementation of several monotonic aggregates that can be used in recursive Datalog rules and evaluated efficiently using our novel optimization techniques. We demonstrate the effectiveness of these aggregates and conduct an experimental comparison with other Datalog systems and determine that DeALS combines superior generality with superior performance.
(ii) We design and implement BigDatalog, a Datalog system on Apache Spark, for large-scale advanced analytics. We implement BigDatalog for efficient distributed evaluation and to utilize communication-reduction techniques during evaluation. We propose compilation and optimization techniques, as well as job scheduling techniques, to support efficiently the evaluation of DeAL programs on Spark. We conduct an experimental comparison with other state-of-the-art large-scale Datalog systems and demonstrate the efficacy of our techniques and effectiveness of our Spark extensions in supporting Datalog-based analytics.