Skip to main content
Open Access Publications from the University of California


UCLA Electronic Theses and Dissertations bannerUCLA

Bridging the Gap Between Application Logic and Auto-optimization in Modern Data Analytics


Recent decades have seen an explosion in the diversity and scale of data analytics tasks. While data analysis of the late 20th century was characterized by the dominance of relational databases and highly structured querying languages, demand for less structured and more complex tasks has resulted in new data analytics frameworks that break with the norms of historical systems.

This growth has come at the cost of breaking with the assumptions that guided automated optimization in historical systems. Automated optimizations analyze the input program of a system and extract insights that allow the system to better execute a given task with little to no human effort. Absent these features, analysts must manually tune data processing frameworks to achieve reasonable performance, a delicate and time-consuming endeavor.

In this thesis, I argue that automated optimization techniques, such as caching and physical design, that have long been deployed in relational frameworks remain both relevant and necessary to achieving sustainable performance in modern systems. Via two projects, each targeting a different class of analysis tasks, I identify new techniques to bridge the gap between modern data analytics frameworks and established automated optimization techniques.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View