UC Santa Cruz
Physical Design Tuning Methods for Emerging System Architectures
- Author(s): LeFevre, Jeff
- Advisor(s): Polyzotis, Neoklis
- et al.
Physical design tuning (i.e., configuring the physical data model and secondary data structures) is key to obtaining good performance for a database management system. Recently, there have been disruptions to the types of data, queries, and systems used for analytical data management and processing. The advent of the cloud and database-as-a-service along with these recent disruptions have changed the analytical landscape significantly. Physical design tuning methods that were developed to address traditional RDBMS architectures are inadequate for the emerging system architectures and analytics. While tuning the physical design remains crucial for good performance, the problem acquires interesting dimensions in these new contexts.
This dissertation introduces new physical design tuning methods for emerging system architectures. Our methods exploit the unique characteristics of several different architectures, leveraging them toward good physical design. For replicated database architectures, we introduce the concept of divergent design that exploits database replication. Our method specializes the design of replicas to efficiently process subsets of the workload while still affording the opportunity to load balance the workload across replicas. For MapReduce architectures, we introduce the concept of opportunistic design that exploits the by-products of query processing in MapReduce. Our method includes a semantic model for user-defined functions (UDFs) and a novel query rewriting algorithm that together enable the effective reuse of previous results. For hybrid MapReduce--RDBMS architectures, we introduce the concept of multistore design that exploits the unique strengths of both stores. Our method periodically reorganizes the data in each store by transferring data between them, adapting their physical designs as the workload changes dynamically. Lastly, based on our experiences with the changing analytical landscape, we examine big data exploratory queries in current real-world scenarios and present a workload modeled upon our findings. We then present a benchmark along with a set of system performance metrics that highlight the various design tradeoffs made by different architectures. By better understanding these tradeoffs, the benchmark can be used to help guide the design of future hybrid architectures.