Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Optimizing Repartitioning Parallel Sort in AsterixDB

Abstract

As big data evolves, more and more databases are incorporating a parallel architecture. Sorting a dataset on a key value is a required operation of any database. The hope is that a linear increase in computing power given to a database results in a linear increase in performance. One of these databases is AsterixDB, which has Repartitioning Parallel Sort, or RPS, as its sort operator.

The goal of this thesis is to optimize RPS to fully utilize the parallel nature of AsterixDB in all cases. Currently, the sort operator performs poorly when faced with an input dataset whose sort attribute is skewed on one or more identical values. In this thesis, we first discuss the current state of sorting in AsterixDB and the problems associated with it. Second, we go over a proposed optimization to the sort operator. Third, we compare the old approach with the new one with performance testing. Finally, we discuss some future work that can be done to further improve sorting in AsterixDB.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View