- Main
Optimizing Repartitioning Parallel Sort in AsterixDB
- Lychagin, Mikhail Dmitriyevich
- Advisor(s): Carey, Michael J
Abstract
As big data evolves, more and more databases are incorporating a parallel architecture. Sorting a dataset on a key value is a required operation of any database. The hope is that a linear increase in computing power given to a database results in a linear increase in performance. One of these databases is AsterixDB, which has Repartitioning Parallel Sort, or RPS, as its sort operator.
The goal of this thesis is to optimize RPS to fully utilize the parallel nature of AsterixDB in all cases. Currently, the sort operator performs poorly when faced with an input dataset whose sort attribute is skewed on one or more identical values. In this thesis, we first discuss the current state of sorting in AsterixDB and the problems associated with it. Second, we go over a proposed optimization to the sort operator. Third, we compare the old approach with the new one with performance testing. Finally, we discuss some future work that can be done to further improve sorting in AsterixDB.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-