Optimizing External Parallel Sorting in AsterixDB
In parallel database systems, when a query is evaluated, each operation in the query plan is typically parallelized. This allows a large cluster of machines to process a vast amount of data efficiently since all the machines work simultaneously. AsterixDB, a big data management system, leverages this fact to parallelize operations when executing a query, and sort is one such operation.
This work is an attempt to optimize the sort operation in AsterixDB to make the entire sort process parallel from start to finish without the need for an additional final merge step at the end of the sort operation. We state our problem and highlight some of the issues with the current approach to sorting. We present the architecture and high-level design of the proposed optimization. Then, we describe the implementation details and how the different components interact with each other. We conclude by showing the results of a performance evaluation before and after this optimization.