UC San Diego
Achieving Efficient I/O with High-Performance Data Center Technologies
- Author(s): Conley, Michael Aaron
- et al.
Recently there has been a significant effort to build systems designed for large-scale data processing, or "big data." These systems are capable of scaling to thousands of nodes, and offer large amounts of aggregate processing throughput. However, there is a severe lack of attention paid to the efficiency of these systems, with individual hardware components operating at speeds as low as 3% of their available bandwidths. In light of this observation, we aim to demonstrate that efficient data-intensive computation is not only possible, but also results in high levels of overall performance. In this work, we describe two highly efficient data processing systems, TritonSort and Themis, built using 2009-era cluster technology. We evaluate the performance of these systems and use them to set world records in high-speed sorting. Next, we consider newer, faster hardware technologies that are not yet widely deployed. We give a detailed description of the design decisions and optimizations necessary for efficient data-intensive computation on these technologies. Finally, we apply these optimizations to large-scale data- processing applications running in the public cloud, and once again set world records in high-speed sorting. We present the details of our experience with the Amazon Web Services (AWS) cloud and also explore Google Cloud Platform