Skip to main content
eScholarship
Open Access Publications from the University of California

Sorting 100 TB on Google Compute Engine

Abstract

Google Compute Engine offers a high-performance, cost-effective means for running I/O-intensive applications. This report details our experience running large-scale, high- performance sorting jobs on GCE. We run sort applications up to 100 TB in size on clusters of up to 299 VMs, and find that we are able to sort data at or near the hardware capabilities of the locally attached SSDs. In particular, we sort 100 TB on 296 VMs in 915 seconds at a cost of $154.78. We compare this result to our previous sorting experience on Amazon Elastic Compute Cloud and find that Google Compute Engine can deliver similar levels of performance. Although individual EC2 VMs have higher levels of performance than GCE VMs, permitting significantly smaller cluster sizes on EC2, we find that the total dollar cost that the user pays on GCE is 48% less than the cost of running on EC2.

Pre-2018 CSE ID: CS2015-1013

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View