Chen, Yuxin; Brock, Benjamin; Porumbescu, Serban; Buluç, Aydin; Yelick, Katherine; Owens, John D

doi:10.1109/sc41404.2022.00055

Download PDF

Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way

2022

Published Web Location

https://doi.org/10.1109/sc41404.2022.00055

Creative Commons 'BY' version 4.0 license

Abstract

We present Atos, a dynamic scheduling framework for multi-node-GPU systems that supports PGAS-style lightweight one-sided memory operations within and between nodes. Atos's lightweight GPU-to-GPU communication enables latency hiding and can smooth the interconnection usage for bisection-limited problems. These benefits are significant for dynamic, irregular applications that often involve fine-grained communication at unpredictable times and without predetermined patterns. Some principles for high performance: (1) do not involve the CPU in the communication control path; (2) allow GPU communication within kernels, addressing memory consistency directly rather than relying on synchronization with the CPU; (3) perform dynamic communication aggregation when interconnections have limited bandwidth. By lowering the overhead of communication and allowing it within GPU kernels, we support large, high-utilization GPU kernels but with more frequent communication. We evaluate Atos on two irregular problems: Breadth-First-Search and PageRank. Atos outperforms the state-of-the-art graph libraries Gunrock, Groute and Galois on both single-node-multi-GPU and multi-node-GPU settings.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Berkeley

Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way

Published Web Location