Skip to main content
eScholarship
Open Access Publications from the University of California

Unsteady Turbulent Simulations on a Cluster of Graphics Processors

Abstract

This paper describes the GPU accelerated MBFLO2 multi-block turbulent flow solver completely in double precision using CUDA and the latest generation of GPU processors. On a cluster of 8 Tesla C2050 ''Fermi'' GPUs and Intel Xeon X5550 ''Nehalem'' quad-core CPUs, we achieve 9x speedup over the parallel CPU solver or 70x speedup over the serial solver. High performance is obtained by optimizing the data layout on the GPU, optimizing data transfers and using asynchronous memory copies to overlap GPU execution with communications. We test the solver on a turbulent flat plate and an unsteady turbulent cylinder with 3.2 million grid points. We confirm the GPU results are in agreement with turbulent flow theory. We discuss the GPU optimization techniques used to reach this level of performance.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View