Computational Fluid Dynamics (CFD) is a popular tool in engineering applications and an active research area in the field of Fluid Mechanics, yet today's state-of-the-art CFD solvers lack the ability to perform adequately on modern architectures. Design of an efficient CFD solver, that takes advantage of available computational resources on both CPU and GPU architectures, requires a comprehensive analysis of the solver while considering the different architectural aspects of these processing units. In this thesis, we present our efforts in designing an efficient and scalable implementation that solves the fluid motion of compressible viscous flow at transonic speeds.
Stencil computation, which is the core computational pattern of these solvers, has been vastly studied and optimized for different architectures. However, these optimizations have mostly been focused on applications that primarily consist of a single stencil pattern. We tailor these optimizations to our specific application which involves multiple stencil patterns with different characteristics. Our optimizations also include unique numerical and application-specific improvements, while building up on well-known parallelization techniques for GPU and multicore CPU implementation.