Understanding and Mitigating Multicore Performance Issues on the AMD Opteron
Over the past 15 years, microprocessor performance has doubled approximately every 18 months through increased clock rates and processing efficiency. In the past few years, clock frequency growth has stalled, and microprocessor manufacturers such as AMD have moved towards doubling the number of cores every 18 months in order to maintain historical growth rates in chip performance. This document investigates the ramifications of multicore processor technology on the new Cray XT4? systems based on AMD processor technology. We begin by walking through the AMD single-core and dual-core and upcoming quad-core processor architectures. This is followed by a discussion of methods for collecting performance counter data to understand code performance on the Cray XT3? and XT4? systems. We then use the performance counter data to analyze the impact of multicore processors on the performance of microbenchmarks such as STREAM, application kernels such as the NAS Parallel Benchmarks, and full application codes that comprise the NERSC-5 SSP benchmark suite. We explore compiler options and software optimization techniques that can mitigate the memory bandwidth contention that can reduce computing efficiency on multicore processors. The last section provides a case study of applying the dual-core optimizations to the NAS Parallel Benchmarks to dramatically improve their performance.