Lawrence Berkeley National Laboratory
Identifying performance bottlenecks on modern microarchitectures using
an adaptable probe
- Author(s): Griem, Gorden
- Oliker, Leonid
- Shalf, John
- Yelick, Katherine
- et al.
The gap between peak and delivered performance for scientific applications running on microprocessor-based systems has grown considerably in recent years. The inability to achieve the desired performance even on a single processor is often attributed to an inadequate memory system, but without identification or quantification of a specific bottleneck. In this work, we use an adaptable synthetic benchmark to isolate application characteristics that cause a significant drop in performance, giving application programmers and architects information about possible optimizations. Our adaptable probe, called sqmat, uses only four parameters to capture key characteristics of scientific workloads: working-set size, computational intensity, indirection, and irregularity. This paper describes the implementation of sqmat and uses its tunable parameters to evaluate four leading 64-bit microprocessors that are popular building blocks for current high performance systems: Intel Itanium2, AMD Opteron, IBM Power3, and IBM Power4.