Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

PMap : unlocking the performance genes of HPC applications

Abstract

Performance modeling, the science of understanding and predicting application performance, is important but challenging. High Performance Computing (HPC) with large- scale applications and aggressive technologies, such as dynamic computational grids, hybrid computing platforms, and innovative storage systems, further complicates the task. This dissertation proposed and proved the hypothesis that a small number of performance primitives can be extracted from HPC applications and leveraged for fast application performance modeling and prediction even on large-scale dynamic systems. PMap : a set of methods and tools to extract, measure, and analyze performance primitives in HPC applications are proposed, implemented, and verified under these challenging environments. Two production computational grids, Teragrid and Geon, were monitored with periodically running benchmarks for about half a year. Their performance fluctuated in the 50% range. However, simple benchmarks that serve as performance primitives can be used to predict application performance with a relative error as low as 9%. To map program constructs to the best matched hardware components in hybrid computing platforms, an automatic idioms (performance primitives) recognition method was proposed and implemented based on the open source compiler Open64. With the NAS Parallel Benchmark (NPB) as a case study, the prototype system is about 90% accurate compared with idiom classfication by a human expert. The performance of the idiom benchmarks with their corresponding in- stances in the NPB codes on two different platforms were compared with different methods. The approximation accuracy is up to 97%. With the HPC data challenge and emerging storage technologies, a flash-based supercomputer DASH was designed, built, and tuned. A large parameter space was swept by fast and reliable measurements developed to investigate varying design options, and the results showed that performance can be improved by as much as 9x with appropriate existing technologies developed here. Finally, the PMaC framework was extended to model and predict application performance on flash storage systems. Results showed that the total I/O time can be predicted with reasonable error of 15%. The end result of this body of work is that the performance of applications on supercomputers can be understood by mapping their performance genetics

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View