- Main
PMap : unlocking the performance genes of HPC applications
Abstract
Performance modeling, the science of understanding and predicting application performance, is important but challenging. High Performance Computing (HPC) with large- scale applications and aggressive technologies, such as dynamic computational grids, hybrid computing platforms, and innovative storage systems, further complicates the task. This dissertation proposed and proved the hypothesis that a small number of performance primitives can be extracted from HPC applications and leveraged for fast application performance modeling and prediction even on large-scale dynamic systems. PMap : a set of methods and tools to extract, measure, and analyze performance primitives in HPC applications are proposed, implemented, and verified under these challenging environments. Two production computational grids, Teragrid and Geon, were monitored with periodically running benchmarks for about half a year. Their performance fluctuated in the 50% range. However, simple benchmarks that serve as performance primitives can be used to predict application performance with a relative error as low as 9%. To map program constructs to the best matched hardware components in hybrid computing platforms, an automatic idioms (performance primitives) recognition method was proposed and implemented based on the open source compiler Open64. With the NAS Parallel Benchmark (NPB) as a case study, the prototype system is about 90% accurate compared with idiom classfication by a human expert. The performance of the idiom benchmarks with their corresponding in- stances in the NPB codes on two different platforms were compared with different methods. The approximation accuracy is up to 97%. With the HPC data challenge and emerging storage technologies, a flash-based supercomputer DASH was designed, built, and tuned. A large parameter space was swept by fast and reliable measurements developed to investigate varying design options, and the results showed that performance can be improved by as much as 9x with appropriate existing technologies developed here. Finally, the PMaC framework was extended to model and predict application performance on flash storage systems. Results showed that the total I/O time can be predicted with reasonable error of 15%. The end result of this body of work is that the performance of applications on supercomputers can be understood by mapping their performance genetics
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-