The Cosmic Microwave Background (CMB) is an exquisitely sensitive probe of the fundamental parameters of cosmology. Extracting this information is computationally intensive, requiring massively parallel computing and sophisticated numerical algorithms. In this work we present MAD bench, a lightweight version of the MADCAP CMB power spectrum estimation code that retains the operational complexity and integrated system requirements. In addition, to quantify communication behavior across a variety of architectural platforms, we introduce the Integrated Performance Monitoring (IPM) package: a portable, lightweight, and scalable tool for effectively extracting MPI message-passing over heads. A performance characterization study is conducted on some of the world's most powerful supercomputers, including the superscalar Seaborg (IBMPower3+) and CC-NUMA Columbia (SGI Altix), as well as the vector-based Earth Simulator (NEC SX-6 enhanced) and Phoenix (Cray X1) systems. In-depth analysis shows that in order to bridge the gap between theoretical and sustained system performance, it is critical to gain a clear understanding of how the distinct parts of large-scale parallel applications interact with the individual subcomponents of HEC platforms.