Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
- Author(s): Chandramowlishwaran, A;
- Williams, S;
- Oliker, L;
- Lashuk, I;
- Biros, G;
- Vuduc, R
- et al.
Published Web Locationhttps://doi.org/10.1109/IPDPS.2010.5470415
This work presents the first extensive study of single- node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi- core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, Open MP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double- precision performance by 25× on Intel's quad-core Nehalem, 9.4× on AMD's quad-core Barcelona, and 37.6× on Sun's Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture. © 2010 IEEE.