Skip to main content
Open Access Publications from the University of California

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

  • Author(s): Chandramowlishwaran, A
  • Williams, S
  • Oliker, L
  • Lashuk, I
  • Biros, G
  • Vuduc, R
  • et al.

This work presents the first extensive study of single- node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi- core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, Open MP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double- precision performance by 25× on Intel's quad-core Nehalem, 9.4× on AMD's quad-core Barcelona, and 37.6× on Sun's Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture. © 2010 IEEE.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
Current View