Skip to main content
eScholarship
Open Access Publications from the University of California

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

  • Author(s): Chandramowlishwaran, A
  • Williams, S
  • Oliker, L
  • Lashuk, I
  • Biros, G
  • Vuduc, R
  • et al.
Abstract

This work presents the first extensive study of single- node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi- core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, Open MP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double- precision performance by 25× on Intel's quad-core Nehalem, 9.4× on AMD's quad-core Barcelona, and 37.6× on Sun's Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture. © 2010 IEEE.

Many UC-authored scholarly publications are freely available on this site because of the UC Academic Senate's Open Access Policy. Let us know how this access is important for you.

Main Content
Current View