- Druinsky, Alex;
- Ghysels, Pieter;
- Li, Xiaoye S;
- Marques, Osni;
- Williams, Samuel;
- Barker, Andrew;
- Kalchev, Delyan;
- Vassilevski, Panayot
- Editor(s): Wyrzykowski, Roman;
- Deelman, Ewa;
- Dongarra, Jack J;
- Karczewski, Konrad;
- Kitowski, Jacek;
- Wiatr, Kazimierz
We study the performance of a two-level algebraic-multigrid algorithm, with a focus on the impact of the coarse-grid solver on performance. We consider two algorithms for solving the coarse-space systems: the preconditioned conjugate gradient method and a new robust HSS-embedded low-rank sparse-factorization algorithm. Our test data comes from the SPE Comparative Solution Project for oil-reservoir simulations. We contrast the performance of our code on one 12-core socket of a Cray XC30 machine with performance on a 60-core Intel Xeon Phi coprocessor. To obtain top performance, we optimized the code to take full advantage of fine-grained parallelism and made it thread-friendly for high thread count. We also developed a bounds-and-bottlenecks performance model of the solver which we used to guide us through the optimization effort, and also carried out performance tuning in the solver’s large parameter space. As a result, significant speedups were obtained on both machines.