Systems of linear equations arise at the heart of many scientific and
engineering applications. Many of these linear systems are sparse; i.e., most
of the elements in the coefficient matrix are zero. Direct methods based on
matrix factorizations are sometimes needed to ensure accurate solutions. For
example, accurate solution of sparse linear systems is needed in shift-invert
Lanczos to compute interior eigenvalues. The performance and resource usage of
sparse matrix factorizations are critical to time-to-solution and maximum
problem size solvable on a given platform. In many applications, the
coefficient matrices are symmetric, and exploiting symmetry will reduce both
the amount of work and storage cost required for factorization. When the
factorization is performed on large-scale distributed memory platforms,
communication cost is critical to the performance of the algorithm. At the same
time, network topologies have become increasingly complex, so that modern
platforms exhibit a high level of performance variability. This makes
scheduling of computations an intricate and performance-critical task. In this
paper, we investigate the use of an asynchronous task paradigm, one-sided
communication and dynamic scheduling in implementing sparse Cholesky
factorization (symPACK) on large-scale distributed memory platforms. Our solver
symPACK relies on efficient and flexible communication primitives provided by
the UPC++ library. Performance evaluation shows good scalability and that
symPACK outperforms state-of-the-art parallel distributed memory factorization
packages, validating our approach on practical cases.