Skip to main content
eScholarship
Open Access Publications from the University of California

Fault-Tolerant LOBPCG for Nuclear CI Calculations

Abstract

Exascale computing platforms with millions of compute units and with thousands of nodes are predicted to experience frequent faults which interrupt applications' execution. In this context resilience against faults becomes important. We examine user and software level fault mitigation strategies in a distributed LOBPCG algorithm targeting nuclear CI calculations. In particular, we present and evaluate one strategy that keeps the total number of fault-Tolerant LOBPCG iterations close to that of the standard LOBPCG algorithm ran on a fault-free machine.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View