An Examination and Survey of Random Bit Flips and Scientific Computing
Abstract
Data error, modification, and loss can occur in scientific computing due to numerous potential causes, including those natural, accidental, or malicious in nature. Such errors and modifications can silently impact scientific computing results if not detected and corrected or compensated for through other means. The goal of this document is to describe the ways in which integrity faults due to bit flips occur, their potential odds of occurring, and potential mitigation mechanisms in high-level, practical terms, and is broken into individual computer components and environments describing each of those elements. We conclude the report by summarizing key issues and several best practices for mitigation of the effects of bit flip-induced errors.
Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.