Partial Fault Detection Using Speculative Architecture Structures
Skip to main content
eScholarship
Open Access Publications from the University of California

Partial Fault Detection Using Speculative Architecture Structures

Abstract

Future microprocessors will be highly susceptible to transient errors as the sizes of transistors decrease due to CMOS scaling. Prior techniques advocated full scale structural or temporal redundancy to achieve fault tolerance. Though they can provide complete fault coverage, they incur significant area and/or performance overhead. It is desirable to have a mechanism that can provide, incomplete, but still sufficiently high fault coverage with negligible area and/or performance cost. To achieve this goal, in this paper, we examine exploiting speculative structures that already exist in modern processors to provide partial fault coverage. We start by quantifying how much the faulty program deviates from the correct program execution in terms of control flow, address patterns and store values. We find this classification useful to design techniques that can detect a particular form of deviation and thereby ultimately detect the transient fault. In order to detect transient faults, we propose augmenting branch predictors to detect control flow errors, store sets and L2 cache misses to predict faults that might have resulted in incorrect address references, and a value predictor to detect incorrect store values.

Pre-2018 CSE ID: CS2005-0847

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View