- Main
Improved Timing Error Resilience of Microelectronic Computing Systems using Cross-layer Optimizations
- Jiao, Xun
- Advisor(s): Gupta, Rajesh
Abstract
Microelectronic scaling has entered into the nanoscale era with tremendous capacity and performance advantages that continue to drive new devices and systems from high-performance computing to ultra-low power Internet endpoints. This scaling, however, faces challenges due to serious effects of microelectronic variability that results in significant variation in individual device parameters. The most common manifestation of this variability is increased susceptibility to timing errors. Combating these errors usually results in increased guardbands in the circuit and architectural design, thus reducing the gains from process technology advances.
This dissertation focuses on methods to improve the timing error resilience of microelectronic computing systems by reducing the guardbands which also results in improved operational efficiency if microelectronic circuits. Timing errors can show at various abstraction levels --- the circuit layer, the architecture layer, and the software layer. Accordingly, we have proposed error tolerance methods that correspond to the layer where such errors manifest. Considering the interdependence of overall system performance or application quality on the design choices made at various abstraction levels, an integrated view of the overall effects of error tolerance strategies is necessary to evaluate the effects of these approaches in the system or application layer. Cross-layer optimizations are thus important in addressing the effects of timing errors. At the circuit layer, we examine the root cause of ``timing error'' via analysis of dynamic path sensitization of the circuit. We use machine learning methods to build a prediction model for the timing errors based on the useful features extracted from computation history, circuit workload, and circuit switching. Results show high prediction accuracy and fast computing performance that make the model useful in early circuit reliability evaluation. Second, at the architecture layer, by characterizing delay of various instructions, we dynamically adjust the clock frequency, that reduces timing errors and improve the operational efficiency. Finally, at the software layer, by utilizing the inherent ``error-tolerance'' of emerging applications such as neural networks, we reduce design margins under the premise that the application quality is acceptable. Specifically, we have investigated the vulnerability of emerging neural networks to timing errors and deliver an approximate computing hardware for neural networks that achieves significant energy savings with negligible accuracy loss. Stemming from this dissertation, our future research concerns building emerging high-performance, low-power, reliable, and intelligent non-conventional computing systems.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-