Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

From Variability-Tolerance to Approximate Computing in Parallel Computing Architectures

Abstract

Variation in performance and power across manufactured parts and their operating conditions is an accepted reality in modern microelectronic manufacturing processes with geometries in nanometer scales. This dissertation covers challenges and opportunities in identifying variations, their effects and methods to combat these variations for improved microelectronic devices. We focus on timing errors caused by various sources of variations at different levels. We devise methods to mitigate such errors by jointly exposing hardware variations to the software and by exploiting parallel processing. We investigate methods to predict and prevent, detect and correct, and finally conditions under which errors can be accepted. For each of these methods, our work spans defining and measuring the notion of error tolerance at various levels, from ISA to procedures to parallel programs. These measures essentially capture the likelihood of errors and associated cost of error correction at different levels. The result is a design platform that enables us to further combine these methods for a new joint method of detecting and correcting with accepting errors across the hardware/software interface via memoization (i.e., spatial or temporal reuse of computation). We accordingly devise an arsenal of software techniques and microarchitecture optimizations for improving cost and scale of these methods in massively parallel computing units, such as GP-GPUs and clustered many-core accelerators. We find that parallel architectures and parallelism in general provide the best means to combat and exploit variability to design resilient and efficient systems. Using such programmable parallel accelerator architectures, we show how system designers can coordinate propagation of error information and its effects along with new techniques for memoization and memristive associative memory. This discussion naturally leads to use of these techniques into emerging area of "approximate computing", and how these can be used in building resilient and efficient computing systems

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View