Li, Tianmu

Learned Approximate Computing for Machine Learning

2023

Li, Tianmu
Advisor(s): Gupta, Puneet

Abstract

{Machine learning using deep neural networks is growing in popularity and is demanding increasing computation requirements at the same time. Approximate computing is a promising approach that trades accuracy for performance, and stochastic computing is an especially interesting approach that preserves the compute units of single-bit computation while allowing adjustable compute precision. This dissertation centers around enabling and improving stochastic computing for neural networks, while also discussing works that lead up to stochastic computing and how the techniques developed for stochastic computing are applied to other approximate computing methods and applications other than deep neural networks. We start with 3pxnet, which combines extreme quantization with model pruning. While 3pxnet achieves extremely compact models, it demonstrates limits of binarization, including the inability to scale to higher precision levels and performance bottlenecks from accumulation. This leads us to stochastic computing, which performs single-gate multiplications and additions on probabilistic bit streams. The initial SC neural network implementation in ACOUSTIC aims at maximizing SC performance benefits while achieving usable accuracy. This is achieved through design choices in stream representation, performance optimizations using pooling layers, and training modifications to make single-gate accumulation possible. The subsequent work in GEO improves the stream generation and computation aspects of stochastic computing and reduces the accuracy gap between stochastic computing and fixed-point computing. The accumulation part of SC is further optimized in REX-SC, which allows efficient modeling of SC accumulation during training. During these iterations of the SC algorithm, we developed efficient training pipelines that target various aspects of training for approximate computing. Both forward and backward passes of training are optimized, which allows us to demonstrate model convergence results using SC and other approximate computing methods with limited hardware resources. Finally, we apply the training concept to other applications. In LAC, we show that an almost arbitrary parameterized application can be trained to perform well with approximate computing. At the same time, we can search for the optimal hardware configuration using NAS techniques.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Learned Approximate Computing for Machine Learning