UC San Diego
Algorithm-Hardware Optimization of Deep Neural Networks for Edge Applications
- Author(s): Akhlaghi, Vahideh
- Advisor(s): Gupta, Rajesh K.
- et al.
Deep Neural Network (DNN) models are now commonly used to automate and optimize complicated tasks in various fields. For improved performance, models increasingly use more processing layers and are frequently over-parameterized. Together these lead to tremendous increases in their compute and memory demands. While these demands can be met in large-scale and accelerated computing environments, they are simply out of reach for the embedded devices seen at the edge of a network and near edge devices such as smart phones and etc. Yet, the demand for moving these (recognition, decision) tasks to edge devices continues to grow for increased localized processing to meet privacy, real-time data processing and decision making needs. Thus, DNNs continue to move towards the edges of the networks at `edge' or `near-edge' devices, even though a limited off-chip storage and on-chip memory and logic on the edge devices prohibit the deployment and efficient computation of large yet highly-accurate models. Existing solutions to alleviate such issues improve either the underlying algorithm of these models to reduce their size and computational complexity or the underlying computing architectures to provide efficient computing platforms for these algorithms. While these attempts improve computational efficiency of these models, significant reductions are only possible through optimization of both the algorithms and the hardware for DNNs.
In this dissertation, we focus on improving the computation cost of DNN models by taking into account the algorithmic optimization opportunities in the models along with hardware level optimization opportunities and limitations. The techniques proposed in this dissertation lie in two categories: optimal reduction of computation precision and optimal elimination of inessential computation and memory demands. Low precision but low-cost implementation of highly frequent computation through low-cost probabilistic data structures is one of the proposed techniques to reduce the computation cost of DNNs. To eliminate excessive computation that has no more than minimal impact on the accuracy of these models, we propose a software-hardware approach that detects and predicts the outputs of the costly layers with fewer operations. Further, through the design of a machine learning based optimization framework, it has been shown that optimal platform-aware precision reduction at both algorithmic and hardware levels minimizes the computation cost while achieving acceptable accuracy. Finally, inspired by parameter redundancy in over-parameterized models and the limitations of the hardware, reducing the number of parameters of the models through a linear approximation of the parameters from a lower dimensional space is the last approach proposed in this dissertation. We show how a collection of these measures improve deployment of sophisticated DNN models on edge devices.