Reshaping Deep Neural Networks for Efficient Hardware Inference
- Author(s): Khodamoradi, Alireza
- Advisor(s): Kastner, Ryan RK
- et al.
The latest Deep Learning (DL) methods for designing Deep Neural Networks (DNN) have significantly expanded our ability to train data processing systems. Coupled with exponential growth in available digital data, we have seen dramatic accuracy improvements in DNNs and widespread adoption of these models in different applications.
This increased demand has motivated innovations in DNN architecture design to deliver high-quality output. For example, advanced DL models can include irregular connections between their layers, have more parameters, and employ computationally complex neurons. Unfortunately, these new architectural additions often increase the implementation complexity of the DNNs on hardware, particularly when deploying DL models for inference in scale-out and power-limited systems.
Currently, to deploy a DNN on a custom platform, an abstract of the DL model is used to create a functionally identical realization. However, because altering this abstract changes the functionality of the DL model, hardware designers keep the model unchanged for a lossless implementation.
This thesis shows that a co-design approach can improve the hardware implementation of DL models. In a co-design approach, the designer reshapes the DNN architecture to better fit a target processing platform and preserves its accuracy by retraining the model.
We describe a custom accelerator for Spiking Neural Networks (SNN) with improved computational cost and memory utilization because of reshaping the layers and neurons of the model. We then apply these changes to the existing SNN models and show that they can maintain their accuracy after the reshaping and retraining. In addition, we introduce novel applications for SNNs based on the new architecture. We also present a stochastic noise filter for pre-processing SSN's input with improved accuracy and memory utilization. Furthermore, we explain a reshaping method for Residual Networks (ResNet) to reduce their memory footprint while preserving their accuracy.
This thesis also introduces a method for accelerating the co-design process. Reshaping DL models can increase the complexity of their training stage. We present an auto tuner for the learning rate (an essential parameter for training DNNs) that simplifies the manual tuning for this parameter and can accelerate the retraining of DL models.