- Main
Adaptive Solution to Compress Deep Neural Networks for Resource-Constrained Devices
- Wang, Ruzhuo
- Advisor(s): Kim, Hyoseung
Abstract
Prior works applied singular value decomposition and dropout compression methods for fully-connected structure, or pruned the feature map by magnitude of kernel weights for convolutional structure. They only focused on one specific DNN structure. Other than that, prior framework needed numerous time in the compressing process to ensure the least accuracy penalty and they could not take a compression ratio as an input. However, this thesis proposes an idea about combining the different compression methods to generate a unified approach that compresses the whole DNN model. Moreover, we successfully combine singular value decomposition and dropout to be a new method called SVD-based dropout which turns out to be a efficient method for compressing fully-connected structure. Plus, we create a framework called Adaptive-Surgery that can take user input compression ratio and automatically decide the compression parameter $\beta$ for each convolutional layer and fully-connected layer and compresses them based on $\beta$. The compressed models that generated by Adaptive-Surgery can be directly implemented on the Raspberry Pi 3 Model B, which has constrained processing resource. The experiment results show that target two DNN models (Alexnet and Cifar10-quick) can be compressed by Adaptive Surgery to three input compression ratios (43.75\%, 75\% and 93.75\%) with soft accuracy penalty. The experiment results also prove that compared with the models whose initial parameters are randomly chosen, those models that generated by Adaptive-Surgery have less accuracy penalty. More importantly, we can manually set the compression ratio for Adaptive-Surgery, which means in order to implement DNN models on resource constrained devices with specific requirements for real-time correctness, Adaptive-Surgery can compress the DNN models to meet the requirements.
For the future work, Adaptive-Surgery could be evaluated on more than two DNN models to prove its performance. Although our SVD-based dropout have a great performance on compressing the fully-connected structure, it can be improved by having a better way to represent the bias matrices in the pruned model rather than ignore the influence of bias matrices. Also, if we can combine two different compression methods to generate a unified compression framework, researchers can try more combinations of different compression methods for compressing the DNN to achieve a unified approach. Last but not the least, in the evaluation of this thesis, we do the experiments to test for three compression ratios. However, since out framework can support arbitrary ratios, it will be interesting to try with more diverse set of compression ratios and analyze their results.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-