Youssef, Ahmed

Algorithmic Techniques towards Efficient Quantization of Deep Neural Networks

2020

Abstract

With numerous breakthroughs over the past several years, deep learning (DL) techniques have transformed the world of artificial intelligence (AI). The abilities that were once considered unique and humane, are now characteristics of powerful machines. State-of-the art performance across various perceptual tasks from computer vision, speech recognition, game playing, and others, have been demonstrated. Now that we know it works, current research is more directed towards exploring: (a) TinyAI: how to make it more efficient (deployable in resource-constrained devices), through developing new optimization algorithms and tooling; (b) AutoAI: how to reduce human effort and speedup the development cycle of AI systems through automation. (c) InterpretableAI: understand why it works, through detailed theoretical studies; (d) AppliedAI: how to combine all these efforts to move from “Narrow AI” (resolving specific task) into “General AI” (human semi-equivalent).

This dissertation primarily takes on the exploration of the first and second directions (autoAI & tinyAI). In particular, we make progress towards developing algorithms for more efficient and automated AI systems with particular focus on quantization methods.(i) Discovering optimal quantization bitwidths. Research question: What is the optimal bitwidth per layer for optimal quantization of a deep neural network? Proposal: we devel- oped a systematic approach to automate the process of discovering the optimal bitwidth for each layer of a deep neural network while complying to the constraint of maintaining the accuracy through an end-to-end deep Reinforcement Learning framework (ReLeQ). (ii) Quantization-aware training. Research question: Can we train a DNN in such a way that makes them inherently robust to quantization? Proposal: we developed a novel quantization- friendly regularization technique based on sinusoidal function, called WaveQ. WaveQ exploits the periodicity, differentiability, and the local convexity profile in sinusoidal functions to automati- cally propel weights towards values that are inherently closer to quantization levels. Moreover, leveraging the fact that sinusoidal period is a continuous valued parameter, we utilized it as an ideal optimization objective and a proxy to minimize the actual quantization bitwidth, which avoids the issues of gradient-based optimization for discrete valued parameters. (iii) Improved and accelerated finetuning methods. Research question: Can we finetune a quantized DNN in an efficient way to better improve its final accuracy? Proposal: we developed a novel finetuning algorithm for quantized DNNs. The proposed approach utilizes knowledge distillation through teacher-student paradigm in a novel setting that exploits the feature extraction capability of DNNs for higher-accuracy quantization. This divide and conquer strategy makes the training of each student section possible in isolation while all these independently trained sections are later stitched together to form the equivalent fully quantized network.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Algorithmic Techniques towards Efficient Quantization of Deep Neural Networks