Neural networks have gained widespread use in many machine learning tasks due to their state-of-the-art performance. However, the cost of this progress lies in the ever-increasing sizes and computational demands of the resulting models. As such, the neural network compression, the process of reducing the size, power consumption, or any other cost of interest of the model, has become an important practical step when deploying the trained models to perform inference tasks.
In this dissertation, we explore a particular compression mechanism --- the low-rank decomposition --- and its extensions for the purposes of neural network compression. We study important aspects of the low-rank compression: how to select the decomposition ranks across the layers, how to choose best decomposition shapes for non-matrix weights among a number of options, and how to adapt the low-rank scheme to target the inference speed. Computationally, these are hard problems involving integer variables (ranks, decomposition shapes) and continuous variables (weights), as well as nonlinear loss and constraints.
As we show over the course of this dissertation, all these problems admit suitable formulations that can be efficiently solved using the recently proposed \emph{learning-compression algorithm}. The algorithm relies on the alternation of two optimization steps: the step over the neural network parameters, the L step, and the step over the compression parameters, the C step. Once we formulate the compression problems, we show how the L and C steps are derived. Each step can be solved efficiently: the L step is solved by stochastic gradient descent, and the C step relies on singular value decomposition. We demonstrate the effectiveness of the proposed compression schemes and the corresponding algorithms on multiple networks and datasets.
Finally, we discuss the resulting general neural network compression toolkit that encompasses all compression schemes presented in this dissertation and many others. The toolkit is designed to be flexible and extensible, and is released under the open-source license.