Over the past few years, quantization has shown great and consistent success in compressing high-dimensional data and over-parameterized models. This dissertation focuses on theoretical guarantees and applications of quantization algorithms for fast binary embeddings (FBEs), random Fourier features (RFFs), and neural networks (NNs). Chapter 1 presents an introduction to quantization and background information for topics covered by later chapters.
In Chapter 2, we introduce a novel fast binary embedding algorithm that transforms data points from high-dimensional space into low-dimensional binary sequences. We prove that the $\ell_2$ distances among original data points can be recovered by the $\ell_1$ norm on binary embeddings and its associated approximation error is comparable to that of a continuous valued Johnson-Lindenstrauss embedding plus a quantization error that admits a polynomial decay as the embedding dimension increases. So the length of the binary codes required to achieve a desired accuracy is quitesmall which is empirically verified by our experiments over natural images.
As a natural extension of Chapter 2, Chapter 3 proposes low-bit $\Sigma\Delta$ and distributed noise-shaping methods for quantizing RFFs associated with shift-invariant kernels. We show that the quantized RFFs achieve a high accuracy approximation of the underlying kernels with the approximation error decaying polynomially as the dimension of the RFFs increases, and decaying exponentially as a function of the bits used. Moreover, we test our method on multiple machine learning tasks that involve the kernel method.
In Chapter 4, we generalize a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. We expands the results of previous work on GPFQ to handle general quantization alphabets and a range of input distributions, showing that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights – i.e., level of over-parametrization. Additionally, we propose modifications to promote sparsity of the weights, and rigorously analyze the associated error. Without fine-tuning, we can quantize several common architectures to $4$ bits, while attaining an accuracy loss less than $1\%$.
Since the theoretical results in Chapter 4 are limited to single-layer neural networks, in Chapter 5, we propose a new stochastic algorithm for quantizing pretrained neural networks. We establish, for the first time, rigorous full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. Moreover, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using a mere $\log_2\log N$ bits, where $N$ represents the maximum width across all layers.