Sparsity has played an important role in numerous signal processing systems. By leveraging sparse representations of signals, many batch estimation algorithms and methods that are efficient, robust, and effective for practical engineering problems have been developed. However, gradient descent-based approaches that are less computationally expensive have become essential to the development of modern machine learning systems, especially the deep neural networks (DNNs). This dissertation examines how we can incorporate sparsity principles into gradient-based learning algorithms, in both signal processing and machine learning applications, for improved estimation and optimization performance.
On the signal processing side, we study how to take advantage of sparsity in the system response for improving the convergence rate of the least mean square (LMS) family of adaptive filters, which are derived from using gradient descent on the mean square error objective function. Based on iterative reweighting sparse signal recovery (SSR) techniques, we propose a novel framework for deriving a class of sparsity-aware LMS algorithms by adopting an affine scaling transformation (AST) methodology in the algorithm design process. Sparsity-promoting LMS (SLMS) and Sparsity-promoting Normalized LMS (SNLMS) algorithms are introduced, which can take advantage of, though do not strictly enforce, the sparsity of the underlying system if it already exists for convergence speedup. In addition, the reweighting--AST framework is applied to the conjugate gradient (CG) class of adaptive algorithms, which in general demonstrate a much higher convergence rate than the LMS family. The resulting Sparsity-promoting CG (SCG) algorithm also demonstrates improved convergence characteristics for sparse system identification. Finally, the proposed algorithms are applied to the real-world problem of acoustic feedback reduction encountered in hearing aids.
On the machine learning side, we investigate how to exploit the SSR techniques in gradient-based optimization algorithms for learning compact representations in nonlinear estimation tasks, especially with overparameterized models. In particular, the reweighting--AST framework is utilized in the context of estimating a regularized solution exhibiting some desired properties such as sparsity without having to incorporate a regularization penalty. The resulting algorithms in general have a weighted gradient term in the update equation where the weighting matrix provides certain implicit regularization capabilities. We start by establishing a general framework that can possibly extend to various regularizers and then focus on the sparsity regularization aspect. As notable applications of nonlinear model sparsification, we propose i) Sparsity-promoting Stochastic Gradient Descent (SSGD) algorithms for DNN compression and ii) Sparsity-promoting Kernel LMS (SKLMS) and Sparsity-promoting Kernel NLMS (SKNLMS) algorithms for dictionary pruning in kernel methods.