Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Robust PCA and Robust Linear Regression via Sparsity Regularization

Abstract

Robustness to outliers is of paramount importance in data analytics. However, many data analysis tools are not robust to outliers due to their criterion of minimizing the sum of squared errors. One essential characteristic of the outliers is that they are sparse. A significant contribution of this thesis is the development of a novel framework that directly uses genuine L0-`norm' to enforce the sparseness of the outliers, while uses L1-norm to address the inlier noise, and development of algorithms with better recovery guarantees than the state-of-the-art L1 relaxation approach.

We first study this framework in the Robust Linear Regression setting and propose an Algorithm for Robust Outlier Support Identification (AROSI) to minimize a novel objective function. The proposed algorithm is guaranteed to converge in a finite number of iterations to a local optimum. Under certain conditions, AROSI is guaranteed to have exact recovery when only sparse outliers are present. Furthermore, the estimation error is bounded when there is dense inlier noise as well. It can also identify the outliers without any false alarm.

Then, we study this framework in the Robust Principal Component Analysis (PCA) setting and propose a novel objective that additionally uses nuclear norm to capture the low-rank matrix. The associated algorithm, termed Sparsity Regularized Principal Component Pursuit (SRPCP), is shown to converge in a finite number of iterations to a local optimum. Under certain conditions, SRPCP is guaranteed to have exact recovery in the presence of sparse outliers only, and bounded error in the noisy case. It can also identify the outliers without any false alarm. An important byproduct of our analysis is the result that, the widely used Principal Component Pursuit (PCP) method and its missing entry version are actually stable to dense inlier noise. We further propose an Iterative Reweighted SRPCP method that uses log-determinant to capture the low-rank matrix instead, which also converges and achieves even better performance.

To better enforce the low-rankness, we transform the Robust PCA objective into a novel Robust Sparse Linear Regression objective with equivalent global optima guarantee. Then we propose a concise Sparse Bayesian Learning method to solve this new objective, and the method is shown to encourage the solution to be low-rank and the outliers to be sparse. To further utilize the sparsity pattern information of the outliers in the Robust PCA problem, a modification of the above Bayesian method is proposed and analyzed. Empirical studies demonstrate the superiority of the proposed methods over existing state-of-the-art methods.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View