Low-Dimensional Models for PCA and Regression
This thesis examines two separate statistical problems for which
low-dimensional models are effective.
In the first part of this thesis, we examine the Robust Principal Components Analysis
(RPCA) problem: given a
matrix $\datam$ that is the sum of a low-rank matrix $\lowopt$ and a sparse noise matrix
$\sparseopt$, recover $\lowopt$ and $\sparseopt$.
This problem appears in various settings, including image processing,
computer vision, and graphical models. Various polynomial-time heuristics
and algorithms have been proposed to solve this problem.
We introduce a block coordinate descent algorithm for this
problem and prove a convergence result. In addition, our iterative
algorithm has low complexity per iteration and empirically performs well
on synthetic datasets.
In the second part of this thesis, we examine a variant of ridge regression:
unlike in the classical setting where we know that the parameter of
interest lies near a single point, we instead only know that it lies near
a known low-dimensional subspace.
We formulate this regression problem as a convex optimization problem, and
introduce an efficient block coordinate descent algorithm for solving it.
We demonstrate that this ``subspace prior" version of ridge regression is
an appropriate model for understanding player effectiveness in basketball.
In particular, we apply our algorithm to real-world data and demonstrate
empirically that it produces a more accurate model of player effectiveness
by showing that (1) the algorithm outperforms existing approaches and (2)
it leads to a profitable betting strategy.