High-dimensional statistics with systematically corrupted data
- Author(s): Loh, Po-LIng
- Advisor(s): Wainwright, Martin J
- et al.
Noisy and missing data are prevalent in many real-world statistical estimation problems. Popular techniques for handling nonidealities in data, such as imputation and expectation-maximization, are often difficult to analyze theoretically and/or terminate in local optima of nonconvex functions---these problems are only exacerbated in high-dimensional settings. We present new methods for obtaining high-dimensional regression estimators in the presence of corrupted data, and provide theoretical guarantees for the statistical consistency of our methods. Although our estimators also arise as minima of nonconvex functions, we show the rather surprising result that all stationary points are clustered around a global minimum. We describe extensions of our work to nonconvex regularizers, and demonstrate that an adaptation of composite gradient descent may be used to compute a global optimum up to statistical precision in log-linear time. Finally, we show how our corrupted regression methods may be applied to structure estimation for undirected graphical models, even when data are observed with systematic corruption. We derive new relationships between augmented inverse covariance matrices and the edge structure of discrete-valued graphs, and combine our population-level results with corrupted estimation methods to create new algorithms for graph estimation. We close with theoretical results and preliminary simulations in the domain of compressed sensing MRI.