Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods
Open Access Publications from the University of California

## Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods

• Author(s): Tan, Yan Shuo;
• Vershynin, Roman
• et al.
Abstract

The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace $E$ in $\mathbb{R}^n$ so that data points projected onto $E$ follow a non-gaussian distribution. Although this is an appropriate model for some real world data analysis problems, there has been little progress on this problem over the last decade. In this paper, we attempt to address this state of affairs in two ways. First, we give a new characterization of standard gaussian distributions in high-dimensions, which lead to effective tests for non-gaussianness. Second, we propose a simple algorithm, \emph{Reweighted PCA}, as a method for solving the NGCA problem. We prove that for a general unknown non-gaussian distribution, this algorithm recovers at least one direction in $E$, with sample and time complexity depending polynomially on the dimension of the ambient space. We conjecture that the algorithm actually recovers the entire $E$.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.