Non-Gaussian Component Analysis
- Author(s): Bean, Derek
- Advisor(s): Bickel, Peter J.
- El Karoui, Noureddine
- et al.
Extracting relevant low-dimensional information from high-dimensional data is a common pre-processing task with an extensive history in Statistics. Dimensionality reduction can facilitate data visualization and other exploratory techniques, in an estimation setting can reduce the number of parameters to be estimated, or in hypothesis testing can reduce the number of comparisons being made. In general, dimension reduction, done in a suitable manner, can alleviate or even bypass the poor statistical outcomes associated with the so-called ``curse of dimensionality.''
Statistical models may be specified to guide the search for relevant low-dimensional information or ``signal'' while eliminating extraneous high-dimensional ``noise.'' A plausible choice is to assume the data are a mixture of two sources: a low-dimensional signal which has a non-Gaussian distribution, and independent high-dimensional Gaussian noise. This is the Non-Gaussian Components Analysis (NGCA) model. The goal of an NGCA method, accordingly, is to project the data onto a space which contains the signal but not the noise.
We conduct a comprehensive review of NGCA. We analyze the probabilistic features of the NGCA model and elucidate connections to similar well-known models and methods in the literature, including a hitherto-unseen and surprising connection to a set of models proposed by Cook in the context of dimension-reduction in regression. We review the literature on NGCA, catalogue existing NGCA methods, and compare them to the method proposed in Chapter 2.
We also propose and analyze a new NGCA method based on characteristic functions called CHFNGCA. We show CHFNGCA is, under mild moment conditions on the non-Gaussian sources, consistent and asymptotically normal; the latter property has not been demonstrated for any other NGCA method in the literature. We conclude by highlighting areas for future work.
The proof of all stated propositions, lemmas and theorems are contained in Appendices A and B.