Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Nonparametric Methods for High-Dimensional Data Analysis

Abstract

Modern biomedical studies generate high-dimensional data, meaning that the number of variables collected is equal to or larger than the number of observations. Examples are numerous, including single-cell transcriptome analyses and clinical trials. The dimensions of these data often prevent the use of traditional statistical methods: the theory motivating their application is no longer applicable. New procedures must be developed for express use in these contexts to ensure trustworthy inference. This dissertation delves into two topics in high-dimensional statistics, the first being covariance matrix estimator selection. Motivated by the need to nonparametrically identify an optimal estimator of the covariance matrix for a given dataset, we propose a cross-validated estimator selection procedure and investigate its finite-sample and high-dimensional asymptotic performance. Our theoretical results, supported by empirical evidence, demonstrate that this procedure selects the optimal estimator asymptotically. Here, optimality is defined in terms of a Frobenius-norm-based risk. Applications are myriad, though we focus on improving exploratory analyses in single-cell transcriptome analyses. The second topic, born of the need to reliably uncover biomarkers that predict clinical trial patients’ response to novel therapies, is treatment effect modifier discovery. Treatment effect modifiers are pre-treatment covariates that influence the effect of a treatment on an outcome. While many approaches exist for identifying these effect modifiers in traditional asymptotic settings, few developments have been made for high-dimensional data. We propose a nonparametric framework for defining parameters measuring treatment effect modification, deriving accompanying estimators, and establishing these estimators’ asymptotic properties. We derive several such parameters and estimators using our methodology, and assess these estimators’ empirical performance through comprehensive simulation studies and real clinical trial data analyses.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View