Concurrent Object Regression and Single Index Fréchet Model for Metric Space Valued Data
- Bhattacharjee, Satarupa
- Advisor(s): Mueller, Hans-Georg
Abstract
In the era of modern data science, it has become increasingly ubiquitous to observe complex data structures arising in areas such as biological or social sciences, that is non-Euclidean, and specifically do not lie in a vector space. These complex and big data structures have attracted a great deal of attention both within and outside of Statistics. A commonly encountered and important scenario is a regression framework, where the response variables take values in a metric space with or without any algebraic structure, and where only the pairwise distances between the observed data are available. Examples of such random object data include covariance matrices, graph Laplacians of networks, and univariate probability distribution functions among others.
Since the data are metric space valued, many classical notions of Statistics such as the definition of sample or population mean as an average or expected value do not apply anymore. However, the complexity of data objects can be most elegantly handled using notions of geometry, which help to determine how to represent the data structures and quantify relationships among objects in a sample. Fréchet mean and variance provide a way of obtaining mean and variance for metric space-valued random variables that lie in abstract spaces devoid of algebraic structures and operations.This dissertation concerns the analysis of random object data in the context of building statistically justified regression methodologies to quantify the dependence of a general metric space valued response variables on Euclidean predictors, by modeling the conditional Fréchet means appropriately.
In the first chapter, a new concurrent regression model is proposed to characterize the time-varying relation between object responses and real-valued predictors, where concepts from Fréchet regression are employed. Concurrent regression has been a well-developed area of research for Euclidean predictors and responses, with many important applications for longitudinal studies and functional data. However, there is no such model available so far for general object data as responses. We develop generalized versions of both global least squares regression and locally weightedleast squares smoothing in the context of concurrent regression for responses that are situated in general metric spaces and propose estimators that can accommodate sparse and/or irregular designs. Consistency results are demonstrated for sample estimates of appropriate population targets along with the corresponding rates of convergence.
In the second chapter, a single-index model is developed for regression models where the random object responses are coupled with multivariate Euclidean predictors. Single index models provide an effective dimension reduction tool in regression, especially for high dimensional data, by projecting the multivariate predictor onto a univariate direction vector. While Fréchet regression has proved useful for modeling the conditional mean of such random objects given Euclidean vectors, it does not provide for regression parameters such as slopes or intercepts, since the metric space-valued responses are not amenable to linear operations. As a consequence, distributional results for Fréchet regression have been elusive. We show here that the parameters that define the single index projection vector can be used to substitute for the inherent absence of parameters in Fréchet regression. Specifically, we derive the asymptotic distribution of suitable estimates of these parameters, which then can be utilized to test linear hypotheses for the parameters, subject to an identifiability condition. Consistent estimation of the link function of the single index Fréchet regression model is obtained through local Fréchet regression. We demonstrate the finite sample performance of estimation and inference for the proposed single index Fréchet regression model through simulation studies, including the special cases where responses are probability distributions and graph adjacency matrices. The method is illustrated for resting-state functional Magnetic Resonance Imaging (fMRI) data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study.