Regression and optimal transport models for functional and surface-valued data
- Author(s): Liu, Xi
- Advisor(s): Petersen, Alexander
- et al.
There are various types of information, such as shapes and constrained curves, that can not be represented by a scalar variable or a simple Euclidean vector. For these nonstandard data types, their inherent constraints and geometric features can often be exploited to inform model development and data analysis. In analyzing these data, the usual Euclidean norm that is implicitly used for standard multivariate analyses must be replaced by suitable functional norms or metrics. In this dissertation, some statistical models and computational tools are developed in order to analyze information in functional and surface-valued data.
In Chapter 1, the effect of a smooth curve on a binary response is analyzed through a functional generalized linear model. The proposed method develops a novel approach under the assumption that the coefficient function $\beta(t)$ is truncated, i.e. one can expect that the curve predictor loses its influence after a timepoint in its domain. To achieve an estimate $\beta(t)$ that is simultaneously smooth and truncated, a structured variable selection method and localized B-spline expansion of $\beta(t)$ are leveraged to formulate a penalized log-likelihood function, where the nested group lasso penalty guarantees the sequential entering of B-splines and hence induces truncation in $\beta(t)$. Computationally, an optimization scheme is developed to compute the entire solution path effectively when varying the truncation tuning parameter from $\infty$ to 0. Unlike previous methods, which either directly penalized the value of the truncation point or resulted in a nonconvex optimization problem, the proposed approach utilizes a nested group lasso penalty and leads to a convex optimization problem. By expressing the nonsmooth lasso penalty in its dual formulation, it can be subsequently smoothed so that the objective function can be optimized by an accelerated gradient descent algorithm. Theoretically, the convergence rate of the estimate and consistency of the truncation point estimation are derived under suitable smoothness assumptions. The proposed method is demonstrated with an application involving the effects of blood pressure curves in patients who suffered a spontaneous intracerebral hemorrhage.
In Chapter 2, a set of computational tools is developed to perform inference for a regression model where density curves appear as functional response objects with vector predictors. For such models, inference is key to understand the importance of density-predictor relationships, and the uncertainty associated with the estimated conditional mean densities, defined as conditional Fr