Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Bayesian and Frequentist Methods for Uncertainty Quantification and Interpretation in Statistical and Machine Learning Models

No data is associated with this publication.
Abstract

Modern statistical and machine learning models excel at capturing complex non-linear relationships between outcomes and predictors, resulting in high accuracy. However, the complexity of these models can impede statistical inference and interpretation. This dissertation confronts and tries to overcome the emerging challenges presented by intricate models and big data.

One significant challenge involves modeling and statistical inference for zero-inflated semi-continuous data. Thus, in the first part, we develop a flexible Bayesian semi-parametric mixture model for zero-inflated skewed longitudinal data, generating credible intervals for not only the mean but also any quantiles of the parameters and predictions, aiding population inference of skewed data. The model is applied to evaluate how number of binge drinking episodes changes with neuromaturation using the National Consortium on Alcohol and Neuro-Development in Adolescence data.

On the other hand, credible or confidence intervals do not directly address a common question: can we identify a subset of predictions or parameters with true values exceeding a specific threshold with confidence? To tackle this, in the second part, we improve upon the inverse set estimation framework that estimates such sets by developing an approach with fewer assumptions and broader applicability to various data settings. We construct an excursion set map with probability guarantee on the North American Regional Climate Change Assessment Program data using the proposed method. Moreover, we use this new method to discover characteristics of in-patients at high risk for severe outcomes using University of California San Diego hospital data.

In the third part, we apply this inverse set estimation inference framework to quantify prediction model uncertainty and develop theories and algorithms that ensure non-conservative coverage rates for a single threshold in non-asymptotic settings in regression problems. We demonstrate the effectiveness of the constructed confidence sets for uncertainty quantification and interpretation in both simulate data and PhysioNet sepsis prediction data.

Main Content

This item is under embargo until June 27, 2025.