Novel Computational Methods for Bayesian Hierarchical Modeling in the Biomedical Domain
The recent growth in the availability of biomedical data promises to reshape healthcare by ushering in an era of personalized medicine where data can be used to diagnose and treat patients with pinpoint accuracy. Truly realizing this goal requires building statistical models that individually model patient variations such as age, sex, and genetic makeup, which leads to a combinatorial growth in the number of parameters in a model and noisy estimates. Fortunately, Bayesian hierarchical models, along with recent computational advances, provide a solution to this issue. By naturally embedding the hierarchical structure that many datasets exhibit into the model, these models allow for separate estimates that capture population-level variation and simultaneously avoid noise via regularization to a population mean.
In this thesis, we describe novel models and computational methods for Bayesian hierarchical modeling of biomedical data. We begin by describing our contributions to various areas of Bayesian modeling, along with the problems from our applied work that motivated these contributions.
Specifically, we describe our disease progression model, which hierarchically models patient disease trajectories. The model, which was motivated by our applied work on Alzheimer's Disease, utilizes I-splines to capture the characteristic monotonic shape of dementia disease trajectories, along with Dirichlet distributions over the coefficients of these I-splines to hierarchically model these trajectories. Next we describe our work on using the Givens Representation of orthogonal matrices to infer models with orthogonal matrix parameters, such as factor models, in a general Bayesian framework. We describe the innovations in our method along with our motivating hierarchical example based on the analysis of protein biomarkers of coagulopathic trauma patients. Next, we describe a mechanistic model of coagulopathy that relates clotting assay data to protein concentrations, effectively providing a fast and convenient way for clinicians to understand key protein markers involved in clotting.
Next we transition specifically to Hamiltonian Monte Carlo (HMC) and elucidate the connection between multiscale posterior distributions and the efficiency of HMC. We describe the issue of numerical stability inside HMC and present our implicit HMC algorithm for efficiently sampling non-Gaussian posterior distributions.
Lastly, we provide a summary of our contributions along with ideas for future work.