Skip to main content
eScholarship
Open Access Publications from the University of California

Bayesian Analysis in Problems with High Dimensional Data and Complex Dependence Structure

  • Author(s): Lee, Wayne Tai
  • Advisor(s): Kaufman, Cari G
  • et al.
Abstract

This dissertation is a compilation of three different applied statistical problems from the Bayesian perspective. Although the statistical question in each problem is different, a common challenge is the high dimensionality of the data and the complex dependence structure. These introduce challenges with standard statistical techniques and computational issues. For each problem, we address the statistical problem and resolve the computational issues in the implementation.

The first topic considers the problem of Bayesian inference for the location of the global extreme of a nonparametric regression function given noisy observations. We model the unknown function using a Gaussian Process (GP) prior. The unknown function may be high dimensional and sampling posterior realizations of the function can be computationally intensive. We introduce a novel algorithm that makes use of existing optimization routines to simultaneously sample and optimize the GP realizations in an efficient manner. We demonstrate our method on a spatial data sets with non-Gaussian observations as well as an application in astronomy in which the location of the extreme varies temporally.

The second topic constructs a Bayesian Hierarchical Model for surface wind fields over the globe. Surface winds are intrinsically multivariate with spatially heteroscedastic behavior over the globe. Our model is the first to model wind fields at the global scale over land and sea. Motivated by the geostrophic relationship, we fit a varying coefficient model to model wind fields using the pressure gradient. We apply our method on surface wind and sea level pressure products from a general circulation model. We will show that our model can produce realistic wind fields that resemble the wind fields from the climate model.

The third topic considers the problem of hierarchical multilabel classification (HMC) given existing single label classifier outputs. In our problem setting, multiple labels can be assigned to each subject and the assignments have to respect a given hierarchy. We want to utilize the existing local classifiers to give assignments consistent with the hierarchy. We rank each label assignment under a Bayesian framework by its probability of being positive given all local classifier outputs. We use this ranking to sequentially assign labels according to a cutoff. However, we also update the ranking after each assignment to ensure consistency. Our algorithm outperforms existing HMC methods in various simulation studies and on a disease diagnosis dataset with a large hierarchy with few independent observations.

Main Content
Current View