Distributed Inference and Data Sketching for High Dimensional Spatial Regression Models
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Distributed Inference and Data Sketching for High Dimensional Spatial Regression Models

Abstract

Modeling spatial data with flexible statistical models has become an enormously active area of research over the last decade in many disciplines. This work focuses on scaling MC computations for large-scale Bayesian inference in complexspatial models with adequate point estimation and uncertainty in inference and prediction. We first derive a three-step distributed Bayesian inferential framework for multivariate spatial generalized linear mixed effect models (MVspGLMMs) for big data. The proposed approach delivers fully model-based Bayesian parameter inference based on the construction of the “meta posterior” as the Wasserstein Barycenter of pseudo posterior distributions obtained from the partition of the data into independent subsets.

We introduce Bayesian data sketching for spatially varying coefficient regression models (SVCM) to obviate computational challenges presented by large numbers of spatial locations. To address the challenges of analyzing very large spatialdata, we compress spatially oriented data by a random linear transformation to achieve dimension reduction and conduct inference on the compressed data. We establish posterior contraction rates for estimating the spatially varying coefficients and predicting the outcome at new locations under the randomly compressed data model.

Finally, we present a novel idea that employs data sketching for distributed Bayesian inference. The proposed model takes advantage of parallel computation by performing Bayesian inference built on the aggregation of “sketched subsetposteriors". This approach addresses spatial variable selection in SVCMs with big data without developing fundamentally new models or algorithms or making use of any specialized computational hardware. The models are empirically illustrated by simulation experiments and by conducting a spatial analysis of remote sensed vegetation data.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View