Uncertainty generally exists in various research stages, including experimentation, model formulation, input specification, parameter estimation, and predictions. Therefore, quantifying the uncertainty through statistical inference is essential for different disciplines, including physics, chemistry, biology, geography, ecology, epidemiology, and power systems management. The Gaussian process (GP) model is a suitable choice for predicting nonlinear relationships in different applications, due to the availability of uncertainty assessment and statistical efficiency. However, its application to large-scale datasets is limited by computational challenges, primarily because computing the inverse covariance matrix and the determinant of the covariance matrix in the likelihood function requires O(n^3) operations, where n is the number of observations. To address this issue, we develop fast GP models by utilizing the connection between GPs with Matern kernels of 1D input and the dynamic linear model, which enables the use of Kalman filter and Rauch-Tung-Striebel smoother for theoretically reducing the computational complexity. The connection enables us to develop fast algorithms for changepoint detection, predicting spatio-temporal data or functional data with multi-dimensional inputs. We focus on two main objectives for applications. First, motivated by assessing the COVID-19 pandemic since 2020, we introduce two new models for patient-level and regional-level detection. The first model aims to detect changepoints in patients' biomarker data efficiently and identify the COVID-19 infection date for each patient in the dialysis facilities. The second model aims to detect the transmission dynamics for the COVID-19 pandemic in more than 3,000 US counties and update the analysis on a weekly basis. The second objective of the thesis is to develop efficient computation of the GP model for spatio-temporal data and functional data with high-dimensional inputs. We develop fast algorithms for latent factor processes with an orthogonal factor loading matrix, particularly for scalable computations on large, incomplete lattice datasets. We further study the GP models for predicting computer simulations of power systems with high-dimensional inputs and outputs and outline a few future research goals.
Chapter 1 introduces the background of the GP model and the connection to the dynamic linear model or linear state space model. We also review the Kalman filter and Rauch-Tung-Striebel smoother for efficiently computing dynamic linear models. Chapters 2 and 3 focus on applying the GP model in COVID-19 research. Chapter 2 introduces the sequential Kalman filter online changepoint detection (SKFCPD) algorithm for detecting changes in temporally correlated data modeled by GP, which has linear computational complexity at each time step without any approximation. One challenge is to include a large number of the covariates in the model, whereas a large proportion of the covariates are missing. We propose a two-step approach that integrates the classification methods with the SKFCPD algorithm for fast and accurate detection of COVID-19 infection dates based on patients' biomarker data. Chapter 3 introduces a new mathematical model that integrates the information of regional cases and death counts. By utilizing the GP model for quantifying the uncertainties in predictions, our model can provide real-time, robust estimation with uncertainty quantification of COVID-19 transmission dynamics for over 3,000 U.S. counties. Chapter 4 proposes the Gaussian orthogonal latent factor (GOLF) model for efficient computations on large correlated spatial, spatiotemporal, and functional data. Chapter 5 concludes the previous chapters, explores the application of the GP model on large-scale power systems, and discusses future research directions.