Methods for Bayesian Inference and Data Assimilation of Soil Biogeochemical Models
Improving mechanistic understanding and prediction capabilities of long-term organic soil system dynamics is a high priority for biogeochemists, soil scientists, and climate policy researchers who aim to reduce uncertainty regarding changes in the global trajectory of soil carbon sequestration and emissions. While popular "black box" machine learning and classical statistics approaches including XGBoost, LSTM, and ARIMA have been demonstrated to be effective and efficient for time series forecasting, they are not designed to inform on the physical processes underlying a data generating process. Instead, we can turn to soil biogeochemical models, also known as soil carbon models, to jointly predict and falsify soil dynamics. Soil biogeochemical models are formulated to simulate the microbe-driven movement of organic elements between terrestrial pools of soil organic matter. As dynamical systems, they provide an avenue to mathematically translate and formalize hypotheses about soil system mechanics into parameterized differential equations.
If we assume that soil biogeochemical models superior at describing empirical soil measurements more closely represent the actual data generating processes, we can surmise that models better at fitting data under biologically realistic parameter regimes are more useful for forecasting purposes; mismatch to data can suggest a need to reparameterize or restructure a model. However, the determination of statistical frameworks that can rigorously assess the capability of models to assimilate observations under compute time and resource limitations remains an open and unsettled issue. On this note, we will first expound on soil biogeochemical models in greater detail and motivate the use of Bayesian statistical methods as a means of model fitting and parameter inference while incorporating expert uncertainty and beliefs across the first two chapters of this interdisciplinary dissertation. Subsequently, we will demonstrate the use of a contemporary inference algorithm to assimilate two models with the same data set and then compare their goodness-of-fit quantified with Bayesian information criteria and cross-validation metrics. Finally, in the remaining chapters, we will trial the ability of two novel Bayesian soil biogeochemical model inference schemes offering improved computational efficiency to recover observations and parameter values of known synthetic data generating processes and evidence algorithm functionality worthy of future exploration.