Search

Scholarly Works (6 results)

Sort By:

Thesis
Peer Reviewed

Bayesian Nowcasting of Pathogen Transmission Dynamics

Goldstein, Isaac
Advisor(s): Minin, Volodymyr M

UC Irvine Electronic Theses and Dissertations (2024)

A central task in statistical analyses of infectious disease surveillance data is nowcasting transmission dynamics, understanding how transmissible a pathogen is in the present day. One way to summarize transmissibility is through the effective reproduction number, the average number of individuals an individual infected today would subsequently infect under current conditions. When the effective reproduction number is above one, an outbreak is expected to grow, the reverse is true when it is below one. Estimating the effective reproduction number from observed data is non-trivial, as epidemics are only ever partially observed, and existing data streams are subject to ascertainment biases that must be taken into account. Ideally, epidemics would be modeled as a partially observed stochastic process, but in practice this is computationally prohibitive. In this dissertation, we develop statistical models for estimating the effective reproduction number from a variety of data sources using a series of computationally tractable approximate models of epidemics. In particular, we develop models for estimating the effective reproduction number from case and test data, from pathogen genome concentrations collected from wastewater in large populations, and pathogen genome concentrations collected from wastewater in small populations. We compare our methods against state-of-the-art methods in simulation studies, and apply our methods to estimate the effective reproduction number of SARS-CoV-2 in California from 2020 to 2022.

Cover page: Bayesian Nowcasting of Pathogen Transmission Dynamics

Creative Commons 'BY' version 4.0 license

Thesis
Peer Reviewed

Inference and Forecasting Using Infectious Disease Surveillance Data

Bayer, Damon
Advisor(s): Minin, Volodymyr M

UC Irvine Electronic Theses and Dissertations (2023)

Statistical modeling of infectious disease data is among the oldest applications of statistics. Today, it is an increasingly relevant application of research, due to globalization that enables diseases to spread further and faster, as well as the abundance of relevant data from electronic surveillance systems, seroprevalence studies, and genetic sequencing of pathogens. In this work, we develop novel statistical methods to combine varied data sources to improve both inference and forecasting. First, we work with data from assay validation studies and active surveillance studies to develop confidence intervals for prevalence estimates from complex surveys with imperfect assays. In this complicated setting, there are no established competitive methods, and ours exhibits at least nominal coverage. In addition, we apply our model in simplified cases where competitors exist and demonstrate desirable properties. Next, we develop a semi-parametric Bayesian compartmental model that effectively integrates passively collected time series of diagnostic tests and mortality data, as well as actively collected seroprevalence data. We emphasize retrospective inference and evaluate the utility of each data stream in the context of short-term forecasting. Finally, we focus on healthcare demand forecasting during epidemic surges of pathogen variants capable of immune escape. We build upon our Bayesian compartmental model to incorporate time series of cases, hospitalizations, ICU admissions, deaths, and genetic sequence counts. We show that using genetic information leads to superior forecasting performance, compared to traditional models. Throughout each project, we employ our methods to analyze a variety of COVID-19 data sets at the county, state, and national levels.

Cover page: Inference and Forecasting Using Infectious Disease Surveillance Data

Thesis
Peer Reviewed

Hierarchical Bayesian Modeling, Model Selection, and Optimal Experimental Design for Hematopoiesis

UC Irvine Electronic Theses and Dissertations (2023)

Hematopoiesis is the complex mechanism by which hematopoietic stem cells produce a variety of functional blood cells through multiple stages of differentiation. Since the numbers of various blood cell types need to be maintained in homeostasis, with occasional short-lived departures from it, hematopoiesis must have multiple regulatory mechanisms. However, these are still not fully understood. Although many mathematical models of hematopoiesis regulation have been proposed, more work on developing methods for fitting and interpreting experimental data that integrate statistical and mechanistic models is needed. Here, using a new chemical reaction ordinary differential equation model of negative feedback regulation in hematopoiesis, we develop a scalable, hierarchical Bayesian framework using a latent variables approach that takes cross heterogeneity into account and infers division, differentiation, and feedback regulation parameters of hematopoietic cells. We designed and performed an experiment where mice were injected with the chemotherapy drug 5-FU that reduces the number of stem and progenitor cells by blocking DNA synthesis and repair, to perturb the hematopoietic equilibrium. In order to count the number of cells in the BM, the mouse must be sacrificed. Therefore, each mouse can contribute their cell count data at a one time point only. To work with partially observed datasets, we use an ODE model to interpolate the noisy means of the experimental cell count data (the missing data is inferred). We evaluate the performance of the new model and inferential framework using synthetic data and find that we are able to distinguish between models that account for biological variation and models that include only technical variation/measurement error. We find that the experimental data are best described by a hierarchical model in which the hematopoiesis model parameters are allowed to vary among mice, suggesting the presence of significant biological variability. Our experimental data and the model show that, after perturbation, hematopoiesis returns to equilibrium via damped oscillations, with a notable overshoot of depleted cell counts that happens shortly after the system is perturbed from equilibrium. We then explore an alternative way of accounting for data heterogeneity by employing stochastic differential equations instead of letting division and feedback regulation parameters vary across mice. Computational tractability of the likelihood in a Bayesian inference framework is achieved by using the linear noise approximation (LNA) derived from the chemical Langevin equation. This enables us to approximate the joint posterior density for the hematopoietic rate value parameters and missing data. We evaluate the performance of the new Bayesian LNA model framework and compare it to the Bayesian ODE model frameworks we developed previously. We find that the new framework can further improve the out-of-sample prediction, as indicated by leave-one-out cross-validation. We identify limitations of inference for our LNA model when multiple sources of biological and technical variation of the dataset are significant and then develop a procedure for overcoming them. Finally, we investigate experimental designs that optimize the amount of information gained about the model parameter and missing data. We employ a new adversarial approach that uses a game theory framework for experimental design without the need for the calculation of the posterior probability distributions. This enables us to overcome the cost of traditional Bayesian optimal design methodology that requires repeated approximations of the posterior distributions, which are expensive to generate and prohibitively costly for high dimensional models.

Cover page: Hierarchical Bayesian Modeling, Model Selection, and Optimal Experimental Design for Hematopoiesis

Article
Peer Reviewed

Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission

UC Irvine Previously Published Works (2022)

Identifying host factors that influence infectious disease transmission is an important step toward developing interventions to reduce disease incidence. Recent advances in methods for reconstructing infectious disease transmission events using pathogen genomic and epidemiological data open the door for investigation of host factors that affect onward transmission. While most transmission reconstruction methods are designed to work with densely sampled outbreaks, these methods are making their way into surveillance studies, where the fraction of sampled cases with sequenced pathogens could be relatively low. Surveillance studies that use transmission event reconstruction then use the reconstructed events as response variables (i.e., infection source status of each sampled case) and use host characteristics as predictors (e.g., presence of HIV infection) in regression models. We use simulations to study estimation of the effect of a host factor on probability of being an infection source via this multi-step inferential procedure. Using TransPhylo-a widely-used method for Bayesian estimation of infectious disease transmission events-and logistic regression, we find that low sensitivity of identifying infection sources leads to dilution of the signal, biasing logistic regression coefficients toward zero. We show that increasing the proportion of sampled cases improves sensitivity and some, but not all properties of the logistic regression inference. Application of these approaches to real world data from a population-based TB study in Botswana fails to detect an association between HIV infection and probability of being a TB infection source. We conclude that application of a pipeline, where one first uses TransPhylo and sparsely sampled surveillance data to infer transmission events and then estimates effects of host characteristics on probabilities of these events, should be accompanied by a realistic simulation study to better understand biases stemming from imprecise transmission event inference.

Cover page: Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission

Article
Peer Reviewed

Semi-parametric modeling of SARS-CoV-2 transmission using tests, cases, deaths, and seroprevalence data.

UC Irvine Previously Published Works (2023)

Mechanistic models fit to streaming surveillance data are critical to understanding the transmission dynamics of an outbreak as it unfolds in real-time. However, transmission model parameter estimation can be imprecise, and sometimes even impossible, because surveillance data are noisy and not informative about all aspects of the mechanistic model. To partially overcome this obstacle, Bayesian models have been proposed to integrate multiple surveillance data streams. We devised a modeling framework for integrating SARS-CoV-2 diagnostics test and mortality time series data, as well as seroprevalence data from cross-sectional studies, and tested the importance of individual data streams for both inference and forecasting. Importantly, our model for incidence data accounts for changes in the total number of tests performed. We model the transmission rate, infection-to-fatality ratio, and a parameter controlling a functional relationship between the true case incidence and the fraction of positive tests as time-varying quantities and estimate changes of these parameters nonparametrically. We compare our base model against modified versions which do not use diagnostics test counts or seroprevalence data to demonstrate the utility of including these often unused data streams. We apply our Bayesian data integration method to COVID-19 surveillance data collected in Orange County, California between March 2020 and February 2021 and find that 32--72\% of the Orange County residents experienced SARS-CoV-2 infection by mid-January, 2021. Despite this high number of infections, our results suggest that the abrupt end of the winter surge in January 2021 was due to both behavioral changes and a high level of accumulated natural immunity.

Cover page: Semi-parametric modeling of SARS-CoV-2 transmission using tests, cases, deaths, and seroprevalence data.

Article
Peer Reviewed

Determinants of exposure to Aedes mosquitoes: A comprehensive geospatial analysis in peri-urban Cambodia

UC Irvine Previously Published Works (2023)

Aedes mosquitoes are some of the most important and globally expansive vectors of disease. Public health efforts are largely focused on prevention of human-vector contact. A range of entomological indices are used to measure risk of disease, though with conflicting results (i.e. larval or adult abundance does not always predict risk of disease). There is a growing interest in the development and use of biomarkers for exposure to mosquito saliva, including for Aedes spp, as a proxy for disease risk. In this study, we conduct a comprehensive geostatistical analysis of exposure to Aedes mosquito bites among a pediatric cohort in a peri‑urban setting endemic to dengue, Zika, and chikungunya viruses. We use demographic, household, and environmental variables (the flooding index (NFI), land type, and proximity to a river) in a Bayesian geostatistical model to predict areas of exposure to Aedes aegypti bites. We found that hotspots of exposure to Ae. aegypti salivary gland extract (SGE) were relatively small (< 500 m and sometimes < 250 m) and stable across the two-year study period. Age was negatively associated with antibody responses to Ae. aegypti SGE. Those living in agricultural settings had lower antibody responses than those living in urban settings, whereas those living near recent surface water accumulation were more likely to have higher antibody responses. Finally, we incorporated measures of larval and adult density in our geostatistical models and found that they did not show associations with antibody responses to Ae. aegypti SGE after controlling for other covariates in the model. Our results indicate that targeted house- or neighborhood-focused interventions may be appropriate for vector control in this setting. Further, demographic and environmental factors more capably predicted exposure to Ae. aegypti mosquitoes than commonly used entomological indices.