Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Bayesian Phylogenetic Inference for Viral Dispersal Process

Abstract

Phylogenies have been increasingly used in studying the spatial and temporal dynamics of infectious disease outbreaks; this phylodynamic approach encompasses a suite of methods for inferring various aspects of pathogen biology, including: (1) patterns of variation in demography through time; (2) the history of geographic spread either over continuous space or among a set of discrete-geographic areas, and; (3) the interaction between demography and geographic history.

This dissertation focuses on the discrete-geographic phylodynamic methods, which have been used extensively to understand the spatial and temporal spread of infectious disease outbreaks, and have played a central role for inferring key aspects of the COVID-19 pandemic, such as the geographic location and time of origin of the disease, the rates and geographic routes by which it spread, and the efficacy of various mitigation measures to limit its geographic expansion. These phylodynamic methods adopt an explicitly probabilistic approach that model the process of pathogen dispersal among a set of discrete-geographic areas (e.g., cities, states, countries) over the branches of the pathogen phylogeny. The observations include the times and locations of pathogen sampling, and the genomic sequences of the sampled pathogens. These data are used to estimate the parameters of discrete-geographic phylodynamic models, which include a dated phylogeny of the pathogen samples, the average dispersal rate among all areas, and the relative dispersal rates (the dispersal rate between each pair of areas). Inference under these models is performed within a Bayesian statistical framework.

Although these phylodynamic models provide a powerful tool for understanding pathogen spread, they contain many parameters that must be inferred from minimal information (i.e., the single geographic area in which each pathogen occurs). As a result, inferences under these models are inherently sensitive to our prior assumptions about the model parameters. In Chapter 1, I (and co-authors) demonstrate that the priors on the average dispersal rate and the number of dispersal routes, implemented as defaults in BEAST (and assumed in the vast majority of empirical studies) make strong and biologically unrealistic assumptions about the underlying dispersal process. I present empirical evidence demonstrating that these priors are strongly disfavored by real data, and that these priors strongly (and adversely) distort central conclusions of epidemiological studies, including the importance of dispersal routes for the spread of pathogens and the ancestral area in which a given epidemic originated. I conclude this chapter by offering strategies and introducing an interactive web utility, PrioriTree, to help researchers avoid these issues.

Chapter 2 presents PrioriTree in detail. This utility is designed to help researchers follow the strategies I explored and recommended in Chapter 1 more easily. Specifically, it provides a suite of functions to allow users to interactively set up BEAST discrete-geographic phylodynamic analyses with visualized priors, and specify BEAST analyses and and summarize the results for assessing prior sensitivity and model fit. Apart from generating BEAST analysis scripts and figures summarizing the analyses, PrioriTree also dynamically generates a description of the associated methods to facilitate transparent and explicit communications in empirical bio- geographic studies regarding what exact priors are used, how they are chosen, and how their impacts are assessed, eventually enhancing the reproducibility of biogeographic studies.

Virtually all discrete-geographic phylodynamic studies are based on models that assume that pathogen dispersal dynamics—including the average and relative rates of pathogen dispersal—remain constant over time. However, the dispersal dynamics of emerging pathogens (e.g., SARS-CoV-2) may have been impacted by the initiation (or alteration or cessation) of intervention measures. Moreover, pathogen dispersal processes may inevitably vary over time due to temporal variation of human travel dynamics even without the impact of intervention measures.

In Chapter 3, I (and co-authors) (1) extend discrete-geographic phylodynamic models to allow both the average and relative dispersal rates to vary independently across pre-specified time intervals; (2) enable stochastic mapping under these interval-specific models to infer the number and timing of pathogen dispersal events between areas, and; (3) develop posterior- predictive statistics to assess the absolute fit of discrete-geographic phylodynamic models to empirical datasets. I first validate the new methods using simulations, and then apply them to a SARS-CoV-2 dataset from the early phase of the COVID-19 pandemic. These analyses reveal that: (1) under simulation, failure to accommodate interval-specific variation in the study data will severely bias parameter estimates; (2) in practice, the interval-specific models can significantly improve the relative and absolute fit to empirical data; and (3) the increased realism of these interval-specific models provides qualitatively different inferences regarding key aspects of the COVID-19 pandemic—revealing significant temporal variation in global viral dispersal rates, viral dispersal routes, and the number of viral dispersal events between areas—and alters interpretations regarding the efficacy of intervention measures to mitigate the spread of SARS-CoV-2.

Together, this dissertation serves as a careful and thorough exploration of various aspects of the phylodynamic methods for inferring pathogen dispersal process, and represents an ad- vance in the conceptual and statistical framework of Bayesian phylogenetic inference.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View