UC San Diego
Making sense of microbial populations from representative samples
- Author(s): Morton, James
- Advisor(s): Knight, Robin
- et al.
Microbiomes make up the vast majority of life on Earth, and we are just beginning to understand how to study them using high-throughput omics. However, analysis of microbial populations is complicated by numerous statistical challenges. We first outline these challenges in the context of phylogenetically aware methods, then focus on two concepts: the horseshoe effect and compositionality.
The horseshoe effect is a phenomenon that can lead to horseshoe patterns appearing in low dimensional representations of high dimensional data. For multiple decades, this pattern confounded ecologists when studying populations across multiple environmental conditions. Here, we show that the horseshoe effect arises from distance saturation, and can indicative of microbial population displacement. This phenomenon is illustrated across a soil study and a decomposition study.
In the second part of the thesis, we will discuss identifiability due to representative sampling, also known as compositionality. Statistical laws have shown that it's possible to obtain unbiased estimators for population proportions from representative samples. However, based on representative samples alone, it is not possible to determine which species abundances have grown or declined, since there is an infinite number of outcomes that can explain the same change in proportions. In the biological sciences, this problem is also known as the differential abundance problem, which is critical for determining which microbes have been altered across experimental outcomes. Here, we show that in order to estimate which species have been altered, the total population size needs to be estimated.
We present two workarounds to this problem that ultimately negating the need to estimate total population size. The first solution is using ratios, analogous to concentrations in chemistry. We will showcase the usefulness of this technique on a soils study and a cystic fibrosis study. The second solution is using ranks as a proxy to feature importances. Rather than attempting to compute absolute change, we can compute relative change, ultimately ranking which microbes have increased or decreased the most across different experimental conditions. We show how these ranks can be computed using multinomial regression and can facilitate reproducible findings in the context of oral microbial communities and atopic dermatitis.