Phylogenetic Factor Analysis and Natural Extensions
- Author(s): Tolkoff, Max Ryan;
- Advisor(s): Suchard, Marc A;
- et al.
Frequently in evolutionary biology we are interested in how different quan- titative traits of an organism evolve together over time. In order to properly understand these relationships, we need to adjust for the shared evolutionary history of these organisms. Previous methods rely on modeling quantitative traits as undergoing a high dimensional, correlated multivariate Brownian diffusion (MBD) down a phylogenetic tree. In order to present a more nuanced approach to understanding these trait relationships, we develop a phylogenetic factor analysis (PFA) model on these quantitative traits by assuming that the relatively low dimensional factors, rather than the traits themselves, undergo independent Brownian diffusion down a phylogenetic tree. Additionally, we develop a novel method for inferring the marginal likelihood estimates of probit models which allows for accurate model selection in the presence of discrete data. We demonstrate using Bayes factors that this PFA model is a more probable model than the MBD model. We then continue to develop this PFA method by relying on a shrinkage prior on the loadings matrix. This shrinkage prior consists of a normal prior with a global and local standard deviation component, and a half cauchy prior on these standard deviation components. With this we can distinguish trait relationships which would otherwise remain hidden using a standard normal prior on the loadings. Lastly, when we wish to incorporate a large number of taxa in our MBD and PFA models, obtaining a complete suite of measurements is difficult. These missing measurements make these analyses relatively inefficient and difficult to use for larger problems. To rectify this, we develop a method by which we can evaluate the likelihood of an MBD model by analytically integrating out missing values, and then apply similar principles to integrate out the factors in a PFA model. These innovations allow for massive speedup in our inference.