Biologically-interpretable machine learning for microbial genomics
Advancements in high-throughput biotechnology have enabled the unprecedented detailing of microbial diversity. Researchers now have the opportunity to understand evolution as a function of the genomic, transcriptomic, metabolic, and physiological variables underlying differential fitness. A comprehensive understanding of microbial evolution will help eradicate infectious disease, engineer robust synthetic circuits, and tackle environmental issues facing our planet. While the information revolution in biology has enabled researchers to simultaneously measure various biomolecules at low costs, a major bottleneck remains in translating these datasets to actionable knowledge. In this proposal, we aim to address the challenge of biological data analysis through development of computational methods that leverage both the predictive power of machine learning (ML) and the biological interpretability of mechanistic genome-scale models. First, classical ML is applied to thousands of drug-tested Mycobacterium tuberculosis genome sequences to recover 33 known genetic determinants of antimicrobial resistance (AMR) and 24 novel candidates. Second, a biochemically-interpretable ML model is developed and applied to the same genomics dataset to reveal metabolic mechanisms of AMR. Third, independent component analysis is applied to a multi-omics dataset of E. coli laboratory evolution to reveal multi-scale adaptation principles governing causal mutations. In conclusion, this dissertation broadened our understanding of microbial evolution through development and application of interpretable ML models.