The use of whole genome sequencing in infectious disease diagnostics generated an unprecedented amount and resolution of information. Large-scale sequencing of pathogens requires scalable methods in species identification, outbreak clustering, virulence phenotyping, antimicrobial resistance profiling, and epidemic modeling.
This dissertation presents a new approach in defining species membership using a pangenome framework explicitly applied to the whole genome sequences of the genus Hungatella which effectively identified a misclassified reference strain. Genomic epidynamics is a phylogenetic free approach in epidemiological inference, particularly the disease transmission parameter reproductive number (R). This approach offers a scalable process in elucidating heterogeneous transmission of genomic variants of SARS-CoV-2. Genomic epidynamics bridges pathogen population genomics and epidemic modeling. A genome-first approach to antimicrobial resistance definition combines automated machine learning rank resistance genes and phenotypic data thru genomic MICs. This approach was applied to a multidrug-resistant serotype of Salmonella enterica subsp. enterica serovar Dublin (S. Dublin). Machine learning-based approach to genome-wide association study revealed allelic variants of porA in Campylobacter jejuni leading to an abortive phenotype when the organism is invasive from the gut and resides in the reproductive system.