The microbiome is a community of microorganisms living in our bodies and throughout the environment. The genomic data researchers can extract from microbiomes, known as metagenomic data, can be used to predict traits about a host or environment. By identifying microbiome biomarkers associated with disease or health, researchers can develop better therapeutics for microbiome-associated diseases. However, metagenomic data is commonly affected by technical variables unrelated to the phenotype of interest, such as sequencing protocol, which can make it difficult to predict phenotype and find biomarkers of disease. Here, we evaluate methods to remove background noise due to technical variables unrelated to the phenotype of interest, such as sequencing protocol, and thereby improving our ability to find accurate biomarkers of human disease. Also crucial in understanding host health is elucidating the sources of their microbiomes, as it allows researchers to understand the dynamics behind how microbial communities form and how they respond to changing environments. In this work, we introduce a method to use metagenomic variants obtained from hundreds of species in microbiome data to perform source tracking, which is a method of estimating colonization sources for a sample of interest. These analyses shed light on phenomena like the colonization of the early infant gut microbiome, or spatial patterns in the ocean microbiomes around the world. Lastly, we analyze metagenomic data to understand how genetic diversity changes along the human gut on the species, strain and gene level. In sum, this work leverages the genomic information contained in our microbiomes to find universal patterns in microbiomes, allowing us to better understand the relationship between microbiome and phenotypes, the colonization sources of microbiomes, and also the colonization dynamics on the species and strain level.