- Main
Bones and Biobanks: An Anthropological Approach to Investigating the Genetic Architecture of Complex Traits
- Cataldo-Ramirez, Chelsea
- Advisor(s): Weaver, Timothy D
Abstract
Recent advances in computational power, in conjunction with biobank development, have resulted in a proliferation of genotype-phenotype research. The predominant context of such research has been biomedical, with the aim of better understanding the genetic underpinnings of disease. Genome-wide association studies (GWAS) have become a standard method for identifying potentially causative genetic variants underlying health outcomes, and they provide the basis for developing precision healthcare models. However, more precisely clarifying the nuances of genetic and environmental influences on human phenotypes could be similarly impactful for biological anthropology and could improve our ability to test evolutionary hypotheses about human biological variation. Anthropologically relevant phenotypes, particularly those preserved in the fossil record, are understudied within the domain of genetic analyses. This is in part due to the lack of paired genotype-phenotype datasets large enough to overcome the power limitations imposed by the size of the human genome. Additionally, collecting skeletal measurements from medical imaging databases remains a tedious task, limiting the research utility of biobank-level data. In Chapter 1, I present an automated phenotyping pipeline for obtaining skeletal measurements from DXA scans and compare its performance to manually collected measurements. Results indicate that a subset of measurements can be reliably extracted from DXA scans, greatly expanding the utility of biobank-level data for biological anthropologists and medical researchers Overrepresentation of Europeans within Biobank repositories compounds issues for genotype-phenotype analyses, as these are typically the only data available for GWAS and polygenic score (PGS) development. This leads to reduced PGS applicability in non-Europeans, as it has been demonstrated that PGS performance declines when applied to groups outside of the GWAS sample. To combat this facet of the current limitations in GWAS-PGS research, in Chapter 2, I present a partial solution by increasing the utility of extant non-European participant data within the UK Biobank (UKB) by characterizing the genetic affinities of UKB participants who self-identify as Bangladeshi, Indian, Pakistani, “White and Asian” (WA), and “Any Other Asian” (AOA), towards creating a more robust South Asian sample size for future genetic analyses. Through a detailed investigation of the data quality, we are able to increase the sample size of the UKB South Asian group by 1,381 additional participants and hypothesize that, because of our efforts to reduce the effects of ascertainment bias built-in to the UKB SNP array, this sample will be better matched to other individuals of South Asian ancestry outside of the UKB data. Lastly, in Chapter 3, I assess the impact of a quality-centered approach to GWAS-PGS research on refining GWAS results for height (a canonical complex trait) and improving PGS performance for groups underrepresented in biobank repositories. Using UKB data, I assess the effects of including environmental covariates in height GWAS modeling, develop an environmentally adjusted PGS model, and compare its performance with that of a better powered, but traditionally modeled, population-matched PGS equation. Comparisons of GWAS results demonstrate increased detection of tag-SNPs when environmental covariates are included in the GWAS model, as well as differences in the distribution of SNP-effect sizes, and changes in variant-specific effect sizes. This suggests that adjusting for environmental influences reduces some of the noise that may be mitigating the ability to detect significant hits and that the significance of some variants may be falsely inflated when environmental confounders aren’t adjusted for, leading to misidentification of potentially causal genetic loci. Additionally, results demonstrate that PGS performance derived from environmentally adjusted GWAS summary statistics yield comparable predictive ability to PGS models developed using substantially larger training data. In addressing these hinderances to genotype-phenotype research, I aim to exemplify how an anthropological approach to investigating the genetic architecture of human height and related skeletal endophenotypes can generate insights beneficial to both biomedical and anthropological pursuits.