- Main
Identification of rare-variant effect in complex human traits using whole-genome and whole- exome sequencing data
- ZHAN, LINGYU
- Advisor(s): Sul, Jae Hoon JHS
Abstract
For recent advancements in sequencing technologies, genetic information can be obtained from a large population at a relatively low cost. This provides an unprecedented opportunity to understand the role of genetic variability in association with complex human traits. One common strategy is to conduct genome-wide association studies to identify loci significantly associated with phenotypes of interest. However, the findings are usually limited to common variants with small effect sizes. Collectively, these identified loci can not fully explain the observed heritability, which is a problem commonly referred to as “the missing heritability.” To uncover this problem, human genetic research has shifted more focus to other types of genetic variations, including rare variants, which is further capacitated and facilitated by the next-generation sequencing technique. These rare mutations are believed to harbor large effect sizes and, therefore to be one of the major contributors to complex traits.Here, we describe our effort in analyzing the effect of rare variants in two complex human traits, Alzheimer’s Disease and Tourette Syndrome, followed by conducting a genome-wide association study on human blood lipids. Exploring large whole-genome sequencing datasets, we have first demonstrated that rare variants were strongly associated with Alzheimer’s Disease, neurofibrillary tangles, and age-related phenotypes within the endocytic pathway using a gene-set burden analysis framework. Subsequent gene-based analyses identified one AD-associated gene, ANKRD13D, and two e-Genes, HLA-A and SLC26A7. Leveraging bulk and scRNA-Seq data, we observed significant differential expression patterns in all three implicated genes. Secondly, we have explored a specific type of rare variants, de novo mutations, within Tourette Syndrome patients using a whole-exome sequencing trio dataset and identified a recurrent mutation in one gene, FBN2, previously implicated in TS. Comparing to the expected mutation rate, we demonstrated that the protein-truncating variants were enriched in probands. In addition, gene-set analysis displayed differential expression patterns across different tissue types and brain developmental stages. Lastly, we have performed a multi-population meta-analysis on blood lipid levels using electronic health records and genotyping information from the UCLA ATLAS database. We have observed genetic effects both specific to and shared across five different populations. Compared to previous large-scale GWASes, our results demonstrated consistent effect estimates while identifying one novel locus, rs72552763.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-