With more efficient genotyping technologies and lower sequencing cost, genome-wide association studies (GWAS) have been broadly applied to many complex human traits. However, people of European descent remain the most prominent subjects in genetic research and other ethnic groups might not fully benefit from the effort of GWAS. In addition to expanding GWAS to include more diverse populations, new approaches that enable trans-ethnic or multi-ethnic analyses in GWAS will also be a crucial stepping stone for future genetic studies. To address this disparity and knowledge gap, we developed and applied a new approach, cross-population allele screen (CPAS) prior to GWAS, to identify population-specific variants that are associated with complex traits or diseases (Chapter 2). In our study, we identified novel genetic variants that are associated with serum triglycerides (TGs), high-density lipoprotein cholesterol (HDL-C), and body mass index (BMI), exhibiting differential allele frequencies between Finns and Mexicans. Notably, one of the novel TGs-associated genes, SIK family kinase 3 (SIK3), harbors an Amerindian-specific common risk variant (allele frequency=18% in Mexicans), which is not observed in other continental populations, and the risk allele carriers also exhibit higher serum TG levels after a high-fat meal. In addition, this locus displays a signal of positive selection in Mexicans, suggesting that a delayed serum lipid clearance might have been evolutionally advantageous for ancient Amerindian people.
While GWAS have uncovered many trait-associated loci, translating GWAS results to actionable medical information remains nontrivial due to the difficulty of pinpointing the true causal variants and genes. To understand the molecular mechanism of GWAS variants, many functional genomic approaches have been developed. In this dissertation, I will present two computational methods to integrate genetic and transcriptomic data to infer functional variants and possible underlying genes. First, we developed Functional Summary-based Imputation (FUSION) that can leverage GWAS summary statistics and a relatively small reference panel of transcriptomes to infer the association between gene expression and traits (Chapter 3). Using FUSION as well as subcutaneous adipose and whole blood RNA-sequence (RNA-seq) data, we performed transcript-wide associated studies (TWAS) and identified 69 novel genes associated with BMI, serum lipids, and height. With the constantly growing GWAS summary statistics and transcriptomic data, we can further utilize FUSION to apply TWAS to many different traits and tissues.
To account for the increasing presence of large-scale RNA-seq cohorts, we created a new computational tool, ASElux, which can efficiently perform allele-specific expression (ASE) estimation that was previously prohibited due to excessive computing time (Chapter 4). We implemented a hybrid index system in ASElux to first build an individualized reference genome with available genotype data, and ASElux will then only align variant-carrying reads that are informative for ASE calculation. Thus, ASElux can correct for the reference allele bias during alignment with much shorter computing time. In our comparison test, ASElux is 4-33 times faster than other commonly used software or pipelines for ASE and obtain a similar or better accuracy. We applied ASElux to 273 lung RNA-seq samples, and uncovered a splice variant, rs11078928, which could explain the molecular mechanism of an asthma GWAS hit, rs11078927. We envision that the speed and efficiency of ASElux can facilitate ASE analysis in many RNA-seq datasets to uncover functional variants in the future.
In Chapters 5 and 6, I will present our studies utilizing epigenomic and transcriptomic data to gain insight into the causal mechanisms of obesity and non-alcoholic fatty liver disease (NAFLD). To elucidate molecular mechanisms underlying obesity-related GWAS variants, we integrated promoter-enhancer interactions in human primary adipocytes with adipose cis expression quantitative trait locus (eQTL) variants (Chapter 5). Using promoter capture Hi-C, we first assayed chromosomal interactions in human primary adipocytes. In combination with human subcutaneous adipose transcriptomes, we then identified four genes associated with BMI or obesity-related traits that are also under cis regulation via chromosomal looping. We further performed electrophoretic mobility shift assays (EMSAs) to validate the allelic effect of a cis eQTL, rs4776984, regulating mitogen-activated protein kinase 5 (MAP2K5). The reference allele displayed a lower protein binding affinity than the alternative allele, in line with the computationally predicted disruptive effect. Finally, we also reported 38 additional BMI candidate genes under the regulation of chromosomal interactions for future studies of obesity.
In our NAFLD study (Chapter 6), we tested the hypothesis that obesity may impair the function of adipose tissue, which can lead to ectopic fat accumulation in the liver, resulting in NAFLD. To understand the molecular pathogenesis of NAFLD driven by obesity, we examined the liver histology and subcutaneous adipose transcriptomes from 259 morbidly obese Finnish individuals that underwent a bariatric surgery. One year after the surgery, we re-profiled their adipose transcriptomes to assess the effect of the weight loss on adipose gene expression. At baseline, we identified adipose expression of 43 genes downregulated in non-alcoholic steatohepatitis (NASH) patients. Of these, the adipose expression of 17 genes was negatively correlated with liver steatosis and serum TGs. In a large panel of mouse strains, expression of five of the 17 genes was also correlated with a diet-induced liver steatosis. Specifically, the adipose expression of one of the five genes, death associated protein kinase 2 (DAPK2), recovered after the weight-loss at the one-year follow-up. Combining phenotype and longitudinal transcriptome data, we performed mediation analyses to demonstrate the causal effect of DAPK2 adipose expression on NAFLD. When DAPK2 expression was knocked down in human primary preadipocytes, five key genes involved in autophagy, of which two also function in adipocyte differentiation, were also downregulated. Our findings suggest an obesity-induced reduction of DAPK2 expression as a new pathogenic mechanism of NAFLD through impairment of autophagy pathway and adipocyte differentiation. In summary, our work presented in Chapters 5 and 6, employing functional genomic approaches and computational methods to decipher disease mechanisms of obesity and NAFLD, highlights strategies to understand the molecular pathogenesis of human disease beyond GWAS.