UC San Diego
Human genetic-epidemiologic association analysis via allelic composition and DNA sequence similarity methods : applications to blood-based gene expression biomarkers of disease
- Author(s): Wessel, Jennifer
- et al.
The Human Genome Project, and related DNA sequence variation projects, has provided researchers with both the motivation and raw material for considering large-scale genetic association studies seeking to identify genetic variations that contribute to disease susceptibility. Association studies are plagued by many problems, including inappropriate data analysis methodologies and the potential for false positive results due to the testing of hundreds-of-thousands, of polymorphic loci for association with a disease. Most association analysis methodologies ignore biological realities mediating gene- phenotype relationships, such as the possibility that genes and genetic variations work in concert or in combination to influence a disease and/or phenotypic expression. I describe a statistical analysis methodology for association studies which considers the genetic variation within a gene (chapter 2), across the entire genome (chapter 3), or a series of genes in pathways (chapter 4), as "wholes" rather than as individual isolated entities that are to be assessed independently of each other. I showcase the methodology by applying it to publicly available genotype and gene expression data from the HapMap Project on 57 CEPH individuals. I provide biological motivation for this type of analysis approach and consider measures that assess the "genomic similarity" of individuals with respect to the variations they possess across a number of loci. I, describe a weighted distance- based regression method that exploits this similarity measure in association analyses. In chapter 2, I develop and apply the method to an analysis of the CHI3L2 gene and document the utility and flexibility of the method. In chapter 3, I apply the method developed in chapter 2 to a whole genome analysis of 811,886 phased genetic variations typed on the CEPH subjects. In chapter 4, I extend the method to the analysis of biochemical pathways involved in diseases, functions, and drug targets that are affected by multiple SNPs. I ultimately argue that my work has the potential to not only open up a new area of research in genetic epidemiology and statistical genetic methodology, but also to shed light on the genetic basis of complex, multifactorial diseases and phenotypes