Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Methods for detecting structure in large-scale genomic data

Abstract

Large-scale repositories of genomic data are providing opportunities for researchers to answer biological questions at unprecedented resolution. Uncovering the structure underlying these datasets is a fundamental task where the structure can correspond to biological signals of interest or to confounders such as ancestry and batch effects that must be accounted for to prevent spurious findings. While discovering structure is a challenging problem, the growing size of genomic datasets leads to computational bottlenecks that further complicate their analysis. Here, we propose three scalable approaches for detecting structure in genomic data. We present ProPCA, a probabilistic principal component analysis method for large-scale genomic data. We also introduce SCOPE, a method for inferring admixture proportions from biobank-scale data. Both these methods utilize randomized eigendecomposition and the unique structure of the genotype matrix to perform scalable population structure inference. We apply these methods to simulations to reveal that they remain accurate while improving on runtime compared to existing methods. We applied both methods on the UK Biobank, a dataset containing half a million individuals, to uncover fine-scale structure within the United Kingdom. We subsequently introduce a statistical testing framework for detecting variance and covariance differences by extending eigengene analysis through a set of transformations and randomized eigendecomposition. We use RNA-seq data from individuals with psychiatric disease to reveal several (co)variance differences; highlighting the need to look beyond mean effects. With the increasing availability of large biological datasets, our work enables researchers to efficiently discover and test for structure and perform downstream analyses.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View