Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Computational Comparative and Epigenomic Approaches to Improve Genome Interpretation

Abstract

Systematic analysis of sequence or mappings of biochemical activities can reveal biologically relevant information that may be otherwise overlooked. Such information can be elusive in a large collection of genomic data from varied sources. We therefore propose and apply computational methods that detect complex relationships among data from different genomic loci within or across genomes and generate annotations that highlight notable patterns.

First, we focus on locating genomic regions with conserved properties by scoring cross-species similarity between two regions from different species based on their functional genomic datasets. To do so, we develop a method, Learning Evidence of Conservation from Integrated Functional genomic annotations (LECIF). When we apply LECIF to thousands of human and mouse datasets, we learn a score that highlights human and mouse loci with shared properties, which is expected to be useful in mouse model research.

Building on this work, we also develop a method that scores association between two regions within the same genome based on epigenomic and TF binding data. We apply this approach to thousands of human datasets and learn a score that highlights regions with similar or associated properties within human, which we expect to be useful in studying multiple loci together.

Lastly, motivated by the COVID-19 pandemic, we apply an existing comparative genomics approach to coronavirus sequences and annotate the SARS-CoV-2 genome. Specifically, we apply ConsHMM, a hidden Markov model method that learns conservation states that capture recurring patterns in an alignment of sequences, to alignments of coronavirus sequences. We then analyze the learned state annotations using external annotations of genes, protein domains, SARS-CoV-2 mutations, and other regions of interest and demonstrate that the states reflect biologically relevant information for interpreting the SARS-CoV-2 genome.

Overall, our work aims to learn meaningful patterns in large genomic datasets from diverse sources and provide annotations for interpreting important DNA elements and their relationships. All methods we present are flexible and scalable, making them applicable to newer and larger datasets that will be made available in the future. We expect our methods and genomic annotations to be useful resources for studying various genomes.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View