Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Bioinformatic characterization of genomic and transcriptomic diversity in the human brain

No data is associated with this publication.
Abstract

The human brain can be organized using various different layers of information about the cells: epigenetic, genomic, transcriptomic, proteomic, etc. Recent endeavors have put tremendous effort into mapping the brain cell-by-cell using these layers of information. A challenge associated with these multi-modal approaches is being able to parse through the giga- to terabyte scale amount of data that is generated. My thesis work has focused on investigating the diversity of the brain’s genome (DNA) and transcriptome (RNA) and developing bioinformatic tools to make that possible. My work can be broken into two general categories, addressing the genome and the transcriptome.

On the genomic side, I focused on identifying novel features known as gencDNAs (genomic cDNAs). gencDNAs are hypothesized to result from transcription of a highly expressed gene which is then spliced, reverse-transcribed, and inserted back into the genome at the site of a DNA strand break. These novel sequences are predicted to be functional, resulting in additional translation of a protein. APP, the amyloid precursor protein gene, was the first gene to be identified as a gencDNA and was determined to be more prevalent in neurons of Alzheimer’s disease (AD) patient brains. I developed an unbiased approach to identify additional gencDNAs in the genome from short-read sequencing data.

The transcriptome can be studied at various resolutions. Through several projects, I examined gene expression at the single-cell level, and I additionally characterized full-length isoforms using long-read sequencing technologies. Recent advances in sequencing have made it possible to sequence the entire lengths of mRNA transcripts. This technology is relatively new, and bioinformatic tools need to be developed to handle this type of data. While several packages and tools exist for quality control, alignment, reduction of redundancy, and annotation, a tool for comparing isoforms (known and novel) across multiple samples and groups is not available. I made a database-driven tool for this purpose that is compatible with current analysis pipelines. The applications of this software were demonstrated by examining a dataset from the 1000 Genomes Project in addition to a large single-cell dataset investigating gene and isoform expression changes in several neurodegenerative diseases.

Main Content

This item is under embargo until January 12, 2025.