Streaming Methods for Assembly Graph Analysis
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Streaming Methods for Assembly Graph Analysis

Abstract

The advent of high-throughput sequencing has radically altered the theory and practice of biology. Massive volumes of sequence data have necessitated commensurate advances in computational approaches to biological analysis; methods for the assembly of shotgun sequencing data into complete genomes, transcriptomes, and metagenomes have been particularly foundational in enabling downstream study. More recently, sketching methods have allowed classification and comparison to scale to hundreds of thousands of samples. This dissertation extends that work by exploring streaming implementations of several core approaches. First, we introduce a method for single-pass construction of the compact de Bruijn graph, with support for novel methods of dynamic assembly graph analysis. Next, we explore the saturation behavior of streaming sequence sketches, and introduce a novel sketch we call draff which uses Universal k-mer Hitting Sets to represent the shape of assembly graphs. Finally, we introduce several pieces of research software for sequence comparison, transcript annotation, and phylogenetic tree building, which showcase accessible, open access design philosophies. We also introduce the goetia library and toolset for efficient de Bruijn graph algorithms in C++ and Python.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View