- Main
Streaming Methods for Assembly Graph Analysis
- Welcher, Camille Scott
- Advisor(s): Brown, Charles T
Abstract
The advent of high-throughput sequencing has radically altered the theory and practice of biology. Massive volumes of sequence data have necessitated commensurate advances in computational approaches to biological analysis; methods for the assembly of shotgun sequencing data into complete genomes, transcriptomes, and metagenomes have been particularly foundational in enabling downstream study. More recently, sketching methods have allowed classification and comparison to scale to hundreds of thousands of samples. This dissertation extends that work by exploring streaming implementations of several core approaches. First, we introduce a method for single-pass construction of the compact de Bruijn graph, with support for novel methods of dynamic assembly graph analysis. Next, we explore the saturation behavior of streaming sequence sketches, and introduce a novel sketch we call draff which uses Universal k-mer Hitting Sets to represent the shape of assembly graphs. Finally, we introduce several pieces of research software for sequence comparison, transcript annotation, and phylogenetic tree building, which showcase accessible, open access design philosophies. We also introduce the goetia library and toolset for efficient de Bruijn graph algorithms in C++ and Python.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-