Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Algorithms for long-read assembly

Abstract

The recently introduced long-read sequencing technologies (such as Pacific Biosciences or Oxford Nanopore) have substantially improved genome assemblies of many organisms, including the human reference genome. The technologies are, however, facing the challenge of high read errors. In this dissertation, we describe multiple algorithms for assembly and analysis of long-read sequencing data. First, we introduce the ABruijn algorithm for long-read assembly that bypasses the expensive read error-correction step by identifying reliable k-mers in reads. We then describe the Flye package, that combines ABruijn with a new repeat graph approach that accurately resolves the genomic structure. Finally, we extend Flye to the assembly of complex metagenomic communities using long reads.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View