Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Computational Methods for Higher Accuracy Nanopore Sequencing

Abstract

Nanopore sequencing is a versatile technology that can generate long, single-molecule reads on a portable device. This presents strong advantages in areas such as metagenomics and de novo genome assembly. These advantages have traditionally been tempered by an error rate higher than other sequencing technologies, though advancements in both sequencing chemistry and basecalling models have yielded steady increases in sequencing accuracy. This work presents multiple computational techniques for further reducing this error rate, primarily by combining information from multiple noisy reads into a single, higher accuracy consensus sequence. We present algorithms for using the probabilistic output of existing basecallers to find the consensus of two (Chapter 2) or more (Chapter 3) reads. We also introduce a neural network polisher for multi-read consensus at higher read depths (Chapter 4). Finally we explore the application of policy gradients, a technique developed for reinforcement learning, to train nanopore basecallers (Chapter 5). These methods are implemented in a variety of software tools, all of which are freely available on GitHub (https://github.com/jordisr/). Our software PoreOver is designed to work with the output of multiple basecallers and implements two main consensus algorithms. We also present PoreOverNet, a simple standalone basecaller. In addition to the standard maximum likelihood loss generally used to train most basecallers, PoreOverNet also implements a multi-objective loss designed to reduce the expected number of errors through a policy gradient approach. Finally, Semapore is a neural network tool for consensus and polishing of genome assemblies.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View