Skip to main content
eScholarship
Open Access Publications from the University of California

Genome reassembly with high-throughput sequencing data

  • Author(s): Parrish, Nathaniel Dion
  • Advisor(s): Eskin, Eleazar
  • et al.
Abstract

Recent studies in genomics have highlighted the significance of structural variation in deter-

mining individual variation. Current methods, however, are predominantly focused on either

assembling whole genomes from scratch, or identifying the relatively small changes between

a genome and a reference sequence. While significant progress has been made in recent years

on both de novo assembly and resequencing (read mapping) methods, few attempts have

been made to bridge the gap between them.

In this paper, we present a computational method for incorporating a reference sequence

into an assembly algorithm. We propose a novel graph construction that builds upon the

well-known de Bruijn graph to incorporate the reference, and describe a simple algorithm,

based on iterative message passing, which uses this information to significantly improve

assembly results. We validate our method by applying it to a series of 5Mb simulation

genomes derived from both mammalian and bacterial references. The results of applying

our method to this simulation data are presented along with a discussion of the benefits and

drawbacks of this technique.

Main Content
Current View