Skip to main content
eScholarship
Open Access Publications from the University of California

Genome Assembly and Comparison

  • Author(s): Pham, Kim Son
  • et al.
Abstract

The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been read-pairs, which facilitate the assembly of repeating regions. The shortcomings of current read- pairs algorithm stem from the fact that they are heuristics approaches that are applied after the de Bruijn graphs have been constructed. First, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. While Paired de Bruijn Graphs provide an elegant solution to the read-pair analysis in theory, they are impractical in real sequencing data. Next, we introduce rectangles graphs and pathset graphs, which addressed additional challenges encountered in real data. In the final chapter of the thesis, we introduce an A- Bruijn graph algorithm for finding synteny blocks in highly duplicated genomes

Main Content
Current View