De novo genome assembly is one of the most critical problems in computational biology. Due to the limitations of current sequencing technologies, the de novo assembly is typically carried out in two stages, namely contig (sequence) assembly and scaffolding. The scaffolding process can vastly improve the assembly contiguity and can produce chromosome-level assemblies. Despite significant algorithmic progress, the scaffolding problem can be challenging due to the high repetitive content of eukaryotic genomes, possible mis-joins in assembled contigs and the inaccuracies of the linkage information.
Different types of linkage information such as paired-end/mate-pair/linked/Hi-C reads or genome-wide maps (optical, physical or genetic) are used to carry out the scaffolding process. Optical maps (in particular Bionano Genomics maps) have been extensively used in many recent large-scale genome assembly projects (e.g., goat, apple, barley, maize, quinoa, sea bass, among others).
In this dissertation, we address some of the computational issues associated with genome scaffolding when optical maps are used. We propose novel algorithms for scaffolding, chimeric detection, and assembly reconciliation. First, we introduce a novel chimeric removal tool called Chimericognizer. Chimericognizer takes advantage of one or more Bionano Genomics optical maps to accurately detect and correct chimeric contigs. Experimental results show that Chimericognizer is very accurate, and significantly better than the chimeric detection method offered by the Bionano Hybrid Scaffold pipeline. Chimericognizer can also detect and correct chimeric optical
molecules.
Second, we describe a novel method called Novo&Stitch that can take advantage of optical maps to accurately carry out assembly reconciliation. Experimental results demonstrate that Novo&Stitch can double the contiguity (N50) of the input assemblies without introducing mis-joins or reducing genome completeness.
Third, we introduce a scaffolding algorithm called OMGS that for the first time can take advantages of multiple optical maps. OMGS solves several optimization problems to generate scaffolds with optimal contiguity and correctness. Extensive experimental results demonstrate that our tool outperforms existing methods when multiple optical maps are available, and produces comparable scaffolds using a single optical map.