Combinatorial Algorithms for Haplotype Assembly
- Author(s): Mazrouee, Sepideh
- Advisor(s): Wang, Wei -
- et al.
Many phenotypes such as genetic disorders may be hereditary while others may be influenced by the environment. However, some genetic disorders are due to new mutations in the individuals DeoxyriboNucleic Acid (DNA). Diseases such as diabetes and specific types of cancer are examples of the conditions that can be inherited or affected by lifestyle genetic mutations. In order to investigate and predict the incidence of such diseases, sequences of single individuals need to be examined. In the past decade, the Next Generation Sequencing (NGS) technology has enabled us to generate DNA sequences of many organisms. Yet, reconstructing each copy of chromosome remains an open research problem due to computational challenges associated with processing a massive amount of DNA data and understanding complex structure of such data for individual DNA phasing.
In this dissertation, I introduce several computational frameworks for understanding the complex structure of DNA sequence data to reconstruct chromosome copies in diploid and polyploid organisms. The methodologies that are presented in this dissertation span several areas of research including unsupervised learning, combinatorial optimization, graph partitioning, and association rule learning. The overarching theme of this research is design and validation of novel combinatorial algorithms for fast and accurate haplotype assembly. The first two frameworks presented in this dissertation, called FastHap and ARHap, are tailored toward providing computationally simple diploid haplotypes with the objective of minimizing minimum error correction and switching error, respectively. I then introduce HapColor and PolyCluster, which aim to improve minimum error correction and switching error in polyploid