- Main
Discovery and analysis of mosaic arrangements in biological sequences and structures
Abstract
Biological molecules are composed of discrete units, called domains. The study of the identity and organization of these domains can reveal the correspondence between individual units in different molecules, and the history of domains themselves, which may guide our understanding of the evolutionary history of individual molecules. Currently, the study of domain organization in protein sequences is a mature field; however, the studies of domain organization in other types of biological sequences and protein structures are still in their infancy. There is currently no general framework and specific tools for the identification of domains or for the discovery of the domain organization. Existing tools do not explicitly define what a domain is. In some cases, existing tools (e.g., multiple sequence alignment tools) ignore domain organizations entirely, or represent only a limited subset of domain organization. As a result, the mosaic structures of biological data are left undetected, and we demonstrate that the prevalence of mosaic arrangements is under- appreciated. This dissertation considers shortcomings of current technologies and develops a generic framework for the discovery and analysis of domain organizations in any types of sequential data. We apply this framework in several biological contexts. First we develop the A-Bruijn Aligner (ABA), which represents a multiple sequence alignment (MSA) as a graph that automatically reveals the domain structures. Second, we develop a repeat domain graph approach that decomposes a repeat family library into repeat domains, which is the first method for the comprehensive identification of repeat domains in large genomes. Third, we extend the A-Bruijn graph approach to an exploration of the mosaic arrangements in protein structures. Finally, we propose a new method for structure comparison based on a simplified representation of protein structures using the local curvatures along their generalized backbones
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-