Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Discovery and analysis of mosaic arrangements in biological sequences and structures

Abstract

Biological molecules are composed of discrete units, called domains. The study of the identity and organization of these domains can reveal the correspondence between individual units in different molecules, and the history of domains themselves, which may guide our understanding of the evolutionary history of individual molecules. Currently, the study of domain organization in protein sequences is a mature field; however, the studies of domain organization in other types of biological sequences and protein structures are still in their infancy. There is currently no general framework and specific tools for the identification of domains or for the discovery of the domain organization. Existing tools do not explicitly define what a domain is. In some cases, existing tools (e.g., multiple sequence alignment tools) ignore domain organizations entirely, or represent only a limited subset of domain organization. As a result, the mosaic structures of biological data are left undetected, and we demonstrate that the prevalence of mosaic arrangements is under- appreciated. This dissertation considers shortcomings of current technologies and develops a generic framework for the discovery and analysis of domain organizations in any types of sequential data. We apply this framework in several biological contexts. First we develop the A-Bruijn Aligner (ABA), which represents a multiple sequence alignment (MSA) as a graph that automatically reveals the domain structures. Second, we develop a repeat domain graph approach that decomposes a repeat family library into repeat domains, which is the first method for the comprehensive identification of repeat domains in large genomes. Third, we extend the A-Bruijn graph approach to an exploration of the mosaic arrangements in protein structures. Finally, we propose a new method for structure comparison based on a simplified representation of protein structures using the local curvatures along their generalized backbones

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View