Transformation of High-Throughput Data into Hierarchical Cellular Models Enables Biological Prediction and Discovery
- Author(s): Kramer, Michael Harris
- Advisor(s): Ideker, Trey
- et al.
A holy grail of bioinformatics is the creation of whole-cell models with the ability to enhance human understanding and facilitate discovery. To this end, a successful and widely-used effort is the Gene Ontology (GO), a massive project to manually annotate genes into terms describing molecular functions, biological processes and cellular components and provide relations between terms, e.g. capturing that “small ribosomal subunit” and “large ribosomal subunit” come together to make “ribosome”. GO is widely used to understand the function of a gene or group of genes. Unfortunately, GO is limited by the effort required to create and update it by hand. It exists only for well-studied organisms and even then in one, generic form per organism with limited overall genome coverage and bias towards well-studied genes and functions. It is not possible to learn about an uncharacterized gene or discover a new function using GO, and one cannot quickly assemble an ontology model for a new organism, cell-type or disease-state.
Here we change this state of affairs by developing and utilizing the concept of purely data-driven gene ontologies. In chapter two, we show that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to computationally infer a data-driven ontology whose coverage and power are equivalent to those of the manually-curated GO. In chapter three we further develop the algorithmic foundations for data-driven ontologies, laying the groundwork for machine learning to intelligently integrate many types of experimental data into ontology models. In chapter four, we focus on a cellular process (autophagy in Saccharomyces cerevisiae) and develop a framework (Active Interaction Mapping) which guides experimental selection, systematically improves an existing process-specific ontology model and uncovers new autophagy biology. Finally, in chapter five, we illustrate the power of hierarchical whole-cell ontology models for biological modeling by demonstrating an ontology-based framework for translation of genotype to phenotype.
Overall, this work provides a roadmap to construct data-driven, hierarchical models of gene function for the whole cell or a specific cellular process and illustrates the power of these models for both discovery of new biology and biological modeling.