For biological systems, structure and hence function are hierarchically organized at multiple scales. For example, genetic variation in nucleotides (1nm) gives rise to functional changes in proteins (1–10nm), which in turn affect protein complexes, cellular processes, organelles (10nm–1μm), and, ultimately, phenotypes observed in cells (1–10 μm), tissues (100μm–100mm), and individuals (>1m). Here, I exploit this principle for biological modeling.
First, I develop a software library that facilitates the assembly, analysis, and visualization of biological hierarchies, represented by a data structure called an ontology. As demonstration, I assemble a compendium of hierarchies describing the molecular mechanisms of 649 diseases, by integrating a set of gene-disease associations with a gene similarity network derived from ‘omics data. For example, the hierarchy for Fanconi Anemia recaptures the disease’s known relation with DNA repair and proposes new relations with orthogonal pathways.
Next, I introduce a strategy for genotype-to-phenotype translation by using existing knowledge of a hierarchy of cellular subsystems. Guided by this structure, I organize genotype data into an “ontotype,” that is, a hierarchy of perturbations representing the effects of genetic variation at multiple cellular scales. The ontotype is then interpreted using logical rules generated by machine learning to predict phenotype. This approach substantially outperforms previous, non-hierarchical methods for translating yeast genotype to cell growth phenotype. It also accurately predicts the growth outcomes of two new screens of 2,503 double gene knockouts impacting DNA repair or nuclear lumen and generalizes to larger knockout combinations.
Finally, I present a more accurate and interpretable model for translation called DeepCell, a “visible” neural network that couples the model’s inner workings to those of the cell. Outperforming the ontotype approach, DeepCell simulates cellular growth nearly as accurately as laboratory observations. During simulation, genotypes induce patterns on the activities of cellular subsystems, enabling in-silico investigations of the molecular mechanisms underlying each genotype-phenotype association. These mechanisms can be validated and many are unexpected; some are governed by Boolean logic. Cumulatively, 80% of the importance for growth prediction is captured by 484 subsystems (21%), reflecting the emergence of a complex phenotype.