An integrative method for predicting enzyme functions and mapping metabolic pathways
Reconstructing metabolic networks is useful for understanding the functional contexts of compounds, genes, and their interactions. Yet, there is still a significant gap in our knowledge of metabolic networks, particularly for organism-specific pathways, as many pathways are missing from current pathway databases. Computational approaches play an important role in bridging this gap. Prediction of metabolic pathways consisting of enzymes with unknown function involves the identification of multiple enzymes and their interactions with compounds. Strategies based on virtual screening of metabolites against structures of multiple enzyme pathway members and genome context analysis have been applied towards the annotation of novel enzymes in recent cases. However, when information from virtual screening is too ambiguous and the number of possible pathway components is large, it becomes challenging to construct possible pathways manually. We have developed an automated approach that integrates different types of information for the prediction of metabolic pathways. The simultaneous interpretation of information is beneficial because the individual features of a network model are informative for determining a full pathway. It was benchmarked on linear and nonlinear pathway topologies. We applied the method to identifying a novel L-gulonate catabolic pathway in Haemophilus influenzae Rd KW20, integrating information from metabolite docking, chemoinformatics similarity calculations, chemical transformations, functions of close homologs, and high-throughput screening hits. These benchmarks and application demonstrate the potential of our approach to contribute to the discovery of metabolic pathways and annotation of enzymes.