This thesis explores the automatic prediction of biomolecular interactions using machine learning. The overriding philosophy motivating these investigations is to model the interactions between biomolecules (proteins and small-molecule ligands) using simple features to represent characteristics that are hypothesized to contribute to binding.
For these investigations, I use "support vector" learning to build discrimination functions that separate input features into classes, resulting in a hypothesis as to whether or not (or how strongly) the biomolecules will interact. These discrimination functions are based on training data sets of known interactions.
Individual chapters of the thesis center on different investigations which predict protein-protein interactions in a multi-species database, within a single organism and across species. A final study focuses on the prediction of binding free energy between a receptor and ligand.
An important contribution made by this research is the demonstration that no explicit information about three-dimensional protein structure is necessary to make predictions of protein interactions. This implies that researchers may proceed directly from sequence to inference of protein function, as represented by the context of its interaction with other biomolecules.