Ultimately, our bodies are biochemical factories of diabolical complexity. As scaffolds, reactors, engines, and signals, proteins are our essential building-blocks. Drugs, with their potential to alleviate symptoms or cure disease, are often small molecules that amplify or extinguish protein function in just the right way. Molecular docking attempts to understand and predict how those small molecule drugs interact with their protein targets inside us. In isolation, both ligand and protein are bathed in water - to bind one another, some water must necessarily depart. At its core, this dissertation is about how to account for desolvation of the ligand upon protein binding. To highlight why ligand desolvation is important, we discover new chemical probes for CXCR4, a protein target implicated in cancer and HIV. En route, we create the LogAUC metric and the DUD-E benchmarking dataset to better assess retrospective docking performance.
Our rapid context-dependent ligand desolvation scoring term relates the Generalized-Born effective Born radii for every ligand atom to a fractional desolvation, and then uses this fraction to scale an atom-by-atom decomposition of the full transfer free energy. In a test that fails with no desolvation, our method properly discriminates ligands from highly charges molecules. The method is also flexible, performing well whether the protein binding site is charged or neutral, open or closed.
We first retrospectively test ligand desolvation on the 40 original DUD targets, but discover many ways to improve that benchmark. So we construct DUD-E, an improved set with more diverse and biomedically relevant targets, totaling 102 proteins with 22,886 clustered ligands, each with 50 property-matched decoys. To ensure chemotype diversity we cluster the ligands by Bemis-Murcko frameworks. To improve decoys, we add net charge as an additional matched physico-chemical property, and only include the most dissimilar decoys by topology. To test our method prospectively, we screen both a homology model and then a crystal structure of CXCR4. Several of our novel scaffolds are potent and relatively small, with IC50 values as low as 306 nM, ligand efficiencies as high as 0.36, and substantial efficacy in blocking cellular chemotaxis.