- Mobley, David L;
- Bannan, Caitlin C;
- Rizzi, Andrea;
- Bayly, Christopher I;
- Chodera, John D;
- Lim, Victoria T;
- Lim, Nathan M;
- Beauchamp, Kyle A;
- Slochower, David R;
- Shirts, Michael R;
- Gilson, Michael K;
- Eastman, Peter K
Traditional approaches to specifying a molecular mechanics force field encode all the information needed to assign force field parameters to a given molecule into a discrete set of atom types. This is equivalent to a representation consisting of a molecular graph comprising a set of vertices, which represent atoms labeled by atom type, and unlabeled edges, which represent chemical bonds. Bond stretch, angle bend, and dihedral parameters are then assigned by looking up bonded pairs, triplets, and quartets of atom types in parameter tables to assign valence terms and using the atom types themselves to assign nonbonded parameters. This approach, which we call indirect chemical perception because it operates on the intermediate graph of atom-typed nodes, creates a number of technical problems. For example, atom types must be sufficiently complex to encode all necessary information about the molecular environment, making it difficult to extend force fields encoded this way. Atom typing also results in a proliferation of redundant parameters applied to chemically equivalent classes of valence terms, needlessly increasing force field complexity. Here, we describe a new approach to assigning force field parameters via direct chemical perception. Rather than working through the intermediary of the atom-typed graph, direct chemical perception operates directly on the unmodified chemical graph of the molecule to assign parameters. In particular, parameters are assigned to each type of force field term (e.g., bond stretch, angle bend, torsion, and Lennard-Jones) based on standard chemical substructure queries implemented via the industry-standard SMARTS chemical perception language, using SMIRKS extensions that permit labeling of specific atoms within a chemical pattern. We use this to implement a new force field format, called the SMIRKS Native Open Force Field (SMIRNOFF) format. We demonstrate the power and generality of this approach using examples of specific molecules that pose problems for indirect chemical perception and construct and validate a minimalist yet very general force field, SMIRNOFF99Frosst. We find that a parameter definition file only ∼300 lines long provides coverage of all but <0.02% of a 5 million molecule drug-like test set. Despite its simplicity, the accuracy of SMIRNOFF99Frosst for small molecule hydration free energies and selected properties of pure organic liquids is similar to that of the General Amber Force Field, whose specification requires thousands of parameters. This force field provides a starting point for further optimization and refitting work to follow.