Development of an alternative approach to protein crystallization

We are developing an alternate strategy for the crystallization of macromolecules that does not, like current methods, depend on the optimization of traditional variables such as pH and precipitant concentration, but is based on the hypothesis that many conventional small molecules might establish stabilizing, intermolecular, non covalent crosslinks in crystals, and thereby promote lattice formation. To test the hypothesis, we carried out preliminary experiments encompassing 18,240 crystallization trials using 81 different proteins, and 200 chemical compounds. Statistical analysis of the results demonstrated the validity of the idea. In addition, we conducted X-ray diffraction analyses of some of the crystals grown in the experiments. These clearly showed incorporation of conventional molecules into the protein crystal lattices, and further validated the underlying hypothesis. We are currently extending the investigations to include a broader and more diverse set of proteins, an expanded search of conventional and biologically active small molecules, and a wider range of precipitants. The strategy proposed here is essentially orthogonal to current approaches and has an objective of doubling the success rate of today.


Introduction
Structural biology has become wholly dependent upon Xray crystallography, which has, in turn, become entirely dependent on the crystallization of proteins, nucleic acids, viruses, and macromolecular complexes.It follows that improvements and advances in the area of macromolecular crystallization translate directly into corresponding developments in structural biology, molecular biology, and the medical sciences that derive from them.Crystallization of macromolecules has become the linchpin of the enterprise.At this time, the techniques in common use succeed in about 40% of cases, leaving many of the most biologically and medically relevant proteins and complexes out of reach.This is true both within the laboratories of individual investigators, and at large structural genomics centers.Present methods are proving incapable of addressing the more intractable problems.Innovative alternatives to the approaches now in use are needed.
We have been developing an alternative strategy to current methods based, not on optimization of traditional variables such as precipitant concentration and pH, but on the idea of identifying conventional and biologically active small molecules that promote crystallization through formation of favorable lattice contacts.Such small molecules, traditionally referred to as ''additives,'' have often proven crucial to macromolecular crystallization.The strategy we are developing will carry that idea to the forefront.The small molecule reagents, and their combinations, become the primary factor in the crystallization process.
The fundamental hypothesis driving this research is that conventional molecules having unique features may (1) be bound by biological macromolecules, which may then be stabilized or induced to assume a more favorable conformation for crystallization, (2) that the small molecules may alter the interactions between macromolecules and their solvent and induce ordered association, or (3) that the small molecules may form reversible, cross links in a lattice through hydrogen bonding, electrostatic, and possibly hydrophobic interactions, and thereby promote formation of a crystal.There is ample evidence in the literature, and in our preliminary results, to support this hypothesis and suggest that future research may be profitable.
While it might appear that identifying specific molecules that promote the crystallization of a particular protein is a hopeless task, there being an impossibly vast number of chemical compounds, this is not, in fact, the case.As we have shown in preliminary experiments [4], we are not obliged to evaluate compounds individually, but can do so in groups of various sizes.By grouping compounds into formulations, sample matrices can be devised to test 200-300 chemicals in a single 96 well screen.The only chemicals that need be considered are those of significant solubility in water that do not denature proteins.Further, available evidence suggests that the most suitable compounds will be those bearing groups that can engage in electrostatic and/or hydrogen bonding interactions with proteins.
It may require some time and effort to assemble a sufficiently broad base of experience, and continued refinement of reagent combinations, but eventually those compounds offering the greatest potential benefit will emerge.While any one compound or reagent mix may have a small chance of promoting the crystallization of a specific protein, the probabilities of success contributed by each reagent mix in a large set are additive.The problem currently facing us is to identify those molecules and compounds that can serve, for at least some proteins, to occasionally enhance, even by a small amount, the probability of a successful outcome.

Experimental
In a previous paper [4] we described the results of three experiments that were intended as initial steps in identifying classes of molecules, and individual compounds, that might be generally useful in promoting the crystallization of macromolecules, or that might have utility in increasing the probability that a specific protein crystallizes.The techniques, methods, and materials that we have used in those and subsequent experiments are detailed there.In two experiments we investigated the effects of various reagent mixes, generally having two to eight components, on the crystallization success of 81 different proteins.In these two experiments the compounds were not principally bioactive, but were chemical compounds that might affect the solubilities of the proteins, their stability, surface properties, or their interactions within a crystal lattice.In another experiment, bioactive and physiologically relevant compounds were tested on 66 proteins with the objective of finding combinations that might crystallize more readily, or crystallize in a different form than the unliganded protein.In all, 200 compounds were explored as possible additives.Only two fundamental crystallization conditions were used.One of these was based on 30% PEG 3350, the other on 50% TACSIMATE, both at pH 7. The experiments encompassed 18,240 individual crystallization trials, using sitting drops, deployed manually.

Results
The success of the experiments were measured according to two fairly rigorous criteria, a statistical analysis of the number of macromolecules crystallized as a function of reagent cocktail, and the ability of certain cocktails to cause the crystallization of a protein when all others, or most others failed.We termed the latter feature a measure of the cocktail's ''silver bullet'' potential for a specific protein or virus.The statistical analysis revealed a number of interesting clues, which suggested those types of compounds that might be most generally useful.Of particular significance was the finding that the use of reagent cocktails more than doubled the number of proteins that could be crystallized under the two basic conditions utilized, when compared with those two conditions free of any small molecules.
The crystallization data provided persuasive evidence that, for many macromolecules, incorporation of one or more small molecules could be crucial to obtaining crystals of specific proteins.The results further indicated that certain classes of small molecules, such as dicarboxylic acids and diamino compounds of various sizes and geometries, promoted the crystallization of proteins in a general sense.In addition, different crystal polymorphs were produced of some proteins in the presence of various small molecules, and effects on the diffraction resolutions of some crystals were also observed.Results that compliment ours were also reported based on independent experiments in other laboratories [1,5].
To further test whether the underlying hypothesis was valid, that the small ligands did indeed serve to tether macromolecules to one another and thereby encourage lattice formation, X-ray analysis of at least a sampling of the crystals grown in the experiments was necessary.Only by this technique could detailed interactions within lattices be directly visualized, and the original idea rigorously evaluated.In a subsequent paper [2] we described analyses of nine crystalline proteins grown in the original experiments, each obtained in the presence of a ''cocktail'' of low molecular weight compounds.Another, independent analysis was carried out on bovine trypsin crystals grown in our experiments in the presence of banzamidine and protamine [6].
In the nine examples, small molecules from the different reagent mixes were bound and were clearly seen in difference Fourier maps.In all cases we were able to unambiguously identify which component of the relevant reagent mix the ligand represented.The cases investigated were most frequently those where the small molecule contained an aromatic ring, such as sulfanilic acid, or were a larger ligand such as a nucleotide.We could not be certain that some small ligands, such as formate, would be readily identifiable in difference maps, particularly at resolutions poorer than 1.8 A ˚. X-ray diffraction analyses in which data were recorded from crystals grown in the presence of various reagent mixes, and then used to calculate difference Fourier syntheses, generally revealed the presence of ligands in the lattices at interfaces between protein molecules, consistent with the motivating hypothesis.
On the other hand, some small molecules often do interact in more biochemically relevant ways with protein molecules and thus affect the way in which they crystallize.For example, in the case of benzamidine in trypsin, the ligand binds tightly at the active site, does not form intermolecular bonds, yet enhances the crystallization of the protein.Without benzamidine present, under the same conditions, neither bovine nor porcine trypsin crystallizes.In this case the positive effect of the small molecule must be exerted through stabilization of the protein structure, or by its promotion of imperceptibly small alterations in conformation or disposition of surface groups.
As illustrated by one study on lysozyme crystals, significant conformational changes were induced by interaction with small molecules.Although the ligand was not itself directly involved in intermolecular interactions, a loop, whose conformation was altered, was involved.Thus a ligand may indirectly influence lattice interactions by modifying the surface of the protein, and we know from a long history of protein crystallography that minor changes in surface structure can be all important in crystallization.
Examples were also presented that clearly illustrated the validity of the hypothesis underlying the original crystallization experiments; that the formation of intermolecular, lattice interactions does occur through the incorporation of small molecules at protein-protein interfaces.In other experiments on lysozyme and porcine trypsin crystals, pamino benzoic acid and sulfanilic acid respectively were clearly observed to be present in the lattice, and in both crystals served to establish an interface between protein molecules.This kind of binding was exactly what we might have anticipated.The same was true of trimesic acid in another lysozyme crystal, malonate, oxamic acid, and sorbitol in porcine trypsin crystals, and tartrate in crystals of thaumatin.In all of these cases, virtually every hydrogen bonding possibility inherent to the ligand was utilized in making the intermolecular interactions.
Although interactions were principally hydrogen bonds, hydrophobic interactions were also important, as in an RNase A structure with dGMP.Thus, in at least those nine examples, the observations conformed well to our expectations.All of the results support the idea that ''cocktails'' of small molecules may provide a highly useful alternative for promoting the crystallization of macromolecules.
Subsequent to the screening and X-ray diffraction experiments we have already described [2,4] we have now conducted two additional experiments.Details of those will be presented elsewhere, but a few observations are appropriate here.First, both experiments further confirmed our belief that our underlying hypothesis is valid.Second, certain broad classes of compounds showed themselves to be of rather little value, while others demonstrated surprising efficacy.Among the former were two classes, histological stains and dyes, and antibiotics.Both of these groups are known to bind to biological macromolecules, either specifically or non-specifically, so we had hoped that they might efficiently form co-crystals with proteins.Unfortunately, they did little to promote or alter the crystallization of the members of our protein test set.
Another class of small molecules, however, stood out as promoting the crystallization of a substantial number of macromolecules, and this class was comprised of peptides of various lengths and sequences, ranging in size from dipeptides and tripeptides of defined sequence, up to proteolytic fragments of undefined length and sequence.It is by no means evident at this point what sequences or lengths are most useful, but we are initiating experiments to study the matter further.
In the more recent experiments, we have also included in our macromolecular test set some tRNAs.We were pleased to find that, like many proteins, these nucleic acids also could be favorably affected by the presence of certain small molecules.Thus we now know that the usefulness of this approach is not confined to proteins alone, but extends to viruses and nucleic acids as well.

Comments on the evaluation of crystallization experiments
In evaluating the effects of reagents or physical factors on the crystallization of macromolecules, proper design and scoring of the experiments, the crystallization trials, is obviously important.Unfortunately, no standard or generally accepted approach to such experiments has developed, and the literature is becoming cluttered with ad hoc experiments and, occasionally, meaningless observations that prove of little value.We are not proposing here any rigorous experimental design for crystallization experiments, but would like to call attention to several considerations, and in particular to how the results might best be scored, or evaluated.We would like to suggest a few general guidelines that might be useful in increasing the utility of crystallization experiments, and encouraging the experiments to yield the most in terms of value.The ideas presented here have emerged from the recent experiments described above, and previously [4], along with subsequent experiments using an expanded range of small molecules.Those experiments, in total, involved tens of thousands of crystallization trials, deployed both manually and with robotic systems, and on the order of a hundred different proteins, nucleic acids, and viruses.
(1) It is almost always essential to use a large set of test macromolecules if one is to generalize the results.If findings are presented for one or a few proteins, or for a group of closely related proteins, then it is difficult to know whether the results apply to an individual, or a small subset of proteins, or are meaningful in a broader sense.In general, we feel that between 20 and 30, carefully selected proteins give a good idea of the generality of the results.In our experiments, we tended to use much larger protein test sets, 65-80 macromolecules, but these were probably more than was necessary.Some provided very little information, and often the results for a smaller subset were well representative of the whole.(2) The best proteins are those that are known to be crystallizable, but are difficult to crystallize.We can speak of them as fastidious in this respect.If crystals of a particular protein have never been obtained, and that continues to be the case in spite of the efforts of the investigator, then it might very well be that some inherent feature of the macromolecule prohibits success, regardless of the efficacy of the reagents or physical factors being examined.(3) Proteins that crystallize under a very wide range of conditions, i.e. they form crystals no matter what the pH, precipitant, temperature, additives, etc., are similarly poor in information content.One doesn't know from these whether the reagents or factors being tested were effective or not.Proteins that fall into this category, however, are not useless.As pointed out below, even proteins that crystallize promiscuously may serve as important signals of the effects of specific factors.
(4) When proteins are tested in the presence of a particular reagent by setting them up using some standard crystallization screen (generally commercially available) then the number of successful trials, or ''hits'' obtained with that screen is likely an unreliable measure.The results may reflect the redundancy of components in the screen, the fickle nature of the protein, the stochastic nature of the crystallization process, or simple technical variations.The argument that inclusion of the test reagent opens up new paths to crystal growth that were closed in its absence, seems to us to be rather weak.It should only be given some attention when reproducibility can be clearly demonstrated.(5) Reasonable controls, as for any experiment, are important.This is in great part why so many crystallization studies are difficult to evaluate.It is also necessary to demonstrate reproducibility, perhaps not between different preparations of a protein, but internally, within a single experiment.Because probability plays such an important role in nucleation, multiple, identical trials, parallel sample arrays, should also be used whenever possible.We have been surprised in our experiments, however, at the degree of duplication of results observed in identical samples.This we find encouraging.( 6) Evaluating the results of large arrays of crystallization trials is perhaps the most important part of the investigation.Again, it is important to emphasize that over time, as experience with a large set of test proteins accumulates, certain macromolecules tend to stand out as meaningful indicators of positive or negative effects.Horse hemoglobin and beef catalase, for example, are not very helpful, yeast hexokinase, rabbit aldolase, and rabbit serum albumin are.
When examining the results of a crystallization screen of many proteins, what does one look for?What is meaningful and what is not?We tend to use the following as strong and weak indicators.
(a) Crystals appear in only one, or a very limited number of trials, and they are completely absent in all other trials.Clearly, this is the strongest indicator that one can have, as it fundamentally fulfills the objective of the experiment, and the result is unequivocal (assuming, of course that the crystals are verifiably protein).(b) The second strongest indicator is the appearance of some unique crystal form (crystallographic unit cell) which is clearly different than one obtains in other samples.This is where promiscuous crystallizers are still very valuable.One may obtain crystals of a certain unit cell in virtually every trial, but one trial yields crystals of a distinctly different unit cell.
Clearly, this indicates that some reagent or factor particular to that specific sample had a profound impact on how the protein crystallized.Two examples are shown in Fig. 1.Bovine trypsin inhibited with benzamidine almost invariably crystallizes in a P2 1 2 1 2 1 unit cell, but in the presence of mellitic acid, it crystallizes in a trigonal (pseudo cubic) unit cell (4).Similarly, thaumatin is almost always found in a tetragonal unit cell, but in the presence of a cocktail of certain small molecules, was grown in an orthorhombic unit cell.(c) One may observe that masses of microcrystals, usually tiny needles or thin plates, are commonly obtained throughout a screen regardless of specific conditions, but that in one or a few samples significantly larger crystals are obtained.If this can be shown to be reproducible, then it is indeed meaningful.(d) A different crystallographic unit cell may not be obtained, but a dramatic change in habit may.From the standpoint of X-ray diffraction analysis, this may be equally helpful.Needles or plates may achieve the three-dimensional form necessary for data collection.This too indicates that the specific factor for that sample has had a positive effect.One might not consider a deleterious effect, such as the transformation of polygonal crystals into fine needles, as helpful, but it does indicate that something interesting has happened.Just as fine tuning the dose of a toxin can have useful medicinal effects, optimization of the reagent or factor may do as well.In any case, it is an unusual occurrence that should be noted.
(e) Massive showers of microcrystals may be observed throughout a screen, but some trials may experience only very limited nucleation, with only a few larger crystals present.This is also a positive result.Similarly, crystals in general may exhibit obvious disorder, twinning, or other visible defects, but some samples do not.Most crystals of a particular protein may show poor or no birefringence under polarized light, while one or a few others have strong optical effects.Again, this is a favorable outcome and should not be ignored.(f) More subtle effects are sometimes difficult to observe, but may be meaningful in terms of crystal quality.For example, crystals may appear relatively early after the screen is made, while most samples yield crystals only after a longer period.Such a result suggests that a reagent or other factor may promote nucleation.Even if crystals are not obtained at all, some observations may be meaningful.If precipitate, phase separation, or some other insoluble form predominates across an entire matrix of conditions, but one or two samples remain clear, then this may be a helpful pointer for maintaining the protein soluble.(g) Finally, though it requires a considerably greater commitment of effort, the results of crystallization experiments may be evaluated using X-ray diffraction analysis.The measure here is a comparison of X-ray crystallographic properties such as resolution limit, mosaic spread of reflections, cryo-properties, or radiation sensitivity.For a large number of crystals in a large screen, this approach may not prove broadly practical, but it may be worthwhile in those instances where other, visible features of certain crystals suggest an improvement.One must bear in mind, however, that comparison of crystal quality, in a quantitative manner, has a substantial history of controversy (see for example the results of experiments conducted in microgravity) [3].
In summary, there are several guiding principles of experimental design and evaluation that emerge from these considerations.First, the set of test macromolecules should be large and carefully chosen, and contain as broad a distribution of types, characteristics, and functions as is possible.They should be carefully selected to yield the most information in the experiment, and broad enough to permit generalization of results.Second, it is unwise to use number of successes in a common screen when investigating an individual reagent or physical factor.Third, scoring based only on the appearance or non-appearance of crystals is too rigid an evaluation approach and ignores much useful data and many meaningful observations.It is wise to look as well for unusual outcomes that standout from the norm, even if those outcomes are awkward or difficult to express quantitatively.

Fig. 1
Fig.1In (a) are orthorhombic crystals of bovine trypsin plus benzamidine which are commonly obtained from PEG 3350.In the presence of mellitic acid, however, the crystals of cubic habit seen in (b), which are formally trigonal, are reproducibly grown.In (c) is the most common crystal form of the protein thaumatin, which has a tetragonal unit cell.In the presence of a mixture of anthrone/ N-(2-acetamino)-2 aminoethane sulfonic acid/ Congo Red, the orthorhombic crystals seen in (d) are obtained