A guide to the crystallographic analysis of icosahedral viruses

Determining the structure of an icosahedral virus crystal by X-ray diffraction follows very much the same course as conventional protein crystallography. The major differences arise from the relatively large sizes of the particles, which significantly affect the data collection process, data processing and management, and later, the refinement of a model. Most of the other differences are due to the high 5 3 2 point group symmetry of icosahedral viruses. This alters dramatically the means by which initial phases are obtained by molecular substitution, extended to higher resolution by electron density averaging and density modification, and the refinement of the structure in the light of high non-crystallographic symmetry. In this review, we attempt to lead the investigator through the various steps involved in solving the structure of a virus crystal. These steps include the purification of viruses, their crystallization, the recording of X-ray diffraction data, and its reduction to structure amplitudes. It further addresses the problems attending phase determination and ultimately the refinement of a model. Finally, we describe the unique properties of virus crystals and the factors that influence their physical and diffraction properties.

. On the left is a ribbon image of the STMV where protein subunits surrounding each of the 12 pentameric vertices are coloured differently so that the symmetry elements of the particle are more pronounced. STMV has a diameter of about 18 nm. On the right is a backbone image of TYMV, a T = 3 icosahedral virus. The pentameric capsid proteins are in yellow and the two different conformations of the pseudo-hexameric capsid protein subunits are in blue and green. The diameter of TYMV is about 30 nm. Figure 2. The simplest icosahedral virus, like STMV shown here, comprised of a protein capsid having T × 60 subunits; here T = 1. The virion has been separated along a fivefold axis so that the capsid (shown in ribbon representation) is divided into two parts exposing the RNA (shown in tubular form) within. Virus capsids are organized to exhibit 5 3 2 point group symmetry. Inside the capsid are one or more strands of either RNA, usually single-stranded, or DNA that code for the capsid protein amino acid sequence, and for some other enzymes that might be required for replication. The peculiar feature of viruses is that they cannot replicate except with the aid of host cell enzymes.
PRD. [9] Host cells, which serve as the sources for the viruses, are highly varied and may come from the plant, animal, or microbial kingdoms.
An excellent general source book for macromolecular structure determination is volume F of the International Tables for X-ray Crystallography. [10] Although not strictly dedicated to virus crystallography, so many of the methods, instruments, and strategies are the same that it also serves as an invaluable guide if the objective is a virus crystal structure. Specific areas of virus structure analysis using X-ray methods are found in other articles in preparation for this and following volumes of Crystallography Reviews. General treatments for the X-ray analysis of macromolecules are also easily accessible. [11][12][13][14]

Crystallization
The crystallization of biological macromolecules of all kinds, proteins, nucleic acids, assemblies, and complexes has been addressed and reviewed extensively. [58][59][60][61][62][63][64] No attempt will be made to repeat that here as it is unnecessary. The question arises, however, as to whether there are any obvious or even suspicious differences between the crystallization of viruses and other biological macromolecules. Indeed, one can find such differences only with difficulty. For the most part the crystallization of viruses and the approaches to accomplishing it are no different than for other macromolecules and their complexes. Indeed, the appearance and mechanical properties of virus crystals, like those shown in Figure 3, give no suggestion that they differ in any respects from more conventional protein crystals.
A review of successful virus crystallization conditions ( Table 1) would show that the same variety of precipitating agents (ammonium sulphate, sodium phosphate, PEGs, methylpentanediol (MPD), etc.) were used with viruses as with most crystalline proteins (Protein Data Bank (PDB) [65]). The pH of successful crystallization experiments generally tends towards the acid side of neutrality, likely reflecting the greater stability of most viruses there, and the tendency to swell at higher pH. Use of detergents, even non-ionic detergents, is essentially absent and specific additives [61,66] are rare. In addition, the most common methods for producing supersaturation, hanging and sitting drop vapour diffusion, microdialysis, batch, and free interface  Table 1. Crystallization precipitants for viruses solved by X-ray diffraction as reported in the PDB.

Precipitant
Number of crystals Precipitant amount (1) PEGs < 1000 11 20-30% w/v most common (2) PEGs 1000-8000 62 See Figure 4 for % distribution (3) PEGs > 8000 None (4) Ammonium sulphate 10 1.5-2.5 M most common (5) Other salts 18 1.0-2.0 M most common (6) MPD and hexanediol 5 10-50% v/v (7) Ethanol or propanol 2 15-25% v/v (8) Numerous virus crystals obtained by adjustment of pH -no reported precipitant diffusion [58,63] predominate. The only obvious difference is that the virus concentration in virus crystallization trials is usually lower, in terms of milligrams per millilitre, than for most proteins. Generally 3-5 mg ml −1 is sufficient to produce diffraction size crystals of viruses, whereas concentrations four times that, or more, may be required for proteins. Examination of crystallization conditions detailed for 110 unique crystals in the PDB shows that the vast majority, about 62%, were obtained with PEGs having molecular weights between 1000 and 8000. In many of the PEG dependent mother liquors, some salts were also included and often in not insignificant concentrations of 0.2-0.3 M. As further shown by Figure 4, PEG concentrations were low, generally between 1% and 5%, in comparison with those used for more conventional protein crystallizations, which tend to be around 10% and higher. [67] An additional 11 viruses were crystallized using PEGs of molecular weight less than 1000, along with 5 others crystallized from MPD or hexanediol. For these crystallizations, the range of effective PEG (or MPD) concentrations was 14-30% w/v (with a single virus at 50% MPD). Two viruses were crystallized from ethanol (14-17%) and isopropanol (24%). Viruses have been crystallized over a wide range of pH from 3 to 9, although the majority around neutrality. As illustrated by the histogram of Figure 5, however, the distribution clearly favours the low side of pH 7, again likely reflecting the greater stability of many viruses at acidic pH, and their tendency to become less uniform in structure at alkaline values. [47] Of the virus crystals, 25-30% were grown using some salt as the principal precipitating agent (though when combined as well with some low concentration of PEG, it is difficult to make a distinction). Ammonium sulphate was most commonly used ( ∼ 12%) and over a concentration range of 0.5-2.5 M, with most successes clustered about 2 M. In addition to ammonium sulphate,  other salts used to promote crystallization were sodium salts of acetate, chloride, and formate; ammonium salts of phosphate, acetate, and formate; and lithium sulphate. These were used in concentrations ranging from 1 to 3.5 M with most successes clustering about 2.5 M. It should also be noted that some virus crystals were obtained from relatively low ionic strength buffers simply by adjusting the pH to a minimum point of virus solubility.
As with some conventional protein molecules, but perhaps more so because of their high symmetry, individual viruses may crystallize in a diverse variety of crystallographic unit cells. This is particularly true as the pH is varied (see, for example, the crystals of BMV in Figure 6), or the precipitant is varied between different salts, or changed from, for example, ammonium sulphate to PEG. The polymorphs may have widely different solvent contents, degrees of order, or vary widely in their diffraction properties. Sesbania mosaic virus, for example, crystallizes in seven distinctly different unit cells according to the PDB, and human rhinovirus crystallizes in at least six forms. Figure 6. An example of multiple crystal forms of a virus are the four crystals of BMV shown here, each having a unit cell different in symmetry and dimensions from the others. The crystals were grown under similar conditions, but the pH of the mother liquor was varied over the range 5.5 to 7.5. Figure 7. Multiple crystal forms are common with virus crystals, as with most macromolecular crystals, reflecting a lattice maintained by a large array of weak intermolecular interactions. Three crystal forms of STMV are shown. From left to right, the forms are cubic, monoclinic, and orthorhombic. The largest dimension of the crystal in each case is between 1 and 1.5 mm, very large by most macromolecular crystal standards.
There is one particularly challenging requirement in virus crystallization (there may, of course, be more for a specific virus) and that is, the crystals for diffraction analysis must be relatively large in size. Optimally, they would be of the sizes of the STMV crystals seen in Figure 7. While micro-beams, high-intensity synchrotron radiation, cryopreservation, and other technological advances have reduced the necessary dimensions of protein crystals to a few tens of microns, this is not usually true for virus crystals. One must still attain virus crystal dimensions measured in fractions of a millimetre (sometimes large fractions). This requirement stems from the very large unit cell dimensions of virus crystals that may range from 200 Å to over 1000 Å, and the large asymmetric units that will always be some significant fraction of the entire virus, and sometimes the entire particle. Because average intensity of reflections is inversely related to unit cell volume and asymmetric unit size, the crystals must be large just to produce strong enough intensities that can be accurately measured. Crystallization trial volumes of nanolitre volumes, often the standard when robotic methods are used for conventional protein crystallization, are, therefore, seldom of value for viruses except possibly to identify initial conditions.

Crystal considerations
For roughly the past 20 years, most X-ray diffraction data have been recorded from protein crystals, and complexes such as ribosomes, that have been flash-cooled in liquid nitrogen and preserved in a cryo-stream during exposure. [68,69] This was both to gain advantage (reduced scaling, less radiation damage [69,70]) and often a practical necessity. Virtually all of those data, particularly the high-resolution data, were collected at synchrotron sources which supplied very high flux density X-ray beams. [71] Crystals simply could not withstand the radiation doses for any useful period of time in the absence of cryo-cooling. Combination of flash-cooling with very high-intensity sources also meant that crystals of decreasingly smaller sizes could be used for data collection, a further advantage since it obviated requirements in the crystal growth phase of a project.
It appears that investigators have frequently encountered difficulties in freezing virus crystals. As discussed above, however, unit cell considerations do not allow for small virus crystals as are now common for most proteins. They just do not produce sufficiently high-intensity X-ray reflections for accurate measurement. Thus, even if virus crystals can be flash-cooled, they still must be large crystals. The problems in freezing virus crystals have been variously attributed to their high solvent content, the large volume of solvent within the particles themselves, and the large interstitial spaces between particles in the lattice. These explanations, though possibly contributors, are probably not sufficient.
Atomic force microscopy (AFM) studies of various icosahedral plant virus crystals (STMV, BMV, TYMV, PMV), in situ, during growth, suggest another explanation. [72][73][74][75] AFM analyses indicate that virus crystals have a relatively high density of defects and that the defects include the incorporation of large foreign particles, misoriented microcrystals, anomalous virus particles and lattice vacancies. These produce more or less localized disorders, and tolerable disruptions to the lattice, in so far that growth continues. More importantly, they also include large numbers of stacking faults or planar defects. Interestingly, no screw dislocations have yet been observed in virus crystals, though they are common in protein and conventional crystals. [58,61,73,76] The planar defects, which subdivide the virus crystals into sectors, or domains, are responsible for their mosaic character (see below). The stacking faults also serve as tributaries and reservoirs in the lattice where solvent accumulates and flows. When winter comes and the cracks in the pavement fill with rain or snow that subsequently freezes to ice, the cracks expand and eventually the concrete may shatter. The same process likely exists for macromolecular crystals that contain both large amounts of solvent [58,[77][78][79] and a high density of planar defects. This is the more likely explanation for the problems that arise in cryo-preservation of virus crystals.
Virus crystals are additionally susceptible to damage from freezing because, as discussed above, they must be large in size. The number of defects as well as their extent is proportional to crystal volume. In addition, defects produce both local and long-range perturbation and strain within crystals, and the accumulated lattice strain is also (probably in a nonlinear manner) proportional to volume. The lattice of large crystals always experiences greater stress than that of small crystals. [80,81] The end result is that virus crystals, by virtue of their size, have high defect densities, high solvent content, and an elevated degree of lattice stress baked into them. It is not surprising then that the trauma inflicted by sudden exposure to cryogenic temperatures causes severe disruption, cracking, shattering or, at the least, a significant increase in mosaicity.
In passing it should be noted that the most common technique used in flash-cooling macromolecular crystals [68,69] is to select them from their mother liquor with a small fibre loop, pass the looped crystal rather quickly through a cryo-preservative solution (e.g. 20% glycerol or ethylene glycol) and then plunge them into liquid nitrogen. The common assumption with this procedure is that damage to crystals results primarily from the freezing of solvent (water) about the surface layers of the crystals. That is, an ice shell forms about crystals that compresses and crushes them. Cryo-preservative solutions are intended to eliminate that shell. Passing a crystal quickly through a cryo-preservative, however, may not allow diffusion of the cryo-protectant into the crystal and the replacement of the water in the defects and vacancies. These may drain or exchange more slowly. Water, upon freezing, will then cause the crystals to crack at domain boundaries.
With virus crystals it may be advantageous to expose them for longer periods to the cryopreservative before freezing. In addition, anything that can be done to reduce defects, such as enhanced purification, should be undertaken. In spite of the difficulties and many failures, a substantial number of virus crystals have been successfully frozen for X-ray data collection. In most cases, rather complex concoctions of cryo-preservatives have had to be formulated and tested by trial and error. Some examples are STMV, [69,82] BMV [52], PMV, [46] and TYMV. [83] Crystals of smaller, T = 1, viruses such as STMV and SPMV have proven easier to flash-cool, and this follows from their smaller unit cell dimensions and the arguments presented above. T = 3 viruses have shown themselves to be more challenging, and viruses of even larger sizes and greater T numbers the most difficult of all.
If virus crystals cannot be frozen, and it is well worth extensive efforts to successfully freeze them, then the only recourse is to record X-ray data at room temperature. Conventional X-ray sources that allow longer lifetime in the X-ray beam, and that are commonly sufficient for highresolution data collection on conventional protein crystals, generally do not provide sufficient intensities for virus crystals of T > 1 particles or unit cell dimensions exceeding about 250 Å. Hence, to acquire good X-ray data, it usually is essential to employ synchrotron sources. The trade-off is that synchrotron sources provide measureable reflections, but the crystals, at room temperature, suffer severe radiation damage and deteriorate rapidly.
The investigator must do the best he can under these circumstances. For fairly robust T = 3 viruses, we have found that up to about 6 minutes of total exposure time to radiation produced by most second-generation synchrotron sources can be used to obtain still useful data before the crystal is exhausted. This interval may allow 2-4 exposures, usually about 0.5°of rotation each, to be recorded. The last exposure in the set is of course always questionable and has to be evaluated with care, but this concern is lessened somewhat when a large number of exposures are recorded and scaled.
The overall objective is to collect as many acceptable exposures (frames of data) as possible, scale them together, and then eliminate those that contribute more error than information. The saving grace with this approach is that scaling of many exposures collected at room temperature is surprisingly good. Scaling of data clusters or sets collected at cryo-temperatures from multiple crystals on the other hand scale poorly, if they can be scaled at all. Another positive consideration is that virus crystals usually yield an abundance of independent reflections in proportion to the size of the molecule (usually the capsid protein) that must be solved because of the non-crystallographic symmetry (NCS) inherent in the virus particle.
For room temperature data collection, crystals must be mounted in what was once called 'the conventional manner' in sealed glass or quartz capillaries. [62,84,85] The art of mounting crystals in this way requires, in addition to 'grace under pressure', patience and skill. The art has been nearly lost over time due to the successes of cryo-crystallography, but may be experiencing a revival. In any case, proficiency can be acquired through practice. If it is the only way to obtain the X-ray data, then that is usually sufficient inducement to merit the commitment.
A somewhat simpler method that has more recently appeared is to mount or secure the crystal in a fibre loop, as for cryo-crystallography, but then cover the loop and crystal with an envelope of thin, transparent plastic (Mitigen Co., Ithaca, NY). This can work as well as capillary mounts but due to slow loss of water the arrangement is only useful for data collection for four to six hours. This, however, is ample time to record several frames of data on a synchrotron source at room temperature. Indeed, because a virus crystal at room temperature can only tolerate a few minutes exposure of such a source, in total, then it may not even be necessary to enclose the crystal in an envelope. One can simply trap a crystal on a loop, align it in the X-ray beam and collect a cluster of frames before significant dehydration occurs. This has been done in a few cases, but it does entail risk.
Room temperature data collection of virus crystals carries an additional implication. Because success depends on the scaling and merging of reflections from many small data sets, it means that many large crystals must be grown. With flash-cooling, all data can conceivably be acquired from a single crystal. Many large crystals of any macromolecule is a daunting, formidable challenge, but one which has been overcome by many investigators. The structure determination of the bacteriophage HK-97, which required 720 separate crystals, is one such heroic example. [8,86] The discussion above would suggest that virus crystallography presents some unique problems at the crystal growth stage, and this is largely true. There is one distinct advantage that icosahedral viruses have, however, that appreciably ease issues at the crystallization stage, and that is their high symmetry. It has been observed [62,87,88] that symmetry in macromolecules tends to advance the probability of crystallization. Indeed, vast numbers of symmetrical protein oligomers have been crystallized and their structures determined. In a majority of cases, symmetry elements of the oligomer or particle were incorporated entirely, or at least in part, into the ultimate crystallographic symmetry. There is now little argument against the fact that symmetry promotes crystallization. [89] Icosahedral virus particles [2][3][4][5][6]90] and see article by DLD Caspar in preparation for a future issue of Crystallography Reviews, have exact 5 3 2 point group symmetry relating their 60 identical asymmetric units (coat protein subunits in the case of T = 1 viruses). Although the fivefold symmetry cannot be incorporated into the space group symmetry of a crystal, twofold and threefold symmetry elements can. Thus, icosahedral viruses are often centred on crystallographic special positions, on twofold or threefold axes, and at 23, 222 or 32 symmetry points.
Icosahedral viruses of triangulation number T > 1 possess quasi-symmetry elements as well, such as quasi-sixfold axes. [2,5,6] Quasi-symmetry cannot be incorporated into the space group symmetry, but there is some likelihood that the periodic nature of the quasi-symmetry and the isotropic shape of the overall particle may contribute to favourable and repetitive lattice interactions. The end result is that, in comparison with most other biological macromolecules and assemblies, viruses are fairly easy to crystallize once they have been obtained undamaged and in a pure form.

AFM analysis of virus crystals
In addition to earlier studies [91,92] of virus crystallization using quasi-elastic light scattering (QELS), extensive AFM studies have also been carried out on virus crystals in situ. [93] Viruses proved to be particularly valuable as samples in QELS investigations, because their large size meant that they produced a strong scattering signal, as did their aggregates. This was especially true in studies of prenucleation and nucleation events. Viruses were equally valuable in AFM studies because their size allowed them to be seen as single particles and their incorporation into crystal lattices directly visualized. Figure 8 provides examples of AFM images of the surfaces of several T = 1 and T = 3 virus crystals where the individual particles composing the lattice are clearly defined. Figure 9 shows that in some cases, like that of TYMV, individual pentameric and hexameric capsomeres composing the capsids of single particles could be observed. Thus, growth kinetics, defect formation, and other features of crystal growth could be visualized at  essentially molecular resolution. This is not generally possible with most proteins, though there are some exceptions. [94] AFM investigations, focused primarily on STMV, BMV, TYMV, cucumber mosaic virus (CMV), and PMV, have been reviewed [2,5,6,74,75,95] and thus need not be repeated here. A few observations from that work are, however, worth recounting. The observations, to some extent, emphasize those features of virus crystals that discriminate them from other macromolecule crystals, and certainly, small organic molecule crystals.
Virus crystals grow from solutions, as do most conventional organic molecule crystals and all other macromolecular crystals, by what is classically referred to as sequential layer addition. Layer addition in the face normal direction relies on the generation of terraces and growth steps by two-dimensional nucleation and/or by spiral dislocations. [80,81,96,97] It pictures the ordered addition of individual molecules at the resulting step edges, by tangential growth, at a rate determined by the level of supersaturation. The only really distinctive difference between virus crystals and protein crystals that has been observed by AFM is that the generation of growth a b c d e f g h Figure 10. Virus crystals develop, as do almost all crystals grown from solution, by the sequential addition of planes of ordered molecules to their surfaces. The planes expand laterally from an initial island, or two-dimensional nucleus on the surface, by incorporation of virus particles to the edges of the expanding planes, the so-called step edges. [80,81] Here step edges are seen on a variety of growing virus crystals. In steps by spiral dislocation does not appear to be a growth mechanism for virus crystals. Face normal growth seems to be exclusively due to two-and three-dimensional nucleation on existing surfaces. [58,98] Virus crystals, because of the large growth step heights at the advancing edges ( Figure 10), incorporate vast amounts of impurities into their lattices. That is, the growth steps, as they move across the surfaces of crystals, singly or in step bunches, [99] sweep everything before them, like great waves, into the channels and interstices between particles. This has been seen to include fibres of various sorts (Figure 11), [100] dust particles, [101] misoriented microcrystals ( Figure  12), [102,103] and as seen in Figure 13, anomalous and mutant particles.
A particularly striking case was recorded for the lattice of a monoclinic crystal of PMV, a T = 3 plant virus of 30 nm diameter. Preparation of this virus required separating PMV from its satellite SPMV (T = 1, 17 nm diameter) by PEG fractionation. As a consequence, SPMV remained a prominent contaminant in PMV crystallization mother liquors. In Figure 14, virions of SPMV can be seen incorporated into the crystal lattice of PMV in the interstitial spaces between PMV virions. [53,104] Another interesting case is typified by BMV. [32] In Figure 13, the lattice of a BMV crystal is seen to contain not only normal 30 nm diameter BMV virions, but occasionally, distinctly larger, anomalous virus particles of greater diameter. The lattice of an orthorhombic STMV crystal in Figure 15 illustrates another common phenomenon in which there are frequent absences in the lattice (called vacancies) of single particles, clusters, and lines of particles in the lattice. The images demonstrate that in STMV crystals, as in most other macromolecule crystals, some unit  cells, perhaps as many as several per cent, remain unoccupied. In spite of these defects, the crystals diffract to unusually high resolution.
In Figure 12, also AFM images of an STMV crystal, it can be seen that during the course of its growth, microcrystals, presumably having formed spontaneously in the mother liquor, sediment on the surfaces of a larger, growing crystal. These too are incorporated, misoriented as they are, into the larger crystal. Thus, we see that virus crystals may be inordinately permeated with a wide array of different impurities that exceed, probably by several orders of magnitude, the quantity consumed by conventional crystals, and even protein crystals. [102] It might be expected that the extensive impurity incorporation observed by AFM would gravely interfere with crystal growth and even cause it to cease. It would certainly do so for conventional crystals. It might seem extraordinary, in fact, that large virus crystals can even be obtained. They do, nonetheless, grow to large dimensions because the lattices of virus crystals appear to be unusually forgiving. They can absorb extensive insults and offenses and simply grow around them. Defects (see below) are created as consequences of impurity incorporation, but these too fail to prevent further growth from proceeding. Apparently this is due to the plastic nature of virus crystals, likely a consequence of particle elasticity and size, the high solvent content of the crystals, and the large spaces between particles in the lattice. As noted above and illustrated by STMV and BMV ( Figure 15), lattices can exhibit point defects and vacancies, and even line defects due to strings of vacancies. These defects are relatively innocuous, localized, and while the absent lattice points fail to contribute to the Bragg scattering, any damage that is consequential to diffraction is limited. Similarly, incorporation of anomalous particles results in some local disorder, as seen in Figure 13 for BMV, but it too is fairly restricted and the effects are not serious. Because no spiral dislocations are apparently present in virus crystals, likely because of the large step heights, there are no long distance line defects passing through crystals along screw dislocation axes.
What is found in relative abundance in virus crystals are stacking faults and planar defects of various kinds. These arise when separate growth terraces and planes (from two-and three-dimensional islands) encounter one another on the developing layer and their step edges fail to merge and knit in a flawless manner, as seen in Figure 16. That is, there is some vertical displacement of a fraction of the step edge height between apposing steps. When this occurs, then uniform forward advancement of step edges ( Figure 10) is disrupted and redirected as seen on the surface of an STMV crystal in Figure 17. A consequence of this is that vertical dislocations are created between the molecules and unit cells comprising one expanding plane and those of others. The net effect is to effectively create mosaic blocks within the crystal ( Figure 18) and produce a spread in the Bragg angles for reflections, and therefore the width of observed intensities. [71,105] In some virus crystals, such as STMV [72] or CMV, [106] the planar defects are very common and the crystals exhibit what we call a high defect density ( Figure 18), orders of magnitude greater than for conventional crystals. This elevated defect density is likely a major  constraint on the resolution of the diffraction patterns yielded by virus crystals, which, it might be noted in passing, varies for the diversity of virus crystals over a wide range ( Figure 19).

Preliminary X-ray analysis of virus crystals
The symmetry properties of icosahedral viruses have been dealt with extensively in the literature [2,3,5,6,107] and will be further discussed by DLD Caspar in a review in preparation for a future issue of Crystallography Reviews, so no attempt will be made to comprehensively review that here. A few points relating to crystallographic analyses are, however, appropriate. Icosahedral viruses are cubic solids, the highest of the Platonic solids, and they may also be described as dodecahedra. The two are complementary solids and both exhibit the same 5 3 2 symmetry. One can be inscribed within the other so that faces in one become vertices in the other, and vice versa. Some icosahedral viruses may in fact actually have the shape of a dodecahedron. As cubic solids they are isotropic and exhibit identical optical properties independent of direction.
Although the particles are isotropic, they can form crystals that do not exhibit isotropic properties, that is, monoclinic, orthorhombic, rhombohedral, etc., and those crystals, having different refractive indexes for different crystallographic directions, can exhibit optical effects with polarized light, including birefringence and extinction. [85,108] Because the particles making up the crystals are isotropic, however, the optical effects of virus crystals are very weak. Birefringence, for example, is dependent upon the product of the difference in the refractive index in two directions with the thickness of the crystal. To obtain any strong birefringence, therefore, it is necessary to have a large, thick virus crystal, a good fraction of a millimetre in thickness. As with all other crystals, no birefringence or extinction is possible if the virus crystal itself has cubic symmetry, or if a crystal of lower symmetry (e.g. rhombohedral and tetragonal) is viewed along an optical axis (i.e. threefold and fourfold). The end result is that optical analysis of virus crystals usually yields little reward.
As noted above, when icosahedral viruses crystallize, some of their cubic symmetry elements may be incorporated into the space group symmetry of the crystal. Thus, they are often situated on crystallographic two-or threefold axes, or at special symmetry points. [109] It is, further, not uncommon for icosahedral viruses to crystallize in cubic unit cells and reside on 23 symmetry points that thereby yield the smallest possible asymmetric unit size in terms of protein subunits. Asymmetric units of icosahedral virus crystals may also be the entire virus (or even multiple n particles, e.g. PMV [46]) in which cases there are T (60) or nT (60) protein subunits as the crystallographic asymmetric unit. If residing on a rotation axis or special position, then the asymmetric unit size will be T/2 (60) if on a twofold axis, T/3 (60) on a threefold axis, T/4 (60) if on a 222 symmetry point, T/6 (60) if on a 32 symmetry point, and T/12 (60) if positioned on a 23 symmetry point. Because of the inherent symmetry of the particles, crystallographic space groups of relatively high symmetry are more common for virus crystals than for proteins and nucleic acids, but low symmetries, including P1, also frequently occur (http://viperdb.scripps.edu/ [109]). High symmetry is to be preferred because it substantially simplifies data collection and data management, and it fixes orientation so that the entire analysis proceeds with less ambiguity (see below). Lower symmetry means that there is greater opportunity for particle averaging in the phasing and analysis stages of a structure determination.
There are 230 crystallographic space groups in total, but only 65 are possible for strictly chiral asymmetric units as is the case with biological macromolecules. Many of the possible 65, however, have not been observed for icosahedral viruses. As Table 2 shows, for 163 unique virus crystals in the PDB that served for structure determination, only 26 space groups are represented, and 9 of these only a single time. Fifteen of the 26 space groups were observed 3 or less times. The most frequently observed space groups were P2 1 and I222, which account for 21% of all space groups. The next 6 most frequent space groups C2, P1, I23, P2 1 2 1 2, H3, and P2 1 2 1 2 1 (by a large margin the most generally observed space group for globular proteins) are included, the top 8 symmetries account for 113/163 or 69% of all space groups for virus crystals. As noted already, while cubic space groups are relatively rare for biological macromolecule crystals, for virus crystals they account for 35 observations or 21% of the total.
If a particle is centred on a 32 or 23 symmetry point, then the directions of all icosahedral axes are specified. If it lies on a twofold or threefold crystallographic axis, however, then the directions of remaining particle axes must be determined to fix the orientation of the virion in the Figure 20. If a portion of the icosahedral array of a T = 3 virus is projected on to a plane then the interactions between adjacent A, B, and C conformers of the capsid subunits can be visualized, as it is here for the protein lattice of PMV. The numbers attached to the letters refer to the icosahedral symmetry operators that generate the subunit from the reference subunits A1, B1, and C1. unit cell. If the virus is centred on a 222 symmetry point, then this could correspond to either of two possible orientations for the particle and, again, that ambiguity must be resolved (see below).
Because the fivefold axes of an icosahedron can never be crystallographic symmetry elements, there must always be at least 5T protein subunits in the asymmetric unit. This further implies that for any icosahedral virus crystal there will always be the opportunity for at least fivefold averaging within the crystallographic asymmetric unit to be exploited in phasing. For any T > 1 virus, however, the subunits fall into T conformational classes [2] and the T subunits do not have strictly identical conformations, though their amino acid sequences are generally the same, nor do they have identical environments. The T subunits in the icosahedral asymmetric unit are then described as being quasi-equivalent. For example, for a T = 3 particle, there are T (60) = 180 subunits, but these contain equal amounts of three quasi-equivalent variants generally denoted subunits A, B, and C ( Figure 20). The three protein subunits must be treated in the analysis as different proteins.
The smallest icosahedral viruses have, of course, the lowest triangulation numbers T, such as T = 1, T = 3, T = 4, and T = 7. Beyond T = 7 the virions are generally too large to be addressed by single-crystal X-ray diffraction, though not, evidently by cryo-electron microscopy (article in preparation for Crystallography Reviews by Veesler and Johnson, and also the review by Baker et al. [106]). For T > 1, the principle of quasi-equivalence in which the icosahedral asymmetric unit is composed of multiple, identical proteins comes into play. Virus capsids can also have asymmetric units composed of multiple, non-identical polypeptides, or protein subunits, which results in what are termed 'pseudo-quasi-equivalences', in which case the symbol T is usually replaced or preceded by a p, for example pT3 or p3 [1,4,5,110,111] (http://viperdb.scripps.edu). Poliovirus [112] and rhinovirus [113] are prominent examples of viruses with 'pseudo-T3 quasiequivalence' symmetry because of their multiple and distinct capsid polypeptide chains. The presence of 'pseudo-quasi-equivalence', though suggesting ominous complications, can also be precisely characterized and does not, in fact, make structure solution significantly more difficult. It simply enlarges the size of the icosahedral asymmetric unit according to the multiple polypeptide chains.
Preliminary X-ray diffraction analyses need not be carried out using a synchrotron X-ray source or with the most advanced detectors because fairly low-resolution reflections are usually sufficient to allow determination of the unit cell parameters and the space group symmetry. Screw axis ambiguities and specification of the number of virus particles per unit cell (Z) are usually absent or are straightforward to resolve. This may not, on the other hand, be true for conventional protein crystals. Monoclinic and rhombohedral unit cells are notorious among crystallographers for their tendencies to twin. [114,115] Thus, the investigator is advised to keep a wary eye on that possibility.
To determine the true diffraction limit of a crystal of any macromolecule, including virus crystals, the crystals must be examined by X-rays at room temperature as well as cryogenic temperature, as cryo-cooling may reduce the diffraction resolution and increase mosaicity. Flashcooling of any macromolecular crystal, and particularly large crystals (see above), inevitably produces damage, cracking or disorder that reduces the resolution of the diffraction pattern. Similarly, freezing also increases mosaicity, increases background intensity, and generally degrades the overall quality of diffraction. Crystal mounting, best done in quartz capillaries (see above), and preliminary room temperature analysis is, therefore, essential. Crystal decay from X-ray exposure can also be evaluated at room temperature, and if it is severe, then efforts to reduce it should focus on identifying cryo-crystallography conditions. There is always a trade-off between radiation damage and freezing damage, and as early on as possible that conflict needs to be resolved.

X-ray data collection
Recording X-ray intensities is the last truly experimental step in any crystallographic structure determination, as every procedure after that basically involves some manipulation of the X-ray amplitudes or model parameters in a computer. Thus, data collection deserves particular attention and care. [116][117][118][119] If the data are poor, subsequent steps of the analysis will be more difficult, and those steps are challenging enough with high-quality data. Recording the X-ray intensities from virus crystals, in the view of the authors, is still the most demanding part of the structure determination. Because the unit cell dimensions are several hundred or more angstroms in length and the unit cell volumes large, there are usually hundreds of thousands of independent reflections to be measured and, optimally, there will be many equivalent reflections recorded. Low crystallographic symmetry or otherwise unusually large asymmetric unit sizes intensify the challenges.
Because the total scattering of the unit cell is spread over so many intensities, any individual reflection tends to be weak and therefore associated with greater error or imprecision. Large unit cell dimensions mean very small reciprocal lattice spacings, hence the reflections are very close together and frequently difficult to resolve. The resolution limits of T = 1 virus crystals are generally comparable to protein crystals and several have extended to beyond 2.0 Å. Cubic crystals of SPMV yield excellent data to at least 1.9 Å resolution, [120] orthorhombic crystals of STMV diffract to at least 1.4 Å (Figure 21), [82] and crystals of STNV VLPs also to that resolution. [29] Larger T = 3 virus crystals do not diffract to such limits and sometimes diffract to no more than 3.5 Å at best. The histogram in Figure 19 shows the resolution limits of the structure determinations for all virus crystals in the PDB.
As discussed above, virus crystals may prove difficult to flash-cool for data collection, and even when they can be frozen, they often suffer severe damage that degrades the overall diffraction pattern. At best, freezing increases mosaicity which makes it even more difficult to resolve reflections. If it is impossible to obtain data from frozen crystals, then it becomes necessary to do so at room temperature. This places additional demand on data collection, as many crystals, generally having random orientations, are required, only short exposure times are possible, radiation damage may be severe, and clusters of only two to six frames of data must be scaled together. As with all data collection, redundancy of measurement of reflections and their symmetry equivalents is highly desirable, but this must sometimes be reconciled with other considerations.
For structure determination of T = 1 particle crystals, conventional rotating anode sources have proven adequate for many cases, and the robust character of T = 1 viruses has allowed the use of multiple crystals and structure determination at room temperature. T = 1 virus crystals, including T = 1 particles derived from viruses, which all have diameters of 16-19 nm, such as the T = 1 VLP of STNV [29] or the T = 1 particles derived from BMV [121] and AMV, [34] have cell dimensions in the neighbourhood of 200 Å. These generally present no serious issues for data collection with rotating anode sources fitted with appropriate optical devices and using rapid detector systems such as image plates or CCD detectors. [122,123] Even multiwire detectors used in the 1990s (San Diego Multiwire Systems, San Diego, CA) proved themselves entirely adequate.
For data collection from crystals of viruses of T > 1, the situation is quite different ( Figure 22). All of the problems alluded to above come into play. In addition, the crystals are softer, more fragile, and both mechanically and radiation sensitive. Animal and bacterial viruses such as HK 97 [124,125] epitomize the inherent problems of virus data collection. With these viruses, the use of synchrotron radiation is obligatory. Lesser sources simply do not provide sufficient intensity. The beams, furthermore, must be of very low divergence and use the best collimation available to provide the least spread of reflection intensity possible at the detector. If spot size is too great, then unacceptable numbers of reflection overlaps occur, and these are either worthless, or at best difficult to disentangle and merge. With virus crystals, beam optics [71,126] become important considerations. This is particularly so because simply reducing beam diameter also reduces X-ray flux and the volume of crystal illuminated, and therefore already weak reflections become even more so.
Spot separation is an important requirement in terms of the detector as well. In general, spot separation increases as a function of crystal to detector distance. Thus for most virus crystals, the detector is pushed back as far as possible to give the greatest crystal to detector distance. Distance, however, must be weighed against other effects. As the detector is pushed further back, the spread of reflections also increases on the detector face due to divergence, increasingly so as a function of Bragg angle, and this results in more reflection overlap. In addition, as the crystal to detector distance is increased, the angle subtended by the detector decreases and reduces the maximum Bragg angle, hence the resolution, of the recorded reflections. In practice, however, the detector is usually pushed back as far as it will go. With the detector at maximum distance from the crystal, it will probably be necessary to swing the detector up (or out, depending on beam line geometry) to gather reflections at high 2θ. This must be done with caution as mechanical movements in the goniostat or detector must be extremely precise in order to properly correlate the reflections (h k l indexes) of high with low-resolution reflections. Reflection centres for virus crystals are separated from one another by only a few pixels on the detector. In addition, indexing of reflections at any resolution is very sensitive to the specified position of the beam centre.
Obtaining very precise coordinates for the beam centre on the detector is essential. For most protein crystals this is less important because the reciprocal lattice spacings are relatively large and the reflections on the detector well-separated. For virus crystals with large unit cell dimensions, the reflections are very close together, a few pixels of separation. Indexing of reflections is, therefore, absolutely dependent on knowing the exact beam centre. If this is in error by only two or three pixels in one or more directions, it will throw the indexing off by one or more in h, k, or l. The miseries inflicted on the individual trying to process and scale the data then become legion. More than once entire data sets have had to be entirely recollected because the beam centre was indeterminate.
The authors suggest the following expedient to locate precisely the beam centre. Before data collection on a virus crystal is initiated, or after any significant mechanical adjustment has been made, a lysozyme crystal (easily grown [58,127]) is mounted in the beam. A dozen frames or more are then quickly collected from that strongly diffracting crystal (a few minutes in total is adequate). The lysozyme data can then be quickly processed and when the unit cell is refined, the beam centre is as well. In this way, a very accurate centre point is obtained that can be trusted to support a correct indexing for the virus crystal.
Although diffraction intensities from virus crystals tend to be weak, they are not uniformly so. At low resolution, at say less than 6 Å, there may be strong intensities. In compensating for generally weak diffraction by making relatively long exposures, these strong reflections may become saturated (detector dependent) and then rejected from the data set. To obviate this, it may be necessary to recollect the low-resolution portion of the diffraction pattern with short exposure times to recapture strong reflections. This is usually done after rather than before higher resolution data collection, because low-resolution reflections are less sensitive to radiation damage. Do not ignore or dismiss the value of these strong, low-resolution reflections. Virus crystallographers will unanimously attest to their importance in the subsequent structure analysis.
Attention should be given to data collection strategy [128] to insure efficiency. Some rules are almost self-evident. If the crystal possesses a high symmetry axis (threefold, fourfold, and sixfold), then clearly rotation around that axis provides the most rapid measurement of an entire asymmetric unit of reciprocal space. A 60°wedge about a sixfold axis is a delight, or even 90°about a fourfold axis, but neither as delightful as a 22.5°wedge in a cubic space group. Redundancy will be lacking, however, except from Friedel mates. A second orientation may be necessary about some general direction to fill in the 'apple core' region surrounding the first rotation axis, but only a partial data set is required for that second orientation.
If the crystal has a particularly long cell edge (correspondingly short reciprocal axis), then reflections along that direction in reciprocal space will be most difficult to resolve. If the choice presents itself, then collection by rotation about that short reciprocal axis provides the best separation. If frozen crystals cannot be used and the approach is to collect small wedges of data from many crystals and scale them together, then strategy is usually out of the question. In these circumstances, the objective must be to obtain as much data as possible from as many randomly oriented crystals as possible. That is, collect data until you run out of crystals or until the authorities show you the door.
In addition to crystal to detector distance and beam size, two other data collection parameters that deserve some thoughtful consideration are the angular increment that defines a frame of data, and the exposure time devoted to that frame. Statistical considerations of error regarding frame size suggest that small intervals or 'fine slicing' is preferable. [129] This is also favoured because it minimizes the overlap of reflections, usually a significant problem. On the other hand, the smaller the angular increment, the more the frames that must be collected and scaled together, and the more partial than whole reflections one will find on a frame (generally, one seldom finds an entire reflection on one frame, but the smaller the increment the more the frames over which a spot will be spread). The authors commonly use a rotation or oscillation angular increment of 0.5°unless special circumstances prevail, such as extreme overlap problems, in which case that increment may be 0.33°. With the new pixel detectors, angular increments are no longer relevant because there is continuous rotation, the shutter never closes, and there is near-continuous readout so that angular increments of as small as 0.2°are practical. [129] Exposure time per frame also presents trade-offs. In the end, it will be determined by the diffracting power of the crystal and its sensitivity to radiation damage. Obviously, to obtain data from a weakly diffracting crystal, a frame will have to be exposed longer; the longer the exposure, the greater the magnitudes of the intensities, and the less their associated error; on the other hand, the longer the exposure, the greater the radiation damage and the fewer the frames that can be collected before the crystal becomes useless. One to two minutes of exposure per frame of 0.5°, however, might be a good starting point.
As noted above, it is necessary to use large virus crystals for data collection simply to obtain adequate X-ray intensities. Frequently, though, because of its size that crystal is considerably larger than the diameter of the X-ray beam. In that case, the beam may be directed through different, non-overlapping volumes of the crystal. This allows, in some cases, multiple clusters of frames to be collected from a single crystal at room temperature or allows an entire data set from a single frozen crystal. Thus, when collecting data from a large crystal, do not begin by shooting through the fat middle of the crystal. Begin at one end, collect there as long as you can, then move to the middle, collect, and then finally record the data from the far end. Some investigators have even mastered the technique of spiralling down a symmetry axis of a crystal.

X-ray data processing
Before initiating the processing of X-ray images into Lorentz-polarization (Lp) corrected structure amplitudes, it is wise to inspect the images in the set visually. Hopefully, this will produce only boredom, but it also allows one to catch unexpected, and usually unexplained, instrument or computer glitches that may mar individual frames or a series of frames. Examination of at least some images, taken at different angular settings, sometimes exposes anomalous reflections that reveal the existence of a spur or parasite crystal, or reflections out of place that could raise the suspicion of twinning.
Examination of the images, acquired over a wide angular range, can also reveal that a crystal is cracked or split. This may not be evident in one orientation, but be pronounced in another. If the intensity distribution for a crystal is anisotropic, this may indicate that the crystal suffers from disorder in one or more directions. Problems with the diffraction data are often clearly evident by simple visual inspection that otherwise may be submerged in the statistics at later stages. There are several data processing packages that are available to the user that are well proven for virus crystal data. Prominent among these are HKL2000, [130] MOSFLM, [131] XDS, [132] and d*Trek. [118] Each may have its own specific advantages, but all have demonstrated themselves capable of handling data from virus crystals, and have been successful in yielding structure solutions. The authors favour d*Trek for its capable treatment of overlapping reflections, but we have used others as well.
Generally, processing of X-ray images requires the determination of crystal orientation and subsequent specification of the hkl indexes and expected position of every reflection on every image, as well as a measure of its partiality on the image. [12,133] This is dependent on the estimated mosaicity of the crystal, among other parameters. Most programs 'track' the images and can correct for small amounts of crystal 'slippage' during data collection so that the indexing and processing proceed properly. The 'slippage' may arise from actual crystal movement in the capillary or in the cryo-stream, stress in the fibre loop, accumulated ice or other experimental factors.
In the authors' view, from a perspective formed from the trials and tribulations of data collection over the past 40 years, modern index assignment is little short of remarkable. No attempt will be made here to explain the algorithms and computing technology that underlie indexing, as that is done elsewhere. [133] Suffice it to say that the programs are generally capable of correctly assigning indexes and predicting precise spot positions in reciprocal space even when the reflections number in the millions (see, e.g. PMV [46]), and even when the spot separation is no more than a few pixels on the detector. When things do go wrong, however, it is most commonly at the indexing stage. As noted above, indexing is very sensitive to correct specification of the beam centre and experimental parameters. A saving grace is that most cases of mis-indexing become evident at a later stage where symmetry-related reflections are scaled and merged. Thus, a fault can be detected and corrected.
The next stage of processing after indexing and the prediction of the locations on every X-ray image of the centres of reflections is integration of the total intensity contained within the spot on the detector. This is more complicated than one might suppose. Integration depends on getting the spot centre exactly right as well as evaluating the crystal mosaicity, or reflection spread (also dependent on the divergence of the reflection), as a function of Bragg angle or position on the detector. A box or envelope of the appropriate shape and dimensions is then defined about the spot, and all of the intensities of the pixels inside the envelope summed. Following this, the background, also dependent on detector position and evaluated by separate procedures, must be subtracted from the summed intensities. Reflections are especially weak for virus crystals as discussed above, so that accurate background estimates are of crucial importance.
Integration is further complicated by the fact that with small data collection angular increments commonly used for viruses, say 0.5°, most reflections are only partially recorded on any individual image. Thus, to determine the total integrated intensity for a single observation, it is usually necessary to sum the contributions from multiple images. It is at this stage that the mosaic spread increase associated with flash-cooling of the crystals may impose itself most painfully. These difficulties are largely overcome with the application of three-dimensional profile fitting such as that implemented in the program XDS. [129,132] Because the reciprocal lattice spacings for virus crystals are so short, and particularly with freezing, the mosaicity high, reflections tend to overlap. This is a frequent difficulty. Overlap increases with the Bragg angle so that high-resolution reflections are particularly afflicted. One approach is to simply eliminate measurements of reflections predicted to overlap. This, however, often leads to unacceptable losses of high-resolution data. Most of the programs and integration strategies, however, incorporate procedures to separate overlapping reflections and preserve I hkl s.
Once indexed intensities have been obtained, the next stage in data processing is scaling and merging multiple observations of the same reflection I hkl together along with symmetry equivalents. This means, at the least, I hkl but generally, if the crystal symmetry is high, many other reflections as well. Anomalous pairs are seldom used in virus crystallography and Friedel mates are usually averaged. It is at this stage that outliers may be eliminated and a meaningful evaluation of the quality of the X-ray data becomes possible. There are many scaling algorithms in use and they have been treated elsewhere. [133][134][135] Scaling not only merges reflections, but it also smoothes out defects in the data arising from a host of sources. These include crystal shape, deterioration, beam intensity fluctuations, and absorption effects due to solvent, glass, or fibre loop. The square root of the scaled and merged intensities become the structure amplitudes, F hkl s.

Measures of X-ray data quality
Despite the differences and cautions offered above with regard to X-ray data collection from virus crystals, assessing the quality of such data is no different than for non-virus crystals. Although there are those that claim that the relationship between model quality and data quality is uncertain, [136] intuitively, the quality of a data set should be assessed from the quality of the model derived from it, since it is doubtful that a quality model could be obtained from poor data. Although we may not be able to make an unambiguous a priori assessment of the true quality of a data set, certain statistical quantities do suggest how good it is.
The quality of a model, for example, is related to the detail seen in electron density maps, and this is dependent upon the resolution and completeness of the data. [137] Lack of completeness diminishes the effective resolution. [138] The objective should be to collect data to the diffraction limit of the crystals with as close to 100% completeness in all resolution shells as is practicable. Furthermore, the data should be as redundant as resources (i.e. beam time and suitable crystals) will allow, particularly in the highest resolution shells. High redundancy improves precision in the final averaged intensities and permits the identification of outliers through greater sampling of each reflection. It should be born in mind, however, that precision does not necessarily imply accuracy since a constant systematic error may result in high precision (low standard deviation) but inaccurate intensities.
Historically, the critical quantities that are reported with regard to data processing that have served as a summary of data quality are (1) the internal agreement (precision) of the data (expressed as R merge ); (2) the signal-to-noise ratio of the data (usually described as I/σ I , although some data processing programs report I / σ I ); (3) the completeness, (4) the redundancy, and (5) the high-resolution limit of the data set. For each of the first four quantities, two numbers are usually given, one for the whole data set and the other for the highest resolution shell. The resolution is reported as a range for the whole data set and a range for the highest resolution shell. Referees and readers can assess the quality of the data upon which the structure is based, although model statistics (i.e. R work , R free , and deviations from ideal geometry) will likewise serve as an assessment of the data quality, assuming that structure solution and refinement were carried out properly. Poor model statistics but good data statistics suggest a problem with the model. Good model statistics consistent with the resolution of the data would confirm good quality set.
It has been customary to use the internal consistency and/or the signal-to-noise ratio of the highest resolution shells in specifying the resolution limit of a data set. Redundancy and completeness have likewise been used. During data processing, arbitrary targets for these four criteria may be used to set the resolution limit. Generally, the intensities or structure amplitudes are partitioned into resolution shells or bins. When a statistic calculated for the highest resolution shell does not meet the target value of one or more of these criteria, the resolution is cut to some value lower than the resolution of that bin. For example, if R merge exceeds 0.5 or 0.6 or I/σ I is less than 2.0 in the highest resolution bin (say 2.3-2.2 Å), a resolution cut-off less than the highest resolution of the bin (2.2 Å) would be applied to the data. [136,138,139] Similarly, if the average redundancy is less than 2 or the completeness is less than 50%, a resolution cut-off might also be applied. While resolution and completeness may be the primary determinants of model quality and, hence, data quality, low-resolution data sets (which are often obtained for virus data) can be evaluated by their internal consistency and signal-to-noise ratio. Even though low-resolution data sets generally result in poorer model quality, the ability to take advantage of the icosahedral symmetry of a virus through NCS map averaging tends to produce unusually good phases and, hence, maps and models that appear to be of higher resolution than otherwise would be suggested by the nominal resolution of the data.
Wlodawer et al. [138] suggested that contemporary refinement programs that employ maximum-likelihood methods, for which it is generally recommended that no data be excluded from the refinement process, allow the use of weak data without severe consequences. Hence, it would not be detrimental to process data to the maximum resolution and base the high-resolution limit in refinement on the fit of the model to the data. Furthermore, they state that 'all reflections are very precious and should always be included, particularly at high resolution'. Under the assumption that all data are valuable, data should be processed to the highest resolution.
Ideally, data sets should have good internal agreement (precision) and high signal-to-noise ratio while maximizing the completeness, redundancy, and resolution over all resolution bins. These criteria are not independent; if high-resolution shells are discarded due to high R merge or low signal-to-noise statistics, the high-resolution limit is reduced. If weak observations or outliers are rejected to improve the signal-to-noise ratio or precision, the redundancy and possibly completeness are reduced. So, data processing involves compromise among these five criteria.
The conventional measure of the internal consistency of X-ray data has been the statistic R merge . Commonly used as a global indicator of data quality, R merge derives from the merging of all intensity measurements of a reflection and its symmetry equivalents into a single averaged value and is given by where h runs through the set of unique reflections and i runs through the set of observations (including symmetry relatives) of each reflection h in the data set. Diederichs and Karplus [140] suggested that data sets with values < 5% are classified as good, 5-10% as usable, and 10-20% as marginal, and data sets with R merge > 20% classified as questionable. Generally, it is expected that the lowest resolution shell have R merge < 5%, while the highest resolution shell should be < 50-80%. [136,138,139] As pointed out by Karplus and Diederichs, [139] the data precision statistic, R merge , and the statistic for agreement of the model to the data, R cryst , have very different characteristics. R merge at high resolution, where reflections are weakest, tends towards infinity since the numerator is dominated by noise while the denominator tends to zero. Hence, R merge should be expected to be large in the highest resolution shells. On the other hand, a value for R cryst near 0.59 is representative of a random model. [141] Thus, values of R merge should not be evaluated on a similar basis to R cryst .
Weiss and Hilgenfeld [142] and Diederichs and Karplus [140] further pointed out that R merge is inherently flawed as a global indicator of data quality because it will increase as the redundancy of the data set increases, which is somewhat counter-intuitive since more observations of an event produces a more precise description of that event. Thus, in the case of reflection intensities, the more times a reflection is measured, the more precise that measurement should be because the uncertainty or error in that measurement will decrease. Therefore, they have proposed other statistical quantities that are, so-called, redundancy independent. These include R meas (also called R rim for redundancy-independent merging R factor) given by and R pim (or precision-indicating merging R factor) given by where in each case N h is the number of observations of reflection h, and h and i are defined as for R merge . R meas is always larger than R merge but should approach R merge as the redundancy increases as shown by the tendency of the term [N h /(N h − 1)] 1/2 to approach 1 as N h increases. R pim can be considered an average value of the precision of redundant reflections and would have much smaller values than either R merge or R meas because of the [1/(N h − 1)] 1/2 terms. Although the R merge or R meas gives some idea of the precision of a data set on a global basis, and in the individual resolution shells, and allows the data to be classified as good, usable, marginal or questionable, [140] the greater utility of these statistics may be in comparing different batches or images that produced the data set. Generally during data processing, values of R merge and/or R meas are calculated for each batch or each image against the whole data set. When the value for an image is considerably different than all other images, that image becomes questionable and may warrant rejection of the image. Additionally, severe decay of the diffraction due to X-ray exposure may be identified in this manner. A researcher with a data set in hand, regardless of the value of R merge or R meas , will attempt to solve the structure. One always does the best one can with what one has. High values for these statistics, however, may prompt the researcher to pursue better crystallization conditions or to seek better crystals with the same conditions.
In assessing the high-resolution limit by signal-to-noise ratios, the traditional target for I/σ I in the highest resolution shell is ∼ 2.0. [138,139] It was pointed out by Wlodawer et al., [138] however, that 48% of the structures deposited in the PDB [65] report I/σ I of 3.0 or greater. This suggests that many structures were not determined to the maximum diffraction limit of the underlying crystals, a result of setting some arbitrary cut-off value for I/σ I such as 3.0, or not performing a preliminary analysis of the diffraction properties of the crystals to establish the parameters for optimal data collection. An arbitrary limit for I/σ I implies that a significant number of reflections with I/σ I greater than the cut-off value are being discarded and, hence, potentially useful data are lost. Until the last 20-30 years, data were often eliminated from refinement by an amplitude cut-off of F < 4.0σ F ; however, with maximum-likelihood methods the recommendation is, again, to use all data. If the rule pertains to model refinement, then it should pertain to data processing as well. Once a structure is solved and refinement is initiated, an evaluation of the quality of the data in the higher resolution shells can be assessed against the model and a more practical resolution cut-off can be applied to the processed data rather than applying an arbitrary cut-off value during data processing.
Wang and Boisvert [143] demonstrated the value of weak high-resolution reflections on structure solution and refinement. They reprocessed data for a (GroEL-KMgATP) 14 complex that had been truncated at 2.4 Å because the 2.4-2.3 Å resolution shell had I/σ I = 1.5. The data were reprocessed to a resolution of 2.0 Å. The number of reflections used in the subsequent refinement was about 40% greater than in the previous studies even though 143,333 reflections with F = 0 were excluded. The F = 0 reflections were, however, included in map calculations. The authors reported F/σ F = 1.16 in the 2.0-2.1 Å shell which equates to I/σ I = ∼ 0.58. The final R work and R free values for the new refinement were 0.243 and 0.258, respectively, compared to 0.247 and 0.283 obtained using 2.4 Å resolution data with F ≥ 2.0σ F previously reported. With the higher resolution reprocessed data, the authors were able to identify an E434A mutation, analyse probable domain motions, identify deviations from the sevenfold symmetry of the complex and nearly double the number of water molecules in the model.
A second impressive example was reported by Wang [144] involving a group II intron structure that had been truncated at 3.1 Å even though the highest resolution shell of 3.21-3.10 Å had I/σ I = 3.7. The data were reprocessed to 2.8 Å with I/σ I = 0.38 in the 2.9-2.8 Å resolution shell. The overall I/σ I was 20.7 versus 13.9 for the previous data set. The two highest resolution shells had R merge > 100%. In this case, the total unique reflections increased by 37%. The final model R factors using the new data were R work = 0.196 and R free = 0.226 to 2.8 Å, whereas the previous model gave R work = 0.276 and R free = 0.310 to 3.1 Å, respectively. As a consequence, a host of additional features emerged in difference electron density maps. These two examples support the premise, suggested by others, [136,139] that measures of precision and signal-to-noise ratio are not good arbiters of the maximum resolution of a data set.
Recently, it has been suggested that the correlation coefficient of random half sets of data, designated CC 1/2 , is a useful statistic for determining the high-resolution limit of a data set. [139,145] This statistic is calculated in the CCP4 programs SCALA and AIMLESS [136] and was recently added to the program XDS. [132,146] Unmerged data are randomly divided into two half sets for each unique reflection and the correlation coefficient is calculated between the average intensities of the reflections of the two sets. CC 1/2 is close to 1 at low resolution and falls sharply at resolutions near the high-resolution limit of the data as the data becomes weaker. [139] This is a more objective statistic since its calculation does not involve the uncertainties in the intensities that are more subjective since the estimation of σ I varies among the various data processing programs. [136,139,145] Several studies [139,147,148] suggest that data sets with highest resolution shells having CC 1/2 in the range 0.1-0.2 produce better atomic models than data sets that have been truncated to a lower resolution limit. Furthermore, the work of Karplus and Diederichs [139,147] demonstrates how the CC 1/2 statistic can be a predictor of model quality through the derived statistic which is an upper limit of the CC work for derived models. In summary, the data set that will yield the best model will be highly redundant, extend to a high-resolution limit characterized by a CC 1/2 statistic in the 0.1-0.2 range, and will be nearly complete in all resolution ranges with only randomly missing reflections. In other words, (1) merging R factors are of little value, especially in determining the high-resolution limit of a data set, [136,139,147,149] (2) strict signal-to-noise criteria discard useful data, degrading data quality and, consequently, model quality, [136,139,143,144,147,148] (3) highly redundant data sets are better than low redundancy sets, [147,149] (4) the CC 1/2 statistic is a better high-resolution limit indicator than previously used statistics, that is, merging R factors and I/σ I , [136,139,145,147] and (5) completeness in all resolution ranges is important, especially for structure solution, although incompleteness in the highest resolution shells only reduces the effective resolution of the data. [138,145,147] In the words of Evans and Murshudov, [136] 'There is no reason to suppose that cutting back the resolution of the data will improve the model. ' Occasionally, crystals may exhibit some degree of pseudosymmetry, and their X-ray diffraction patterns appear to possess higher symmetry than is actually present. Usually, however, the lower symmetry becomes evident at the scaling stage. Truly symmetry equivalent asymmetric units of reciprocal space scale together with reasonable residuals (R factors) comparable to those for protein crystals, at least at low and moderate resolution. Asymmetric units that only appear to be symmetry related do not; they yield markedly higher residuals. Thus, scaling statistics can be used to resolve some questions of space group.
Equivalent reflections having the same hkl indexes for all crystals must then be scaled and merged. The primary difference between this phase of the analysis for virus crystals in contrast to most protein crystals is the sheer number of observations and independent structure amplitudes. A second difference is that the weaker average intensity associated with virus data means that every reflection is less precisely determined and carries a greater error. As a consequence, particularly at higher resolution, statistical measures are generally inferior to those for protein crystals with smaller unit cells.

Determining the orientation of virus particles in the unit cell
Often a symmetry axis, or multiple symmetry axes of an icosahedral virus will be coincident with space group symmetry axes. If a single twofold axis of the particle coincides with a crystallographic twofold axis, then the asymmetric unit of the crystal will be 1/2 the particle or 30 units. For a threefold axis the asymmetric unit will be 1/3 of the particle, or 20 units. No icosahedral virus can possess a screw axis symmetry element of any order, so crystallographic screw axes can only relate entire particles in the unit cell and never result directly in an asymmetric unit of a fraction of a particle.
Icosahedral viruses may also be centred at a symmetry point, 23, 32 or 222, whenever they exist in the space group of a crystal. These special positions give rise to asymmetric units of 1/12, 1/6, and 1/4 (5, 10 and 15 units, respectively) of a particle. For a virus lying on a crystallographic dyad, knowing the exact orientation requires determination of its rotational angle about that axis. The same holds true if it lies on a threefold axis as well. If the unit cell has a unique origin fixed by crystallographic symmetry elements, then the position along the dyad or triad axis must also be determined.
If the virus particle is centred on a 23, 32, or 222 symmetry point, then its position in the cell is fixed, as is its orientation (with an ambiguity in the case of 222). A particle, however, might lie on a twofold or threefold axis of a unit cell having a special symmetry point, but not be centred at that special position. In such a case, the position of the particle centre with respect to the special position origin must be determined. In the case of a particle centred at a 222 special position, there are two orientations of the particle that are consistent with the crystallographic symmetry. This choice must be resolved before the precise orientation of the virus can be specified.
In general, calculation of a rotation function [150,151] can resolve all questions of rotation about any axis. Because the particle has 5 3 2 symmetry, in the Eulerian rotation function, only the χ sections at 180°, 120°, and 72°need be calculated and inspected. Peaks indicating the dispositions of the 5-, 3-, and 2-fold axes are usually clear. When a rotation function is calculated, it is important that the X-ray data be complete in sampling all of reciprocal space. It is less important that weaker data are included, strong reflections alone may be adequate. Resolution too is secondary, as answers can emerge from relatively low-resolution rotation functions of 6-8 Å. It is essential that sectors of data in reciprocal space not be omitted or absent, as that may lead to erroneous rotation function results.
Once rotation angles have been determined, the only question that remains, and only for some cases, is the position of the virus centre. This also does not present a difficult problem. Packing considerations taking into account the diameter of the roughly spherical particle provide a good starting point. The particle (generally the chosen probe model, see below), in the correct orientation, can be incrementally translated in each unrestricted direction and an R factor calculated based on observed low-resolution data. By using only low, > 10 Å structure amplitudes, a good estimate of the virus centre can be obtained. Both the coordinates of the virus centre and the angles defining the orientation of the virus, obtained from the rotation function, can then be refined precisely using higher resolution data.
It is absolutely essential that at the end of this analysis the orientation and position of the icosahedral particle be defined with precision. All subsequent operations and procedures such as isomorphous heavy atom position determination, phase extension, electron density averaging, and coordinate refinement will be completely dependent on its accuracy and precision.

Probe models
Crystal structures of proteins that are homologous to others of known structure are currently solved using molecular replacement, [150][151][152] and de novo structures now solved using phases based on anomalous dispersion measurements, and to a lesser extent traditional isomorphous replacement. Virus structures are not usually solved using exactly these techniques, though isomorphous replacement still has its place, and virus phasing does, at least initially (see above), use a kind of molecular substitution to obtain starting phases. With virus crystals one begins by obtaining estimates for the phases of low-resolution (>10 Å) reflections, and then, taking advantage of the symmetry within the crystallographic asymmetric unit to extend the phases to higher resolution. [153,154] The process is abetted by the high solvent volume of the crystals that allow effective solvent flattening as well. [155,156] A more detailed description of the phase determination procedure for particles having high NCS, especially viruses, is presented in an article by V. Reddy in preparation for a future issue of Crystallography Reviews.
To begin, however, estimates of low-resolution phases must be obtained. To accomplish this, some model, hopefully one that resembles (the closer the better) the unknown crystalline virus, is placed in the unit cell in the correct orientation and at the correct position, as determined above. Phases are then calculated from the model, and these are then used as the starting phases in a subsequent 'boot strap' series of procedures.
Fortunately for virus crystallographers today, a lot of advantages exist. First of all, we know that different virus species within a family closely resemble one another, particularly at low resolution. For example, TYMV and desmodium yellow mottle virus (DYMV), two tymoviruses, are almost indistinguishable, [157] as are Cowpea chlorotic mottle virus (CCMV) and BMV of the bromovirus family. Thus, if the structure of another member of the same virus family is available as a model, it is almost certain to suffice. Even if no family member is available, the amino acid sequences of the coat proteins of viruses whose structures are known can be searched for maximum amino acid identity and homology with the amino acid sequence of the crystalline virus. The VIPER data base (http://viperdb.scripps.edu/) now contains well over 250 unique virus structures that have been precisely determined by X-ray diffraction. Even more models are available, though to lesser precision, based on cryo-electron microscopy, and their amino acid sequences are also known.
It is remarkable how little identity there must be between virus amino acid sequences in order for their three-dimensional structures to serve as adequate probe models for initial phase estimation. In the structure solution for PMV, for example, a particularly difficult problem because there were two entire virus particles in the asymmetric unit (360 protein subunits), a model based on cocksfoot mottle virus (CfMV) having only about 20% amino acid identity was successful. Though it was from a different virus family, phases based on its known structure were adequate as initial phases for PMV. [46] This weak dependence on amino acid identity is undoubtedly due to the strong preservation of three-dimensional structure within the coat proteins of almost all icosahedral plant viruses. There are other classes of viruses having coat proteins of different structures including those that have large amounts of alpha helix, [25] a single jelly roll β-barrel, [158] a double jelly roll β-barrel, [159] and HK97 fold, [8,86,160] but they can usually be identified by the amino acid sequences.
Probe models based on homologous structures can also be improved before use in initial phase calculations. Differences in homologous coat proteins tend to occur as amino acid replacements, deletions or insertions, and found in extended polypeptide loops that project away from the core of secondary structure. Often these loops, being less representative of the actual virus, are simply eliminated from the model. Features that are not likely to be common between the probe model and the unknown protein, such as metal ions like Ca ++ , should be eliminated from the probe. One can try to make appropriate amino acid substitutions from the probe model to conform to the correct amino acid sequence, but it appears unlikely that this is worth the effort. Pruning, however, may be useful. Obviously, the better the starting phases, the fewer the difficulties that will subsequently be encountered.
Although an X-ray structure-based probe model is to be preferred, such a structure may not always be available. It may be preferable then to choose a probe model of less precise structure similar to the unknown, than to use a model that is precise but does not resemble the unknown. In those cases a model based on cryo-electron microscopy may prove the best choice (see the article in preparation for a future issue of Crystallography Reviews by Veesler and Johnson). Cryo-electron microscopy models are generally of lower resolution ( > 8 Å, though some are much better) compared with X-ray structures, but since only low-resolution phases are required, they may prove adequate, and have in a number of analyses. [107] Additional comments on the choice of probes may be found in the article by V. Reddy in preparation for a future issue of Crystallography Reviews.
It may be appropriate to say a few words here regarding low-resolution X-ray reflections and their value in analyses. Low-resolution reflections are generally strong, and they play the major role in defining the envelope of the virus and those spaces within the unit cell occupied by solvent. In the initial stages of phase determination that utilize chiefly low-resolution reflections, these data are of especially high significance in generating starting phases. Thus, it is wise to measure them with care and to make sure as few of them as possible are lost in data collection.

Heavy atoms and molecular replacement
Although extension from phases based on homologous models are generally successful in virus structure determination, it is not always the case. For a truly novel virus that may have no obvious homologues of known structure, when no adequate probe model can be found, or when, for whatever reason phase extension fails, then it is necessary to resort to traditional methods. In practice this means relying on isomorphous replacement using heavy atoms. [161,162] It is also true that isomorphous replacement phases are almost always better starting points for phase extension than those obtained from a model, and furthermore they allow that extension to be started at a higher resolution.
Isomorphous replacement was used to obtain phases for the earliest virus structures that were determined [158,159,163] and it has proven successful for many viruses that followed. In principle it operates with virus crystals in exactly the same way as with protein crystals. The target protein, the virus coat protein is no different than other proteins in composition and Figure 23. Shown here is a trimer of the A, B, and C protein subunits of the virus PMV with a difference Fourier map of the orthochloro mercuriphenol heavy atom derivative superimposed. The large peaks represent the binding sites of mercury atoms. Interestingly, and puzzling, even though the three different subunit conformers have the same chemical composition, the heavy atom compound binds to all 5 of the A subunits, but only the B subunits of the pseudo-hexameric B and C subunits.
it generally contains cysteine, methionine, and histidine residues that are susceptible to reaction with mercury, platinum, and silver compounds among others. They often have sites for binding lanthanides (which can replace divalent metal ions such as Mg ++ and Ca ++ ), uranyl compounds, and other heavy atom containing organic molecules that may be attracted to low-affinity sites. [13,164] The major difference between virus heavy atom substitution and most proteins arises from the fact that the crystallographic asymmetric units of virus crystals contain at least five and usually many more (see above) coat protein subunits. As a consequence, even when a highly selective heavy atom compound is identified that has only one or a very few reaction sites on the coat protein, there will be many substitution sites within the crystallographic asymmetric unit. The more substitution sites there are, generally the more difficult it is to determine their coordinates. Heavy atom sites are usually determined by Patterson techniques (though increasingly by direct methods), and these become increasingly complex (almost by the square) and difficult to interpret as the number of sites increases. Thus, virus crystals invariably present a complicated Patterson puzzle with heavy atoms.
The good news with virus crystals, of course, is that the multiple substitution sites within the crystallographic asymmetric unit are related by icosahedral symmetry, and that symmetry is known precisely from earlier work (see above). This makes it possible, at least in principle, to identify Patterson solutions consistent with symmetry considerations, and indeed this has been put into practice. The problem of finding multiple heavy atom substitution sites has also been reduced by new approaches to analysing Patterson maps using automated procedures, [165][166][167] and by the application of direct methods that have proven remarkably successful in identifying sites even when in great numbers. [168][169][170] Another approach that does not require Patterson interpretation has also proven very useful. When phase extension from initial low-resolution phases based on a probe model fails for whatever reason, as it did, for example, with STMV, [55] it does not necessarily mean that the initial phases are worthless. While failing to extend to higher resolution, the phases may be adequate for calculating low-resolution F HA(obs) − F nat(obs) difference Fourier maps on the putative heavy atom derivatives. The difference Fourier maps can then be icosahedrally averaged within the crystallographic asymmetric unit to directly reveal the heavy atom binding sites. The important point is that even marginal low-resolution phases when combined with the NCS can provide a powerful way to locate the heavy atom sites. An example where this was done, on crystalline PMV, is shown in Figure 23.
Heavy atom parameters can subsequently be refined using conventional Blow-Crick refinement, [162,171] or some corresponding algorithm, at the maximum resolution of the heavy atom derivative data. Conventional heavy atom refinement has, in recent years, largely been supplanted by maximum-likelihood approaches. [172] From the refined parameters and observed differences, phases can be determined. An advantage of these phases over those obtained from a probe model is that they suffer no model bias, that is, the resulting structure will not reflect the structure of the probe. Once heavy atom parameters have been obtained for one heavy atom derivative, then phases based on that derivative can be used alone, or in combination with probe model phases, to locate the sites for other potential derivatives. This is again done with difference Fourier syntheses.

A general overview of structure determination
The fundamental difference between structure determination of a conventional macromolecular crystal and a virus crystal is that, in the latter, the use of NCS, density modification, and symmetry averaging provide the essential means for phase determination and improvement. [153,154,167,173,174] To see how this is applied in practice, let us assume as a starting point that the orientation of the virus particles in the unit cell has been specified and the directions of the icosahedral axes known. Assume also that a set of initial low-resolution phases have been obtained from a probe model appropriately placed in the cell, or alternately, from isomorphous replacement. The next step is to define an envelope for the capsid that excludes the exterior solvent and may exclude the interior cavity of the virus. The envelope may be the shape of the probe model, or it may be a spherical shell the approximate thickness of the protein capsid. Selection, or definition of this is not necessarily straightforward, and all subsequent phasing, solvent flattening, and electron density averaging operations depend upon its quality. It is sometimes necessary, therefore, to try different envelopes to achieve success in obtaining accurate phases. This is apparent if one considers the pronounced variations of the exterior shapes of several different T = 3 viruses like those in Figure 24. A similar problem exists, but probably to a lesser extent, on the inside of a virus particle, since density modification is applied there as well. The inside surfaces, however, are more consistently smooth and uniform. An example of a capsid shell, for PMV, is shown in Figure 25.
An electron density map is calculated using the observed structure amplitudes (measured experimentally) and phases taken from the structure factors calculated from the model probe. This map will be at low resolution, perhaps at 8 Å or 10 Å. Two operations are then carried out as one. The electron density map is solvent flattened outside the envelope (both inside and outside the particle). That is, the density outside the envelope is set to some uniform low value. The solvent-flattened map is then averaged about each of the icosahedral NCS operators. [153,154,175] Presumably this map better represents reality and contains information not present in the initial map. Using the solvent flattened, averaged electron density map, new phases at the same resolution are calculated. These phases are an improvement over the earlier phases. The process is then repeated until no further changes are evident and can be monitored by calculating R factors or correlation coefficients between the observed data and the backtransformed structure amplitudes and also by the phase changes for each cycle, especially in the highest resolution bins.
At this point another set of structure factors are calculated from the solvent-flattened and symmetry-averaged electron density map, but this time the resolution of the calculated structure factors is increased by some increment , usually about one reciprocal lattice point. Phases from the slightly higher resolution F hkl−calc. are then applied to the experimentally recorded structure amplitudes and a new hybrid map is calculated at the slightly extended resolution. Again, cycles of solvent flattening, map averaging, and recalculation of maps is carried out. At the end of What is noteworthy is that each would require an envelope definition that is quite different from those of the others. This reveals the rather critical need to find a probe, or a known, closely related virus structure, that provides a suitable initial model and envelope for icosahedral averaging and solvent levelling during the course of phase extension from low to high resolution.
an appropriate number of cycles at the fixed resolution, the resolution is extended slightly in reciprocal space and the procedure continued. The word 'slightly' appears repeatedly here to caution that the extension in reciprocal space must be slow and measured. If the increment is too large, then the procedure will falter. Patience here is a great virtue. It also may well be that must be evaluated by trial and error to obtain the best result. Like the envelope, is a critical parameter.
If fully successful then, ultimately in the resolution range of 4-3 Å, an electron density map is obtained that allows identification of polypeptide chain or some secondary structural features such as beta sheet or alpha helix. The averaging envelope should be analysed against the model to assure that it fully covers it (e.g. no loops are missing due to the envelope boundary) or that the envelope is not excessively large (tight envelopes are the most effective phase restraints under solvent flattening). At this point, structure factors and phases can be calculated from the model built into the electron density map and combined with experimental phases to produce further improved maps. One also has the option of improving the model by more conventional approaches, such as 2F obs -F calc Fourier maps, or even continuing phase extension with modelenhanced phases at each increase in resolution.
Artificial particles, or VLPs, can prove useful in improving the precision of a virus model in some cases, as can an independent, parallel structure determination of crystals of the coat protein.
For example, the T = 3 virus BMV, though crystallized in numerous unit cells, never produced diffraction intensities much beyond about 3.5 Å resolution ( Figure 22). Nevertheless, data at this marginal resolution allowed delineation of the capsid. Later, however, a T = 1 particle was made Figure 25. This diagram shows the variation in PMV capsid thickness and form and further illustrates the problem in defining a suitable envelope for phase determination. The interior space of the virion contains nucleic acid, which may be semi-ordered in some cases. This would, in principle, require different treatment in levelling than the exterior of the shell, which would be solvent. by cleaving the amino terminal tail from the capsid protein, and crystals of these 19 nm diameter particles diffracted to about 2.9 Å. Using X-ray data from the reduced particle crystals lead to far better definition of the capsid protein, which could then be assembled according to the earlier T = 3, lower resolution structure. [32,52,121]

Refinement of virus crystal structures
The principle feature of virus crystal structure refinement that discriminates it from more conventional protein structure refinement [176][177][178][179] is the high degree of NCS from as little as 5-fold symmetry for some cubic crystals to as high as 60 fold, or even to multiples of 60 if there is more than one particle in the crystallographic asymmetric unit. Because the number of reflections available for refinement is a function of the crystallographic asymmetric unit size, but the number of independent atoms to be refined is determined by the size of the icosahedral asymmetric unit, the observation to parameter ratio is usually quite favourable. In the case of fully constrained NCS refinement, the ratio is very high and a substantially greater degree of precision is possible than for most protein crystals. For constrained refinement of STMV using 1.4 Å resolution X-ray data, for example, there were 570,721 independent reflections for 13,624 atomic parameters. [82] Temperature factors are generally refined in viruses, as well as positional parameters, as they are in more conventional protein structures. As with the coordinates, NCS constraints or restraints must be applied to B factors as well. In general, temperature factors in virus structures have been treated isotropically, as resolution has not really permitted otherwise. B factors for virus coat proteins tend to be no higher than are observed for proteins in general, reflecting their sturdy construction and extensive intersubunit contacts. As with protein structures, the variation in B factors is mainly indicative of the dynamics or flexibility of local regions, for example, loops and mobile termini ( Figure 26). In STMV, however, 1.4 Å resolution data did permit anisotropic temperature factors to be refined. [82] The application of fully constrained NCS refinement for virus structures assumes that the icosahedral symmetry of the virus is exact and that it is rigorously maintained when in the crystal. There is reason to doubt that this is strictly true. Crystallographic symmetry is inviolable, but the icosahedral symmetry of a virion is a consequence of biological considerations. If all protein subunits have the same amino acid sequence, and the particle is truly isotropic, as well as its chemical environment, then it should be true. On the other hand, we know that as virus particles become larger they also become softer and more deformable. Deformations from perfect sphericity might also occur when the particles are packed in a crystal lattice.
Most importantly, in a crystal lattice, local surface areas, and therefore individual protein subunits, are exposed to different environments. Some subunits may be entirely exposed to solvent and participate in no inter-particle contacts. Extended loops, or individual amino acid residues of other otherwise identical protein subunits, on the other hand may be in close contact, or form significant interactions with those on subunits of neighbouring particles in the lattice. There is no reason, therefore, to expect that NCS holds absolutely at the molecular level.
At modest resolutions of around 3 Å, or lower, it is probably safest to refine the virus structure using fully constrained refinement and exact icosahedral symmetry. This improves the observation to parameter ratio in a range where that is needed, and it represents a sound and conservative approach. At high resolution, and T = 1 particles have commonly been refined to beyond 2 Å Bragg spacings, most evidence suggests that restrained refinement is more appropriate. With restrained refinement exact icosahedral symmetry is not imposed, but the atoms within different subunits are allowed to deviate slightly (dependent on the nature and strengths of the restraints) from the mean position for all equivalent atoms. If very high-resolution data are available, say Figure 27. All water molecules that are firmly bound to either the protein or the nucleic acid of a hemisphere of STMV are shown here. The waters are colour coded according to their role in the structure [181] and make up a significant portion of the entire virus structure. a b Figure 28. Both anions and cations are frequently found bound to the protein surfaces of viruses, and they utilize a variety of different ligands and modes of coordination. In (a) is a phosphate ion bound on a fivefold axis of STMV. In (b) is a calcium ion bound on a threefold axis of PMV. The ligands in (a) are the amides of 10 symmetrically disposed asparagine side chains. In (b) the calcium coordination involves six aspartic acid side groups. beyond 1.5 Å resolution, then it may be justifiable to refine anisotropic temperature factors for some or all atoms as well.
An important feature of a virus model is the structure of the shell of ordered water molecules associated with the virus. [77,78,180] This does not include the bulk water in the crystal interstices, or at the centre of the particles, but those waters that have fixed positions by virtue of hydrogen bonding interactions with the virus. In the case of STMV, for example, it was found that ordered water molecules amounted to between 10% and 15% of the total mass of the capsid. Identifying and placing the water molecules in electron density maps is time consuming and demanding. This is particularly true with high NCS, as we find in virus crystals, where redundancies in solvent structure are difficult to detect. Nevertheless, because of their sheer number and extent, water molecules must be included. Figure 27 shows the distribution of structural water molecules in STMV. Many viruses also have ions incorporated into their structures. These may be cations, such as Mg ++ or Ca ++ , which are common, [6,18,19,50] or anions, as in STMV. [82] Figure 28 provides examples. These too must be identified and introduced into the model. Discriminating ions from waters, it should be noted, is frequently a challenge.
High-resolution refinement, at 1.4 Å, of at least one virus, STMV, [82] shows that as many as 30% of the amino acid side chains, principally surface residues, on the capsid proteins have alternate, or even multiple conformations. It is unlikely that amino acids on the different subunits coordinate or synchronize their conformations. This suggests that the surface features of individual viruses are constantly changing, and that no two virus particles ever appear exactly the same. It further requires that refinement, if observations allow, must take these alternate conformations into account.
Viruses, of course, contain nucleic acid, which may be single-or double-stranded RNA or DNA. The expectation is that the absence of icosahedral symmetry within the nucleic acid structure will render it invisible in electron density maps. As a consequence, although we acknowledge its presence, we usually cannot model it in molecular terms. It is physically 60fold averaged when crystallization occurs, even if it is identically organized in all particles in the crystal. Currently, the only way to treat it in refinement is with something similar to a bulk solvent correction, and this is probably inadequate in most cases.
In a number of virus particles whose structures were solved by X-ray crystallography, fragments of nucleic acid, generally double helical, [182][183][184][185] but not always, [46,83,157,186] appeared in electron density maps. Figure 29 shows an example. This was because the inside of the capsid displayed sequence independent, but secondary structure-specific nucleic acid binding sites that were consistent with the icosahedral symmetry of the virion. Double helical segments of otherwise single-stranded RNA have twofold axes perpendicular to their helical axes, and in some viruses those dyads were observed to share twofold symmetry axes with the capsid and with the crystallographic symmetry elements. In some other cases (Figure 30), single strands of RNA were seen filling the cavities within pentameric or hexameric capsomeres. [182] A measure of the degree to which segments of nucleic acid are ordered within a virion and to what extent they conform to the icosahedral symmetry of the particle is the distribution of temperature factors for the atoms in the segment. Nucleic acid structures, in general, have significantly higher temperature factors than do proteins, but nevertheless the variation among the nucleotides can be telling. In STMV, for example, helical segments bound to the interior of the protein exhibited a steep gradient. The nucleotides bound near the icosahedral dyad and in close contact with protein had unusually low B factors ( Figure 31), but these increased as the segment became less ordered, or less consistent with the icosahedral symmetry. At the visible termini of the segment the temperature factors were over 150 Å 2 .  The nucleic acid must, when it is evident in the structure, be modelled and refined along with the coat protein, water and ions. This adds some complication to the refinement process, particularly when the symmetry axis of a sequence generic nucleic acid helix lies on a symmetry axis of the particle. It has also been observed that subunits having a specific conformation (e.g. the A subunits of a T = 3 particle) may bind nucleic acid while others do not. Protein and nucleic acids require different geometrical parameter dictionaries and the application of different restraints.
If there is one structural property of viruses that continues to remain a mystery, it is the conformation of RNA and DNA within the virion. Thus, observations of nucleic acid that provide clues are always of special interest and deserve particular attention when a virus structure is refined. In searching for nucleic acid segments, or the shadows of disordered pieces, it may be useful to take further advantage of the fact that experimentally determined phases for many viruses are so accurate that, in some cases, model phases are best traded in their favour.
An example is TYMV, a T = 3 plant virus. It is somewhat unique in that empty virions, devoid of RNA, are also produced naturally during infection. The full and empty virions can be separated by centrifugation and crystallized isomorphically. Crystals of each were solved independently by phase extension producing corresponding sets of observed structure amplitudes plus their experimentally determined phases. It was then possible to calculate F nat -F empty difference Fourier syntheses using structure factors determined exclusively experimentally. These maps were entirely free of model influence and displayed (unlike conventional difference Fourier syntheses using phases calculated from the native model) difference density belonging exclusively to ordered nucleic acid. By using such a structure factor difference Fourier synthesis, a substantial portion of the structured RNA in TYMV was visualized. [83] In cases where crystallographic symmetry is low, the symmetry of the icosahedral virus often is seen as pseudosymmetry in the diffraction pattern. Even in cases where it is not obvious to the eye, statistical searches such as the rotation function can detect it. This implies that there is a correlation among intensities in the diffraction pattern due to symmetry in the structure. The eye and the rotation functions, however, examine only the structure amplitudes or intensities. If correlations can be detected among intensities, then it follows that correlations must also exist among phases as well, and of course they do. In principle, the relationship between phases of correlated structure amplitudes, as in direct methods, could provide additional phase information to a structure analysis. So far, however, this does not appear to have been explored or utilized.
The redundancy of structure that exists in real space due to icosahedral NCS has an exact analogue in reciprocal space, as is true of all properties of real and reciprocal space, though they may not always be obvious or intuitive. In the case of icosahedral viruses this means that there is a correlation between the magnitudes and phases of independent structure factors, F hkl , that otherwise would not exist in the absence of NCS. [174] In practice, this means that when a virus model is refined, F hkl s in an isolated test set participate indirectly through correlated structure amplitudes. This occurs regardless of the approach used (nonlinear least squares, maximum likelihood, etc.) when general F hkl s are employed to minimize the working R factor, or an equivalent residual. It follows, then, that the R free is not fully independent of R and is also minimized as R is reduced.
The correlation among structure factors is seen in practice by the tendency of the R free to converge to a value close to R for many virus structures that have been refined with icosahedral constraints. Hence, one must be circumspect in evaluating the quality of refinements carried out on icosahedral virus crystals and not place more weight on the value of R free than it truly merits. The histogram in Figure 32 shows the distribution of R and R free for the virus structures in the PDB. The average difference between R and R free for all of the virus structures is significantly less than the average for most protein structures in the corresponding resolution ranges. Figure 33. As with protein structures, a reliable and valuable way to assess the quality of a virus structure determination and its refinement is the Ramachandran plot that examines the distribution of phi and psi torsion angles for each amino acid. The plot shown here is for DYMV.
The problem of the correlation of R and R free can be obviated to a great extent by judicious selection of the structure amplitude test set. [187] The correlation between structure factors due to NCS exists only among reflections having the same sin θ . If test sets are chosen that include all reflections within narrowly defined (thin) shells of resolution, then no correlations will be present among the reflections actually used for refinement and the test set. If this approach is applied, then the R free will not track R as refinement progresses. It should be noted that in addition to statistical residuals, other measures of model quality such as Ramachandran plots ( Figure 33) are equally appropriate for virus structures as for protein structures.
It has been suggested that averaging of structure using the high NCS of icosahedral viruses effectively increases the resolution of virus electron density maps. This is erroneous. Although it may be true that the use of high NCS enhances the contrast, or the level of detail, or improves the general interpretability, the true resolution of a Fourier synthesis remains limited by the maximum Bragg angle of the structure factors included in the synthesis. Resolution, of course, is strictly defined as the distance at which two scattering centres appear as two individual points. The perception that the resolution has been increased by NCS averaging in virus electron density maps arises from the pleasant fact that the phases obtained by NCS averaging and density modification for icosahedral viruses are very good in comparison with the experimental phases obtained for most protein crystals using other phase determination approaches.
Averaging using high NCS and phase extension [175] with flattening outside of a capsid envelope [156] as is currently employed for virus structure determination produces phases of inordinate accuracy, with lower average error than would probably be achieved with isomorphous replacement or even anomalous dispersion methods. It is the enhanced phase quality that produces the more detailed Fourier maps (Figure 34), but the resolution is not increased.
Criticism has, in some instances, been directed at the frequent low completion for some virus crystals of recorded data in higher resolution shells of reciprocal space. This is usually a consequence of the general problems in collecting X-ray data from virus crystals as described above. The argument is made, for example, that 25% completion in the 3.0-2.8 Å resolution shell does not permit the claim of 2.8 Å resolution electron density maps calculated from those data. The true resolution is less. There is some truth to this, and no one would argue that a better electron density map would not be obtained if 100% of the data in the 3.0-2.8 Å resolution shell were available. It is not necessarily true, however, that the resolution is reduced significantly by only partial completion. This is again because redundancy and correlation of structure in real space is reflected as correlations between structure amplitudes and their phases in reciprocal space. Thus, if the asymmetric unit of the crystal was, for example, a quarter of a virion, implying 15-fold redundancy in real space, then 25% of the X-ray data in any resolution shell would represent an adequate sampling of reciprocal space. Professor McPherson has written several books on macromolecular crystallization, and the analysis of macromolecular crystal structures by X-ray diffraction. He has taught in the Cold Spring Harbor course on X-ray crystallography of biological molecules for 28 years. He was also principal American investigator for macromolecular crystallization on the US Space Shuttle, Russian Space Station, and International Space Station NASA programmes. His principal interests are the study of enzyme and immunoglobulin structure, and the structures of viruses using the techniques of X-ray crystallography and atomic force microscopy.  45 Step bunches, 16 Step edge movement, 20

Steven B. Larson was born in the USA (California
Step edges, 15