Structure at 2.3 A resolution of the gene 5 product of bacteriophage fd: a DNA unwinding protein.

Abstract The structure of the gene 5 DNA unwinding protein from bacteriophage fd has been determined by X-ray diffraction analysis of single crystals to 2.3 A resolution using six isomorphous heavy-atom derivatives. The essentially globular monomer appears to consist of three secondary structural elements, a radically twisted three-stranded antiparallel β sheet and two distinct anti-parallel β loops, which are joined by short segments of extended polypeptide chain. The molecule contains no α-helix. A long groove, or arch, 30 A in length is formed by the underside of the twisted β sheet and one of the two β ribbons. We believe this groove to be the DNA binding region, and this is supported by the assignment of residues on its surface implicated in binding by solution studies. These residues include several aromatic amino acids which may intercalate or stack upon the bases of the DNA. Two monomers are maintained as a dimer by the very close interaction of symmetry related β ribbons about the molecular dyad. About six residues at the amino and carboxyl terminus are in extended conformation and both seem to exhibit some degree of disorder. The amimo-terminal methionine is the locus for binding the platinum heavy-atom derivatives and tyrosine 26 for attachment of the major iodine substituent.

The structure of the gene ii DNA unwinding protein from bacteriophage fd has been determined by X-ray diffraction analysis of single crystals to 2.3 A resolution using six isomorphous heavy-atom derivatives.
The essentially globular monomer appears to consist of three secondary structural elements, a radically twisted three-stranded antiparallel j sheet and two distinct antiparallel /I loops, which are joined by short, segments of extended polypeptide chain. The molecule contains no u-helix. A long groove, or arch, 30 A in lengtll is formed by the underside of the twisted p sheet and one of the two j3 ribbons. We believe this groove to be the DNA binding region, and this is supported by the assignment of residues on its surface implicated in binding by solution studies. These residues include several aromatic amino acids which may intercalate or stack upon the bases of the DNA. Two monomers are maintained as a dimer by the very close interaction of symmetry related /3 ribbons about the molecular dyad. About six residues at the amino and carboxyl terminus are in extended conformation and both seem to exhibit some degree of disorder. The amino-terminal methionine is the locus for binding the platinum heavy-atom derivatives and tyrosine 26 for attachment of the major iodine substituent.

Introduction
The gene 5 product of bacteriophage fd is a small DNA binding protein of 9800 molecular weight containing 87 amino acids (Alberts et al., 1972 ;Oey & Knippers, 1972). its primary physiological role is the stabilization and protection of singlestrand DNA daughter virions from duplex formation following replication in the host (Salstrom & Pratt, 1971). Because of the highly co-operative nature of its binding to DNA, it further has the capacity to unwind or destabilize native DNA. This can be seen as a reduction by nearly 40 deg. C of the melting temperature for ET AL.
double-stranded DNA in the presence of the gene 5 product (Salstrom $ Pratt, 1971). The protein is made in about 100.000 copies per infected Escherichia coli cell and can be isolated in substantial amounts by DNA cellulose chromatography. Its sequence (see Fig. 1) has been determined (Nakashima et al., 1974) and extensive biochemical and biophysical characterization has been carried out (Coleman et al., 1976;Anderson et al., 1975;Day, 1973;Pretorius et al., 1975). Evidence from these studies indicates electrostatic interactions between basic residues of the protein and the phosphate groups of the polynucleotide backbone are involved in binding, and that aromatic residues are likely to intercalate or stack upon the purine and pyrimidine bases during complex formation (Coleman et aZ., 1976;Pretorius et al., 1975). 1. Amino acid sequence of the gene 5 protein (from Nakashima et al., 1974).
Electron microscopy studies on the gene 5 protein (Alberts et al., 1972;Pratt et al., 1974), which exists predominantly as a dimeric species in solution (Cavalieri et al., 1976), suggest that the protein binds to DNA strands running in opposite directions so that it would crosslink opposing strands of a duplex or opposite sides of circu1a.r single strands of DNA. The mechanism for DNA duplex unwinding is simply a linear aggregation along the two opposite strands deriving from the highly co-operative nature of the lateral binding interaction (Dunker & Anderson, 1975). This high degree of co-operativity presumably is a product of strong protein-protein forces between adjacent molecules of the gene 5 protein along the DNA strands.
Approximately two years ago we were fortunate in being able to obtain this protein in a form suitable for single-crystal X-ray diffraction analysis (McPherson et al., 1976). Since these crystals marked the first time that a truly DNA interactive protein had presented itself for such analysis we began a three-dimensional study. This we hoped would allow us, both from the native structure and from crystals of gene 5-DNA complexes, which we have now obtained (unpublished results), to deduce with high precision the atomic interactions which mediate the non-specific recognition and binding of DNA by protein molecules.
The crystals are of monoclinic space group C2 with one gene 5 monomer per asymmetric unit, immediately implying that the gene 5 dimer contains a perfect molecular dyad axis relating its two halves. The unit cell dimensions are a = 76.5, b = 28.0, c = 42.5 A and p = 108". The resolution of the diffraction data extends to at least 1.2 A Bragg spacings, and radiation damage to the crystals appears negligible for up to 60 hours or more.
We report the structure determination of these crystals using the isomorphous replacement technique and describe some of the initially interesting features of the molecule. We will defer a detailed description of the orientations and interactions of t,he amino acid side-chains until some degree of refinement has been completed. to the mother liquor. The iodine derivative was produced by direct addition of 5 InM-KI and fi m&f-I, to crystals for a period of 5 days before data collection. The heavya,tom derivatives search was conducted using Buerger precession cameras with a crystal t,o film distance of 90 mm. The X-ray source was an Elliott rotating anode generator operated at 40 kV and 40 mA with a focal spot, size of 200 pm 2. The X-radiation was nickel filtered (luK, with 18 11 exposures at 25°C.

Methods
The photographic trials were converted to sets of integrated intensities via an Optronics P-1000 high-speed rotating-drum microdensitomoter using a 100.pm raster size and intrarfaced directly to a PDPI l/40 computer. Three-dimensional X-ray diffraction data were collected on a Picker FACS-I diffractometer fitted with a 1600 W Phillips fine focus X-ray tube to 2.3 A using the step scan mode (Wyckoff et al., 1967). Friedel pairs were recorded at & 20 for all compounds and, in general, t,he 3700 independent reflections could be obtained from 1 or at most 2 crystals. Tn additiorr, each data set, was collected twice on different cryst)als and averaged with ET AL. merging residuals of no more than 3.5% on F. Errors were measured from counting statistics (Arndt & Willis, 1966). Scaling of derivative to native structure amplitudes was carried out in shells of sin20/h2 and the R factors for each compound used in isomorphous replacement are shown in  , 1960) and is shown in Fig. 4. The major and minor iodine sites were readily found from difference Fourier maps as peaks 2 to 4 times background level and were confirmed by the corresponding difference Patterson syntheses.
Addltlonal difference Fourier rounds using all compounds and various combinations permitted location of the minor sites for all derivatives. While the degree of substitution and the isomorphism of the derivatives appears to be quite good, the overall distribution of heavy-atom sites for the gene 5 crystals is rather poor in that most of the positions lie clustered very close to z :--0.00 and z -== 0.00. Thus we do not expect that the quality of phasing by these derivatives is as good as might be expected if they were distributed at completely general sites. The iodine derivative shows obvious signs of non-isomorphism and was not used beyond 3.0 A resolution. The phase angles for the native structure were calculated using the isomorphous phasing method of Dickerson et al. (1961) and the heavy-atom parameters were refined with the error treatment of Blow & Crick (1959) to the values seen in Table 2. The program employed for the calculations was that written by Rossmann and his colleagues (Adams et al., 1969). Phase angles were calculated at 5" intervals and cycles of phasing and least-squares minimization of lack of closure were alternated.
The refinement was carried out in shells of sin t? and the relevant statistics are shown in Table 3. Anomalous dispersion contributions were incorporated in the phasing for all derivatives except iodine as described by North (1965). The figure of merit and refinement, residuals as a function of resolution are shown in Table 4 and the distribution of terms as a function of figure of merit in Table 5.
The native Fourier for gene 5 protein was calculated on planes xz using the program GENFOUR written by George Reeke at Rockefeller University. The map was contoured using a program written by Peter Campbell-Smith at Hershey and displayed on a Calcomp plotting device interfaced directly to a PDP 11/40 computer. The sections were then contoured onto acetate sheets and displayed in the conventlional manner for visualization.
The electron density map of the gene 5 DNA unwinding protein is of good quality and from our experience compares favourably with those used to interpret a number of other macromolecular structures.
One group of map sections containing t'he density of a strand of a p sheet running in the x2 plane and a part of the /l loop running more or less perpendicular to the sections is shown in Fig. 5 as an example of the map quality. The density map was interpreted in terms of the polypeptide backbone by examination of 9 in x 9 in mini maps. These were then photographed, projected and drawn onto 1 m x 1 m Mylar sheets and these used in a Richards Optical comparator (Richards, 1968) to construct a trial model of the structure with Kendrew model parts.

Results and Discussion
The monomer, whose gross shape is seen in the wooden model of Figure 6, is essentially globular with a protruding appendage of density lying close to the molecu.. lar dyad. It is roughly 45 A long, 25 A wide and 30 L% high. The entire course of the polypeptide chain derived from our present 2.3 a electron density map is shown in the stereo photographs of Figures 7 and 8. The molecular dimer which exists in solution must have the form shown in Figure 9. The known sequence of the protein has been fitted to the tracing of the polypeptide with no serious inconsistencies. There are three short regions of the chain which either contain portions of weak density or appear to show some measure of disorder. The bend of the /I loop containing residues 22 to 27 is seen to be somewhat disperse. This bend, however, freely protrudes into the solvent-filled intersticies and might well be expected to show some positional flexibility even in the crystal. The other two sequences in question are 1 to 7 at the N terminus and 82 to 87 at the C terminus, both of which are partly in solvent regions. The N-terminal sequence appears to occupy either of two positions displaced slightly from one another. Those amino acids forming the C terminus simply show some degree of smearing out that we interpret as small statistical variations about a mean position. The tracing illustrated in Figures 7 and 8, therefore, is stated with a substantial degree of confidence.
The large peak at z = 0.06, z = 0.16 correlates exactly with that found in the isomorphous difference Patterson.
The protein is composed entirely of antiparallel /3 structure and short lengths of extended chain with no u-helix whatsoever. This is consistant with predictions based on physical-chemical data (Day, 1973) and sequence-structure rules Coleman et al., 1976). There are three basic elements of secondary structure that form the framework of the molecule, a three-stranded antiparallel /3 sheet (I) arising from residues 12 to 49, a two-stranded antiparallel /z? ribbon (II) formedby residues 50 to 70, and a second two-stranded antiparallel j? ribbon (III) derived from residues 71 to 82. The arrangement of these elements as they occur in tracing from NH, to COOH terminus, and presumably the order in which they appear as the protein folds after synthesis, is shown in the series of drawings in Figures 10 and 11. The three-stranded sheet (I) emphasized in Figure 12 is not at all a flat surface but, as characteristic of small /? sheets with short strands, is radically twisted. This severe distortion of the sheet produces a concavity or corridor along the underside of the molecule and is seen in the gross density as an overhanging ledge or archway. The /I ribbons (II, III) emphasized in Figure 13 are also extremely twisted in character so that a precise assignment of the hydrogen bonding scheme may be somewhat tenuous and must await refinement of the atomic positions.
The /3 ribbon (II) lying close to the molecular dyad and at an angle of about 120" t,o the p sheet has two effects on the distribution of density. First, it enhances the Fra. 4. Section y = 0.00 of the difference Patterson synthesis of Pt(NH,),Br,-K2Re0, derivatives using as coefficients ( 1 F,, I-IF,, I)". The large negative peak near the lower right corner corresponds exactly to the vector between the platinum site and the rhenium position on the dyad axis. This peak, approximately 3 times more negative than any other on the map servos to confirm both the platinum and rhenium co-ordinates.
tunnel-like nature of the concavity running beneath the sheet by creating a cradle at the point where the /3 ribbon most closely passes that of the /3 sheet. In addition, it results in a deep indentation in the monomer at the point where the interior edge of the /I ribbon passes closest to the edge of the @ sheet. The long groove beneath the three-stranded p sheet (I), by its shape and extent (~30 A) would suggest it to be the DNA binding region. There is no other passage through the density that appears to be consistent with a long polynucleotide binding site. This contention is supported by the particular amino acid side-chains that form its surface. These include tyrosine 26, phenylalanine 13, cysteine 32 and tyrosines 41 and 56, all of which have been implicated in the binding of DNA by nuclear magnetic resonance and chemical modification studies (Coleman et al., 1976;Anderson et al., 1975;Day, 1973;Pretorius et al., 1975). In addition, the sheet is comprised of residues from the N&terminal half of the poIypeptide, again the portion of the sequence implicated in binding by these studies (Coleman et al., 1976).
Tyrosine 41, tyrosine 34 and phenylalanine 13 appear to stack upon one another at, one end of the binding groove and look to be in position to interact with two consecutive bases of the bound DNA. Tyrosine 56 projects into the cradle region and is in position to stack upon bases of the polynucleotides as well. Tyrosine 26, on the other hand, is at the other end of the groove and is not paired with any other aromatic residues. There do not appear to be any additional aromatic side-chains extending into the binding region.
The single cysteine residue 33 is a part of the second strand of the p sheet (I) and therefore is at the edge of the molecule and part of the DNA binding groove. The side-chain, however, is turned up inside the density and sequestered in this conformation of the molecule. The adjacent tyrosine 34 is protected rather well from the solvent by other residues consistent with its incapacity to be chemically modified . It does not appear to be in close contact with the sulphur of cysteine 33 but does seem to be interleaved between tyrosine 41 and phenylalanine 13. Our tentative assignment of the initial two thirds of the polypeptide chain was confirmed by the location of the major iodine binding site. This falls less than two angstroms from our choice for tyrosine 26, reported by Coleman et al. (1976) to be one of the three exposed tyrosine residues. Indeed it does occur on the underside of the three-stranded sheet (III) near the turn involving strands 1 and 2 and it is fully exposed.       The minor iodine site is very close to the major rhenium site and does not fall near a tyrosine residue. Because ReOF also binds there, this site may represent a nonspecific anion binding site in the molecule. Alternatively, this site is very near histidine 64, and this amino acid can also be iodinated.
The second major concavity in the electron density mass at the angle between the p ribbon (II) and the three-stranded /I sheet (I) and seen best in Figure 7 is not an extended groove, and is furthermore not a candidate in the dimer for DNA binding.  Because of the molecular dyad relating the two monomers in the dimer, the cavity is entirely filled by the symmetry related density of the j3 ribbon (II'). This symmetrical association between two dyad related loops II and II' is undoubtedly the interaction that maintains the dimer as the predominant species in solution. The actual contact. area would seem to involve, at least to some extent, a large number of hydrophobic residues, although other kinds of side-chains are present as well. The amino acid sequence 54 to 59, Pro-Ala-Tyr-Ala-Pro-Gly, forms part of one strand of loop II closest to the dyad. Since this sequence would not be expected to readily form /? structure it is unclear how much 6 character to ascribe to loop II. Loop II also contains the single histidine residue 64 which seems to have the imidazole group directed back into the central density of the loop. It too is unreactive in the crystal toward heavy metal compounds. FIG. 9. A drawing of the 2 gene 5 polypeptide chains that comprise the dimer showing their association about the molecular dyad through interaction of the symmetry related p loops II and II'. Coleman et al. (1976) pointed out that the gene 5 monomer contains a high proportion of hydrophobic residues and, because of its small size, an appreciable number of these must be exposed to solvent. The dimer interaction, however, will cause many of these hydrophobic residues to be shielded from solvent that would otherwise be exposed. This may be part of the driving force that maintains the dimer in solution.
Since additional residues will likely be shielded from solvent by lateral interactions when binding occurs along the DNA strands, this same mechanism may also be largely responsible for the highly co-operative nature of the protein binding. The second /l ribbon (III) lies almost directly above and diagonally across the threestranded sheet (I) ; it is, however, separated for the most part from the DNA binding region by the sheet. The bend is formed from residues 75 to 78, Ser-Leu-Met-Ile. The non-terminal methionine, although apparently accessible to solvent, could not be reacted with heavy metal reagents in the crystal. This seems to result from the close juxtaposition of the bend with electron density from a second molecule in the unit cell with concomitant obstruction to small compounds. We believe that the /3 loop III and the residues of the carboxy-terminal region could be primarily involved in the protein-protein interactions responsible for the co-operative. contiguous binding along the DNA strands.
The final six residues at the COOH terminus are in extended conformation but, coiled back into the central mass of the molecule. The electron density corresponding t,o these amino acids is the most difficult in the map to interpret and suggests some disorder in this part of the polypeptide chain. The initial ten residues of the polypeptide chain are also in extended conformation and wind from the amino-terminal methionine over the top of the molecule and enter strand 1 of the fl sheet (I). The sulphur atom of the methionine 1 lies only about 5 A from the molecular dyad and is t,he residue responsible for binding the platinum heavy-atom derivatives.
The electron density map shows that the N terminus occupies either of two positions which are separated by about 5 A. Thus we see in the map two large density peaks for the sulphur but in each case they are joined back to the polypeptide chain at lysine 3 by two separate branches of density assigned to the two alternate positions for 11~2. This FIG. 11. A schematic drawing of the gene 5 secondary structural rlcments arranged in space when viewed along the z direction.
FIU. 12. Stereophotograph of the polypeptide backbone as in Fig. 8 with the 3.stranded fi sheet (I) emphasized. disorder caused by two alternate positions in the crystal for the first three residues also explains the secondary sites for the platinum derivatives.
The gene 5 protein would seem, in structure-function terms, to be a very economical molecule, and we believe we can tentatively assign one of its three major functional responsibilities to each of its three prominent secondary structural elements. The three-stranded antiparallel /I sheet (I) is primarily responsible for binding to and interacting with the DNA. The two-stranded antiparallel ribbon (II) maintains the dimer in solution by tightly interlocking with a symmetry related loop (II'). The second p ribbon (III) we believe to be involved in the co-operative protein interactions with neighbouring molecules. Thus, it would appear that in the gene 5 protein, we find the minimum amount of structure expended to fulfil the required functional demands. FIG. 13. Stereophotograph of the polypeptide backbone as in Fig. 7 with the 2 antiparallcl loops II and III emphasized.
The molecular packing is such that there are some extensive solvent regions separating dimers from each other and channels through the crystal as well. The molecules, however, are stacked very closely along the ?J direction by both translation and as a result of the 2, screw axis at x = l/4, z = l/2. This has the net effect of filling or blocking bhe presumptive DNA binding groove with density from neighbouring molecules, thereby explaining our inability to bind short oligonucleotides, from one to four in length, to the protein as it exists in these crystals. Thus it does not appear that we will be able to determine the structure of gene 5-oligonucleotide complexes by simple diffusion and difference Fourier synthesis.