Ty3, a yeast retrotransposon associated with tRNA genes, has homology to animal retroviruses

Ty3, a retrotransposon of Saccharomyces cerevisiae, is found within 20 base pairs (bp) of the 5' ends of different tRNA genes. Determination of the complete nucleotide sequence of one Ty3 retrotransposon (Ty3-2) shows that the element is composed of an internal domain 4,748 bp long flanked by long terminal repeats of the 340-bp sigma element. Three open reading frames (ORFs) longer than 100 codons are present in the sense strand. The first ORF, TYA3, encodes a protein with a motif found in the nucleic acid-binding protein of retroviruses. The second ORF, TYB3, has homology to retroviral pol genes. The deduced amino acid sequence of the reverse transcriptase domain shows the greatest similarity to Drosophila retrotransposon 17.6, with 43% identical residues. The inferred order of functional domains within TYB3--protease, reverse transcriptase, and endonuclease--resembles the order in Drosophila element 17.6 and in animal retroviruses but is different from that found in yeast elements Ty1 and Ty2. A second Ty3 element (Ty3-1) from a standard laboratory strain was overexpressed and shown to transpose.

The genome of Saccharomyces cerevisiae contains at least three families of retrotransposons: Tyl, Ty2 (reviewed in references 24 and 54), and Ty3 (9). Retrotransposons are mobile genetic elements that transpose through an RNA intermediate and resemble retroviruses except for the apparent absence of an extracellular phase. The first retrotransposon identified in S. cerevisiae was named Ty, for transposable element in yeast (5). Two closely related forms of that element have been characterized and designated Tyl and Ty2 (8,35,47,88). Tyl and Ty2 are similar in nucleotide and amino acid sequences but have two large regions of heterogeneity which were first demonstrated by heteroduplex analysis (47). Tyl and Ty2 have an internal domain called epsilon which is 5.3 kilobase pairs (kbp) long. Epsilon is flanked by direct repeats of delta elements, which are 332 to 338 base pairs (bp) long (22,29). Transcription initiates in the upstream delta long terminal repeat (LTR), and the signal for polyadenylation occurs in the second LTR, downstream of the initiation site sequence (19). Thus, an almost full-length transcript with redundant termini is generated. Tyl and Ty2 insertions can influence the transcription of neighboring genes positively (21,89) or negatively (6,18,67). The epsilon region contains two overlapping open reading frames (ORFs), which have similarity to the retroviral gag and pol genes. Boeke and co-workers (2) showed that Tyl transposition is dependent on Tyl transcription and that the transposition occurs through an RNA intermediate. These results demonstrated the functional similarity of retrotransposons to retroviruses.
Ty3 is a more recently discovered element (9). It consists of a 4.7-kbp internal domain flanked by direct repeats of sigma elements, each 340 bp long. Characterization of one Ty3 element, now designated Ty3-1, showed that it has the following retroviruslike features: (i) flanking direct repeats of the insertion site sequence, (ii) LTR sequences which terminate in conserved inverted repeats, (iii) a potential primer-binding site for minus-strand DNA synthesis, (iv) a purine-rich region potentially involved in plus-strand DNA * Corresponding author. synthesis, and (v) an almost full-length transcript with redundant termini. Isolated sigma elements are found exclusively 16 to 19 bp upstream from the 5' ends of tRNA genes (4,13,69,70). This association is also seen for Ty3-1, which is found 16 bp upstream of a tRNACYS gene. One possible explanation for the isolated sigma elements next to tRNA genes, then, is that they are the end products of LTR-LTR recombination events which occurred after position-specific Ty3 retrotransposition.
Retrotransposons have organizational and structural features in common with the provirus form of retroviruses (reviewed in references 27 and 85). Retrotransposons encode proteins analogous to the group-specific antigens (gag) and the pol gene polyprotein of retroviruses. The retroviral gag gene is defined by the first ORF and encodes proteins that interact with the viral RNA to form a nucleocapsid. Viruslike particles have been isolated for copia, a Drosophila retrotransposon (74), and Tyl (30,55). In the case of Tyl, the first ORF, TYA, was shown to encode proteins that make up the nucleocapsid (1,60). The retrovirus pol gene encodes a polyprotein with several enzymatic domains, including a reverse transcriptase (polymerase and RNase H) and an endonuclease. A protease-coding sequence occurs in gag in avian retroviruses but is found at the beginning of the pol gene in most other retroviruses. The reverse transcriptase polymerase and RNase H activities catalyze DNA synthesis from the RNA template and degradation of the RNA portion of the resulting DNA-RNA heteroduplex, respectively. The second ORF of Tyl and Ty2, TYB, encodes a polyprotein which has domains corresponding to the retroviral protease, endonuclease, and reverse transcriptase, in that order. Typically, the first and second ORFs of retroviruses and retrotransposons either overlap or are separated by a stop codon. In systems in which protein expression has been examined, translation of proteins encoded in the pol gene or in the second ORF of retrotransposons has been shown to be dependent on readthrough from the first ORF. This readthrough is mediated by nonsense suppression or frameshifting (reviewed in reference 12). A significant difference between retroviruses and retrotransposons is that the latter 5245 do not encode envelope proteins and appear to lack the extracellular phase of the retroviral life cycle. The term retroid element will be used to refer to repeated elements and viruses which replicate through a reverse transcriptasemediated process (27).
Here we report the complete nucleotide sequence of Ty3-2. Comparison of the deduced amino acid sequences of the ORFs suggests that Ty3 is more closely related to 17.6 and retroviruses than to the other yeast retrotransposons, Tyl and Ty2. To examine the ability of Ty3 to transpose, Ty3-1 and Ty3-2 were overexpressed in a galactose-inducible system. The results of this experiment showed that whereas  is capable of transposition,  is not.

MATERIALS AND METHODS
Recombinant DNA manipulations. Bacterial culture conditions and recombinant DNA manipulations were as previously described (9), unless otherwise noted. Chromosomal DNA from S. cerevisiae AB972 (9) was digested with EcoRI and fractionated on a 1% agarose gel. DNA fragments 5.5 to 7.5 kbp long were isolated from low-gelling-temperature agarose (Bio-Rad Laboratories), subcloned into the EcoRI site of pIB121 (International Biotechnologies, Inc. [IBI]), and transformed into Escherichia coli HB101 (F-hsdS20 rB mB-recA13 leuB6 ara-14 proA2 lacYl galK2 rpsL20 [Sm'] xyl-5 mtl-1 supE44 X-). Plasmid pIBI21 contains a cloning region downstream of the bacteriophage T7 promoter and the IBI primer sequence, which is followed by the M13 primer sequence. The bacteriophage fl origin of replication is present in pIBI21 and allows production of singlestranded plasmid DNA when bacteria transformed with pIBI21 are superinfected with the helper phage M13K07 (IBI). Transformants containing the Ty3-2 insert were identified by colony hybridization (33) by using a radiolabeled internal restriction fragment as a probe. This plasmid is designated pTy3-2. The Ty3 cloned previously (pSBS12; 9) is contained on a HindIl-EcoRI restriction fragment ligated into the HindIII and EcoRI sites of pIBI20 and is designated pTy3-1.
To facilitate studies of Ty3 transposition, the GAL1-10 upstream activating sequence (UAS) was fused upstream of the TATAA sequences of Ty3-1 and Ty3-2. In step 1 of the construction, a 276-bp AluI fragment (nucleotide positions 123 to 398 in sigma; 9) containing the Ty3 TATAA sequences and transcription start site was subcloned into the SmaI site of the pIBI20 polylinker. This construct was cleaved with SalI in the polylinker and with XhoI in the downstream end of the sigma element to produce a sigma promoter fragment with XhoI-compatible ends. This fragment was then inserted into the XhoI site downstream of the GAL1-10 UAS in a derivative of pHZ18 (78) to produce pALG28. The HindIll-XhoI fragment from pALG28 containing the yeast URA3 gene and the GAL1-10 UAS-sigma promoter fusion was cloned into a site in pTy3-1 or pTy3-2 created by complete HindIII digestion and partial XhoI digestion. Screening by restriction digestion of these constructs identified plasmids pGTy3-1 and pGTy3-2, in which the GALl-10 UAS sigma promoter is fused to the Ty3-1 or Ty3-2 internal domain. pGTy3-1 and pGTy3-2 were converted to high-copy yeast vectors, pEGTy3-1 and pEGTy3-2, respectively, by insertion of the 2.2-kbp EcoRI fragment from the yeast 2,m episome.
Nucleotide sequencing strategy. The directed-deletion strategy of Henikoff (37) was used to create overlapping subclones for sequence analysis of the noncoding strand of Ty3-2 by the dideoxy-chain termination method (71). Singlestranded templates were made by superinfection of the pTy3-containing E. coli NM522 (A [lac-proAB] thi Ahsd-5 sup-S supE [F' proAB lacIqZ AM15]) with the helper phage M13K07 and used for the sequence reactions.
The restriction enzymes AluI, RsaI, and Sau3A, which cut frequently within the Ty3-2 sequence, were used to make random subclones of suitable length for sequence analysis of the coding strand. Small-scale preparations of this DNA were obtained with the boiling method of Holmes and Quigley (39). Sequences of these inserts were determined from double-stranded plasmids in polymerase reactions primed with the M13 universal or IBI reverse sequencing primer (34) and by using the dideoxy-chain termination method (71). All sequence analyses used the sequenase enzyme (United States Biochemical Corp.) and [35S]dATP (1,000 Ci/mmol; Amersham Corp.). Six synthetic oligonucleotides (Operon Technologies, Inc.) which hybridize to Ty3-2 sequences were used to allow analysis of remaining regions. The nucleotide sequence was compiled, edited, and translated by using the DNA sequence analysis programs of A. Goldin and G. Gutman (University of California, Irvine). Comparisons with the Genbank Nucleic Acid Data Base and the National Biomedical Research Foundation Protein Data Base were made with the use of the University of Wisconsin Genetics Computer Group programs on a VAX computer (14). The amino acid comparisons of the reverse transcriptase and endonuclease were generated with the progressivealignment programs of Feng and Doolittle (23) on a VAX computer.
Transposition. Yeast strains were cultured by standard methodology (73). Strains yVB109 (MATa Atrpl-901 ura3-52 his3-200 ade2-101 lys2-1 leul-12 canl-100) and yVB110 (V. W. Bilanchone, personal communication) were transformed with plasmid pEGTy3-1 or pEGTy3-2 by a modification of the procedure of Ito et al. (42). Strain yVB110 is an isogenic derivative of yVB109 from which the three endogenous Ty3 elements were serially deleted. Three strains from which different Ty3 elements were deleted were obtained by URA3 disruption of individual Ty3 elements, followed by selection on 5-fluoro-orotic acid-containing medium for colonies with the ura3 mutation. Strains containing multiple Ty3 deletions were derived by standard genetics (Bilanchone, manuscript in preparation). yVB110 transformants containing the inducible plasmids were selected on synthetic complete medium minus uracil on the basis of the uracil prototrophy conferred by the plasmid. yVB110 transformants were streaked onto the same medium or onto medium with 2% galactose substituted for glucose as a carbon source and incubated at 23 or 30°C for 10 days. At the end of that time, 10 colonies from each condition were streaked for single colonies on rich medium (1% yeast extract, 2% peptone, 2% glucose). Fifty isolates representing each original colony were patched onto 5-fluoro-orotic acid-containing medium to select cells which had lost the URA3 plasmid marker (3). These cells were streaked onto nitrocellulose filters on rich medium and grown for about 14 h. Filters were processed and hybridized as described previously (73). The probe was a radiolabeled fragment from the internal domain of Ty3. Strain yVB109 was similarly transformed and grown under inducing and noninducing conditions. Single colonies were isolated and cured of the plasmid as described above. Genomic DNA for Southern blot analysis was prepared by the method of Boeke et al. (2). DNA for the analysis displayed in Fig. 5  pEGTy3-1 transformant. We screened 2 DNA preparations, each representing g colonies. Genomic DNA from these po isolated, digested with EcoRI, fractionate sis in 0.8% agarose buffered in TBE (2.5 n borate, 133 mM Tris hydrochloride, pH 8. to nitrocellulose by the method of Southe bound DNA was hybridized, and filter! previously described (9) and exposed to I in the presence of a Cronex Quanta-IlI (Du Pont Co.) at -70°C overnight. DNA and analyzed in the same way from i which contributed to pools that showec rearrangements. In this way, clonal isola which are homogeneous with respect to new Ty3-hybridizing fragment. DNAs fr were analyzed by hybridization to probes (see Fig. 5, legend).

RESULTS
Isolation of Ty3-2 from S. cerevisiae originally identified in a lambda clone I AB972 and subcloned on a 6.5-kbp Hindl into the E. coli vector pIBI20 (IBI) 4 analysis of genomic DNA from AB972 wi internal region of the cloned Ty3-1 ele presence of a second Ty3 element. Th Ty3-2, occurs on an EcoRI fragment 6.5 Ty3-2, EcoRI-digested AB972 genomic long was inserted into the EcoRl siteor transformed into E. coli HB101. Positir identified by colony hybridization tointernal domain of Ty3-1. These plasri expected 6.5-kbp inserts. The IBI plasn Ty3-1 and Ty3-2 clones are referred pTy3-2, respectively. Plasmids pTy3-1 and pTy3-2 were anal enzyme digestion and subsequent fracti( gel electrophoresis. Figure 1 shows a resulting restriction maps of Ty3-1 and elements will be referred to as the 5' or: B A A X A BgR identify each with respect to the direction of Ty3 transcription (9). Overall, Ty3-1 and Ty3-2 have similar restriction patterns for the enzymes used in this analysis. There are some differences, however. Ty3-1 has an additional BglII site; Ty3-2 contains two additional PstI sites and one extra A A X C R BstEII site. Ty3-2 is also somewhat longer than and .l l-l this difference appears to be accounted for by the presence The positions of the 5' ends and polyadenylation sites of AB972. Ty3-1 was the Ty3 5.2-kb transcripts are heterogeneous, with major from S. cerevisiae endpoints within the sigma element ( Fig. 2; 9). These tran-[II-EcoRI fragment script endpoints were determined with poly(A) RNA ob- (9). Southern blot tained from the same strain background which was the ith probes from the source of the Ty3-1 and Ty3-2 elements. iids contained the The Ty3-1 and Ty3-2 transcripts contain a potential primaids containing the er-binding site for minus-strand DNA synthesis and a purineto as pTy3-1 and rich region implicated in plus-strand DNA synthesis (80,85).
TCCIGAAAATCACCCCA.AATTTACTATTCGCCCAAACTACAACATTCGATCATACAGTACATCAGGACCTGC 3920 CTTTAGACAACTCTACCAAACACAAGACAACCCTGAGAGAGGAAGAGTGTTGTCTC9AAAATGAGATA 5110   ORF analysis of Ty3-2. The nucleotide sequence of Ty3-2 was translated in all three reading frames by using the DNA programs of A. Goldin and G. Gutman (University of California, Irvine). Three ORFs longer than 100 codons occur in the 5.2-kb RNA strand. These will be referred to as TYA3-2, TYB3-2, and ORF3. These gene names were chosen to be consistent with yeast nomenclature and to indicate functional similarity with the TYA and TYB genes from Tyl and Ty2 but to distinguish the source. The protein sequences predicted from these ORFs are shown in Fig. 2, below the nucleotide sequence. A computer-generated ORF analysis of Ty3-2 is displayed in Fig. 3 (48). TYA3-2 extends to the first terminator, 290 codons downstream. The second ORF, TYB3-2, extends from nucleotides 1248 to 4910 and overlaps TYA3-2 by 13 codons in the plus-one frame. The first methionine codon in TYB3-2 does not occur until nucleotide positions 1362 to 1364, 39 codons downstream of the beginning of the ORF. This ATG does occur in an acceptable context for translation initiation, but its position in the ORF suggests that initiation may not occur here. It is also possible that the initiator codon is supplied on a spliced 5' terminus. Nevertheless, no yeast consensus splice sequences (GTAPyGT;TACTAAC...AG; 50, 63, 77) are found in Ty3-2. These observations and comparisons with other retroid elements (see Discussion) suggest that a plus-one translational frameshift is required for expression of TYB3-2.
ORF3 begins at nucleotide 4805 and ends at nucleotide 5134, 15 codons inside the 3' sigma element. The first methionine is encoded 49 codons into ORF3, at nucleotide positions 4949 to 4951, and does not conform to the consensus context for initiation. TYB3-2 and ORF3 overlap by 36 codons, with ORF3 in the minus-one frame (the same frame as TYA3) with respect to TYB3-2. The significance of ORF3 is not known. Sequence analysis of Ty3-1 in the region of the TYB3-2-ORF3 overlap shows that Ty3-1 contains a singlebase insertion compared with Ty3-2 (unpublished data). If other reading frame differences do not occur between Ty3-1 and Ty3-2, then TYB3-1 would extend through the sequence which constitutes Ty3-2 ORF3. Further sequence analysis is required to explore this possibility.
Comparison of the amino acid sequences of Ty3-2 ORFs with those of proteins encoded by other retroid elements. Protein sequences predicted from TYA3-2, TYB3-2, and ORF3 nucleotide sequences were used to search the National Biomedical Research Foundation Protein Data Base (version 13) to identify sequence similarities between predicted Ty3-2 proteins and previously described proteins. The portion of retroid elements that encodes reverse transcriptase is characteristically the most conserved domain (52). The data base search revealed that a region between codons 348 and 632 of TYB3-2 has 43% identity with the polymerase domain of reverse transcriptase encoded by 17.6, a Drosophila retrotransposon. Inspection of TYA3-2 and TYB3-2 for conserved motifs found in retroid elements resulted in identification of domains with similarities to previously described nucleic acid-binding proteins, proteases, and endonucleases, in addition to reverse transcriptases. To determine the relatedness of Ty3-encoded proteins to those encoded by other retroid elements, the Ty3 ORFs were partitioned into putative functional domains based on previous comparisons (43,52,82) and compared with proteins from nine other retroid elements. The results of these comparisons are displayed in Fig. 4.
Nucleic acid-binding protein. The retroviral gag gene encodes the structural proteins of the nucleocapsid. The se-FIG. 2. Nucleotide sequence of Ty3-2 and associated tRNA gene and amino acid sequence deduced from ORFs. The Ty3-2 nucleotide sequence is shown above the predicted amino acid sequence. Numbering of the nucleotide sequence begins at the first base of the upstream sigma element. Orientation is the same as in Fig. 1. Arrows above the ends of the sigma elements indicate the 8-bp perfect inverted repeats. Potential TATAA sequences (solid lines) and upstream pheromone control sequences (49) (broken lines) are boxed. The 5' ends of the 5.2-kb transcript are indicated by vertical bars over the sequence; the 3' ends are indicated by horizontal bars. Predicted minus-and plus-strand primer regions are underlined. Arrows over the internal domain indicate the position of the 78-bp direct repeat which occurs in Ty3-2. The copy of this repeat, which is missing in Ty3-1, is indicated by the dashed arrow (unpublished data). The presumed initiator ATG codon in TYA3-2 is shaded. The sequence of the tRNAn' gene downstream of the Ty3-2 3' sigma element is boxed. The deduced amino acid sequences of the three ORFs longer than 100 codons are shown below the nucleotide sequence in the single-letter code. Numbering of the amino acids begins with the putative initiation methionine for TYA3-2 and with the first amino acid encoded in TYB3-2 and ORF3. Brackets indicate the amino acid sequences displayed in the alignments in Fig. 4 and also the RNase H domain shown by Johnson et al. (43), which is not shown in an alignment in Fig. 4. Amino acid residues which are conserved among the sequences compared in Fig. 4 (11). The (a) indicates the first of two metal finger motifs which occur in most retroviruses. (B) Conserved residues from retroid element proteases and cellular aspartyl proteases are compared with amino acids 53 to 68 predicted from the TYB3-2 sequence. Similar alignments of retrovirus proteases have been shown previously (46,62,90). Alignments of predicted retrotransposon protease sequences with aspartyl protease5 were shown previously (59,82,83). (C) The polymerase domains of the reverse transcriptases from nine retroid elements are compared with amino acids specified by codons 348 to 632 of TYB3-2. Alignments of retroid and nonretroid polymerase sequences have been described previously (43, 45,  Biol., in press). (D) The endonuclease domains from seven retroid elements are compared with amino acids specified by codons 847 to 1040 of TYB3-2. Alignments of retroid element endonuclease sequences have also been described previously (43, 52, 59; Doolittle et al., in press). Polymerase and endonuclease sequences were displayed and printed by using the MULPUB program (23). Boldface circles indicate lines of the Ty3 sequence. Asterisks indicate residues conserved among all of the protein sequences in the comparison. The cellular protease sequences are human pepsin (76), bovine chymosin (25), penicillopepsin (40), and mouse (61) and human (38) renins. Abbreviations and sources of predicted retroid element protein sequences: HIV, human immunodeficiency virus (65); IAPm, mouse intracisternal type A particle (56); MMTV, mouse mammary tumor virus (58); MLV (75); RSV, Rous sarcoma virus (72); CMV, cauliflower mosaic virus (26); Cop, copia (59); 17.6 (68); CERV, carnation etched-ring virus (41); BLV, bovine leukemia virus (66); Tyl (8).
quences of these proteins are not highly conserved among retroviruses, except for a small domain in the nucleocapsid protein which has been suggested to mediate nucleic acid binding (10,11,15,36). This region is the carboxy-terminal portion of the gag polyprotein and consists of cysteine and histidine residues arranged in a C-X2-C-X4-H-X4-C motif. This sequence is repeated in retroviruses, with the exception of Moloney murine leukemia virus (MLV) (75), but occurs once, if at all, in retrotransposons. The sequences of Ty3-1 and Ty3-2 are identical in the downstream end of the first ORF and predict that the protein made from TY3A contains one copy of the nucleic acid-binding motif. The protein predicted from the copia DNA sequence also contains this short sequence, but Tyl, Ty2, and 17.6 proteins do not. A translated sequence from TYA3 containing this motif is aligned in Fig. 4A with similar regions of other retroid elements.
Protease. The protease encoded by retroviruses is required for processing the polyprotein precursors to mature proteins (87,91). More recently, a region in TYB with similarity to this protease was demonstrated to be required for processing Tyl polyproteins (1,60,92). Retroid element proteases contain a highly conserved hexapeptide, (hydrophobic residue)2-D-T/S-G-A/S, which is also found at the active sites of aspartyl proteases (46,62,82,90). Thus, the retrovirus protease is hypothesized to be an aspartyl protease distantly related to these cellular proteases. The predicted protein sequence of TYB3-2 contains the conserved activesite hexapeptide close to its amino terminus. This region from the Ty3-2 protein is compared with proteases from other retroid elements in Fig. 4B. In the TYB3 protease sequence, unlike the protease sequences of most other elements, phenylalanine occurs immediately before D-S-G and serine follows it. These positions in the predicted Ty3-2 protease sequence resemble conserved positions in the cellular aspartyl proteases.
Reverse transcriptase: polymerase and RNase H. Retroviral reverse transcriptase is encoded downstream of the protease. Johnson et al. (43) showed that domains with homology to E. coli polymerase and RNase H proteins occur, in that order, within retroviral reverse transcriptases and are likely to be responsible for the synthetic and nucleolytic activities of reverse transcriptase. Domains with similarity to the polymerase and RNase H domains of other retroid elements are predicted in the TYB3 protein. Figure 4C shows a comparison of the polymerase domain of the Ty3-2 reverse transcriptase With the polymerases encoded by other retroid elements. As noted above, in this region, Ty3-2 is most similar to 17.6. Within the polymerase domain, Ty3-2 and 17.6 have 43% identical residues in an alignment which requires introduction of only four gaps. The next highest similarities are with two plant DNA viruses which replicate through RNA intermediates, cauliflower mosaic virus (26) and carnation etched-ring virus (41), and with MLV (75). The polymerase domains of these retroid elements have 36, 33, and 26% identity, respectively, with the polymerase of Ty3-2. Weaker similarity is found between the Ty3-2 polymerase and those encoded by Tyl and copia, 12 and 11%, respectively.
The protein sequence predicted from TYB3-2 immediately downstream of the region that codes for the polymerase has similarity to conserved positions of RNase H from different retroid elements (43). Strictly conserved positions noted by that group are shaded in the TYB3-2 protein sequence presented in Fig. 2. The tether region separating polymerase and RNase H domains in the retroviral protein sequence (43) is absent in the protein sequence predicted for Ty3. Comparison of the RNase H region from the Ty3 protein with other retroid elements showed less conservation of the protein sequence overall than did the polymerase comparison (unpublished data).
Endonuclease. The endonuclease domain is less conserved among retroid elements than the reverse transcriptase domain. Nevertheless, a protein sequence is predicted from TYB3-2 which has distinct similarity to these endonucleases. Figure 4D shows an alignment of sequences from known retrovirus endonuclease domains with similar regions predicted from Ty3 and other retroid elements. Six residues are conserved among the eight retroid elements compared in this alignment. As noted in a similar alignment previously described by Johnson et al. (43), the occurrence in these endonuclease sequences of a pair of histidines, followed after 20 to 30 residues by a conserved pair of cysteines, is reminiscent of the metal fingers of some DNA-binding proteins. Johnson et al. (43) suggested that this domain mediates interactions between the endonuclease and substrate DNA. The occurrence of these conserved residues in the Ty3-2 protein sequence suggests that some aspects of the Ty3 insertion mechanism are in common with those of other integrating retroid elements.
De novo transposition of Ty3. Nucleotide sequence analysis showed that Ty3 has the structural properties of a plasmids containing galactose-inducible Ty3-1 and Ty3-2 sequences were constructed as described in Materials and Methods and are shown generalized as pEGTy3. The yeast sequences marked are as follows: GALI-JO UAS, solid; sigma elements, lines with stippling; Ty3 internal domain, dark stippling; 2,um plasmid replication sequences, thin lines; and URA3 sequences, thick lines. E. coli sequences, ori and AmpR, are shown as open spaces. The scale is approximate. (B and C) Autoradiograph of Southern blot analysis of genomic DNA from yVB109 clonal isolates which underwent Ty3 rearrangements. The leftmost lane shows migration of lambda DNA digested with HindIII and 32p labeled with T4 polynucleotide kinase. The numbers on the left indicate molecular sizes in kilobases (Kb). Panel B shows hybridization with a Ty3 internal domain-specific probe; panel C shows hybridization with a sigma-specific probe (see Materials and Methods). Lanes: 1, hybridization of genomic DNA from the pVB109 host strain; 2 to 9, hybridization of genomic DNAs from eight clonal isolates with Ty3 rearrangements. The samples shown in B and C were fractionated on the same agarose gel. This autoradiograph was produced by exposure of the hybridized blot to XAR-5 film overnight at room temperature. retrotransposon, including a sequence which encodes a protein with similarity to reverse transcriptase. Nevertheless, it was not known whether Ty3-1 or Ty3-2 (or both) could  frequency when Tyl sequences were fused to the GALI-JO promoter, and Tyl transcription was induced on galactosecontaining medium. We applied a similar strategy to induce high levels of Ty3 transcription.
To increase Ty3 transcription, high-copy yeast plasmids were constructed which contain the GALI-10 UAS (44) fused to a position in sigma upstream of the putative Ty3-1 or Ty3-2 TATAA sequences (Fig. SA). The plasmids were designated pEGTy3-1 and pEGTy3-2, for the Ty3-1-and Ty3-2-derived internal domains, respectively. This construction preserves the Ty3 transcription start site and should result in transcripts with the same 5' and 3' ends as those generated from endogenous Ty3 elements. Cells containing either of these plasmids and grown on synthetic complete medium containing galactose as the carbon source have high levels of a 5.2-kb Ty3 transcript (unpublished data). Because most tRNA isoacceptors in S. cerevisiae are encoded by redundant gene families, no selectable phenotype was anticipated, even if Ty3 transposed. In addition, it was desirable to monitor transposition without disrupting any of the Ty3 ORFs and without relying a priori on trans-activation of a marked Ty3. Evidence for de novo Ty3 transposition was therefore sought by use of a hybridization screen rather than by selection.
To determine the effect of Ty3 transcription on Ty3related rearrangements, yeast cells containing the inducible Ty3 plasmid pEGTy3-1 or pEGTy3-2 were maintained on synthetic complete medium minus uracil for 10 days with either glucose or galactose as the carbon source. The yeast strain used in these experiments, yVB110, has no endogenous Ty3 elements (Bilanchone, personal communication). Parallel sets of cultures were maintained at 23 and 30°C to investigate the effect of temperature on Ty3 rearrangements.
Ten colonies from each experimental condition were streaked for single colonies on rich medium. Fifty clonal isolates representing each of the original 10 colonies were patched onto 5-fluoro-orotic acid-containing medium to select cells which no longer contained the plasmid. The results of this analysis are presented in Table 1. Genomic DNA from pEGTy3-1-transformed cells grown on galactose, but not on glucose, showed hybridization to Ty3. Each of the original 10 colonies grown at 30'C showed evidence of Ty3 rearrangements, with an average frequency of 6.6% over the total of 500 colonies screened. Genomic DNA from 3.4% of the pEGTy3-1-transformed colonies grown on galactose at 23°C showed Ty3 hybridization. The pEGTy3-1-transformed cells grown on glucose did not show evidence of Ty3-related rearrangements. The pEGTy3-2-transformed cells showed no evidence of Ty3 rearrangements during growth on glucose or galactose after 10 days (Table 1) or 20 days (data not shown) at either temperature. These data are consistent with the predicted dependence of Ty3 transposition on transcription but suggest that of the two elements tested only Ty3-1 is capable of independent high-frequency transposition.
The acquisition of Ty3 sequences by host genomes could also be mediated by recombination unrelated to transposition. We consider this an unlikely explanation of the results described above, because the frequency of transposition was dependent on transcription of the Ty3 sequences. Rearrangements were not detected in the glucose-grown cells in which Ty3 was not transcribed. It could be argued that this recombination was transcriptionally activated. This explanation, however, is not consistent with the absence of detectable Ty3 rearrangements in the pEGTy3-2-transformed, galactose-grown cells. Therefore the results presented in Table  1 are not easily explainable by a simple recombination mechanism.
Recombination between the plasmid and host chromosomal sequences could not alone explain the Ty3 rearrangements. Nevertheless, these data do not rule out transcription-dependent synthesis of a full-length Ty3 DNA and subsequent integration by homologous recombination of this intermediate with endogenous sigma elements. Southern blot analysis of genomic DNA from cells in which Ty3related rearrangements occurred was performed to investigate this possibility. Figure 5B and C shows the results of that analysis. The host strain yVB109, transformed with pEGTy3-1 and grown on galactose, contains three endogenous elements (Bilanchone, personal communication). Hybridization with a Ty3-specific probe (Fig. SB) showed that there are four or five Ty3-hybridizing fragments in each of these genomes. In the 230 colonies screened by pooled DNA preparations for this analysis, at least 35 novel bands were observed. Comparison of the Ty3 hybridization pattern with the sigma-specific hybridization pattern (Fig. SC) showed that there are additional sigma-hybridizing fragments in each genome which acquired a Ty3 element(s) but that in no case is a sigma-hybridizing fragment present in the control genome absent in a genome which has acquired Ty3 elements. These results suggest that Ty3 rearrangements are not commonly mediated by integration of Ty3 at endogenous sigma loci. Although our experiments do not specifically demonstrate an RNA intermediate, both sequence analysis and transposition data are consistent with a retroviruslike mechanism of transposition for Ty3.
The basis of the difference in transposition activity between the pEGTy3-1-and pEGTy3-2-transformed cells is not clear. Although Ty3-1 and Ty3-2 are highly similar over long portions of the sequences we compared (unpublished data), there are some differences. These include a 78-bp repeat within TYB3-2 and, potentially, a difference in end point between TYB3-2 and TYB3-1. Alternatively, there may be important differences in the levels or structures of transcripts produced from these two plasmids. The pEGTy3-2 construct retains the tRNA gene downstream of the element. If termination of Ty3 transcription is affected by the tRNA gene, a less active Ty3 transcript may result. These possibilities can be readily distinguished by the activities of chimeric elements.

DISCUSSION
Ty3 is a yeast retrotransposon which contains coding information for proteins with similarity to previously de-scribed retroviral nucleic acid-binding proteins, aspartyl proteases, reverse transcriptases, and endonucleases. The two Ty3 elements characterized, like previously described sigma elements, occur close to the 5' ends of tRNA genes. One Ty3 element isolated from a common laboratory strain was shown to be capable of high-frequency transposition. The results presented here suggest that Ty3 transposition is responsible for the unusual position specificity observed for isolated sigma elements. A surprising finding of this study is that Ty3 is apparently more closely related to a family of Drosophila retrotransposons and to animal retroviruses than to previously described yeast retrotransposons.
Relatedness of Ty3 to Tyl and Ty2. DNA sequences involved in the regulation of expression of Ty3, Tyl, and Ty2 have some features in common. The sigma LTRs of Ty3 and the delta LTRs of Tyl and Ty2 have a number of short regions of identity (31). Tyl and Ty3 transcripts have been shown to start in the LTR about 100 bp upstream of the beginning of the internal domain in the respective elements (9,19). Transcription of Ty elements is under mating type control and is down regulated in diploids (20; V. W. Bilanchone, K. Y. Sato, and S. B. Sandmeyer, manuscript in preparation). The primer-binding site in the transcripts of all three types of elements has complementarity to initiator tRNAMet (9,17,88). As in retroviruses, the first and second ORFs of each of the Ty classes overlap. This overlap is 38, 44, and 38 nucleotides long for Ty3, Tyl, and Ty2, respectively (this work; 8,28,35,88). Nine nucleotides within the overlap are conserved in all three elements. A plus-one translational frameshift within the overlap is required for expression of proteins from the second ORFs of Tyl and Ty2 RNAs (8,53). These similar features of Ty3, Tyl, and Ty2 could be the result of divergent evolution from some ancestral element or could result from common adaptation of cellular regulatory mechanisms to limit transposition.
In contrast to the similarities noted above, the proteins encoded by Ty3 are not highly similar to the analogous proteins encoded by Tyl and Ty2. The DNA sequence of TYA3 predicts a domain containing the C-X2-C-X4-H-X4-C motif which is also found in the nucleocapsid proteins of animal retroviruses. This motif does not occur in the gaglike proteins of Tyl and Ty2 (1,8,35,88). The sequences of the predicted polymerase domains and the endonuclease domains from Ty3 and Tyl were compared by using the progressive-alignment programs of Feng and Doolittle (23). This comparison showed a minimum number of identical positions. The overall identity between Ty3 and Tyl reverse transcriptases in the polymerase domains is 12%. The significance of the reverse transcriptase and endonuclease alignments was scored (23) to reflect positions at which equivalent, although not identical, amino acids are conserved and to reflect the number of gaps introduced with optimal alignment of the protein sequences. Despite the low percentage of identical residues, the score of the Ty3 and Tyl reverse transcriptase alignment (Fig. 4C) was 8 standard deviations away from the mean of the scores generated by 50 comparisons between random jumbles of the same two amino acid sequences. Therefore, the similarity of these sequences is still greater than for random sequences of this composition. The comparison of endonucleases showed about the same level of identity as the comparison of reverse transcriptases but scored closer to the mean of the random jumbles. Thus, the proteins encoded in Ty3 have similarity to those encoded in Tyl but are clearly distinct.
In addition to protein sequence differences between Ty3 and the other yeast retrotransposons, there are also organi-zational differences. In the predicted polyprotein sequence of Ty3, the reverse transcriptase domain precedes the endonuclease domain-the reverse of the order in the Tyl and Ty2 polyproteins. On the basis of these comparisons, Ty3 seems distantly related, if at all, to the other yeast elements. Thus, Ty3 could have diverged long ago from a common ancestor with Tyl and Ty2 or might have been assembled independently from cellular genes (79).
Relatedness of Ty3 and Drosophila retrotransposons. A comparison of the predicted TYB3 protein sequence to the National Biomedical Research Foundation Protein Data Base showed, as mentioned above, that this sequence has the greatest similarity to the polyprotein encoded by the second ORF of 17.6 (68,82). The Ty3 polymerase domain showed 43% sequence identity with the analogous region of the 17.6 polyprotein. The alignment score of the Ty3 and 17.6 polymerase comparison was 19.3 standard deviations above the mean score of the random-jumble comparisons. This score shows that the Ty3 polymerase resembles the 17.6 polymerase much more closely than it resembles the Tyl polymerase. The relatively high level of similarity between the polymerases encoded by Ty3 and 17.6 argues strongly that they are homologous.
Our data are consistent with a common ancestor for Ty3 and 17.6, and the data of others suggest that Tyl and copia also stem from a common lineage. If these pairs of elements are related, then multiple Tys might have existed before the divergence of single-celled fungi from other eucaryotes. It might also be expected that these yeast elements would have diverged for similar periods of time from their respective Drosophila homologs and possibly under similar selection. Why, then, are the polymerase sequences of Ty3 and 17.6 so much more similar than the polymerase sequences of Tyl and copia? There are at least three possible explanations. (i) It is possible that the lineages by which the genera Drosphila and Saccharomyces might be related are much more complex than can be represented by the limited set of characterized elements. The present or past existence of additional elements, for instance, yeast elements more closely related to copia, could resolve the apparent discrepancies in the rates of divergence of these retrotransposons. (ii) The assumption that different retrotransposons followed through the same eucaryotic lineage of organisms should be subject to similar rates of change could be incorrect. Different functions could have evolved for the two types of elements, resulting in differences in the rates of permissible change. Alternatively, the two reverse transcriptases could have significantly different error rates, also leading to different rates of change between the lineages. (iii) The similarity between Ty3 and 17.6 could be explained by interspecies horizontal transmission of some ancestral retrotransposon sequence after the divergence of fungi from other eucaryotic lineages.
It is difficult to choose among these possibilities on the basis of the data available. Neither of the first two explanations-incomplete catalogs of retrotransposon lineages and different rates of change for the Tyl-copia and  lineages-can be easily verified. Several observations may, however, shed light on the likelihood of explanation iii, interspecies horizontal transmission of some ancestral element. Because Ty3 (sigma) and 17.6 are quite different, any interspecies transmission would presumably have occurred long ago. Although it is not known how widespread Ty3 is in different yeasts, the sigma element is found in at least two species closely related to S. cerevisiae, S. norbensis and S. carlsbergensis (13). Interestingly, it is known that retrotrans-VOL. 8,1988 posons in different Drosophila species vary in both occurrence and copy number. In particular, Martin et al. (51) suggested that the widespread but variable occurrence of 412 and 297, two elements considered related to 17.6, is consistent with horizontal transmission. An appealing aspect of the hypothesis that horizontal transmission could involve completely different organisms is the existence of fruit flies and yeasts in the same habitat.
In an effort to evaluate the probability of horizontal transmission by additional criteria, the codon usage of Ty3 and 17.6 second ORFs was compared with codon usage tables generated from Drosophila and Saccharomyces DNA sequences. Our comparison, performed with the University of Wisconsin Genetics Computer Group CORRESPOND program, showed that usage in both the copia ORF and the 17.6 second ORF resembles Saccharomyces usage more than Drosophila usage (unpublished data). This codon bias is consistent with, although it does not prove, horizontal transmission.
Relatedness of Ty3 to retroviruses. Ty3 encodes proteins and has a gene organization which resembles that found in animal retroviruses. The organizational similarities have been noted above. We aligned reverse transcriptase and endonuclease sequences from human immunodeficiency virus, bovine leukemia virus, Rous sarcoma virus, and MLV with the Ty3 reverse transcriptase (polymerase) and endonuclease sequences. These retroviruses have been compared to each other previously and are relatively distantly related (7,43,65,81). The MLV lineage is considered to have diverged relatively early from the lineages of other modern retroviruses. The similarity between Ty3 and animal retrovirus proteins is most clear in sequences from the polymerase domains of reverse transcriptase, in which the alignments show from 19% identity with Rous sarcoma virus to 26% identity with MLV and bovine leukemia virus (Fig. 4C). These similarities are not as compelling as the similarity between the Ty3 and 17.6 reverse transcriptases; nevertheless, the alignment scores of these comparisons were 11 to 17 standard deviations above the mean of the scores of the random-jumble comparisons. Similarity was also present, although to a lesser extent, in the endonuclease domains we compared. Among these retrovirus endonucleases, the sequence from Ty3 was least similar to human immunodeficiency virus, with 17% identity, and most similar to MLV, with 23% identity (Fig. 4D). Of the retroviral polymerase and endonuclease sequences we compared to the Ty3 proteins, similarity was greatest with those from MLV. The organizational and protein sequence similarities between Ty3 and these animal retroviruses are consistent with derivation of Ty3 and animal retroviruses from a common ancestral species.
Position specificity of Ty3. A striking feature of isolated sigma elements and the two Ty3 retrotransposon insertions which we characterized is close association with the 5' ends of different tRNA genes. Although we did not analyze the insertion sites of the newly transposed Ty3 elements for the presence of tRNA genes, one simple explanation of the existing insertion site data is that the Ty3 element transposes with position specificity for tRNA genes. If this hypothesis is correct, then one contributor to this position specificity could be the endonuclease. The endonuclease encoded by the retroid element is presumed to be responsible for cleaving the DNA transposition intermediate to produce the mature termini of the inserted element (reviewed in reference 84). Although it is not known whether this intermediate is circular or linear, there is in vitro evidence that the avian retrovirus endonuclease pp32 binds and cleaves at the ends of the LTRs (16,32,57). Because the length of the target site repeats corresponds to the retroid element rather than the "host," the retroid element endonuclease is inferred to cut the genomic target, as well as the transposition intermediate (85). We examined the sequence of the TYB3 protein in the region most similar to the sequences of retrovirus endonucleases. The TYB3 sequence contained the conserved residues at six of seven positions which were conserved among all of the other endonucleases we compared. This suggests that Ty3 and the other retroid elements we compared integrate through fundamentally similar mechanisms.
The apparent position specificity of Ty3 is one of its most intriguing features. Site-directed mutagenesis of the endonuclease and tRNA gene target plasmids can now be used to examine the molecular basis of Ty3 integration. The similarity which exists between Ty3 and other retroid elements suggests that these studies may also offer insights into the mechanics of retrotransposition in systems in which position specificity has not been observed.