Expression and evolution of members of the Trypanosoma cruzi trypomastigote surface antigen multigene family.

The trypomastigote specific surface antigens of Trypanosoma cruzi are encoded by a supergene family which includes the TSA family. The TSA family is characterized by the presence of a 27-bp tandem repeat array in the coding region. Here, we report the characterization and analysis of the three TSA family members in the Esmeraldo strain of the parasite. In this strain 2 distinct telomeric members are expressed abundantly as 3.7-kb mRNAs, while the remaining member is located at an internal chromosomal site and is expressed at less than 2% of the level seen for the telomeric members. Based on hybridization to DNA separated by PFGE, 3 chromosomes of sizes 1.8 Mb, 0.98 Mb, and 0.90 Mb each contain one of the telomeric members. In addition, the two smaller chromosomes also contain the single internal member. Since both chromosomes contain similar TSA family members, and vary only slightly in size, we suggest that they are homologues. Comparisons of the nucleotide sequences of the different members of the family show that the internal gene differs from the telomeric genes primarily in sequences found 3' of the repeat array. These comparisons also reveal that the three genes are analogous, supporting the hypothesis that short segments between the family members are exchanged by gene conversion events. We propose that similar conversion events between members of different gene families may generate some of the diversity found within the supergene family.


Introduction
Studies on the surface glycoproteins of the bloodstream trypomastigote stage of the parasitic protozoan Trypanosoma cruzi indicate that several of these molecules may be involved in processes of parasite recognition and penetration of the host cell [1][2][3][4][5][6][7][8]. Additional studies [9] have shown that many, and possibly all, of these trypomastigote surface glycoproteins are encoded by a supergene family. Members of the superfamily share about 30% amino acid identity and all possess a partially conserved sequence (VTVxNVfLYNR) near the COOH terminus of the protein. The family members also contain partial or complete copies of the motif SxDxGxTW. Interestingly, this motif was first identified in bacterial neuraminidases [10], suggesting that genes encoding neuraminidase may be found within the supergene family.
Indeed, the SAPA/TCNA (shed-acute phase antigen/Trypanosoma cruzi neuraminidase) gene family has been shown to contain most of the neuraminidase/trans-sialidase activities of the parasite. While criteria to definitively assign individual gene families within the superfamily have not been firmly established, it has been possible to define at least 2 gene families by the presence of tandemly repeated amino acid motifs. The SAPA/TCNA gene family shares a repeat motif of 12 amino acids [11], and one or more members encode the unique 160-kDa sialic acid-transferring enzyme, trans-sialidase [12,13]. Members of the first gene family identified in T. cruzi share a different repeat motif of 9 amino acids and encode trypomastigote-specific surface antigens (TSA) of 85 110 kDa [14].
In previous studies we demonstrated that the gene TSA-PI in the Peru strain contains a tandemly repeated 27-bp sequence within the coding region which defines this 85-kDa gene family [15]. The repeat unit hybridizes to 4 genomic EcoRI fragments in the Peru and Esmeraldo strains of the parasite, and to one EcoRI fragment in the Silvio strain [16]. Bal 31 nuclease studies revealed that, of the four fragments defined by EcoRI digests of Peru and Esmeraldo DNA, one fragment in Peru and 3 fragments in Esmeraldo were Bal 31 nuclease sensitive, indicating that these fragments are located at or near a telomere. The remaining family members in Peru and Esmeraldo, and the solitary Silvio member, were insensitive to Bal 31 nuclease suggesting these fragments are found at internal chromosomal locations.
This report focuses on the relationship between the members of the TSA gene family in the Esmeraldo strain in order to elucidate the mechanisms underlying the evolution of the gene family and the larger superfamily. Based on the data pre-sented, we hypothesize that concerted evolution of the TSA family is occurring, most likely by the process of gene conversion, and we propose that diversity within the superfamily may be generated by conversion events between members of different gene families.

Parasite strains and culture. The T. cruzi
Peru strain was obtained from Stuart M. Krassner, University of California, Irvine. Clonal lines of this strain were established from individual parasites which were isolated by micromanipulation using procedures provided by James Dvorak, National Institutes of Health, Bethesda, MD. Peru clone 3 was utilized for these studies. The cloned T. cruzi lines Esmeraldo clone 3 and Silvio X10 clone 1 were obtained from James Dvorak. Growth and maintenance of epimastigotes and tissue-culture derived trypomastigotes of these strains were as described elsewhere [17].

Pulsed field gel electrophoresis (PFGE).
DNA was prepared from late exponential phase epimastigotes in low-melting-point agarose (In-Cert Agarose, FMC Bioproducts) at a concentration of 2x 109 cells ml i as described [18]. Electrophoresis was performed in 1.0% agarose gels immersed in 0.5 x TBE buffer (90 mM Trisborate, 2.5 mM EDTA, pH 8.0) with a BioRad CHEF DR II unit (BioRad) for 24~48 h at 14°C using 200 volts with a switch time of 60 and 90 s. DNA molecular weights were estimated using standards prepared from Saecharomyces eerevisiae. Pulse-field gel electrophoresis (PFGE) gels were stained with ethidium bromide, photographed, destained for 20 min and blotted to nylon membranes following depurination with 0.25 M HC1.
Isolation of individual DNA bands from PFGE gels was accomplished by excision of the gel region and electroelution in the CHEF DR II system for 24 h using the conditions described above. The DNA was further purified by treatment with proteinase K followed by extraction with phenolchloroform.

Nucleic acid isolation and analysis.
Parasite nuclear DNA and poly A + RNA, phage lambda DNA, and bacterial plasmid DNA were isolated as described previously [19]. Agarose gel electrophoresis on 1.0% agarose was used to separate nucleic acids. Southern transfer to nylon membranes, prehybridization, hybridization, and wash conditions were carried out as previously reported. Hybridization with the 27 nucleotide repeat unit was performed at 37°C in 30% formamide. Hybridization of oligonucleotide probes specific to the three Esmeraldo genes was performed at 42°C in 30% formamide. 111 2.6. DNA sequencing and sequence analysis. DNA sequencing was performed using the dideoxy chain termination method [20] using a T7 polymerase sequencing kit from Pharmacia as recommended. All sequencing reactions were carried out on T. cruzi derived DNA fragments inserted into the Bluescript plasmid using [~-32p]dATP for incorporation. Nucleotide sequence data was compiled and analyzed using the IBI Pustell Sequence Analysis programs for the IBM computer (New Haven, CT), and the Clustal V programs [21].
2.4. Oligonucleotide synthesis, radiolabeling, and restriction enzymes. Oligonucleotides used for probes or as sequencing primers were synthesized on the Gene Assembler Plus (Pharmacia, Piscataway, NJ) according to the manufacturer's instructions.
Synthetic oligonucleotides were radiolabeled by end-labeling using [7-32p]ATP and T4 polynucleotide kinase. DNA restriction fragments were radiolabeled"using a nick translation kit (BRL, Gaithersburg, MD) as recommended with [~-32p]dCTP incorporation. All restriction enzymes were purchased from Boehringer-Mannheim and used as recommended.
2.5. Genomic and cDNA library construction and isolation of recombinant phage. Esmeraldo nuclear DNA was digested with the selected restriction enzymes and ligated into the 2FIX replacement vector (Stratagene, La Jolla, CA). A cDNA library was constructed in phage 2gtl0 using trypomastigote poly A + RNA from the Esmeraldo strain and cDNA synthesis kits from Pharmacia and BRL. Recombinant phage from the genomic and cDNA libraries were screened by plating the phage, transferring to nitrocellulose filters, and screening with a radiolabeled 27-mer representing one unit of the repeat array present in the coding region of the TSA-P1 gene from the Peru strain. Positive 2 phage were plaque purified and T. cruzi DNA inserts from each phage were excised and subcloned into the Bluescript plasmid vector (Stratagene) for nucleotide sequencing and restriction enzyme analysis. iiii~iiiiiiiiiiiiiiiiiii!' iiiiiiiii!!iiiiiiiiii )))iiiiiiiiiiiiiiiiiiiiilFi!ii:iFi)iill

Genomic organization of the TSA family in
Esmeraldo. Previous studies identified 4 genomic EcoRI restriction fragments from the Esmeraldo strain which hybridize with the 27 nucleotide repeatunit [15]. Three of these EcoRI fragments, of sizes 3.2, 3.4 and 3.6 kb, are sensitive to Bal 31 nuclease digestion and are inferred to be located at telomeric sites. The fourth fragment, of size 6.0 kb, shows no sensitivity to Bal 31 nuclease digestion and is likely located at an internal chromosomal site. Since direct cloning of the telomeric genes as EcoRI fragments is not possible, cloning of DNA fragments containing all or part of these four EcoRI fragments was accomplished using an approach similar to that previously used for the cloning of the telomeric and internal members of the TSA genes in the Peru strain [16].
In the Peru strain, a single SalI restriction site separates the telomere from the EcoRI site found within the telomere associated gene, TSA-P1. A second SalI site is located several kilobases upstream of the 5' splice site of TSA-P1, allowing the entire coding region of the gene to be cloned as a single SalI fragment. To determine whether a similar pattern of restriction enzyme sites is present in the telomeric genes of the Esmeraldo strain, genomic DNA was digested with either SalI, EcoRI or SalI/EcoRI, Southern blotted and hybridized with the 27 nucleotide repeat unit. As shown in Fig. 1, 3 SaII fragments of sizes 9, 13 and 20 kb, and 4 EcoRI fragments of sizes 3.2, 3.4, 3.6 and 6.0 kb hybridized. In the SalI/EcoRI digest, only 2 fragments of sizes 0.9 and 1.7 kb are seen. Since both fragments are smaller than any of the EcoRI fragments, each telomeric gene must possess a SalI site between the EcoRI site and the telomere. Also, since only 3 SalI fragments hybridized, it is probable that at least 2 of the four members defined by EcoRI digestion share the same SalI restriction pattern.
A SalI genomic library was constructed in order to clone the SalI fragments by digesting Esmeraldo nuclear DNA with Sall, purifying fragments of size 9-23 kb, and ligating into the 2FIX replacement vector. 200 000 recombinant phage were screened with the repeat unit and 11 recom-  binants rescreened positive. SaII digests of the recombinant phage DNA isolated from the 11 positives revealed 1 containing a 9.0-kb insert, 2 containing a 20-kb insert, and the remaining 8 containing a 13-kb fragment. Restriction enzyme analysis of the two 20-kb inserts showed the SalI fragments to have identical restriction patterns. Restriction enzyme analysis of the eight 13-kb inserts showed 2 variants which differ by a single 400-bp deletion/insertion (Fig. 2).

Chromosome mapping and analysis.
In order to determine the relationship between the SalI and EcoRI fragments, mapping of these fragments to chromosomal size molecules was undertaken. Fig  The chromosomal map shown in Fig. 4 was constructed based on these results. The 1.8 Mb chromosome contains the 9.0-kb SalI fragment and the 3.6-kb telomeric EcoRI fragment. The 0.98 Mb and 0.90 Mb chromosomes contain both the 20-kb and 13-kb SalI fragments as well as the 6.0-kb internal EcoRI fragment. The maps of the two chromosomes differ in the EcoRI fragments found at each telomere, since the 0.98 Mb chromosome contains the 3.2-kb EcoRI fragment whereas the 0.90 Mb chromosome contains the 3.4-kb EcoRI fragment. The assignment of the 13-kb SalI fragment to an internal chromosomal location is based upon direct nucleotide sequence analysis of it as well as the 6.0-kb EcoRI fragment, as discussed below. Also, we have tentatively de-picted the 0.98 Mb and 0.90 Mb chromosomes as homologues based upon their similarity in size and the fact that they possess similar TSA family members and restriction enzyme patterns.
3.3. Nucleotide and amino acid sequence. The nucleotide sequence of the TSA family members contained in each of the SalI fragments and the 6.0-kb EcoRI fragment were determined using synthetic oligonucleotide primers. Alignment of the nucleotide sequence of the three genes is shown in Fig. 5. The positions of 6-bp restriction enzyme recognition sites predicted by restriction mapping or sequencing were confirmed. The nucleotide sequence of the family member present in the 6.0kb EcoRI fragment showed 100% identity with the family member present in the 13-kb Sail fragment, providing direct evidence for assignment of the 13-kb Sall fragment to an internal chromosomal site (Fig. 4). Translation of the gene sequences in each of the 6 possible reading frames revealed only one large open reading frame (ORF) in each sequence. The remaining 5 reading frames each contain numerous stop codons distributed throughout the predicted amino acid sequences. The putative initiation codon for the ORFs is at nucleotide 1 in all three sequences (Fig. 5   for a phosphatidylinositol linkage [28].

Transcription of the family.
To determine whether each of these three genes is actively transcribed and to assess their relative abundance in the stable A+RNA population, two ,~gtl0 cDNA libraries were constructed and 250 000 recombinants from each library were screened with the repeat unit. 129 plaques rescreened positive and, of these, 14 were selected for restriction enzyme and nucleotide sequence analysis. DNA from each was isolated, restricted with EcoRI/SalI, electrophoresed in agarose, Southern blotted and hybridized with the 27 nucleotide repeat. Thirteen of the DNAs showed positive hybridization to a fragment of length 0.9 kb while one DNA showed hybridization to fragment of length 0.96 kb. Direct nucleotide sequence analysis of these DNA fragments indicated that the 0.9-kb fragment was an EcoRI/SalI fragment identical in sequence to either gene TSA-E1 or TSA-E2. The 0.96-kb fragment, however, was an EcoRI fragment from gene TSA-E3 which terminated in a stretch of 15 A residues that have no counterpart in the genomic sequence. To determine the transcriptional origin of the remaining cDNA isolates, oligonucleotide probes representing unique regions in each of the three genes were synthesized and hybridization conditions were determined such that each probe hybridized only to its homologous gene (Fig. 7). Of the 129 plaque purified recombinants, 25 hybridized with the TSA-E1 specific oligo, 102 hybridized with the TSA-E2 specific oligo, and 2 hybridized with the TSA-E3 specific oligo. Comparison of the relative abundance of each of the three members in the mRNA population shows that the internal member, E3 is represented at less than 2% of the level of that seen for the telomeric  Fig. 8. Nucleotide sequence mismatches in blocks of sequence from the three TSA family members in the Esmeraldo strain of T. cruzi.
The columns represent regions of sequence with the numbering system based on the nucleotide sequence of TSA-E2 (see Fig. 4). The numbers in each cell are the total number of nucleotide seqence mismatches in that region between the two aligned family members. The boundaries between the regions were chosen as the midpoints between nonidentities which denote a shift in maximal sequence similarity from one gene pair to a different gene pair. members. Also, there is a five-fold difference in the levels of the telomeric members with TSA-E2 more highly represented than TSA-E1.

Discussion
In our previous studies, restriction enzyme analysis of genomic DNA indicated that the Esmeraldo strain of T. cruzi contained 4 members of the TSA-1 multigene family, 3 of which are located at telomeres. We have now determined the nucleotide sequence, chromosomal location and transcript abundance of each of these members. A primary finding of this study is the chromosomal organization of the gene family. The single family member located at an internal chromosomal site, TSA-E3, is present on 2 chromosomes of similar size, 0.98 and 0.90 Mb, each of which also contains a telomeric family member. The two telomeric members are distinguished because the COOH termini of their coding regions are found on distinct EcoRI restriction fragments (i.e., 3.4 and 3.2 kb), while the entire coding region of each telomeric member is present within either a 13.0or 13.4-kb SaII fragment. A fine scale restriction enzyme map of these two SalI fragments showed that they are identical with the exception of a 0.4kb insertion/deletion. In addition, partial nucleotide sequence analysis of the family member present in each SalI fragment showed no differences in their nucleotide sequence. These findings suggest that the chromosomes containing the two telomeric EcoRI fragments are homologues, with each homologue containing one telomeric and one internal family member. This interpretation is in keeping with previous studies which indicate that T. cruzi is a diploid organism of which the homologues often differ in size [29] and is also consistent with previous observations in the African trypanosome, Trypanosoma brucei, which show that restriction fragments containing telomeres with the same surface antigen gene may vary in length due to the nature of the replication of chromosomal ends [30,31].
Transcription analysis of the family in Esmeraldo shows that A + RNAs from the internal gene, TSA-E3, are present at a level < 2% of that seen 117 for the telomeric members. This is also the case in the Peru strain where a similar study revealed that of 27 cDNAs which contained the repeat unit, only one represented a transcript from the internal gene(s). An internal location is not strictly correlated with the absolute level of transcription, since the single member in the Silvio strain is internal and transcripts of this member are detected in Northern blots of A + RNA [16]. Likewise, studies on related members of the multigene family demonstrated that genes from telomeric and internal locations can be expressed simultaneously [24]. However, within the TSA family it is clear that when both internal and telomeric genes are present in the same genome, transcripts from internal member(s) are much less abundant than transcripts from telomeric member(s). In addition, in the Esmeraldo strain a five-fold variation in the level of transcripts from the two telomeric members is seen, reaffirming our previous observation that the level of expression of different members of the gene family may vary and that such variation can be associated with the chromosomal location of the family members, possibly reflecting position effect.
Nucleotide and amino acid sequence comparisons show that the internal member differs from the two telomeric members primarily in the number of repeat units and in those sequences 3' downstream of the repeat array. TSA-E1 and TSA-E2 each contain 4 tandemly repeated 27-bp sequences which are flanked by partial repeats, while TSA-E3 has only one complete repeat unit which is flanked by partial repeats. Most striking is the observation that the 345 bp immediately downstream of the degenerate repeat array in the telomeric genes (i.e. nucleotides 2056-2401, Fig. 5) are absent in the internal gene, TSA-E3. The simplest explanation for this observation is that the 345 bp have been deleted from the internal gene during evolution of the gene family. Consistent with this view is the observation that the remaining 23 amino acids in TSA-E3 show significant identity (i.e. 64%, Fig. 6) with the carboxyl terminus of both TSA-E1 and TSA-E2. It is curious, however, that the percent of sequence identity is significantly decreased from the >91% identity observed throughout the remainder of the protein. Also, the abrupt loss of nucleotide identity between TSA-E3 and the telomeric genes immediately 3' of the TGA stop codons is surprising (Fig. 5), particularly when viewed in comparison to the > 92% sequence identity observed throughout the coding region of the genes as well as the nontranslated region upstream of the ATG initiation codon. One reasonable explanation for these observations may be found in the genetic mechanisms hypothesized to facilitate maintenance of multigene families [32,33]. Gene conversion can best explain both the maintenance of a family of related genes as well as mediate diversification of particular genes within a gene family in several organisms [see reviews [34][35][36][37][38]. If gene conversion events are occurring between the different TSA family members, certain features of the sequence structure of the TSA genes would be predicted. In particular, gene conversion involving the internal gene TSA-E3 and the telomeric genes could account for the number of repeat units in TSA-E3 being less than the number present in either TSA-E1 or TSA-E2 if the 3' most region of heteroduplex formation occurred within a repeat unit. Also, if genetic exchange between E3 and either E1 or E2 does not occur downstream of the repeat units, then sequences downstream of the repeat units in E3 may diverge independently of El and E2 provided the primary constraint is retention of the biological function encoded by that region of the gene. Secondly, if blocks of sequences rather than entire genes are being converted, patchy homology might be observed among the three genes and maximal sequence identity between the genes would be obtained if the genes were treated as analogues. As shown in Fig. 8, this appears to be the case. Starting at 78 bp upstream of the ATG initiation codon and extending to bp 255, TSA-E1 has greater identity with TSA-E3 than with TSA-E2. However, at bp 256 TSA-EI begins to show a greater similarity with TSA-E2 than with TSA-E3 and this similarity extends through bp 1070 at which a shift back to the El-E3 similarity is observed. At bp 1412 maximum identity again changes, with E3 being more identical to E2 than with El. This similarity extends through bp 2042, at which E2 again becomes most similar to El, and this similarity ex-tends through the non-tranlated region of the two genes. This pattern of sequence similarity suggests that these genes are analogues, and is consistent with gene conversion events between the family members.
There is now substantial evidence that the natural propagation of T. cruzi is clonal, and that sexual reproduction is either absent or so rare as to leave no trace in the population structure of the parasite [39,40]. Thus, gene conversion in T. cruzi most likely takes place during mitotic growth. While somatic conversion events generally occur at a frequency lower than that observed for meiotic exchange, they do have important biological consequences in many organisms. In particular, somatic gene conversion in chickens serves to introduce diversity by the exchange of short segments between family members of the immunoglobulin ;t light chain locus [41,42], and to provide diversification of the variable surface glycoprotein genes in Trypanosoma equiperdum and Trypanosoma brucei [4345].
In summary, the results presented herein show that individual members of the TSA gene family do not diverge independently of each other, and therefore over evolutionary time the homogeneity of this family will not necessarily disappear by a slow accumulation of mutations in individual family members through genetic drift. At present, it is not possible to speculate on whether conversiondriven evolution of gene families, both tandemly repeated ~/s well as dispersed, is a general phenomenon in this parasite. If this is the case, however, a pattern of sequence identity like that observed for the TSA family might be expected to be found within the SAPA/TCNA trans-sialidase multigene family as well as other families within the superfamily. Also, if gene conversion between members of T. cruzi gene families depends on tracts of sequence identity for initiation of recombination via homology-dependent strand transfer, as is the case for the chicken immunoglobulin genes, genetic diversity within the supergene family could be enhanced by exchange of gene segments between families which share even limited sequence identity. Such events would be expected to generate members of the superfamily which are dimorphic in that they would contain regions which have high sequence similarity to one family, while immediately adjacent regions of the gene would exhibit substantially less similarity. Since these new constructs likewise would be presented to the host defense mechanisms, their rate of fixation within the population may be accelerated.