The Origin of Antigenic Diversity in Plasmodium falciparum

Most studies of genetic variability of Plasmodium falciparum have focused on protein antigens and the genes that encode them. The consensus is that populations exhibit high levels of genetic polymorphism, most notably the genes encoding surface proteins of the merozoite ( Msp1 , Msp2 ) and the sporozoite ( Csp ). The age and derivation of this variation is a subject that warrants further careful consideration, as discussed here by Stephen Rich, Marcelo Ferreira and Francisco Ayala.

to mutate rapidly coupled with natural selection favoring novel antigens might account for the seemingly great age of the alleles.

Merozoite and sporozoite surface antigens
It is proposed here that most of the variation in antigenic genes is attributable to duplication and/or deletion of the repeated segments within the genes. This process occurs by several mechanisms, each of which is well understood at the molecular level and might involve either intra-or interhelical exchange of DNA 17 . These mechanisms will be referred to by the generic term intragenic recombination (IGR), which increases or decreases the number of repeats within a genetic locus.
The IGR process is often associated with the evolution of mini-or microsatellite DNA loci, such as those recently described in P. falciparum 18,19 . However, IGR has also been implicated in generating variability within coding regions in a variety of eukaryotic genes, including those encoding Drosophila yolk protein and human ␣ 2 -globin 20,21 . The probable effects of IGR in antigen-encoding genes of P. falciparum have been demonstrated, with examples of the Csp, Msp1 and Msp2 genes. These loci were chosen because: (1) they are widely used in studies of epidemiology and population structure; (2) their polymorphisms are believed to be ancient 12,22 ; (3) they contain repeated DNA segments; and (4) each is a prototypical example of the various stages in the differentiation of genes by IGR.
The Csp gene encodes the antigenic circumsporozoite protein, which has been investigated extensively because it is a likely target for vaccine development 23,24 . The gene comprises two end-regions that are not repetitive (5Ј NR and 3Ј NR), which embrace a central region (CR) made up of a variable number (typically, between 40 and 50) of tandemly arranged 12 nucleotide repeats. There are no silent polymorphisms in the 5Ј NR and 3Ј NR regions, which is part of the evidence used to infer the recent origin of P. falciparum populations 1,4 .
The repetitive amino acid sequences encoded within the CR are remarkably conserved (only two amino acid motifs are known in P. falciparum: NANP and NVDP), but there is a great deal of synonymous nucleotide polymorphism among the repeats. To quantify the degree of nucleotide difference among these motifs, Rich et al. 25 introduced the concept of the repeat allotype (RAT) to refer to the set of variant nucleotide sequences that encode a single amino acid motif. Using the RAT as the basic evolutionary unit, it is possible to achieve correct alignments between gene sequences and, hence, to determine their homologies 25 . Among the known Csp gene sequences of P. falciparum, there are ten RATs that encode the NANP motif and four that encode the NVDP motif (Fig. 1). Each RAT is identified by a Greek letter to distinguish its alignment from that of either nucleotides or amino acids (Fig. 1). The pattern of duplication/deletion of RATs clearly reflects the underlying IGR mechanisms that generate diversity in the CR. Identical symbols in the columns of this alignment indicate identical nucleotide sequences between alleles. Note that nearly all of the observed synonymous site differences in the CR are between RATs found within any single allele. This is a strong indication that

Focus
Parasitology Today, vol. 16  although RAT diversity might have an ancient origin, it has been maintained within individual alleles and can therefore withstand even the most constricted bottleneck. For example, all 25 Csp CR alleles contain at least one copy of each of the most common RATs (␣, ␤, , ␦, and ␥), which constitute more than 93% of all NANP repeats. If any one of these sequences were the sole survivor following a bottleneck, it alone would possess nearly all the diversity currently known for the species. After some cell generations, IGR rearrangements   Parasitology Today of these RATs generate size polymorphisms in the resulting alleles. This process has presumably occurred numerous times in the evolution of the species, and might continue to do so, given the nature of the parasite life style and its propensity for being confronted by population bottlenecks. Interestingly, the singleknown Csp CR of P. reichenowi, is more variable than all known P. falciparum alleles combined, in that it has three amino acid repeat motifs: NVNP as well as the two P. falciparum motifs (NANP and NVDP).
The approach used to determine the evolution of the Csp CR is not applicable to all P. falciparum antigenic determinants. For example, the Msp2 of P. falciparum shows much greater variability in length, amino acid content and number of repeats; therefore, the number of nucleotide sequences encoding one given identical amino acid motif is limited. Nonetheless, the pattern of allele polymorphism in Msp2 is consistent with the IGR model. Similar to CSP, the MSP-2 protein is characterized by N-and C-termini with 43 and 74 residues, respectively 26 . Bracketed within these conserved segments is the highly variable repeat region. Two allelic families have been identified and named after the isolates in which they were first identified. The FC27 family is characterized by at least one copy of a 32-amino acid sequence and a variable number of a 12-amino acid repeat; the 3D7/Camp family contains tandem amino acid repeats of 4-10 amino acids in length 27 .
The 3D7/Camp alleles are more variable in length and sequence of repeat types than are those of the FC27 family 16 . Fenton et al. 28 proposed a model to explain the origin of repeat diversity within the 3D7/Camp family of alleles. The 3D7/Camp family was divided into distinct allelic subclasses, which included types A1 and A3, distinguished by amino acid repeats of different lengths. For example, A1 alleles possess four amino acid motifs, whereas a repeating eight amino acid motif occurs in A3. Fenton et al. have shown that the allelic subclasses within the 3D7/Camp family are derived from a common ancestral nucleotide sequence and that the diversity arises from duplication and deletion of repeat subunits 28 .
Recently, Dubbeld et al. 29 have cloned and sequenced the Msp2 gene of P. reichenowi (PrMsp2), which is a 'unique mosaic of P. falciparum allelic forms and species-specific elements'. The methods described in Ref. 28 have been used to determine whether PrMsp2 provides insight into the ancestry of the FC27 and 3D7/Camp families. Figure 2a shows the amino acid sequence alignment of two P. falciparum MSP-2 proteins with the PrMSP2. The P. falciparum alleles from the 3D7 and OKS isolates are representative of the 3D7/Camp and FC27 families, respectively. The two P. falciparum alleles are identical at nucleotide sites encoding the N-and C-termini, but exhibit little similarity, even at the amino acid level, in the intervening repeat region. A closer look at the nucleotides within this central portion reveals homology at three distinct regions -the repeat homology regions (RHRs). RHR1 shows common ancestry between the PrMsp2 and the Focus Parasitology Today, vol. 16 (Fig. 2b). Diversity within this region results from proliferation of the GGTGCT hexamer, as described by Fenton et al. 28 This hexamer is ancestral to the 3D7/Camp and PrMsp2 allelic repeats within this region. Although conservation of these codons is clear among these two alleles, it appears that they have been lost altogether in the FC27-like alleles. However, the region adjacent to RHR1 in the PrMsp2 sequence is similar to the first 21 amino acids of the 32 amino acid repeat found within the FC27 family, and this sequence is the basis for the inferred RHR2 (Fig. 2b). The last nine nucleotides of RHR2 also manifest homology between all three sequences, including the short stretch following the [actaccaaa] 4 repeat in 3D7. Note also the overlap between repeating nucleotides of PrMsp2 in both RHR1 and RHR2. A third RHR is located further downstream, and shows the relationship between the 12 amino acid repeats of OKS and PrMsp2 (Fig. 2c). The repeat region in OKS is surrounded on either side by a 10-bp sequence (tacagaaagt), which occurs as only a single 5Ј copy in the PrMsp2 allele. Despite the lengthy repeat insertion in the OKS sequence, the homology of OKS and PrMsp2 in the region downstream of this repeat is apparent. Therefore, it appears that the repeats were generated some time after the split between P. falciparum and P. reichenowi.
Analysis of the single P. reichenowi sequence allows us to approximate the ancestral sequence of the two P. falciparum Msp2 allele families. Indeed, comparison of the three RHRs discloses that, although the precursor sequences for the various repeats probably derive from the common P. falciparum-P. reichenowi ancestral species, the extant diversity among the Msp2 alleles has occurred since the divergence of the two species. The distinctive dimorphism of the two P. falciparum alleles results from proliferation of repeats in two different regions of the molecule. Presumably, because the overall MSP-2 molecule is constrained in size, the proliferation of repeats leads to loss of other regions; ie. the 3D7/Camp repeat precursors were lost in FC27 alleles, and the FC27 repeat precursors were lost in the 3D7 alleles.
The repetitive DNA sequences found within the Csp and Msp2 genes, as well as those among other P. falciparum antigenic determinants, are clearly subject to much higher rates of mutation than are nonrepeat sequences found within the same locus. Indeed, the paucity of silent substitutions within the nonrepetitive regions indicates that IGR events have generated repeat diversity in a relatively short period of time. Empirical estimates of mutation rates among repetitive DNA sequences, such as satellite DNA, are as high as 10 Ϫ2 mutations per generation and therefore several orders of magnitude greater than rates for point mutations 30 . These high mutation rates, coupled with strong selection for immune evasion, yield an extremely accelerated evolutionary rate for P. falciparum antigens.
The Msp1 gene has been cited as an apparent exception to the rule of the association between extreme antigenic polymorphism and occurrence of repetitive DNA. Like Msp2, Msp1 exhibits considerable substitution and length variation between two allelic classes (Group I and Group II), but much less variation within each class 11,31 . The two classes are commonly designated by the strains in which they were originally identified: K1 (Group I) and MAD20 (Group II). Tanabe et al. 31 partitioned the MSP-1 protein into 17 blocks, based on the degree of amino acid polymorphism; seven are highly variable, five are semi-conserved and five are conserved. Table 1    non-synonymous nucleotide diversity () for each of these 17 blocks. Note that within either group, non-synonymous and synonymous polymorphisms are absent or rare in most regions, with the notable exception of Block 2, which encodes a set of repetitive tripeptides, and is thus subject to the same type of diversity-generating IGR found in Msp2 and Csp. However, most blocks exhibit far greater nucleotide polymorphisms between than within groups. Based on the diversity in the region encompassing Blocks 4-10, Hughes 22 concluded that the divergence between Group I and II alleles occurred about 35 million years ago. However, he inferred an age of 0.5 million years for a small region within Block 3 (which Hughes referred to as Region 4). Hughes contends that this 70-fold difference in age of allelic blocks, which are separated by Ͻ200 bp, is attributable to high recombination between blocks and a strong balancing selection that has maintained these alleles throughout half of the evolution of the genus. This scenario is extraordinarily improbable, and seems not to fit the observations. Specifically, if the Block 4-10 region was in fact tens of millions of years old, we would expect to see considerable within-group synonymous site polymorphism -but this is not the case.
Rather, it is proposed that it is the rate of evolution, and not the age of these blocks, that is so vastly different. Here too, it is the repetitive DNA regions that are implicated in the rate difference. The dimorphism among Group I and II repeats within Block 2 has been shown to result from processes exactly analogous to those within the Msp2 repeat region 32,33 . The occurrence of repetitive DNA within other blocks has not been described to date. However, repeats within several of the most polymorphic Msp1 blocks have been identified, in particular, Blocks 4, 8 and 14, which were previously characterized as non-repetitive 35 .
Work focused on the repeats detected within Block 8, which is the block identified by Tanabe et al. as showing the lowest amino acid similarity between groups (10%), and which, in our analysis, is the most polymorphic in terms of non-synonymous nucleotide diversity ( = 0.711) 35 . The presence of three group-specific repeats within this block (Fig. 3) was reported 35 . One 9 bp repeat (R2a) is found in all Group II alleles (the five uppermost alleles in Fig. 3); and two repeats, of 6 bp (R1a) and 7 bp (R1b), are present in all Group I alleles. It is hypothesized that the occurrence of these repeats within this very short stretch of DNA is a highly significant departure from chance, and this was tested by searching the recently completed genomic sequences of P. falciparum chromosomes 2 and 3. The nucleotide sequences of repeats R1a, R1b and R2a appear 25, 116 and 11 times, respectively, within the 947 kbp of chromosome 2. Within the 1060 kbp of chromosome 3, the R1a, R1b and R2a repeats are present 39, 52 and seven times, respectively. None of the three nucleotide repeats ever appears in tandem on either chromosome 2 or 3. Moreover, the average distance between each occurrence on these chromosomes is Ͼ20 kb, demonstrating that their repeated occurrence in the short 147 bp segment of Msp1 Block 8 is a strong departure from random expectation. The Msp1 gene is located on chromosome 9, which has not yet been assembled as a complete nucleotide sequence; nonetheless, the distribution of these nucleotide repeats is not likely to differ markedly between chromosomes by chance alone.
It is worth noting that R1a and R2a also exist as clustered repeats outside of Msp1, but they are in both cases located within encoded surface proteins. Thus, on chromsome 2: (1) five of the 11 R2a repeats are located within a 558 bp region corresponding to a predicted secreted antigen that appears similar to the glutamic acid-rich protein gene; and (2) within the pfEMP member of the var gene family, there are 67 repeats, each 39 bp long and the 3Ј terminus of each of the 67 repeats is an R1a sequence. The biological significance of the occurrence of these repeat motifs within multiple antigens is difficult to interpret, but these tantalizing observations lead us to wonder whether these repeats are random products of IGR events, or whether they play some important role in recombination, as would be the case if they were involved in site-specific recombinase activity. In any case, what is clear from the observation of highly significant repeats within regions of the Msp1 gene previously thought to be nonrepetitive is that the extensive polymorphism is attributable to the same kinds of repeat variation and rapid divergence known in the other antigenic determinants.

Conclusions
Homologous comparisons among allelic variants of antigenic genes reveal that most of the observed variation is directly attributable to rapid mutational processes associated with IGR. The increased rate of evolution among these genes reconciles the recent origin of extant P. falciparum populations with the abundance of antigenic diversity observed globally and locally. Conclusions regarding the evolutionary origin of antigenic diversity in P. falciparum have bearing on determining the mechanisms for generating the novel antigen alleles that ensure the long-term survival of the parasite 35 . What remains is to ascertain the relevance of the various IGR mechanisms that underlie the diversification process. It has been noted that IGR can result from either intra-or interhelical events. An example of intrahelical recombination is that of mitotic, slippedstrand mismatch repair (SSM), which is considered to be the principal source of variation in repetitive units such as satellite DNA. Interhelical recombination derives from the classic process of meiotic crossing over and recombination within or between loci on homologous chromosomes.
Both of these processes clearly occur in P. falciparum. Kerr et al. 34 have shown that meiotic, interhelical recombination occurs between mixed Msp2 genotype parasites passaged in laboratory animals. Indeed, this process constitutes the basis for generating linkage maps of P. falciparum chromosomes 18 . But it has been shown that, despite the abundant intragenic recombination within Csp CR, there is an apparent absence of recombination between 5Ј and 3Ј NR, suggesting that the duplication and deletion of RATs occur by mitotic processes such as SSM 25 . SSM has also been implicated 28 as the cause of repeat variation in Msp2. However, it is interesting to note that among Ͼ100 field isolates from which Msp2 has been sequenced and entered in GenBank, only six have hybrid 3D7/Camp-FC27 sequences, despite the strong bias towards sequencing isolates with unusual serotyping results.
The debate over the relevance of sexual recombination between P. falciparum types has been contentious and will probably remain so for some time. However,

Focus
Parasitology Today, vol. 16, no. 9, 2000 as with most controversies centering upon mutually exclusive, dichotomous viewpoints, the final resolution may come from conciliation. In any case, it is becoming increasingly clear that the population structure of P. falciparum might not be uniform throughout the species, but dependent upon local factors related to parasite, vector and host biology [36][37][38][39] . An accurate determination of these factors is contingent upon careful analysis of parasite genotypes and appropriate determination of homologous comparisons.