Using mitochondrial and nuclear markers to evaluate the degree of genetic cohesion among Echinococcus populations

Based on the distinctiveness of their mitochondrial haplotypes and other biological features, several recent publications have proposed that some Echinococcus granulosus strains should be regarded as separate species. However, the genetic cohesion of these species has not been extensively evaluated using nuclear markers. We assess the degree of polymorphism of the partial mitochondrial cox1 (366 bp), the nuclear mdh (214 bp) and EgAgB4 (281–283 bp) genes of E. granulosus sensu lato isolates collected from areas where dif-ferent strains occur sympatrically. Five distinct mitochondrial haplotypes were determined by direct sequencing (G1, G2, G5, G6 and G7). The mdh genotypes were ﬁrst screened by SSCP: three alleles were identiﬁed (Md1–Md3), which were further conﬁrmed by nucleotide sequencing. For EgAgB4 , which was analysed by direct sequencing the PCR products, two groups of sequences were found: EgAgB4-1 and EgAgB4-2. No haplotype-speciﬁc mdh or EgAgB4 sequences occur. Nevertheless, alleles Md1 and Md2 and type 1 sequences of EgAgB4 showed a higher frequency within the group of haplotypes G1–G2, while allele Md3 and EgAgB4-2 are most fre-quent in the G5–G7 cluster. By AMOVA it is shown that 79% of the total genetic variability is found among haplotype groups. These ﬁndings are compatible with two not mutually exclusive evolutionary hypotheses: (a) that haplotypes share an ancestral polymorphism, or (b) that the reproductive isolation between parasites with distinct haplotypes is not complete, leading to gene introgression. The bio-logic and epidemiologic consequences of our ﬁndings are discussed.


Introduction
After a recent taxonomic review, the genus Echinococcus Rudolphi, 1801 (Cestoda; Taeniidae) now includes eight recognized species: Echinococcus granulosus, Echinococcus equinus, Echinococcus canadensis and Echinococcus ortleppi, which cause cystic echinococcosis (referred here as E. granulosus sensu lato, for convenience); Echinococcus multilocularis, causing alveolar echinococcosis; the Neotropical Echinococcus oligarthrus and Echinococcus vogeli causing polycystic echinococcosis; and Echinococcus shiquicus, which was recently identified in the Tibet mountains, and seems more closely related to E. multilocularis. The parasite has a complex life cycle, requiring a carnivore definitive host and an herbivore intermediate host, between which transmission occurs through predator-prey interactions. During the larval stage Echinococcus spp. proliferate asexually and develop numerous protoscolices, each with a potential to mature into an adult worm. Worms are hermaphrodites, and reproduce mainly by selfing in the carnivore intestine, but a low rate of outcrossing has been inferred from genetic polymorphisms (Thompson and Lymbery, 1996;Lymbery et al., 1997;Haag et al., 1999). Nevertheless, due to the clonal origin of the adult population derived from a single metacestode, cross-fertilization occurs most frequently between identical genotypes (geitonogamy), which is equivalent to selfing with regard to the population genetic consequences.
A remarkable feature of E. granulosus sensu lato is a large intra-specific diversity in mitochondrial haplotypes and some adult morphologic traits. Mitochondrial genetic variants differing in epidemiologically relevant characters used to be informally designated ''strains" (Bowles et al., 1992(Bowles et al., , 1995. Most of the evidence used to evaluate if strain designations correlate with morphology comes from larval and adult rostellar hook morphometry. For example, the number and length of hooks can be used as diagnostic features to distinguish the camel and the sheep strain (Ahmadi, 2004), but not the sheep and the Tasmanian sheep strain (Hobbs et al., 1990). Now the taxonomic status of some of these strains has been debated. The horse strain (haplotype G4) was the first to be proposed as a new species, E. equinus, due to its genetic and developmental distinctiveness (Thompson et al., 1995). Later, the cattle strain (haplotype G5) was also indicated as a separate species (Thompson and McManus, 2002): E. ortleppi. Finally, based on phylogenetic analyses of 11 complete mitochondrial genome sequences, it was proposed by Nakao et al. (2007) that some major clades within the Echinococcus phylogeny should be considered as distinct species. Combining their original results with other published mitochondrial sequences, and scattered observations on morphology, host specificities and biogeography, the authors suggested the following reclassification: the clusters G1-G3 should be considered E. granulosus sensu stricto, G4 E. equinus, G5 E. ortleppi and G6-G10 (a cluster including the camel, pig and cervid strains) E. canadensis.
It must be kept in mind, nevertheless, that the G5-G8 strains belong to a single cluster in the inferred tree, and that the genetic distances between mitochondrial genomes of this cluster average about 5%. Furthermore, although similar amounts of mtDNA divergence correlate with species circumscriptions in other animals, cautiousness is required in interpreting mitochondrial barcodes. For example, several African cichlid fish species, which radiated in the last 15-200,000 years, differ by less than 1% in mitochondrial genes (Meyer et al., 1990). Salamanders from the species Ensatina eschscholtzii, on the other hand, form a ring of subspecies whose members diverged more than 5 million years ago, with many subspecies having more than 5% mtDNA divergence from each other. Yet, because of population connectivity based on intensive ecological research, this complex is considered one species (Moritz et al., 1992).
Unfortunately, the genetic evidence behind the taxonomic debate is very limited. Population genetic variability has largely been neglected in Echinococcus.
Furthermore, the genetic studies have concentrated on the analysis of mitochondrial DNA, a non-recombining, fast evolving and maternally inherited genome in most animals. Little is known about the degree of polymorphism in nuclear loci, which would be valuable to test for reproductive isolation. In the present study, we verify the pattern and the amount of variation in a 214-bp fragment of the gene encoding malate dehydrogenase (mdh) and in a 281-283-bp segment of CDS encoding the fourth antigen B subunit (EgAgB4). Samples of E. granulosus sensu lato isolates were collected from geographic areas where strains occur sympatrically.

Parasite samples and strain determination
Echinococcus granulosus hydatid cysts were obtained from different host species and four geographic areas where the distribution of strains is known to overlap (Table 1). Strains were determined with partial sequences (366 bp) of the mitochondrial cytochrome oxydase 1 gene (cox1) using the procedures described by Bowles et al. (1992). Sequencing was performed by cycle sequencing and migrated in an ABI 3730XL machine (Applied Biosystems).

Genotyping the mdh locus
We amplified a fragment of 214 bp from the gene encoding malate dehydrogenase (mdh-GenBank Accession No. L08894), which includes parts of the second and third exons and the complete second intron. The PCR primers used were 5 0 -CGCTCCTTCCATTTCCG AAAG (forward) and 5 0 -TTGGTGACAACGGCGTG AGAC (reverse). The reactions contained approximately 30 ng of genomic DNA, 1 U Taq polymerase (Invitrogen), 1.5 mM MgCl 2 , 10 mM dNTPs and 20 pmol of each primer in a total volume of 50 ll. The amplification was carried out using the following cycling conditions: 94°C for 4 min, 40 cycles of 94°C for 1 min, 55°C for 1 min and 72°C for 1 min and a final extension step of 72°C for 10 min. The PCR fragments were denatured at 95°C for 3 min and migrated in 12% polyacrylamide gels to separate the single stranded DNA according to its conformation (Single Strand Conformation Polymorphism-SSCP) using the GenePhor system (GE Healthcare). Gels were run in ''buffer A" (pH 9.0-GE Healthcare) at 12°C and constant voltage of 200 V for 1 h and 30 min. The gels were silver stained using standard protocols. Genotypes were attributed to the distinct SSCP patterns and confirmed by automated nucleotide sequencing. At least six independent PCR reactions corresponding to each SSCP pattern were sequenced.

EgAgB4 sequence analyses
We designed primers to specifically amplify the EgAgB4 gene, based on the GenBank Accession No. sequence DQ152008. The forward primer PB4F_ol1 (5 0 -GGAT GGAGTATAAGGAGCAG) anneals within the upstream flanking region, while the reverse primer AgBRev2 (5 0 -GACATATTTCTTCAACACTTCGTGAAC) anneals inside the second exon, generating a fragment of about 350 bp. The PCR reactions were similar as those described for mdh, except for the annealing temperatures, which followed a touchdown procedure, starting with 60°C and decreasing until 50°C during the 20 initial cycles. The amplicons were purified through columns (GE Healthcare) and sequenced directly.

Statistic analyses
Sequence trace quality was assessed with the SeqMan tool from the Lasergene software (version 7.0). The amount of sequence polymorphism in the mdh locus was estimated by the number of segregating sites (s) and nucleotide diversity (p) using the dnaSP software version 4.0 (Rozas et al., 2003). The Hardy-Weinberg genotype proportions and the population genetic structure were analyzed with Arlequin version 3.1 (Excoffier et al., 2005). For the genetic structure analyses, we used a hierarchical approach, first grouping the isolates by their mitochondrial haplotype (strain) and further subdividing the groups into populations according to the geographic location. This approach was used to decompose the genetic variation with AMOVA (Excoffier et al., 1992). Gene flow between strains in each geographic area was verified through the F ST parameter (Reynolds et al., 1983), which can be used to estimate the number of migrants per generation (N m ) between populations (Slatkin and Maddison, 1989).

Strain polymorphism at the mdh locus
Five distinct haplotypes are present in our sample, as determined by the cox1 mitochondrial marker (G1, G2, G5, G6 and G7; Table 1). The sequence regions with traces of high quality never showed double peaks for cox1.
For mdh, SSCP discriminated three alleles segregating within our samples (Md1, Md2 and Md3; Fig. 1), and their distinctiveness was confirmed by nucleotide sequencing (GenBank Accession Nos. EF640368, EF640371 and EF640370, respectively). Alleles Md1 and Md2 differ by 3 nucleotide substitutions, while Md3 differs from Md1 and Md2 by 10 and 8 nucleotide substitutions, respectively. The mdh genotype and allele frequencies are displayed in Table 2. There are no exclusive alleles in the strains, but Md3 occurs at a higher frequency (at least 0.88) in the cattle, camel and pig strains (haplotypes G5, G6 and G7). Alleles Md1 and Md2, on the other hand, are present at low frequencies in these strains, but at a higher frequency in the sheep and Tasmanian sheep strains (G1 and G2). A single parasite of G1 in Brazil is heterozygous for Md2 and Md3, and two were homozygous for Md3.
Large heterozygote deficiencies were found for the G1 populations from Argentina (p < 0.01), Algeria (p < 0.05) and Brazil (p < 0.01), and for G5 in Brazil (p < 0.01). Ten, out of 214 nucleotide sites at the mdh locus are polymorphic in the surveyed populations; overall, the nucleotide diversity within each strain is below 1% (Table  1, Supplementary data). Of the 10 polymorphic sites, 3 are located inside the second exon, 3 in the intron and 4 in the third exon (data not shown). When the genetic diversity is decomposed in three hierarchical levels using AMOVA, 79.07% of the variation occurs among strains, 1.1% among populations from the same strain and 19.9% is present within populations. The pairwise F ST estimates between populations of the G1-G2 cluster ranged from 0.79 to 0.90 (p < 0.01) when compared to the G5-G7 cluster (Table 3). Not significant and low F ST values occur within each cluster, except for the G1 populations of Southern Brazil Â Argentina (F ST = 0.13) and Southern Brazil Â Algeria (F ST = 0.03), which showed significant genetic differentiation (p < 0.01 and 0.05, respectively). Conversely, the estimates of gene flow are higher for populations of the same strain and for strains of the same cluster, but very low between strains of distinct clusters. Sympatric populations of different clusters seem to show a small amount of gene flow, ranging between 0.08 and 0.13 migrants per generation, as inferred from our results (Table 3).

Sequence variability for EgAgB4
Due to technical difficulties, only a fraction of our sample could be analyzed both for mdh and EgAgB4 sequence diversity. Overall, 176 isolates were typed both for mdh and EgAgB4 (Table 4). Furthermore, due to the high degree of nucleotide sequence variation in the EgAgB4 upstream flanking region, our analyses focus only on the polymorphisms found within a subset of 281-283 bp after the start codon (see Appendix A, Supplementary data). The The N m value for sympatric strains is displayed in bold letters. * p < 0.05. ** p < 0.01. sequences are separated in two highly divergent groups (s = 35, leading to a divergency of 12%), which we call EgAgB4-1 (GenBank Accession No. DQ152008) and EgAgB4-2 (GenBank Accession No. AY569358). Since the E. granulosus isolates from both strain clusters always show either a type 1 or a type 2 EgAgB4 sequence pattern, they are separated accordingly (Appendix A, Supplementary data). However, several segregating sites are found for each group (s = 3 within EgAgB4-1; s = 6 within EgAgB4-2, data not shown), including sites with double peaks within regions of high quality traces, which we interpret as polymorphisms. Since we cannot assure that they belong to alleles from the same locus, we ignore these additional polymorphisms at the present stage of our analyses. The majority of isolates from the G1-G2 cluster harbor EgAgB4-1 sequences (137 out of 138 isolates, or 99%), while 28 out of 38 isolates (74%) belonging to the G5-G7 cluster contain the EgAgB4-2 pattern. However, as for mdh, several recombinant forms are found for this genetic marker as well (Table 4).

Discussion
We analyzed the genetic structure of E. granulosus sensu lato in four geographic areas where distinct strains occur in sympatry, and therefore could share the same definitive host, using one mitochondrial (cox1) and two nuclear markers (mdh and EgAgB4). The cox1 sequences were used to identify strains, while the mdh genotypes allowed an estimate of gene flow among populations. We already have indications that the EgAgB genes are highly redundant in the Echinococcus genome (Haag et al., 2006), thus the polymorphism within EgAgB4-1 and EgAgB4-2 was ignored, and the marker was not included in the population structure analyses. However, these two groups of EgAgB4 sequences are highly divergent, and the totality of the 176 isolates analysed by the three genetic markers show either a type 1 or a type 2 pattern. In Table 3, all genotypes showing a recombinant genotype taking into account the three markers (mitochondrial Â nuclear or nuclear Â nuclear markers) are indicated by an asterisk, and correspond to 7.4% (n = 13) of our sample.
It has been suggested already that the heterozygote deficiency in E. granulosus populations is due to the fact that worms usually develop in the dog intestine in an aggregate manner, increasing the chance of crossing between clones derived from a single hydatid cyst, which is equivalent to selfing (Lymbery et al., 1989). In the present study, mdh heterozygote deficiencies were found in populations with statistically representative samples (n P 20). This differs from our results in a previous study (Haag et al., 1999), which showed an excess of heterozygotes in the European population and equilibrium frequencies for Southern Brazil and Australia, for two polymorphic loci. Since the samples were considerably smaller and collected in different time periods, it is possible that this difference was due to sampling bias. Nevertheless, the previous study showed strong linkage disequilibrium for all populations, in agreement with the conclusion that selfing (or geitonogamy) is the prevalent mode of reproduction in E. granulosus populations.
The immediate consequence of such a reproductive mode is the genetic structuring of the population in separate groups, corresponding to the alternative homozygote genotypes and linkage groups (Hartl, 2000). Moreover, if inbreeding is associated with natural selection, such as the adaptation to a new host species, genetic changes in the population can occur rapidly, because the adaptive alleles will remain ''arrested" in their linkage groups, increasing the differentiation among sub-populations. Rapid evolutionary changes are facilitated by the asexual amplification of parasite genotypes during the larval stage. This phenomenon was envisioned by Smyth and Smyth (1964), who proposed that ''mutations" amplified during protoscolex formation could lead to the formation of new strains. Overall, it means that animal domestication by primitive human communities could have accelerated the Echinococcus evolutionary process, generating parasites adapted to their local livestock (Rausch, 1967). With the beginning of a worldwide animal trade, some E. granulosus variants (strains) came in contact. The consequences of this contact, which represents the current situation in several regions such as Argentina (Kamenetzky et al., 2002), are largely unpredictable. In the present study, we show that mdh alleles are shared among some E. granulosus strains. It indicates that the history described by the mitochondrial genes is only partial, and it illustrates how misleading it can be to rely only on a single kind of marker for population genetic studies (Anderson, 2001). Two hypotheses could explain our findings: (a) that strains share an ancestral polymorphism, and (b) that the reproductive isolation between strains is not complete, leading to gene introgression. Although the two hypotheses are not mutually exclusive, we believe that some gene flow is occurring among strains. For this reason, we are now evaluating how the genetic diversity is distributed among strains in other nuclear loci (including those analysed in our previous study, Haag et al., 1999) and other stages of the parasite life cycle.
If a small amount of gene flow is occurring among strains, the population dynamics of E. granulosus in some regions can become very complex, with important epidemiologic consequences. Although the number of migrants per generation estimated with the mdh polymorphism is small, it suggests that adaptive alleles can spread from one strain to another, and be amplified in the population as envisioned by Smyth and Smyth (1964). For example, it is thought that the camel strain is less infective to humans (Lymbery and Thompson, 1988), but in Argentina it was found in 4 out of 9 human patients (Rosenzvit et al., 1999), and in 3 out of 33 human patients in Iran (Harandi et al., 2002). McManus and Thompson (2003) speculated that this could have been the result of a mutation in the G6 haplotype, but it could also be explained by introgression of unknown ''human infectivity" genes from another strain.
Another implication of our findings concerns the Echinococcus taxonomic debate. Evolutionary theory predicts that the speciation process is accomplished if the differentiated populations, or the putative species, maintain their genetic identity when in geographic contact, due to reproductive isolation mechanisms (Mayr, 1942). Accordingly, it was proposed that a genetic yardstick should be applied to define Echinococcus species, based on the reciprocal monophyly and the genetic cohesion of the divergent clades on a phylogeny (Lymbery, 1992). The mdh locus does not differentiate between the cattle, camel and pig strain populations (haplotypes G5-G7), since their F ST values are not statistically significant. However, our previous study (Haag et al., 1999) showed that in 3 out of 5 nuclear loci the pig and cattle strains segregate distinct alleles. For this reason, we suggest that an analysis of larger number of nuclear loci should be taken into account for evaluating the degree of their genetic cohesion, and, consequently, identifying good Echinococcus species.