Functional constraints of the Cu,Zn superoxide dismutase in species of the Drosophila melanogaster subgroup and phylogenetic analysis.

The phylogenetic relationships among the Drosophila melanogaster subgroup species were analyzed using approximately 1550-nucleotide-long sequences of the Cu,Zn SOD gene. Phylogenetic analysis was performed using separately the whole region and the intron sequences of the gene. The resulting phylogenetic trees reveal virtually the same topology, separating the species into distinct clusters. The inferred topology generally agrees with previously proposed classifications based on morphological and molecular data. The amino acid sequences of the Cu,Zn SOD of the D. melanogaster subgroup species reveal a high-conservation pattern. Only 3.9% of the total amino acid sites are variable, and none affects the major structural elements. Comparison of the Drosophila Cu,Zn SOD amino acid sequences with the Cu,Zn SOD of Bos taurus and Xenopus laevis (whose three-dimensional structure has been elucidated) reveals conservation of all the protein's functionally important amino acids and no substitutions that dramatically change the charge or the polarity of the amino acids.


Introduction
Superoxide dismutase (SOD) is a metalloenzyme which protects organisms from toxic reactive oxygen metabolites by catalyzing the conversion of the superoxide anion to hydrogen peroxide and molecular oxygen (McCord and Fridovich 1969). Oxygen free radicals (superoxide O 2) , hydrogen peroxide H 2 O 2 , and hydroxyl anion OH) are generated as intermediates during normal metabolism (Halliwell and Gutteridge 1984), and even more so when organisms are exposed to such environmental conditions as active compounds (ozone, SO 2 , herbicides, acid rain, cigarette smoke, etc.), radiation, and other stress agents. Increased levels of oxygen radicals (oxidative stress) can damage cells by oxidizing many biomolecules (DNA, proteins, membrane lipids), which result in several diseases, such as mutation, carcinogenesis, aging, increased male crossing-over frequencies in Drosophila melanogaster (Vontas et al. 1999), and cell death (Fleming et al. 1982;Harman 1982).
SOD is classi®ed into Mn, Fe, and Cu,Zn SOD, depending on the metal in the active site. Cu,Zn SODs are found in the cytosol of all eukaryotic cells (Puget and Michelson 1974) but not generally in prokaryotic cells, with a few exceptions (e.g., Photobacter leiognathi, Pseudomonas diminuta, P. maltophila, Caulobacter crescentous, and several species of Brusella and Haemophilous). However, according to Benov and Fridovich (1994), who found that Esc-herichia coli contains a kind of monomeric variant copper and zinc SOD in the periplasm, it is likely that this enzyme is a common attribute of Gram-negative bacteria. Furthermore, the Mn SODs occur in bacteria, chloroplasts, eukaryotic algae, and mitochondria; the Fe enzymes occur in bacteria, chloroplasts, and prokaryotic algae (Fridovich 1979;Asada et al. 1980). Eukaryotic Cu,Zn SODs consist of two identical subunits, each of which contains one atom of copper and one atom of zinc in the active site of the enzyme, with a total molecular weight of 31.5 kDa. In Drosophila melanogaster the Cu,Zn SOD locus is on the left arm of chromosome 3 and encodes a polypeptide with 151 amino acids in the functional enzyme.
In addition to the essential physiological role of SOD and the high enzymatic activity of the protein, Sod is useful in phylogenetic studies. Cu,Zn Sod is a single-copy gene and the protein is abundant in almost all eukaryotes and fairly conserved in amino acid sequence (Ayala 1986). In Drosophila it evolves at a fairly rapid rate that makes it informative for research of recent evolutionary events (Lee et al. 1985).
The family Drosophilidae is one of the most diverse and widely distributed dipteran families, consisting of more than 3000 species, which are divided into 61 genera (Wheeler 1986). Its taxonomy and phylogeny, however, remain controversial, due to the lack of knowledge of the phylogenetic relationships of the species. To this end, several studies have been done based either on internal morphology and biogeography (Throckmorton 1975) and morphological characters (Grimaldi 1990) or on molecular approaches including the microcomplement ®xation technique (Beverley and Wilson 1984), DNA±DNA hybridization, mitochondrial DNA (mtDNA), ribosomal RNA (rRNA), and sequences of various nuclear genes.
The melanogaster subgroup is one of the most intensively studied groups in the genus Drosophila. However, questions remain about the phylogenetic relationship among D. simulans, D. melanogaster, D. mauritiana, and D. sechellia. According to Solignac and co-workers' Solignac and Monnerot 1986) data (RLFP analysis of the mtDNA), the subgroup is polyphyletic, with D. mauritiana, D. simulans, and D. sechellia associating with D. melanogaster. D. yakuba and D. teissieri are placed outside of this cluster, while the species pair D. erecta/D. orena, which is also placed outside the D. melanogaster cluster, has appeared more recently. Caccone et al. (1988), based on scnDNA± DNA hybridization, found that D. mauritiana and D. sechellia are the most closely related species, followed by D. simulans and D. melanogaster. D. yakuba, D. teissieri, and D. erecta represent another cluster, while D. orena is outside all the former groups. In contrast, many other studies proposed the species pairs simulans/sechellia (Cariou 1987) and simulans/ mauritiana (Lachaise et al. 1986;Kliman et al. 2000;Ting et al. 2000) as the most closely related ones. The phylogenetic tree topography was a little dierent when the same method was applied in mtDNA. The latter was veri®ed by Lachaise et al. (1988), who prepared a consensus tree using molecular, chromosomal, morphological, and behavioral data, in which D. mauritiana and D. simulans are most closely related, followed by D. sechellia and D. melanogaster. D. teissieri±D. yakuba and D. erecta±D. orena are sister taxons and they cluster with the rest of the melanogaster complex separately.
Considering the circumstances mentioned above, a reexamination of the phylogenetic relationships of the species of the D. melanogaster subgroup is expected to provide valuable information. To assess these relationships we have determined the nucleotide sequences of the SOD coding region in six species belonging to the melanogaster subgroup. Together with homologous sequences published previously, we used the information obtained from the analysis of these nucleotide sequences in an attempt to infer the phylogeny of the melanogaster subgroup. We used the Cu,Zn SOD nucleotide sequence also seeking to investigate the amino acid homology of the protein among the Drosophila species and any inferred substitution impacting the three-dimensional (3D) structure. The functional constraints of the enzyme, re¯ected by the very low amino acid variation between the Drosophila species, are extensively discussed. The nucleotide sequences of D. melanogaster (Canton S), D. simulans, and D. willistoni were recovered from the NCBI Genbank database with accessions numbers X17332, X15685, and X13831, respectively.

Materials and Methods
DNA Preparation and Ampli®cation. Insects were frozen in liquid nitrogen and stored at )80°C. Genomic DNA was extracted from about 20±30¯ies on ice following the method of Henry et al. (1990). Two pairs of primers were designed for the PCR reaction method (McPherson et al. 1991) based on the conserved regions of the D. melanogaster Cu,Zn SOD nucleotide sequence.
First pair: The ampli®cation was carried out using high-®delity conditions (Kwiatowski et al. 1991b). Pwo polymerase (a proofreading enzyme) (Boehringer±Mannheim) was used to get ampli®cation products of a high ®delity. The total ampli®cation product consists of approximately 1670 bp including the two exons of the Cu,Zn SOD gene, the intron, and a segment of about 350 bp of the 5 H untranslated region (Fig. 1).
DNA Cloning and Sequencing. The ampli®ed samples were run on 1% agarose gel and the bands excised out using the Gel Extraction Kit (Boehringer±Mannheim, Catalog No. 1696505). Samples were then ®lled in with Klenow polymerase and ligated in the vector pBluescript 2 II SK()) in the EcoRV restriction site (Stratagene) and transformed into competent XL1 blue cells in Escherichia coli (Sambrook et al. 1989). Plasmid DNA was isolated using the QIAprep Spin Miniprep Kit (Qiagen, Catalog No. 27104). Sequencing of the double-stranded plasmid was carried out using an ABI PRISM 310 Genetic Analyzer (Perkin±Elmer). The method is based on the Sanger et al. (1977) dideoxy termination method. For the ampli®cation reaction we used the ABI PRISM Dye Terminator Cycle Sequencing Ready Reaction Kit with Ampli Taq DNA Polymerase FS and four dideoxy terminators labeled with dierent¯uorescent dyes (Perkin±Elmer). The PCR reaction products were electrophoresed in the ABI PRISM 310 Genetic Analyzer instrument. The PCR products were separated according to size under a constant voltage at the edge of a capillary ®lled with POP6 polymer. Detection of the dye-probed products was achieved with a laser detector and a lens system. For sequencing, we used the same PCR primers, SA, SB, SF, SR, SK, and the external )21 M13 primer. For each genomic region both strands were completely sequenced and the consensus nucleotide sequence was obtained for two dierent clones from at least two sequencing reactions.
DNA Sequence Analysis and Phylogenetic Tree Construction. The sequence alignment was performed using the program Clustal W (Tompson et al. 1994). The phylogenetic trees were constructed with the neighbor-joining (NJ) (Saitou and Nei 1987), UPGMA (Sneath and Sokal 1973), maximum parsimony (MP) (Fitch 1971), and minimum evolution (ME) (Rzhetsky and Nei 1992) methods. All phylogenetic and molecular evolutionary analyses were conducted using MEGA Version 2.0 (Kumar et al. 2001) and PHYLIP software package Version 3.5c (Felstenein 1995). The con®dence level of the clusters was evaluated with the bootstrap test with 1000 replications (Felsenstein 1985) and with the t test Nei 1992, 1993). The nucleotide sequences reported in this study have been deposited in the NCBI/Genbank nucleotide sequence database under accession numbers AF127155 to AF127160.

Structure of the Sod Gene and Nucleotide Sequences
The structure of the Sod gene is outlined in Fig. 1. We determined the nucleotide sequence of the Cu,Zn SOD coding region for six species of the Drosophila melanogaster subgroup and a segment of the 5 H SOD anking region for nine species (Fig. 2). In addition, for the sequence alignments we used two published sequences: Drosophila melanogaster (Kwiatowski et al. 1989a) and Drosophila simulans (Kwiatowski et al. 1989b). Cu,Zn SOD consists of two exons and one intron. The ®rst exon is 66 bp long and codes for the ®rst 22 amino acids including the N-terminal methionine. The second is 396 bp and codes for the other 131 amino acids plus the TAA termination signal. The intron ranges in length from 708 to 783 bp. The intron±exon insertion points exhibit the GT± AG consensus sequence (Breathnach et al. 1978). The sequence before the ®rst methionine is CGAA ( Fig. 2; boldface characters) in all species, including D. melanogaster and D. simulans, instead of C/A AA A/C as given by Cavener (1987) for Drosophila genes.
The 5 H¯a nking region is lowly conserved. The variation of the segment that is 3 transcribed but not translated is 15.8% ( Fig. 2; light-shaded sequences); it   ***** ****** ******* **** ** * ** * ****** * ** * The AT content is high, 64.6% for the 5 H region up to the start of translation. According to Seto et al. (1989) the 5 H¯a nking region of D. melanogaster Cu,Zn SOD shows a putative imperfect TATA sequence, TATTTCT, and a consensus sequence, CCCAAT ( Fig. 2; boldface characters), best matched to the CAAT box. These putative regulatory sequences are found in homologous positions in D. simulans (Kwiatowski et al. 1989). The putative TATA box is similar to that of D. simulans in all the species, except D. melanogaster, D. teissieri, and D. yakuba. Similarly in the putative CAAT box the latter three species have dierent nucleotide sequences than the other eight species.

Phylogenetic Analysis
We determined the phylogenetic relationships of the melanogaster subgroup considering only the intron sequence, including the D. willistoni Cu,Zn SOD intron as an outgroup. The coding region is highly conserved and thus not phylogenetically informative.
In addition, we used the complete nucleotide sequence of 1673 bp, including the 5 H¯a nking region, the two exons, and the intron. The alignment of the eight SOD introns of the melanogaster subgroup consists of 812 nucleotide positions, of which 448 (55.17%) are variable. The percentage G+C contents of any given species are virtually identical, having a value of 35.0%. The transition/transversion ratio is 0.7. With the addition of the intron of D. willistoni, the nucleotide positions increase to 824, with 86.4% variation (712 variable positions). The transition/transversion ratio has an average value 0.6. The G+C content becomes 35.6% because the G+C content of the outgroup species is high (42.2%). The observed variation in G+C content has been reported earlier in other dipterans (Kwiatowski et al. 1992(Kwiatowski et al. , 1994 and probably re¯ects a bias in codon preferences that is revealed in well-expressed genes of other taxa (Sharp et al. 1988).
Given the A+T composition bias, we estimated nucleotide divergence using the Jukes±Cantor (JC), Kimura two-parameter, and Tamura±Nei models (unpublished results) as well as Tamura's three-parameter model (Table 1). All four methods give similar distance patterns. We used Tamura's distances for constructing phylogenetic trees. The NJ, UP-GMA, ME, and MP methods all yield largely similar topologies. Apparently, there is a deep split between two main clusters; one includes D. sechellia, D. simulans, D. mauritiana, and D. melanogaster. The phylogenetic trees consistently show that these four species form a monophyletic clade which is supported by a high bootstrap value. The other cluster is separated into two subclusters, one with the pair D. yakuba/D. teissieri and the other with the pair D. orena/D. erecta. The only dierence among the phylogenies concerns the species D. simulans, D. mauritiana, and D. sechellia in the case where the MP method was applied. Due to the higher accuracy of the other three methods used, we can disregard the dierent tree, which is characterized by a pairing of D. simulans with D. sechellia instead of D. mauritiana,  ** ********** ** ********************** ******* ******** as found in the trees constructed according to the other methods as well as from the thoroughly studied triplet consisting of D. simulans±D. mauritiana±D. sechellia (see Discussion). The UPGMA tree, the NJ tree, and the ME method (Fig. 3) show that D. simulans and D. mauritiana are 4 the most closely related species, followed by D. sechellia and, ®nally, D. melanogaster. That is, D. melanogaster is more distantly related to the D. simulans±D. mauritiana cluster than D. sechellia.
We aligned the complete region, consisting of 1673 sites that include the two exons, the intron, and approximately 350 bp of the 5 H¯a nking region. Nucleotide polymorphism occurs at 458 (27.4%) positions and the G+C content is 42.6%. The nucleotide distances according to the Kimura two-parameter, JC, and Tamura models were used in constructing phylogenetic trees using the NJ and UPGMA methods. Trees were also constructed with the ML method, assuming the molecular clock. The main clusters of the trees 5 are the same, arising from the intron (Fig. 3).

Sequence±Structure Selection of the Cu,Zn SOD Protein
The alignment of the amino acid sequences of the nine Drosophila Cu,Zn SODs, translated from the nucleotide sequences, plus C. capitata, Xenopus laevis, and Bos taurus, is given in Fig. 4. The two exons consist of 462 bp, of which 46 bp (9.9%) is variable. Of the 46 nucleotide variable positions, 38 (82.6%) are in the third-codon position, 3 (6.5%) are in the second-codon position, and 5 (10.9%) are in the ®rst-codon position. Six amino acid sites (13%) contain nonsynonymous substitutions (three in the ®rst-codon and three in the second-codon position), while the remaining 40 substitutions (86.9%) are synonymous. The alignment of the amino acid sequences of the eight species of the melanogaster subgroup plus the one of (used as an outgroup species) D. willistoni shows high conservation, having only six dierent amino acids in a total number of 151 positions. The average G+C content for the eight species is 60.2% for the SOD coding region and even higher for the third-codon position (76.3%). The average transition bias for the SOD coding region is 1.8. None of these substitutions occur in functionally important regions of the enzyme.
To determine the importance of the amino acid substitutions that occur between the melanogaster subgroup SOD and other species in relation to the protein function, we aligned the Drosophila SOD amino acid sequences with the crystallographically determined 3D structure of the Cu,Zn SOD sequence of the frog Xenopus laevis (Falconi et al. 1991) and the bovine Bos taurus SOD (Tainer et al. 1982). They have, respectively, 63 and 60% identical amino acids with the Drosophila melanogaster sequence. In addition, we used the amino acid sequence of another dipteran species that is not classi®ed in Drosophilidae, C. capitata to estimate the conservative amino acid residues and speculate about their potential structural and functional role in the dipteran's enzyme activity. The percentage identical amino acids of C. capitata and D. melanogaster was found to be 78.3%.

Discussion
The Cu,Zn superoxide dismutase is an abundant enzyme in eukaryotic organisms and exhibits highly speci®c activity, thus protecting aerobic cells against toxic free oxygen radicals (Fridovich 1986). The Cu,Zn Sod gene of the species analyzed in this study consists of a 462-bp coding region that is interrupted by one intron ranging from 708 bp in D. yakuba to 738 bp in D. teissieri. As expected, the second Sod intron (<100 bp) that is present in Scaptodrosophila (as well as in Ceratitis capitata and Chymomyza) is absent in the species under investigation. This ®nding is in accordance with the absence of a second Sod intron from other species of the subgenus Sophophora examined previously (Kwiatowski et al. 1994).
The 3D structure of the enzyme from Xenopus laevis and Bos taurus has been elucidated to 2-A Ê resolution (Tainer et al. 1982;Falconi et al. 1991). The enzyme crystallizes in space group C2, with two dimeric enzyme molecules per asymmetric unit. The functional and structural role of each domain has been well analyzed. Since SOD is a highly conserved enzyme overall and since the critical residues in the enzyme active site were found to be invariant in many known sequences, including Drosophila melanogaster, Ceratitis capitata, Gallus gallus, Oryctolagus cunicu-lus, Rattus norvegicus, Ovis aries, Sus scrofa, Bos taurus, Xiphias gladius, Prionace glauca, and Homo sapiens (unpublished data), the determined 3D structure of the Cu,Zn SOD of the frog and the cow was thought to ful®ll the expectations to be considered representative of the entire class of Cu,Zn SODs. These data seem to be very useful in our attempt to de®ne the respective domains of the Drosophila species. Moreover, comparisons among the available sequences of Drosophila, frog, and cow as well as the topography of the conserved amino acids may reveal selective or functional constraints that control the evolutionary history of SOD. Each subunit is composed of eight antiparallel b-strands, forming a¯attened cylinder of Greek key topology containing 46% of the residues, and three external loops with 48% of the residues. Cu and Zn are held in the active site 6.3 A Ê apart by interaction with the imidazole ring of His 61 which forms a bridge between the two metals. Important roles for SOD function and structure are played by histidine residues 44, 46, 69, 78, and 118, as well as Asp 81, contributing to metal binding, and glycine residues that are essential to the formation of the b-strand barrel. The only insertions±deletions occur outside the bstrand regions forming the core of the protein, in loops I and II, and they are limited to two amino acids in length. The N-terminal methionine and the Cterminal valine are removed in the mature protein (Lee et al. 1985). The amino acids at the protein active site are all conserved (Fig. 4). Thus, His44 6 , His46, His61, and His118 and His61, His69, His78, and Asp81, the crucial amino acids for Cu and Zn binding, respectively, are conserved for all species examined.
The conservation rate is also high around the active site, at the positions where amino acids interact with the amino acids covalently linked to the metals. Residues that hydrogen-bond to the Cu-linked amino acids, such as His41, Gly42, Thr114, Val116, Asp122, and Gly139, are all conserved. Furthermore Asn63, Lys67, Agr77, and His78, which are found in the Znbinding region of loop IV (from residue 52 to residue 83) and making hydrogen bonds to Zn-linked amino acids, are also conserved. The only exception is Lys68. At position 68 the lysine found in bovine SOD changes to asparagine in the frog and to glutamic acid in Drosophila. The threonine 133 in bovine is found as leucine and lysine in frog and Drosophila, respectively. Both are located on the surface of the protein, far away from the active center or the subunit interface. The change from positive charge (Lys) to glutamic acid and to asparagine seems not to cause a signi®cant change in the overall charge distribution as calculated using DelPhi (Honig and Nichols 1995). The degree of homology of SOD of the nine Drosophila species is 97% for this disul®de and zinc ligand-containing region. The amino acids that de®ne the active site's electrostatic channel (Getzo et al. 1983) are highly conserved (Fig. 4). The amino acid substitutions at positions Glu130, Leu131, Ser134, Thr135 7 , Ala119, Asp120, His129, and Lys133 support the classi®cation of Drosophila electrostatic shell in the fourth category of the seven arrangements given by Bordo et al. (1994). All amino acids at the above-mentioned positions are conserved in the Drosophila species and are in agreement with the total arrangement of the charges in the electrostatic channel.
The high content of glycines that contribute to the active-site stability as well as to the formation of the characteristic b-strand barrel structure represent another conserved protein feature. Most glycines are conserved in all examined species not only in abundance but also in the positions where they are located. Twenty-four glycine residues were found in identical positions, of a total of 25, in the nine Drosophila species examined. The few exceptions noticed in the overall alignment in Fig. 4 (Gly39, Gly89, Gly90, Gly91) are located in external loops that contribute little to the total conformation of the enzyme.
There is also conservation in the amino acid positions that participate in subunit contact such as Lys151 and Asp50, Val146 and Ile111, Gly49 and Gly112, and, ®nally, Gly148. Opposite substitutions occur at positions 107 (Pro, Glu, Ala) (which is in contact with the Arg113 at the opposite subunit), 17 (Val, Thr), 19 (His, Phe), 149 (Tyr, Ile), 150 (Ser, Ala), and 151 (Pro, Lys). The substitutions Val17Thr, His19Phe, Tyr149Ile, Ser150Ala, and Pro151Lys, although at the edge of the subunit interface, seem not to contribute signi®cantly to the subunit association. The Pro at position 107 occupies the i + 2 position of a tight external b-bend. The substitution by Ala or Glu may not necessarily aect the conformation and certainly does not disrupt any local interactions. The substitutions found in the amino acid sequences of the Drosophila species with respect to the frog and bovine 3D structures seem not to make important contributions to the 3D structure and not 8 to aect the enzyme's activity, as they are localized at the surface of the protein in parts not associated with enzyme's function. None of these are found in any essential position of the active center or at the electrostatic channel.
These observations are in agreement with the expectations. In general, the enzyme is well conserved over long time spans; thus, about 60% of the amino acid residues were found to remain identical between organisms from dierent kingdoms, such as humans and yeasts. The substitutions that have been established and maintained among the various species have passed through selective processes that depend on the crucial role of the substituted amino acids (Ayala 1986). We inferred the phylogenetic relationships of species of the melanogaster subgroup by using only the intron sequence. The coding region was found to be highly conserved and thus not informative for our phylogenetic analysis. In contrast, Kwiatowski et al. (1994) used exclusively the coding sequence of Sod for phylogenetic analysis but the circumstances were quite dierent from those in our study. Concerning the analysis of distantly related Drosophila species, the noncoding regions were so highly diverse that their alignment raised many uncertainties in several cases. In the case of closely related species such as D. mauritiana, D. simulans, and D. sechellia, we found that the Sod coding sequence does not contain enough phylogenetic information to resolve the branching pattern (data not shown). In addition, the selection of a proper species as an outgroup was of pivotal importance for our study. D.willistoni, which belongs to the subgenus Sophophora, is an almost closely related but well-de®ned outgroup and the split between D. willistoni and D. melanogaster cluster seems to have occurred a long time ago (about 36 Mya).
The D. simulans complex has been the subject of many eorts to infer phylogeny, and indeed, all three possible pairs of species have been proposed as the most closely related species pair, including simulans/ sechellia, simulans/mauritiana, and also the sechellia/ mauritiana pairing, which seems more unlikely compared to the two other alternative pairs. This dicult phylogenetical problem was approached by constructing a cluster of diagrams for 14 genes, thus grouping the species by various combinations (Kliman et al. 2000), in an attempt to clarify whether the next most similar sequences to the ones of D. sechellia are from D. simulans or D. mauritiana. Hey and Kliman (1993) suggested that the origin of D. sechellia arose prior to the splitting that gave rise to the D. simulans and D. mauritiana species. Therefore Kliman et al. (2000) presented an analysis involving 14 genes, in the framework of the divergence population genetics (DPG) approach, with the plurality of the cluster diagrams favoring the aforementioned explanation. This conclusion is also supported by a recent study by Ting et al. (2000) using the Odysseus (OdsH) locus. Taking into consideration all these data as well as the peculiarity of the analysis used in the phylogenetic model presented by Kliman et al. (2000), which does not employ any assumptions of instantaneous splitting among distinct homogeneous entities, we may conclude that D. simulans and D. mauritiana species are more closely related to each other than either are to those of D. sechellia. What appears to be the most unlikely pairing according to the cluster analyses presented and to biogeographic grounds (D. sechellia and D. mauritiana) was the favored topology in the studies by Caccone et al. (1988Caccone et al. ( , 1996, suggesting that D. mauritiana and D. sechellia are the most closely related species, followed by D. simulans and D. melanogaster. According to the phylogenetic trees constructed with our sequences, all trees suggest that D. sechellia arose prior to the species pair D. mauritiana/D. simulans. D. erecta and D. orena cluster together, and, in parallel, D. yakuba and D. teissieri form another species pair. This topology is supported by the UPGMA, NJ, ME, and MP trees, with an exception in the clustering of D. sechellia in the last tree. The UPGMA and NJ trees are based on molecular distances. The MP tree is based on character-state dierences. The ME trees with JC distances, Kimura's (1980) two-parameter distances, Tajima and Nei's (1984), and Tamura's (1992) distance were constructed and the topology remained the same. These ®ndings are in accordance with the generally accepted phylogeny of the subgroup melanogaster.
It should be mentioned here that many questions have been raised dealing with the ability of the pure molecular information to provide de®nitive answers and lead to important conclusions concerning phylogenetic matters. Furthermore, a certain debate does exist dealing with the extent to which the genealogical reconstruction of single-copy genes accurately re¯ects``phylogeny.'' In fact, several sets of molecular data provide information not always compatible with other taxonomic knowledge or consistent with each other (Kwiatowski et al. 1994). From this point of view, nobody can speculate that molecular data are necessarily better than morphological data. Molecular phylogenies are also known to be aected by various sources of errors (Nei 1991). Moreover, phylogenetic trees that have been constructed from dierent parts of DNA are, in several occasions, inconsistent with each other. Some sequence data such as those for mitochondrial DNA and nuclear rRNA seem to be less informative than other sequence data (i.e., Adh) except for some particular purposes (Russo et al. 1995). For example, it is well known that the low level of polymorphism in ribosomal genes can be attributed to the action of concerted evolution. The ribosomal genes that exist in many copies [400±600 in Xenopus laevis and 130± 250 in D. melanogaster (Nei and Koehn 1983)] look almost-identical to each other, thus lacking the plurality of information that could be used for phylogenetic work. It should always be kept in mind that studies based on data for a single gene may have also been subjected to some undetected peculiarities of the gene which distort the phylogenetic trees. Therefore, it is preferable to examine the phylogeny by using several sets of DNA sequences, corresponding to various genes. But, unambiguously, the DNA sequences of any organism always carry the record of its evolutionary history. Taking into consideration the topology of the phylogenetic trees constructed by Kwiatowski et al. (1994), based on Cu,Zn SOD, these trees were also inconsistent with trees based on the Adh gene (Russo et al. 1995) as well as others. This is why the SOD tree was disregarded in the past. Notably, in the present study, we showed that the Sod gene can be considered as a reliable gene contributing to the phylogenetic analysis of the melanogaster species subgroup. According to Zuckerkandl and Pauling (1962) the rate of amino acid substitutions in proteins may be constant over evolutionary time, a hypothesis called the``molecular evolutionary clock.'' In the past, Lee et al. (1984) reported that SOD is not an acceptable evolutionary clock. The rates of amino acid substitutions grossly departed from constancy, yielding contradictory results concerning the rate of evolution in studies involving various organisms (human, horse, cow, Drosophila, yeast). However, further investigations revealed that the rate of amino acid substitutions in the Cu,Zn SOD of diverse organisms has been fairly constant during the last 60 million years (MY). This rate is approximately 15 aa/ 100 aa/100 MY for PAM-corrected data (PAM is the estimated percentage of amino acid dierences corrected for superimposed and back replacements) (Kwiatowski et al. 1991a(Kwiatowski et al. , 1992(Kwiatowski et al. , 1994. Nevertheless, investigators should always take into account the warning of Lee et al. (1985), who concluded that suggestions or 9 estimations about evolutionary events based on the primary sequence of a gene or protein may be subject to considerable errors.