Molecular evolution of the Est-6 gene in Drosophila melanogaster: contrasting patterns of DNA variability in adjacent functional regions.

We have investigated nucleotide polymorphism at the esterase 6 gene ( Est-6 ) gene, including the complete coding region (1686 bp), as well as the 5 0 -ﬂanking (1183 bp) and 3 0 -ﬂanking (193 bp) regions of the gene, in 30 strains of Drosophila melanogaster and in one strain of Drosophila simulans. The level of silent variation is similar in the coding and in the 3 0 -ﬂanking region, but smaller in the 5 0 -ﬂanking region. Strong linkage disequilibrium occurs within each region; and also, although less pronounced, between the 5 0 -ﬂanking region and the rest of the gene, including the 3 0 -ﬂanking region. We suggest that the pattern of nucleotide polymorphism of Est-6 may be shaped by: (1) directional and balancing selection acting on the promoter and the coding region; and (2) interactions between the two regions that involve variable degrees of hitchhiking. The patterns of linkage disequilibrium, as well as the statistics Z nS (Genetics 146 (1997)1197) and B and Q (Genet. Res. 74 (1999) 65), may be interpreted as there being multiple targets of selection within the gene. The previously reported Est-6 allozyme latitudinal clines may be accounted for by the interaction between selective processes in the promoter and coding regions. q 2002 Elsevier Science B.V. All rights reserved.


Introduction
The esterase 6 (Est-6) gene is on the left arm of chromosome 3 of Drosophila melanogaster, 32.5 in the genetic map and at 68F7-F8 in the cytogenetic map. The coding region is 1686 bp long and consists of two exons (1387 and 248 bp) and a small (51 bp) intron. The gene is duplicated (Collet et al., 1990) but there is evidence that the adjacently located duplicate, referred to as Est-P by Collet et al. (1990) may be a pseudogene (cEst-6, Balakirev and Ayala, 1996). The EST-6 protein is transferred by D. melanogaster males to females in the seminal fluid during copulation and affects the female's consequent behavior and mating proclivity , and references therein).
Two main allozymes (fast, F and slow, S) of EST-6 have been described, which manifest a differential response to the organophosphate inhibitor, which raises questions about the adaptive significance of the polymorphism. The main allozymes show large-scale repeatable latitudinal clines, with the S allozyme more common at higher latitudes. This, together with other data on the temporal and geographic allozyme variation in natural populations and results of laboratory experiments, suggests that the EST-6 polymorphism is maintained by some form of positive selection (reviewed by Oakeshott et al., 1995;Richmond et al., 1990). Cooke and Oakeshott (1989) sequenced the complete coding region of Est-6 in 13 D. melanogaster lines in an Australian population (chosen so as to include all allozyme variants known in the population). They suggested that the main F and S allozymes differed by two amino acids (Asn/ Asp at position 237 and Thr/Ala at position 247) (but see Hasson and Eanes, 1996;Balakirev et al., 1999) and considered these two amino acid replacements as the most likely targets for selection underlying the latitudinal clines previously detected for the F and S allozymes. Hasson and Eanes (1996) investigated the nucleotide polymorphism of the Est-6 coding region in 16 lines from disparate parts of the world, selected on the basis of the presence/absence of the cosmopolitan inversion In(3L)Payne, and detected shared polymorphisms between St and In(3L)Payne chromosomes, indicating extensive genetic exchange between arrangements. Balakirev et al. (1999) sequenced 15 alleles of Est-6 coding region from a Californian population and found two highly differentiated haplotypes, one encompassing the F alleles, and the other consisting of S alleles. They also detected a distinct peak of increased variation in the region surrounding the replacement site responsible for the EST-6 F/S allozyme polymorphism and suggested that balancing selection might be involved in the polymorphism. All these studies involve samples that are too small (Balakirev et al., 1999), non-random, or both (Cooke and Oakeshott, 1989;Hasson and Eanes, 1996) and thus unsuitable for many population genetic tests.
The expression of Est-6 in D. melanogaster has been investigated using P-element-mediated transformation (Tamarina et al., 1997, and references therein). Within the ,1.2 kb of the 5 0 -flanking region, several independently acting cis-regulatory promoter elements have been identified that control the expression of the gene in different tissues. Game and Oakeshott (1990) investigated restriction site polymorphism and its association with functional variation within a 21.5 kb region including the Est-6 gene, and found that a restriction polymorphism at an RsaI site in the 5 0 -flanking region of Est-6 shows a significant association with male amount and activity level of EST-6. Given the evidence from other studies that differences in male EST-6 activity affect the reproductive success of their mates , Game and Oakeshott (1990) conclude that Est-6 cis-acting regulatory polymorphisms may be important contributors to adaptive variation. Indeed, significant associations have been detected between several fitness components (pre-adult viability, development time, time to mating, remating frequency, egg production, and fertility) and the EST-6 activity level. Odgers et al. (1995) sequenced 974 bp of the Est-6 5 0flanking region in D. melanogaster and identified a nucleotide substitution responsible for the RsaI polymorphism (T ! G at 2531). They also revealed the presence of two highly diverged haplotype groups and a peak of polymorphism around the RsaI site. By comparing their data with the results of Game and Oakeshott (1990), Odgers et al. (1995) showed that the RsaI 1 haplotype group yields ,25% more EST-6 enzyme activity in adult males than the RsaI 2 one, and detected weak disequilibrium between the promoter polymorphism and the F/S allozyme polymorphism. However, Odgers et al. (1995) did not sequence the Est-6 coding region in the same lines of D. melanogaster for which they obtained the promoter region sequences, which would have allowed them to analyze the pattern and extent of the association between the regulatory and structural nucleotide polymorphism. Balakirev et al. (1999) studied Est-6 in 15 lines of D. melanogaster and found departures from neutral polymorphism, but the results were ambiguous due to the small size of the sample. We now increase sample size and the length of the region sequenced, seeking also to test for linkage disequilibrium within the gene region, a possibility suggested by the patterns observed in our previous study (Balakirev et al. 1999). We investigate the 5 0 -flanking, coding, and 3 0 -flanking regions of the Est-6 gene (3062 bp total) in a random sample of 30 lines (and thus large enough for the population genetic tests) of D. melanogaster derived from a natural population of California. The detected pattern of variability is highly structured with distinctive features in the coding and 5 0 -flanking regions. We suggest that the Est-6 nucleotide polymorphism is shaped by a combination of directional and balancing selection acting on the promoter and coding region polymorphisms, and by the interactions between the two regions due to different degrees of hitchhiking.

Drosophila strains
The 30 D. melanogaster strains were derived from a random sample of wild flies collected by F.J. Ayala (October 1991) in El Rio Vineyard, Acampo, CA (USA). The strains were made fully homozygous for the third chromosome by crosses with balancer stocks, as described by Seager and Ayala (1982). The strains were chosen because their Sod gene had been previously investigated in our laboratory (Hudson et al., 1994) and named in accordance with the esterase-6 (the letter before the hyphen) and superoxide dismutase (the letter after the number) electrophoretic alleles they carry (Hudson et al., 1994), ultra slow (US), S, and F (Table 1). Fifteen lines are new. Moreover, we have sequenced the 5 0 -flanking region (1183 bp) for the 15 lines previously analyzed by Balakirev et al. (1999) (as well as for the new lines), so that a complete sequence of 3062 bp has been obtained for the same 30 lines.

Allozyme electrophoresis
A total of 20 flies from each line were homogenized in 20 ml 0.1 M Tris-HCl buffer, pH 8.0. The homogenates were electrophoresed for 8-9 h using Tris-borate-EDTA continuous buffer system, pH 8.6 at 100 V in an 11% starch gel. The gels were stained for EST-6 activity according to standard procedures.

DNA extraction, amplification, and sequencing
Total genomic DNA was extracted using the tissue protocol of the QIAamp Tissue Kit (QIAGEN w ).
The D. melanogaster Est-6 sequence (GenBank accessions M33780, M33781; Collet et al., 1990) was used for designing polymerase chain reaction (PCR) and sequencing primers. The primers were designed using the computer program Primer Select (Lasergene, DNASTAR, Inc., 1994-1997. The amplified fragment included 1183 bp of the 5 0 -flanking region, the complete Est-6 gene (1686 bp), the intergenic region between Est-6 and cEst-6 (193 bp), and 127 bp of the cEst-6. The primers used for the PCR amplification reactions were: 5 0 -aagcttgctatatatctatctgt-3 0 (forward primer) and 5 0 -catagggaatcgattcgtagctgt-3 0 (reverse primer). The PCR reactions were carried out in final volumes of 50 ml using TaKaRa Ex Taqe in accordance with the manufacturer description (Takara Biotechnology Co., Ltd.). The reaction mixtures were overlaid with mineral oil, placed in a DNA thermal cycler (Perkin Elmer Cetus), incubated 5 min at 958 and subjected to 32 cycles of denaturation, annealing, and extension: 958 for 30 s, 568 for 30 s, and 728 for 2.0 min (for the first cycle and progressively adding 3 s at 728 for every subsequent cycle); with a final 7-min extension period at 728.
One-tenth of each reaction volume was assayed on a 0.8% agarose gel. When the desired PCR product was detected, the remainder of the reaction was purified with Wizarde PCR preps DNA purification system (Promega corporation). The purified PCR product was directly sequenced by the dideoxy chain-termination technique using Dye Terminator chemistry and separated with the ABI PRISM 377 automated DNA sequencer (Perkin Elmer). For each line, the sequence of both strands was determined, using 14 internal Table 1 DNA polymorphism in the Est-6 gene of D. melanogaster a a The numbers above the top sequence represent the position of segregating sites and the start of a deletion or insertion. Nucleotides are numbered from the beginning of our sequence (position 32 in Collet et al., 1990). The transcription start site corresponds to position 1184. The coding regions (exon I and exon II) are underlined below the reference sequence (S-26F). Amino acid replacement polymorphisms are marked with asterisks. The RsaI restriction polymorphism is determined by site 653, where RsaI 1 has T and RsaI 2 has G; the S-F allozyme polymorphism is determined by site 1955, where S has A (asparagin) and F has G (aspartic acid). These two sites (653 and 1955) are marked by bold face. The nucleotides in the D. simulans sequence are shown at the bottom (Sim); only those sites are shown that are polymorphic in D. melanogaster. N indicates the position where the homologous nucleotide of D. simulans could not be determined due to a 10-bp alignment gap. The hyphens represent deleted nucleotides. The S, US, and F letters before the line numbers refer to the EST-6 allozymes S, US, and F. (The S and F after the numbers refer to the allozyme polymorphism at the Sod locus and have been previously used to tag these lines.) The open boxes indicate putative gene-conversion tracts, detected by the method of Betrán et al. (1997). The question marks indicate missing data. O denotes a deletion; † denotes the absence of a deletion; P denotes an insertion; ‡ denotes the absence of an insertion. O1, a 5-bp deletion of CTTTT; O2, a 19-bp deletion of TTCT ATTT TGTC GCAA GCA; O3, a single nucleotide deletion of T; P, a 35-bp insertion of AGTAATTGTAATAATAATATAATAGTAATTTTGAT. primers spaced, on average, 350 nucleotides. (See GenBank accessions AF147095-AF147102 and AF217624-217645 for the Est-6 sequences.) At least two independent PCR amplifications were sequenced for each polymorphic site in all D. melanogaster strains to prevent possible PCR or sequencing errors.

DNA sequence analysis
The Est-6 sequences were assembled using the programs DARWIN (elaborated by Dr Robert Tyler from our laboratory) and SeqMan (Lasergene, DNASTAR, Inc., 1994-1997. Multiple alignment was carried out manually and using the program CLUSTAL W. Linkage disequilibrium between polymorphic sites was evaluated using Fisher's exact test of independence. The computer programs DnaSP, version 3.4 (Rozas and Rozas, 1999) and PROSEQ, version 2.4 (Filatov and Charlesworth, 1999) were used to analyze the data by means of the 'sliding window' method (Hudson and Kaplan, 1988), and for most intraspecific analyzes. Departures from neutral expectations were investigated using the corresponding tests (Hudson et al., 1987;Tajima, 1989;McDonald and Kreitman, 1991;Fu and Li, 1993;Hudson et al., 1994;McDonald, 1998;Kelly, 1997;Depaulis and Veuille, 1998;Wall, 1999). The non-coding sequences (intron and intergenic region) were used as a neutral reference in the HKA test. A permutation approach was used to estimate the significance of sequence differences between Est-6 haplotype families, treating them as geographical populations. Simulations based on the algorithms of the coalescent process with or without recombination were performed with the DnaSP and PROSEQ programs to estimate the probabilities of the observed values of Tajima's D, Kelly's Z nS and Wall's B and Q statistics. The coalescent approach was also used to estimate confidence intervals of the nucleotide diversity values.

Allozyme polymorphism
Electrophoretic analysis reveals two common EST-6 allozymes: 20 lines are S, nine are F, and one exhibits a rare allozyme with slower mobility than S, denoted as US (US-255F; see Table 1). The frequencies observed (0.708 S 1 US and 0.292 F) are close to those from previous collections at the same site (Smit-McBride et al., 1988).

Nucleotide polymorphism and recombination
The 3062 bp sequenced comprise the 5 0 -flanking region (1183 bp), the complete Est-6 coding region (1387 bp exon I, 51 bp intron, 248 bp exon II), and 193 bp of the intergenic region between Est-6 and cEst-6. Table 1 shows a total of 79 polymorphic sites in this sample of 30 Est-6 sequences: 28 sites in the promoter region (two associated with indels), 37 sites in the coding region (including one site in the intron), and 14 sites in the intergenic region. Four length polymorphisms (indels of 1, 5, 19 and 35 bp) are in the promoter region (Table 1). Two indels, 5-bp deletion and 35-bp insertion, have been previously observed by Odgers et al. (1995). The single nucleotide deletion (located at 383) occurs in the S-255S and S-377F strains; a 19-bp deletion (located at 348-366) occurs in the F-531F and F-611F strains.
There are 13 replacement and 23 synonymous polymorphic sites in the coding region. We have found a new replacement substitution in the US-255F (G ! C, at 2372) which leads to a charge-altered amino acid replacement (amino acid position 397, neutral glycine replaced by positively charged arginine). All other detected replacement substitutions have been observed previously (Cooke and Oakeshott, 1989;Hasson and Eanes, 1996). Table 2 shows estimates of nucleotide diversity for the entire data set, and for different haplotype families separately. In the pooled sample, total nucleotide diversity is very similar in the promoter and coding regions, but higher in the intergenic region. The level of silent variation in the coding region is higher than in the promoter region, but similar to the variation in the intergenic region, which could indicate different degrees of selective constraint in the 5 0 -and 3 0 -flanking regions. Polymorphism is lower in S than in F haplotypes (coding region) and lower in RsaI 2 than in RsaI 1 (promoter region). The difference is significant by coalescent simulations for the coding region haplotypes (P , 0:05), but not for the promoter haplotypes. The level of divergence (K) is similar in different haplotype groups within the same functional region ( Table 2).
The method of Hudson and Kaplan (1985) reveals a minimum of nine recombination events in the Est-6 region: in the intervals 433-494 (promoter); 690-1254 (promoterexon I); 1254-1501, 1985-2032, 2032-2089, and 2093-2408 (exon I); 2408-2588 (exon I -intron); 2629-2756 (exon II); and 2803-2874 (exon II -3 0 -flanking region). The C estimator of recombination is 0.0023 for the entire sequenced region and 0.0075 for the combined sample of all published Est-6 sequences, including our data set (coding region) (Cooke and Oakeshott, 1989;Hasson and Eanes, 1996). Est-6 is located within subsection 68F7-F8, on the left arm of polytene chromosome 3, a region with high recombination. The crossover rate obtained from laboratory measurements of the exchange between flanking markers is 3.32 £ 10 28 /nucleotide site/generation, with maximum levels around 5-6 £ 10 28 (Comeron et al., 1999;Josep M. Comeron, personal communication). Assuming no recombination in D. melanogaster males and an effective population size of 10 6 , the recombination parameter is 0.0664. We note that no inversion polymorphisms have been found in the third chromosome of the El Rio population, in spite of tens of thousands of individuals sampled from different collections, including the 1991 collection used in the present study (Smit-McBride et al., 1988;Tyler et al., 1993; and our unpublished data).

Haplotype structure
There are two promoter region haplotypes, denoted as RsaI 1 and RsaI 2 , distinguished a T ! G transversion at position 653 (Table 1). The average number of nucleotide differences (K) between the two haplotypes is 6.720. The RsaI 2 haplotypes are fairly homogeneous (K ¼ 1:584), while the RsaI 1 haplotypes are more heterogeneous (K ¼ 4:756). The RsaI 2 haplotypes are the most frequent in our data set (20 out of 30) and also in the data of Game and Oakeshott (1990) (20 out of 29). Odgers et al. (1995) consider the RsaI 1 to be the ancestral state, which is consistent with the higher polymorphism of the RsaI 1 haplotypes and supported by the comparison with D. simulans (Table 1). Three putative gene conversion events between the RsaI 2 and RsaI 1 haplotypes are revealed with the method of Betrán et al. (1997) (see Table 1).
There are also two coding region haplotypes (Table 1), denoted as the S and F allelic lineages (Balakirev et al., 1999). The average number of nucleotide differences between them is 11.809. The S group includes most haplotypes (21 out of 30), which are fairly homogeneous (K ¼ 3:810), whereas the nine F haplotypes are significantly more heterogeneous (K ¼ 16:722). This high level of polymorphism would suggest that the F lineage may be ancestral, a conclusion also reached by Cooke and Oakeshott (1989) and Hasson and Eanes (1996). However, D. simulans has an A at position 1985, the same as the S lineage, which would support the inference that the S lineage may have been the ancestral condition from which the F allelic lineage derived, a hypothesis favored by Balakirev et al. (1999), who had in their sample only two F alleles, quite similar to one another. Four putative gene conversion events between the S and F haplotypes are revealed with the method of Betrán et al. (1997) (see Table 1). The F-775F haplotype has 12 differences with all other sequences (Table  1); an identical haplotype appears in Cooke and Oakeshott (1989;haplotype number 5), in a sample from Coffs Harbour, Australia.
The permutation test is highly significant for the promoter haplotypes K st * ¼ 0:4175 (K st * 0:999 ¼ 0:0966, P , 0:001) as well as for the coding region haplotypes K st * ¼ 0:2846 (K st * 0:999 ¼ 0:0701, P , 0:001). Two F haplotypes (strains F-531F and F-611F) may have arisen by recombination a p is the average number of nucleotide differences per site among all pairs of sequences. u is the average number of segregating nucleotide sites among all sequences based on the expected distribution of neutral variants in a panmictic population at equilibrium. K is the average proportion of nucleotide differences between D. melanogaster and D. simulans, corrected according to Jukes and Cantor method. Syn, synonymous; nsyn, non-synonymous; silent, silent in noncoding regions and synonymous in coding region. The coding region includes exons I and II. Nucleotide variability is calculated for the whole data set ('full sample'), as well as for the coding (S and F haplotypes) and promoter (RsaI 2 and RsaI 1 haplotypes) regions separately. The segregating sites associated with indels are excluded from the p, u, and K calculations. between S and F coding region sequences; the difference between them and the other F haplotypes is significant (K st * ¼ 0:3352, K st * 0:95 ¼ 0:1290, P , 0:05).

Sliding window analysis
Figs. 1A,B shows the distribution of polymorphism and divergence within and between promoter haplotypes RsaI 2 and RsaI 1 and S/F alleles, respectively. Both display noticeable peaks of variation that may reflect the effect of balancing selection (Hudson and Kaplan, 1988). Fig. 1A (see Table 1) displays a distinct peak in the promoter region at coordinates 329-690, mostly due to differences between the RsaI 2 and RsaI 1 haplotypes rather than to variability within each type. The peak contains the RsaI site (position 653). Elevated nucleotide variation in the same promoter area of Est-6 was revealed by Odgers et al. (1995). Fig. 1B shows two noticeable peaks, one in the promoter and one in the coding region, but only one of them (sites 1955-2093) is mostly due to differences between the S and F haplotypes. This peak includes the F/S replacement site (position 1955). We detected this peak previously (Balakirev et al., 1999) in a set of fewer sequences obtained by us, and also in the Est-6 data of Hasson and Eanes (1996) and Cooke and Oakeshott (1989). Fig. 2 shows a sliding window plot of the distribution of nucleotide polymorphism in D. melanogaster and diver-gence between this species and D. simulans along the promoter (A) and coding (B) regions. The promoter peak of high polymorphism does not correspond to a region of high interspecific divergence ( Fig. 2A), which is inconsistent with the neutral model. The peak contains a stretch of polymorphic sites at positions 494, 518, and 544 (Table 1) that are in significant linkage disequilibrium with the RsaI site (position 653). In the coding region (Fig. 2B) there is also discordance between polymorphism and interspecific divergence, within a segment that includes the polymorphic sites at positions 1955, 1985, 2032, 2089, 2092, and 2093 (Table 1), which are in significant linkage disequilibrium (position 1955 determines, as noted, the F/S polymorphism).
We measure heterogeneity in the distribution of silent polymorphic sites relative to fixed interspecific differences and by means of Goss and Lewontin (1996) and McDonald (1998) statistics and assess their significance by Monte Carlo simulations of the coalescent model incorporating recombination (McDonald, 1998). We use models with recombination because intragenic recombination predicts heterogeneity in the level of diversity along the sequence. Based on 10,000 simulations, with the recombination parameters varying from 1 to 64, Goss and Lewontin (1996) statistics are significant for the whole Est-6 gene (3062 bp): interval length variance (V IL ) is 0.000353, P ¼ 0:031   (Hudson et al., 1987). Window sizes are 80 nucleotides with 20-nucleotide increments for A, and 130 nucleotides with 25-nucleotide increments for B. and modified interval length variance (Q IL ) is 0.000947, P ¼ 0:047. For the promoter region alone the tests do not reveal significant heterogeneity in the polymorphism-todivergence ratio. However, for our data set combined with previously published Est-6 promoter sequences (Odgers et al., 1995), McDonald (1998) maximum sliding G statistic (G max ) as well as Goss and Lewontin (1996) statistics are significant: G max ¼ 18:0264, P ¼ 0:025; V IL ¼ 0:00232, P ¼ 0:013, Q IL ¼ 0:00471, P ¼ 0:030. The tests fail to reveal significant heterogeneity in the polymorphism-todivergence ratio for the Est-6 coding region. Fig. 3 shows the significance levels of Fisher exact test for linkage disequilibrium between non-singleton pairs of segregating sites. Out of 1225 pairwise comparisons, 445 (36.3%) show significant disequilibrium. With the Bonferroni correction for multiple comparisons, there remain 114 (9.3%) significant associations. There are two areas with significant associations between sites: one includes the promoter region; the other includes the rest of the gene. The significant associations within each region are 52.9 and 55.2%, respectively (or 33.3 and 11.5% with the Bonferroni correction). The significant associations between the two regions are substantially smaller, 15.1% (or 1.0% with the Bonferroni correction), and they are mostly due to the site at position 527. Site 527 is anomalous in that it is in linkage equilibrium with all other sites in the promoter region, but is in disequilibrium with sites in the rest of the gene.

Tests of neutrality
In our previous study we detected significant deviations from neutrality in the coding region of Est-6 using the McDonald and Kreitman (1991) test applied to four D. simulans (Karotam et al., 1995) and 15 D. melanogaster sequences (Balakirev et al., 1999). Similar results are obtained for the 30 D. melanogaster Est-6 coding-region sequences (Table 3). However the McDonald and Kreitman (1991) test is not significant if we combine our data with published Est-6 coding sequences (Cooke and Oakeshott, 1989;Hasson and Eanes, 1996) (Fisher's exact test P ¼ 0:074). Hudson et al. (1987); Tajima (1989); Fu and Li (1993), and Depaulis and Veuille (1998) tests do not reveal any significant deviation from neutrality for the Est-6 region. However, Tajima (1989) test applied to our data (exon I) combined with previously published Est-6 sequences (Cooke and Oakeshott, 1989;Hasson and Eanes, 1996) reveals a significant deviation from neutrality expectations for the S alleles (D ¼ 21:864, P , 0:05), but not for the F alleles (D ¼ 20:181, P . 0:10), which is consistent with our previous results (Balakirev et al., 1999). In the promoter region we detect a significant deviation from neutrality for the RsaI 2 haplotypes using both Tajima (D ¼ 22:065, P , 0:05) and Fu and Li (D* ¼ 22:672, P , 0:05; F* ¼ 22:890, P , 0:05) tests. Negative values of the Tajima's D statistic indicate an excess of polymorphisms segregating at low frequency, and negative values of the Fu and Li's statistics indicate an excess of unique polymorphisms in the data set. We have also applied Kelly's Z nS (1997) and Wall (1999) B and Q tests, which are based on linkage disequilibrium between segregating sites. For the entire Est-6 region both tests are highly significant (Z nS ¼ 0:154, P ¼ 0:004; B ¼ 0:270, P ¼ 0; Q ¼ 0:427, P ¼ 0) with C ¼ 0:015, equal to the C min based on the inferred minimum number of the recombination events (Hudson and Kaplan, 1985). The tests are also significant for the coding and promoter regions separately, with C $ 0:020 (promoter region) and C $ 0:010 (coding region) for Kelly's test and C $ 0:010 for Wall's tests. The areas of significant values of Kelly's and Wall's statistics coincide with the peaks of linkage disequilibrium (data not shown) and are centered around the RsaI site (position 653, promoter region), three replace-ment sites in the beginning of exon I (positions 1212, 1254, and 1356), and around the F/S polymorphism (position 1955) (Table 1, Fig. 3). Thus the neutrality tests incorporating recombination (Kelly, 1997;Wall, 1999) are significant for the Est-6 region, with recombination rate below the laboratory estimate C lab ¼ 0:0664.

Divergence time
The Est-6 average distances (nucleotide differences) between D. simulans and D. melanogaster for the promoter and coding plus 3 0 -flanking region are 75.9 and 97.7, respectively. If we assume that the species diverged 2.3-2.5 million years ago, the time of divergence for the RsaI 2 promoter haplotypes and S coding-region haplotypes, under the assumption of the molecular clock, are about 50 and 94 thousand years, respectively. We have also calculated the expected time of divergence within and between the haplotype families for both regions following the method of Hudson et al. (1997), which infers the time of a putative selective sweep from the amount of variation that has accumulated within the relatively homogeneous subset of sequences. For the promoter region the expected divergence time for the RsaI 2 haplotypes is about 25 thousand years (excluding the three recombinant sites at positions 338, 368, and 433). For the coding region the expected divergence time within the S haplotypes is about 64 thousand years (without the four recombinant sites at positions 1254, 1985, 2032, and 2408). The time of divergence between the two main promoter allelic lineages (RsaI 1 and RsaI 2 ) is 50 thousand years with Hudson et al. (1997) method, but more than seven times greater, 380 thousand years, under the molecular clock assumption. These discrepancies can be accounted for by natural selection that accelerates the evolution of the selectively favored alleles. Parallel differences occur for the coding region, but the differences between the estimates obtained by the two methods are smaller. Thus, the estimated age of the Est-6 S allelic lineage is 64 thousand years or 94 thousand years, and the divergence between the S-F lineages is 162 thousand or 485 thousand years, depending on the method used. This is consistent with the hypothesis that the divergence between the two coding-region allelic lineages has been impelled by natural selection, but with lesser strength than in the case of the promoter allelic lineages.

Pattern of variability
The pattern of Est-6 nucleotide variability is highly structured and has distinctive features in the promoter and coding regions ( Table 1). The promoter pattern of variation does not carry over into the coding region, but the pattern of variability in the 3 0 -flanking region is similar to the coding region. The level of silent diversity is similar in the coding  Karotam et al. (1995). b Sites that are polymorphic in both species are counted only once. For the two-tailed Fisher's exact test P ¼ 0.025. and 3 0 -flanking regions, but substantially lower in the promoter region (Table 2), which may reflect stronger purifying selection for the promoter region. Strong linkage disequilibrium occurs within each of the regions, but is less pronounced between them (Fig. 3).
We previously suggested that the pattern of nucleotide variability of the Est-6 coding region is shaped by the influence of both directional and balancing selection (Balakirev et al., 1999). The present analysis of the Est-6 coding region confirms our previous hypothesis. The pattern of variability in the promoter region of the gene also suggests involvement of both forms of selection (see Section 3). It has been repeatedly shown that positive selection is involved in the evolution of sex-related genes in Drosophila, as well as other organisms (reviewed by Civetta and Singh, 1999). A situation in which both directional and balancing selection might be involved in shaping nucleotide variability has been proposed for the Acp29AB, also a sex-related gene of D. melanogaster (Aguadé, 1999). Game and Oakeshott (1990) revealed that the RsaI promoter polymorphism significantly influences EST-6 activity: males of lines lacking the RsaI site show 25% lower EST-6 activity than lines with the site. This reduction impacts the reproductive fitness of D. melanogaster (see Oakeshott et al., 1995). The pattern of association between EST-6 activity and reproductive fitness is complex, but leaves no doubt about the adaptive nature of the EST-6 activity variation. We have observed a pronounced area of highly significant linkage disequilibrium around the RsaI site (Fig. 3), which is manifest also by Kelly's and Wall's statistics (data not shown). Thus the RsaI site (position 653, Table 1) might be a target of selection in the promoter region. However this site is in linkage disequilibrium with 11 other polymorphic sites (Fig. 3), which makes it impossible to ascertain the precise site that impacts EST-6 activity. Conceivably, selection could act on any site that is in linkage disequilibrium with the RsaI site, or on the whole stretch of linked sites. Seeking to localize more precisely the putative selection site, we have examined the pattern of polymorphism and divergence in the promoter region using a sliding-window. There is a region with apparent excess of polymorphism, which is associated with a remarkable low level of interspecific divergence ( Fig. 2A). This region includes the stretch of the polymorphic sites at positions 494, 518, and 544 (Table 1), which are in linkage disequilibrium with the RsaI site (position 653) (Fig. 3).

Target of selection
A possible interpretation is that directional selection affects the RsaI 2 haplotypes, but without complete exclusion of the RsaI 1 haplotypes. This can be explained by the fact that the RsaI 1 mutation is associated with a significant increase of EST-6 activity, which has important consequences for reproductive fitness . From the analysis of the correlations between EST-6 activ-ity and reproductive fitness it can be expected that both RsaI 1 and RsaI 2 strains may have some advantages under particular conditions . Thus the RsaI 2 /RsaI 1 polymorphism may be a balanced polymorphic system, which is indirectly confirmed by the peak of nucleotide variability in the corresponding promoter region detected in our data (Fig. 1A), as well as in the data of Odgers et al. (1995).
There are some lines of evidence indicating that the Est-6 coding-region variation is also subject to positive selection Richmond et al., 1990;Balakirev et al., 1999). Cooke and Oakeshott (1989) propose that one or both of the polymorphisms at positions 1955 or 1985 (our coordinates) are the primary targets for the selection underlying the F/S allozyme latitudinal clines. Analysis of DNA sequence variation in Est-6 reveals an excess of polymorphism surrounding the two replacement substitutions (Balakirev et al., 1999;Fig. 1B). The area of noticeable differences between observed and expected levels of variation ( Fig. 2B) includes site 1955, which determines the F/S allozyme polymorphism (Table 1). There is an area with elevated level of linkage disequilibrium in the coding region (Fig. 3), accompanied by highly significant values of Kelly's and Wall's statistics (data not shown). The evidence also suggests that the S allelic group might be subject to directional selection (see Section 3; see also Cooke and Oakeshott, 1989;Oakeshott et al., 1995;Balakirev et al., 1999).

Evolutionary scenario
A possible interpretation starts with a favorable mutation in the promoter region spread by directional selection; a second promoter haplotype would persist in populations because of balancing selection involving both haplotypes. Similar processes may have occurred in the coding region. If both favorable mutations are present in the same DNA segment including the promoter and coding region, then the corresponding haplotype would be under a double selective sweep, with greater effect in increasing its frequency ('double sweep' haplotypes). In our data, strains S-5F, S-26F, S-94F, S-174F, US-255F, S-483F, S-498F, S-521F, S-565F, S-581F, S-968F, S-1224F, S-438S, and S-521S have the mutations that are putatively under selection in both the promoter and the coding region. The 'double sweep' haplotypes are frequent (14 out of 30) and have minimum variability along the whole length of the DNA segment studied. The average number of nucleotide differences (K) between them is low (4.47). Tajima's D is negative and significant for this group of haplotypes: D ¼ 21:809, P , 0:05. The haplotype test (Hudson et al., 1994) is also significant (P , 0:05) for the 'double sweep' haplotypes including singleton polymorphisms with recombination C $ 0:04. The haplotypes that have just one mutation under directional selection ('single sweep' haplotypes) located either in the promoter (F-517S, F-1461S, F-274F, F-357F, F-517F, F-611F) or in the coding (S-114S, S-255S, S-501S, S-510S, S-549S, S-2588S, S-377F) region have lower frequency, higher variability (K is equal to 10.80 and 10.19 for the promoter and coding region 'single sweep' haplotypes, respectively), and less pronounced indication of a selective sweep. The haplotypes that have no mutation subject to putative selection ('zero sweep' haplotypes) are most variable (K ¼ 28:00) and less frequent (three out of 30: F-96S, F-531F, F-775F), which may reflect their old age.
According to this scenario, the pattern of nucleotide variation in Est-6 is shaped by the superposition of the effects of directional and balancing selection in the promoter region and by analogous superposition effects in the coding region. The differential hitchhiking of the mutations incorporated in the different haplotypes of the two regions complicates the observed pattern of polymorphism. It is, nevertheless, possible to recognize some general pattern in the network constructed for the combined promoter and coding region (exon I) (Fig. 4). The left part of the network includes the S haplotypes, whereas the right part contains the F haplotypes. The RsaI 1 and RsaI 2 haplotypes occupy the upper and lower parts of the network, respectively (divided by the horizontal dotted line). The 'double sweep' haplotypes are in the RsaI 2 /S compartment (lower left). The 'single sweep' haplotypes are in the RsaI 1 /S (coding region sweep; upper left) and RsaI 2 /F (promoter region sweep; lower right) compartments. The 'zero sweep' haplotypes are in the RsaI 1 /F compartment (upper right).
Note that the concentration of F haplotypes in the RsaI 2 /F compartment is maintained by selection in the promoter region whereas the concentration of the RsaI 1 haplotypes (RsaI 1 /S compartment) is maintained by selection in the coding region (Fig. 4, Table 1). This observation contradicts the conclusion of Odgers et al. (1995), who reject the possibility that previously reported latitudinal clines in the Est-6 allozyme frequencies are due to their hitchhiking along with selection on the promoter difference. But allozyme latitudinal clines could be generated by the interaction of selective processes in the promoter and coding regions and by the rate of recombination between them, because the S haplotypes are under a double sweep (RsaI 2 /S compartment) and a single sweep (RsaI 1 /S Fig. 4. One-step network of the Est-6 sequences including the 5 0 -flanking region and exon I. To generate the network the sequence data are preprocessed into binary characters. The haplotypes included within circles are observed haplotypes; small solid circles represent hypothetical intermediate haplotypes. Identical haplotypes are enclosed together. Solid connections between haplotypes indicate transitions that are unambiguous, whereas the thin dashed lines indicate hypothetical connections. Numbers along the lines represent mutational steps. There are four compartments, designated as RsaI 1 /S, RsaI 1 /F, RsaI 2 /S, and RsaI 2 /F, divided by the horizontal and vertical dotted lines. compartment) influence, whereas the F haplotypes are only under a single sweep (RsaI 2 /F compartment) influence.
There are, nevertheless, non-selective hypotheses that could account for the patterns of Est-6 polymorphism. In particular, population dynamics involving bottlenecks and founding effects and/or population admixture could be the explanatory processes. One way to distinguish between the selective and the demographic hypotheses may be to compare our results with those obtained with other locus (or loci) investigated in the same lines of D. melanogaster as well as in other populations. Whether or not such information will change the selective hypotheses favored by the current data remains to be seen.