Enzyme variability in the Drosophila willistoni group. VI. Levels of polymorphism and the physiological function of enzymes

We have studied allelic variation at 28 loci coding for soluble enzymes in three sibling species of Drosophila. The average proportion of polymorphic loci per population is 58% in D. willistoni, 71% in D. equinoxialis, and 60% in D. tropicalis. An individual of any one of these three species is heterozygous at 18–22% of its genes. The correlation between the amount of polymorphism at a given locus and the equilibrium constant of the reaction catalyzed by the corresponding enzyme is not significantly different from zero. We have separated enzymes into those involved in glucose metabolism (group I) and all other enzymes (group II). The genes coding for group II enzymes are about twice as polymorphic as those coding for group I enzymes.


INTRODUCTION
Recent studies have confirmed that natural populations of animals and plants have large amounts of genetic variation (Harris, 1970;Selander et al., 1970;Marshall and Allard, 1970;Kojima et al., 1970;Ayala et al., 1972;and references therein). These studies are for the most part concerned with the genetic variation underlying variable enzyme patterns exhibited by gel electrophoresis. It has now become possible, at least to a first approximation, to quantify the amount of genetic variation in a population and to measure the amount of genetic differentiation between different populations and between different species. This investigation was supported by NSF grants GB 20694 and GB 30895. 1 Department of Genetics, University of California, Davis, California.
.~~ ~~~~ • ÷l +l +l +l +l +l +l +l +i +l +l +l ~ ÷ 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 +1 o Only a relatively limited number of species and only a few dozen loci in any one of them have been studied so far. Yet certain patterns in the amount and kind of variation have started to emerge (Ayala, 1972). One pattern is that certain enzymes and the genes which code for them are generally more variable than others. Are there any general physiological reasons why some loci, in a given group of organisms, are more variable than others ? Some authors have suggested that such reasons exist.
We have conducted an extensive survey of enzyme variation in several neotropical species of Drosophila. We summarize here our data about the genetic variation found in three sibling species, Drosophila willistoni, D. equinoxialis, and D. tropicalis. From 25 to 28 loci have been studied in each of these species. From several hundred to several thousand wild genomes have been studied at each locus. We ask the question whether the amount of genetic variation depends on the chemical properties and physiological role of the enzymes, as has been recently suggested Kojima, 1968 et al., 1970;Johnson, 1971).

MATERIALS AND METHODS
The geographic distribution of D. willistoni, D. equinoxialis, and D. tropicalis extends through Central America, the Caribbean, and tropical and subtropical South America (Spassky et al., 1971). We have obtained samples from more than 40 natural populations of each species. The localities sampled are spread throughout most of the distribution of the species (Fig. 1). From a few localities, only one or a few laboratory strains were available for study. From most localities, we have secured freshly collected samples and studied either the flies collected in nature or one F1 progeny from each wild female.
Our techniques for starch gel electrophoresis and assay of enzymes have been described elsewhere . They are standard techniques with minor modifications to suit our materials.

RESULTS
The geographic distribution of D. willistoni extends from Mexico and southern Florida through Central America, the Caribbean, and most of South America down to La Plata in Argentina. We have studied 81 samples from natural populations spread throughout that immense territory (Fig. 1). The results are summarized in Table I. For each locus, the table shows the number of genomes sampled, two for each wild individual, except for sexlinked loci, which are carried by males in single dose; the number of populations sampled; the proportion of polymorphic populations, that is, populations in which the most common allele has a frequency no greater than 0.95; the mean frequency of heterozygous individuals estimated from the allelic frequencies assuming Hardy-Weinberg equilibrium; the actual frequency of the most common allele in the total sample for the species; and the number of alleles with a frequency not smaller than 0.01 over the whole species.
The mean frequency of heterozygous individuals is the unweighted average of their mean frequency in each sample. Only samples with 20 or more genomes have been used to estimate the average, except at the Ald, G3pdh, and Odh-2 loci, where all samples with ten or more genomes have been used. We have used the expected frequencies of heterozygotes, assuming Hardy-Weinberg equilibrium. The agreement between the expected and the observed frequency of heterozygotes is generally good.
To estimate the frequency of populations in which a given locus is polymorphic, we have used also only those samples with at least 20 genomes, except for the three loci mentioned in the previous paragraph.
We have studied in D. willistoni 28 loci coding for a variety of enzymes. On the average, we have sampled 3940 genomes from 47 different populations per locus. There is a great wealth of genetic variation. More than half of the loci, 58.1~, are polymorphic in a given population. Using a less stringent criterion of polymorphism, namely, that a locus is polymorphic when the second most common allele has a frequency of 0.01 or higher, the proportion of polymorphic loci per population is 86%. The mean overall loci of the frequency of heterozygous individuals per locus, estimates the proportion of loci at which an individual is heterozygous. On the average, a D. willistoni fly is hetrozygous at 17.8___ 3.0% of its genes.
The distribution of D. equinoxiaIis extends from southern Mexico through Central America, the Caribbean, and northern South America to central Ecuador and central Brazil. Unpublished data obtained in our laboratory indicate that the species consists of two main subspecies, one in most of Central America and the Caribbean, the other in South America. Crosses between the two subspecies give sterile male hybrids. Table II summarizes the genetic variation found in populations of the South American subspecies. A total of 41 samples of natural populations have been studied (Fig. 1). The proportions of polymorphic populations and of heterozygous individuals are calculated using only samples of at least 20 genomes, except for the Ald and Acph-2 loci, where all samples of ten or more genomes were taken into account.
We have studied 27 loci in D. equinoxialis. On the average, 1039 genomes and 18 populations have been sampled at each locus. On the average, 71.1 ~ loci are polymorphic in a given population, or 88~ if the less stringent criterion of polymorphism is used. An individual is heterozygous at 21.8_+ 3.0% of its loci. This figure is not significantly different from that obtained for D. willistoni. As in D. willistoni, populations of D. equinoxialis contain large amounts of genetic variation.
The geographic distribution of D. tropicalis is coextensive with that of D. equinoxialis throughout Mexico, Central America, and the Caribbean. In South America, D. tropicalis extends farther to the south of Brazil than D.  Table III summarizes our study of D. tropicalis. Forty-five samples of natural populations have been surveyed (Fig. 1). As before, theproportions of polymorphic populations and of heterozygous individuals are estimated from samples containing at least 20 genomes, except for AM and G3pdh, where samples with ten or more genomes have been used. An average of 1248 genomes and 20 populations have been sampled for each of 25 loci of D. tropiealis. The amount of genetic variation is approximately the same as in the other two species; 59.6% of the loci are polymorphic in a given population by the "0.95" criterion, or 88% by the "0.01" criterion. An individual is heterozygous at 19.2 _+ 3.1% loci, which is not significantly different from the heterozygosity found either in D. willistoni or in D. equinoxialis.
Not only the total variation but also the amount of variation at any given locus is generally quite similar in the three sibling species. The correlation in the proportion of heterozygous individuals per locus is 0.65 between D. willistoni and D. equinoxialis, 0.84 between willistoni and tropicalis, 0.71 between equino~cialis and tropiealis. The three correlation coefficients are significantly greater than zero (P < 0.001). Similarly, positive correlations between species exist in the proportion ofpolymorphic populations per locus, in the frequency of the most common allele, and in the number of alleles.
What is the significance of the correlation between species in the amount of variation at any given locus? Gillespie and Kojima (1968) and Kojima et al. (1970) have suggested that enzymes involved in the metabolism of glucose are likely to be less variable than other enzymes. Their rationale is that energy metabolism is essential for the survival of the organism. The substrates of most enzymes involved in glucose metabolism are usually restricted to a single species of molecule; variation in the substrates of such enzymes is mainly quantitative. On the other hand, the majority of all other enzymes studied in Drosophila as well as in other animals have broad substrate specificities. These substrates often originate outside the organism and thus vary qualitatively as well as quantitatively. Variation in these enzymes may reflect the heterogeneity of the environment.
In Table IV, we compare the amount of genetic variation between glucose-metabolizing enzymes and all other enzymes. We compare the same three parameters of genetic variation used by Kojima et al. (1970): proportion of loci which are polymorphic in a given population, proportion of heterozygous loci per individual, and average number of alleles per locus. In all three measures, loci coding for glucose-metabolizing enzymes are less variable than loci coding for other enzymes. The most precise measure of genetic variation is the proportion of heterozygous loci. On the average, in the three sibling species, loci not involved in glucose metabolism are somewhat more than twice as variable as those coding for glucose-metabolizing enzymes.    (1971) has argued that enzymes which catalyze irreversible and rate-limiting reactions along a pathway affect production of the end result of the pathway more severely than enzymes with equilibrium constants not very different from 1. Changes in the activity of the former type of enzymes will have larger effects on the pathway than changes in the latter, since the effect of the latter will be largely diluted by mass action. From these considerations, Johnson concludes that natural selection is more likely to promote stable polymorphisms among enzymes catalyzing physiologically irreversible reactions than among enzymes whose reactions are freely reversible or among enzymes with equilibrium favoring the reverse of the physiologically significant direction. According to his hypothesis, the regression of the amount of polymorphism at a given locus on the equilibrium constant of the enzyme should be positive.
In Table V, we give the coefficients of the linear regression of various measures of polymorphism on the logarithm of the equilibrium constant, Keq, of the enzymes. The following equilibrium constants are taken from Johnson (1971): alcohol dehydrogenase, 1 x 1011; c¢-glycerophosphate dehydrogenase, lx 104; hexokinase, 155; isocitrate dehydrogenase, 20; malic enzyme, 20; phosphoglucomutase, 17; aldolase, 2 x 10-4; malate dehydrogenase, 7x 10 -13. The following enzymes were not considered by Johnson: the equilibrium constants used are triosephosphate isomerase, 20 (Oesper and Myerhof, 1950); adenylate kinase, 1 (Bowen and Kerwin, 1956); glyceraldehyde 3-phosphate dehydrogenase, 8 x 10 -1 (Walsh and Sallach, 1965). Besides the proportion of heterozygous individuals, we have used two measures of polymorphism used by Johnson (1971) and a third one suggested by him (personal communication). The "index ofpolymorphism" is estimated by 1 -P, where P is simply the average frequency of the most common allele, weighted for sample size (Johnson, 1971). Het./Het. max. is the ratio of the observed heterozygosity to the heterozygosity which would occur if all alleles occurring with a frequency no less than 0.01 would in fact occur all with equal frequencies (and the genotypic frequencies would satisfy Hardy-Weinberg expectation).
It is clear that our data do not support Johnson's hypothesis. Each of the four measures of genetic polymorphism for each of three species of Drosophila and for the three species combined gives coefficients of regression none of which is significantly different from zero.

DISCUSSION
Studies of enzyme variation in Drosophila flies, house and field mice, man, and other animals and plants have shown that natural populations of these organisms contain a great deal of genetic variation. Much of this variation appears to be maintained by balancing natural selection (see, for instance, Prakash et al., 1969;Ayala et al., 1971Ayala et al., , 1972Allard and Kahler, 1972).
What forms of balancing selection are involved in each particular case largely remains to be ascertained. The occurrence of heterosis has been demonstrated for two different loci in Drosophila (Richmond and Powell, 1970;Wills and Nichols, 1971). That diversifying selection plays a major role has recently been demonstrated by Powell (1972), who found that levels of polymorphism are directly related to environmental heterogeneity. Kojima and his collaborators have shown that genotypic fitnesses are sometimes frequency dependent--a mechanism which underlies adaptation to environmental heterogeneity (Kojima and Yarbrough, 1967;Kojima and Tobari, 1969;Huang et al., 1971).
The number of loci studied in any particular organism and the number of organisms studied are still fairly limited. Yet several patterns are starting to emerge (Ayala, 1972): (1) the amount of polymorphism varies considerably from locus to locus; (2) at a given locus, the amount and pattern of the variation remain fairly constant from population to population throughout the whole distribution of a given species; (3) among closely related species, the amount of variation at a given locus is fairly constant from species to species. These generalizations suggest that the amount of variation at a given locus may be an attribute of the physiological role of the enzyme. Can we predict at least to a first approximation the degree of variation at a locus from knowledge of the physiological function or the biochemical properties of the enzyme ? Johnson (1971) has suggested that the amount ofpolymorphism may be a direct, positive function of the equilibrium constant, Keq, of the enzyme in its physiological milieu. He analyzed data from several Drosophila species and found a positive linear regression of the frequency of the second most common allele on the logarithm of Keq. Similarly, there seemed to be a positive linear regression of the logarithm of his "index of polymorphism" (1 -P, see Results) on the logarithm of Keq. It should be noted, however, that Johnson uses Koq values which do not obtain at the physiologically significant pH, which must be around 7. For alcohol dehydrogenase, he has taken Keq --1021 and for malate dehydrogenase Keq = 7x 10 -23. These values obtain when the molarity of hydrogen ions, H +, is 1 (pH 0). At pH 7, the equilibrium constants are, approximately, 104 and 10 .5 for alcohol dehydrogenase and malate dehydrogenase, respectively (see Johnson, 1960).
Our data do not support Johnson's hypothesis. We have obtained extensive data on the natural polymorphisms of three closely related species of Drosophila. Using four different measures of genetic polymorphisms, we have attempted to correlate the amount of polymorphism with the logarithm of the equilibrium constant of the enzyme. In every case the regression of amount of polymorphism on log Keq was not significantly different from zero.
Lack of linear correlation between amount of polymorphism and equilibrium constant is not surprising. Johnson argues that "enzymes which exert acute control over flow through the pathway," that is, enzymes catalyzing physiologically irreversible reactions, should be more polymorphic. If this rationale is valid, the amount of polymorphism should be high in enzymes both with very high and with very low Keq. Both catalyze irreversible reactions, although the direction of the reaction is opposite. Yet he attempted a straight linear regression for enzymes whose Keq ranged from 7 x 10 -~3 to 1 x 1011. A V-shaped function, with the vertex around Ken = 1 would have been more appropriate. The validity of his rationale may be questioned on other grounds as well. Even if natural selection acts strongly on enzymes catalyzing irreversible reactions, because of their crucial role in a pathway, it is not clear why the mode of selection should be in such cases balancing selection. It might be argued, at least with the same a priori probability, that we should expect directional selection to act on those enzymes. Because of their crucial role. natural selection might be expected to favor that form of the enzyme which optimizes for the organism the amount of flow in the reaction. Furthermore, controlling enzymes in a pathway often have allosteric sites as well as active sites of catalysis. Thus to maintain a functioning enzyme, at least two sites must be protected from radical change. Kojima et al. (1970) studied enzyme variation in two natural populations of each of two species, D. melanogaster and D. ananassae, and in one natural population of each of three species, D. simulans, D. affinis, and D. athabasca. Six to 11 enzymes involved in glucose metabolism (group I) and four to eight other enzymes (group II) were studied in each population. On the average, somewhat more than 100 flies were studied from each population at each locus. They found that enzymes from group II were on the average more polymorphic than those from group I. Three measures of polymorphism were used. The ratio of the amount of polymorphism in group II to that in group I was 3.70, 4.95, and 3.27 for the number ofpolymorphic populations, the frequency of heterozygotes, and the number of alleles per locus.
Our data confirm the observation of Kojima et al. (1970) that glucosemetabolizing enzymes are, on the average, less polymorphic than other enzymes. Richmond (1972) has made an extensive study of genetic variation in D. paulistorum, another sibling species of the D. willistoni group. He has also found less variation in glucose-metabolizing enzymes than in other enzymes. In our study, however, the amount of polymorphism is only about twice as large in enzymes from group II as in those from group I. (It should be pointed out that in the number of alleles per locus we have included all alleles with frequency ~0.01, while Kojima et al. included only those with frequencies 20.05.) In our data, differences between the two groups of enzymes in the amount of polymorphism are similar to those found in the house mouse by Selander and Yang (1969) (see also Kojima et al., 1970). Gillespie and Kojima (1968) were the first to propose the hypothesis that enzymes involved in glucose metabolism should be less variable than other enzymes. This interesting hypothesis needs further examination and testing. If the rationale is correct, one should be able to find that, in general, enzymes involved in biochemical pathways essential to the organism and which act on a single species of substrate should be less variable than those involved in peripheral pathways and having broad specificities. A possible test for the hypothesis could be made by examining enzyme variation in plants. In green plants, enzymes involved in photosynthesis play an essential role analogous to that of the glucose-metabolizing enzymes in animals.