Understanding the genetic basis underlying the process of speciation is one of the primary goals in the field of evolutionary biology. However, despite recent and exciting progress in the field of speciation genetics, particularly in the area of postzygotic isolating mechanisms, surprisingly little is still known about the genetic basis and evolutionary forces that are important early on in speciation. Notably, molecular mechanisms relating to the evolution of prezygotic isolating barriers are particularly poorly understood. While studies on the genetics of postzygotic isolating barriers are critical to our understanding of how species boundaries are maintained post-speciation, such factors potentially may not have been involved in driving the actual speciation event, and instead may have evolved secondarily. By studying recently diverged populations, we increase the chances that the differences in the genome that we detect are actually directly responsible for driving reproductive isolation and thus speciation. The widespread availability of whole genome sequencing techniques opens up the opportunity to examine speciation at the genomic level in non-model species that may be more applicable to the study of early speciation.
This research takes advantage of whole genome sequencing techniques and a very young semispecies system, Drosophila athabasca. The D. athabasca species complex, which is composed of three overlapping semispecies - Western-Northern, Eastern-A, and Eastern-B, provides a unique system in which to study incipient speciation using population genomics. The three semispecies of D. athabasca are estimated to have diverged less than 25,000 years ago and are morphologically indistinguishable. Individuals will hybridize in the laboratory, but their geographic ranges and distinct courtship songs, which result in prezygotic isolation, differentiate the populations sufficiently for them to be designated as semispecies. This very young divergence time and unique population structure within D. athabasca makes it an ideal system to study the genetics of prezygotic isolation and incipient speciation.
I first generated a de novo reference genome assembly for D. athabasca by sequencing the genome at 30X coverage using Illumina next-generation sequencing technologies and annotated this reference genome using a combination of de novo, comparative, and mRNAseq gene finding methods. In order to examine the genome of D. athabasca at a population genomic level, I established 404 iso-female lines of D. athabasca collected from across the species range, including the previously identified ranges of all three semispecies. I characterized courtship songs from a subset of these lines and sequenced the genomes of 28 individuals, roughly equally distributed geographically and between semispecies, each at 10X coverage.
Using this genome-wide population data, I quantified levels of genome-wide diversity and differentiation within and between semispecies. Despite relatively low levels of divergence within the complex, principal component and phylogenetic analyses using the genomic data clearly separates individuals into distinct genetic groups corresponding to the three behaviorally defined semispecies. Furthermore, phylogenetic analysis places Eastern-A and Eastern-B as sister taxa, confirming previous research indicating that Eastern-A and Eastern-B semispecies are the more closely related semispecies, with Western-Northern being the most anciently diverged of the three semispecies. To infer the speciation history of the D. athabasca complex, I fit the data to demographic models and estimate divergence under a model of isolation with low levels of migration. This model estimates a divergence time of only 6,000 years ago for the Eastern-A/Eastern-B split and 16,000 years for the Western/Eastern split, consistent with a previous hypothesis of population expansion and colonization of North America following the last glacial maximum.
Overall divergence within the semispecies is low, with approximately 2 million sites variable within D. athabasca, and only 1% of these variable sites being private and fixed within semispecies. Furthermore, I find divergence is not evenly distributed across the genome, with the X-chromosome exhibiting increased levels of divergence compared with autosomes. Most interestingly, despite the low levels of overall divergence, genome-wide scans identify a single large spike of differentiation between the two youngest semispecies, which have an estimated divergence time of only 6,000 years. Scans for selection also show strong signatures of a selective sweep within semispecies at this same locus, indicating that divergence in this particular region has likely been driven by selection. Further analysis of this region reveals that it harbors a gene previously identified to be involved in courtship song in other species within the Drosophila genus, suggesting that it may play an important role in the evolution of prezygotic reproductive barriers between the D. athabasca semispecies.
This study provides one of the first genome-wide population genetic investigations of the molecular changes and population parameters important during incipient speciation, contributing important new information to our view of the genetics of speciation. With some of the youngest divergence times used to study speciation genetics on a genome-wide level thus far, examining the patterns of divergence within and between the semispecies of D. athabasca has allowed us to identify a candidate gene that may play a role in the evolution of prezygotic isolation, and thus, reproductive isolation at the earliest stages of speciation.