Lawrence Berkeley National Laboratory
Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for
Incomplete Lineage Sorting
- Author(s): Pollard, Daniel A.
- Iyer, Venky N.
- Moses, Alan M.
- Eisen, Michael B.
- et al.
The phylogenetic relationship of the now fully sequenced species Drosophila erecta and D. yakuba with respect to the D. melanogaster species complex has been a subject of controversy. All three possible groupings of the species have been reported in the past, though recent multi-gene studies suggest that D. erecta and D. yakuba are sister species. Using the whole genomes of each of these species as well as the four other fully sequenced species in the subgenus Sophophora, we set out to investigate the placement of D. erecta and D. yakuba in the D. melanogaster species group and to understand the cause of the past incongruence. Though we find that the phylogeny grouping D. erecta and D. yakuba together is the best supported, we also find widespread incongruence in nucleotide and amino acid substitutions, insertions and deletions, and gene trees. The time inferred to span the two key speciation events is short enough that under the coalescent model, the incongruence could be the result of incomplete lineage sorting. Consistent with the lineage-sorting hypothesis, substitutions supporting the same tree were spatially clustered. Support for the different trees was found to be linked to recombination such that adjacent genes support the same tree most often in regions of low recombination and substitutions supporting the same tree are most enriched roughly on the same scale as linkage disequilibrium, also consistent with lineage sorting. The incongruence was found to be statistically significant and robust to model and species choice. No systematic biases were found. We conclude that phylogenetic incongruence in the D. melanogaster species complex is the result, at least in part, of incomplete lineage sorting. Incomplete lineage sorting will likely cause phylogenetic incongruence in many comparative genomics datasets. Methods to infer the correct species tree, the history of every base in the genome, and comparative methods that control for and/or utilize this information will be valuable advancements for the field of comparative genomics.