Selective sorting of ancestral introgression in maize and teosinte along an elevational cline

While often deleterious, hybridization can also be a key source of genetic variation and pre-adapted haplotypes, enabling rapid evolution and niche expansion. Here we evaluate these opposing selection forces on introgressed ancestry between maize (Zea mays ssp. mays) and its wild teosinte relative, mexicana (Zea mays ssp. mexicana). Introgression from ecologically diverse teosinte may have facilitated maize’s global range expansion, in particular to challenging high elevation regions (> 1500 m). We generated low-coverage genome sequencing data for 348 maize and mexicana individuals to evaluate patterns of introgression in 14 sympatric population pairs, spanning the elevational range of mexicana, a teosinte endemic to the mountains of Mexico. While recent hybrids are commonly observed in sympatric populations and mexicana demonstrates fine-scale local adaptation, we find that the majority of mexicana ancestry tracts introgressed into maize over 1000 generations ago. This mexicana ancestry seems to have maintained much of its diversity and likely came from a common ancestral source, rather than contemporary sympatric populations, resulting in relatively low FST between mexicana ancestry tracts sampled from geographically distant maize populations. Introgressed mexicana ancestry in maize is reduced in lower-recombination rate quintiles of the genome and around domestication genes, consistent with pervasive selection against introgression. However, we also find mexicana ancestry increases across the sampled elevational gradient and that high introgression peaks are most commonly shared among high-elevation maize populations, consistent with introgression from mexicana facilitating adaptation to the highland environment. In the other direction, we find patterns consistent with adaptive and clinal introgression of maize ancestry into sympatric mexicana at many loci across the genome, suggesting that maize also contributes to adaptation in mexicana, especially at the lower end of its elevational range. In sympatric maize, in addition to high introgression regions we find many genomic regions where selection for local adaptation maintains steep gradients in introgressed mexicana ancestry across elevation, including at least two inversions: the well-characterized 14 Mb Inv4m on chromosome 4 and a novel 3 Mb inversion Inv9f surrounding the macrohairless1 locus on chromosome 9. Most outlier loci with high mexicana introgression show no signals of sweeps or local sourcing from sympatric populations and so likely represent ancestral introgression sorted by selection, resulting in correlated but distinct outcomes of introgression in different contemporary maize populations.

maize also contributes to adaptation in mexicana, especially at the lower end of its elevational range. In sympatric maize, in addition to high introgression regions we find many genomic regions where selection for local adaptation maintains steep gradients in introgressed mexicana ancestry across elevation, including at least two inversions: the well-characterized 14 Mb Inv4m on chromosome 4 and a novel 3 Mb inversion Inv9f surrounding the macrohairless1 locus on chromosome 9. Most outlier loci with high mexicana introgression show no signals of sweeps or local sourcing from sympatric populations and so likely represent ancestral introgression sorted by selection, resulting in correlated but distinct outcomes of introgression in different contemporary maize populations.

Introduction 1
Interbreeding between partially diverged species or subspecies can result in admixed 2 individuals with low fitness, e.g. due to hybrid incompatibilities [1][2][3]. Consistent with 3 the view that hybridization is often deleterious, a growing number of species show 4 evidence of pervasive selection against introgressed ancestry [4][5][6][7][8][9][10][11][12][13]. At the same time, 5 introgression can be a source of novel genetic variation and efficiently introduce 6 haplotypes carrying sets of locally adapted alleles, with the potential for rapid 7 adaptation to new ecological challenges [14]. Indeed, admixture has been linked to 8 adaptive species radiations and/or rapid niche expansions in a number of natural 9 systems, including mosquitoes [15], Drosophila [16], butterflies [9], cichlids [17], 10 sunflowers [18], wild tomatoes [19] and yeast [20,21]. In addition, introgression from 11 wild relatives has facilitated the broad range expansions of multiple domesticated crops 12 (reviewed in [22] and [23]), and gene flow from crops back into their wild relatives has in 13 some cases opened up novel 'weedy' niches [24]. domesticated in the Balsas River Valley in Mexico from a lowland-adapted sub-species 21 of teosinte (Zea mays ssp. parviglumis [28]), which grows readily at sea level and lower 22 elevations of the Sierra Madre del Sur [29]. In contrast, Zea mays ssp. mexicana, which 23 diverged from parviglumis about 60 thousand years ago [30], is endemic to highland 24 regions in Mexico (∼1500-3000 meters in elevation) where it has adapted to a number of 25 ecological challenges: a cooler, drier climate with higher UV intensity, different soil 26 nutrient composition, and a shorter growing season necessitating earlier flowering 27 times [31][32][33][34][35]. 28 Maize was introduced as a crop to the mountains of Mexico around 6.2 thousand 29 years ago [36], and it is thought that gene flow from mexicana assisted in adaptation to 30 high elevation selection pressures. Highland maize and mexicana share a number of 31 putatively adaptive phenotypes [37,38], including earlier flowering times for the shorter 32 growing season [34], purple anthocyanin-based pigmentation which shields DNA from 33 UV damage [39] and increases solar heat absorption [40], and macrohairs on the leaf 34 and stem sheath, which are thought to increase herbivore defense [41] and/or heat 35 maintenance in colder environments [42]. Earlier studies using 50K SNP-chip data for 36 highland populations [43] or genomewide data for a small number of individuals [44,45], 37 have shown that highland maize populations have experienced significant admixture 38 from mexicana, reaching high frequency at some loci, consistent with adaptive 39 introgression. 40 While some highland and locally-adapted alleles may be beneficial to maize, many 41 introgressed mexicana alleles, especially those affecting domestication traits, should be 42 selected against by farmers growing maize. In addition, maize alleles introgressed into 43 mexicana should be selected against because maize has accumulated genetic load from 44 reduced population sizes during domestication [44] and because domestication traits 45 generally reduce fitness in the wild [46][47][48], e.g. loss of disarticulation and effective seed 46 dispersal [37]. 47 In this study, we generate whole genome sequencing data to investigate genomic 48 signatures of admixture and selection in paired sympatric maize and mexicana 49 populations, sampled from 14 locations across an elevational gradient in Mexico. 50 Mexicana was sampled from wild populations and maize was sampled from nearby fields 51 where traditional cultivation methods and open pollination have resulted in populations 52 with distinct local characteristics and high phenotypic and genetic diversity (often called 53 maize 'landraces'). This expanded sampling of sympatric maize and mexicana 54 populations across Mexico, combined with genomewide data and a well-parameterized 55 null model, improves our ability to more formally test for adaptive introgression and 56 identify likely source populations. The source of introgression is of interest, as teosinte 57 demonstrates local adaptation to different niches within the highlands and there is 58 significant genetic structure between mexicana ecotypes [32,37,[49][50][51]. Thus we can test 59 whether local mexicana populations are the ongoing source for geographically-restricted 60 locally adaptive haplotypes. We use this comprehensive genomic dataset to characterize 61 the bi-directional timing and origin of introgression and evaluate the patterns and scale 62 of natural selection for and against admixture between these taxa. 63 Results/Discussion 64 Genomewide mexicana ancestry is structured by elevation 65 We sampled paired sympatric populations from 14 geographically dispersed locations to 66 assess the extent of gene flow between maize and mexicana in Mexico. Maize today is 67 grown across the entire elevational range of its wild teosinte relatives, from sea-level up 68 to 4000 meters [52]. Our sampled sites range from 1547-2600 meters in elevation, which 69 spans a large portion of mexicana's range and exceeds the upper elevational range for 70 maize's wild ancestor, parviglumis (Fig 1). For each of 14 maize/mexicana sympatric 71 sample locations, we resequenced 7-15 individuals per subspecies. 72 We additionally sequenced 43 individuals from 3 mexicana reference populations, 73 totalling 348 low-coverage genomes (mean ∼1x). Two of these mexicana reference 74 populations are documented to have no adjacent maize agriculture within the past 50 75 years, while a third higher elevation population (Amecameca) was chosen because it 76 grows above the elevational range of parviglumis, and thus outside of the historical 77 range of maize. For a maize reference population, we added 55 previously published 78 high-coverage genomes from a population grown near Palmar Chico at 983 m [53,54], 79 well below the elevational range of mexicana. Because parviglumis is known to admix 80 historically with both of our focal subspecies in Mexico, we also included 50 previously 81 published high-coverage parviglumis genomes from the 'Mound' population at 1,008 m, 82 also near Palmar Chico [53][54][55]. Completely allopatric reference populations are not 83 available because maize has been grown at high density throughout Mexico across the 84 full elevational range of both teosintes. A priori, gene flow from maize into mexicana is 85 possible at Amecameca, and historically between maize and teosinte at all locations. We 86 therefore assess for possible gene flow into each reference population below.  To estimate genomewide ancestry proportions for each individual, we ran 93 NGSAdmix [56] with K=3 genetic clusters and genotype likelihoods for all mexicana, 94 maize, and parviglumis individuals. The three genetic clusters clearly map onto 95 mexicana, maize and parviglumis ancestry, with no significant admixture in the lowland 96 maize or parviglumis reference populations. We find minority parviglumis ancestry in 97 the two lower-elevation reference mexicana populations, but no evidence of introgression 98 from parviglumis or maize into the highest elevation population at Amecameca (Fig 2A). 99 Looking across samples from the 14 sympatric sites, we find a positive association 100 between ancestry proportion and elevation (km), with higher mexicana ancestry at 101 higher elevations in both sympatric maize (β = 0.22, P = 1.01 × 10 −31 ) and sympatric 102 mexicana (β = 0.32, P = 2.02 × 10 −38 ) individuals ( Fig 2B). 103 Increasing mexicana ancestry at higher elevations is consistent with selection 104 favoring mexicana ancestry at higher elevations, but could also be due to purely 105 demographic processes, e.g. a higher density of (wind-dispersed) mexicana pollen at 106 higher elevations, or increased gene flow from non-admixed maize populations at lower 107 elevations. While most populations have admixture proportions well-predicted by their 108 elevation, outlier populations may be the result of recent colonization histories for some 109 locations or adaptation to other environmental niches. Within teosintes, elevation is a 110 major axis of niche separation between parviglumis (the ancestor of maize) and 111 mexicana [50,57], but genetic differentiation also correlates with soil nutrient content 112 and at least four principal components constructed from climatic variables [33]. 113 Origin and timing of introgression 114 If mexicana ancestry found in contemporary maize genomes facilitated maize's 115 colonization of the highlands approximately 6.2 thousands years ago [36], we would 116 expect introgressed ancestry tracts to be short, due to many generations of 117 recombination, and possibly to be derived from an ancient source population common 118 to many present-day maize populations. To test these predictions, we estimated local substantially across populations (median: 173, range: 29-1006). Because the HMM fits 130 a single-pulse per ancestry to what was almost certainly multiple admixture events over 131 time, we caution against over-interpretation of exact dates. Multiple pulses or ongoing 132 gene flow biases estimates towards the more recent pulse(s) [59,60] and even old 133 estimates do not exclude the possibility of limited more recent admixture. These 134 single-pulse approximations do, however, provide evidence that a large proportion of the 135 introgression, especially mexicana ancestry into maize, is found on short ancestry tracts 136 and therefore relatively old.

137
To identify likely source population(s) for introgressed ancestry, we compared F ST 138 between all sympatric populations using only reads from high-confidence homozygous 139 ancestry tracts (posterior > 0.8) for maize and mexicana ancestry separately. We find 140 that most mexicana ancestry in maize resembles other mexicana ancestry introgressed 141 into other maize populations, rather that mexicana ancestry from the local sympatric  this interpretation, we have evidence that local maize at Ixtlan is at least partially 159 descended from recently introduced commercial seed (relayed by local farmers [43]).  When there is widespread selection against introgressing variants at many loci across 176 the genome, selection will more efficiently remove linked ancestry in regions of the 177 genome with lower recombination rates, which creates a positive relationship between 178 local recombination rate and the proportion of introgressed ancestry [4][5][6][7][8][9][10][11][12][13]61]. To test 179 whether such negative selection is shaping patterns of introgression genomewide in 180 sympatric maize and mexicana, we first divided the genome into quintiles based on the 181 local recombination rates for 1 cM windows. We then ran NGSAdmix on the SNPs 182 within each quintile separately, using K=3 clusters, to estimate ancestry proportions for 183 each quintile. We used a recombination map from maize [62], which is likely to be 184 correlated with other Zea subspecies at least at the level of genomic quintiles. A 185 limitation of this analysis, however, is that we do not have a recombination map for 186 hybrid populations, which means that e.g. segregating structural inversions will not 187 necessarily show low recombination rates.  Table). This is again consistent with low-recombination 201 rate regions having a stronger effect of linked selection reducing mexicana ancestry, with 202 higher elevation maize populations either experiencing larger amounts of gene flow or 203 retaining more ancestry due to adaptive processes in high recombination regions.

204
Because recombination rate is positively correlated with gene density in Zea [63], we 205 also tested the Spearman's rank correlation between quintiles defined by coding base  While maize ancestry in general is not predicted to provide adaptive benefits in 216 teosinte, invasive mexicana in Europe shows selective sweeps for maize ancestry at 217 multiple loci that have contributed to its establishment as a noxious weed [64] and we 218 speculate that maize could be a source of alleles adapted to human-modified landscapes. 219 We repeated these analyses using local ancestry calls as our introgression estimates 220 and found a non-significant Spearman's rank correlation between mexicana  However, we caution that local ancestry methods may also have subtle biases in 233 power that are sensitive to local recombination rates and make them less reliable for 234 comparing ancestry patterns across recombination rate quintiles. 235 Overall, we find support for widespread selection against introgression into maize 236 and mixed results from similar tests of this hypothesis in mexicana.

237
High introgression peaks shared across populations 238 To assess adaptive introgression in our sympatric populations, we identified 239 introgression 'peaks' where minor ancestry exceeds the genomewide mean by more than 240 2 standard deviations. We find no strong reduction in average diversity (π) for 241 mexicana ancestry at high introgression peaks ( S4 Fig). This maintenance of diversity 242 implies that selection at most peaks has favored multiple mexicana haplotypes, and 243 hard sweeps for recent beneficial mutations on a specific haplotype are rare. 244 We observe that many high mexicana ancestry peaks are shared across subsets of 245 our 14 maize populations (see e.g. chr4, Fig 5). While most outlier peaks are unique to 246 a single population, many peaks are shared across 7 or more of the populations (S10A 247 Fig). To a lesser extent, we also observe sharing of high-introgression peaks for maize  High introgression peaks in many independent populations would be very 250 unexpected by chance. However, our sampled populations do not provide independent 251 evidence for adaptive introgression, due to shared gene flow and drift post-admixture 252 (e.g. long-distance human-assisted dispersal of maize seed). To estimate the rate of peak 253 sharing we should expect from demographic processes alone, we simulated 100,000 This lack of local adaptive introgression is perhaps surprising given the genetic 271 structure in mexicana associated with different ecotypes [49] and evidence for local 272 adaptation within teosinte across elevation [31,50,51]. However, mexicana also has 273 substantial standing variation and we find little evidence for hard sweeps, so one 274 possibility is that local maize and local mexicana are adapting to the same environment 275 by different available genetic paths, or even the same causal SNP on a different set of 276 haplotype backgrounds. Older introgressed tracts may also offer more accessible paths 277 for maize adaptation, having already purged some of their linked deleterious variation. 278 Additionally, local exitinction and re-colonization by mexicana is common [37] and may 279 contribute to a lack of local sourcing of adaptive haplotypes from contemporary   teosintes [50], overlaps QTLs for leaf pigmentation and macrohairs [42], and is 303 associated with increased yield in maize at high elevations and decreased yield at low 304 elevations [67], but has thus far eluded functional characterization of genes within the 305 inversion [67]. 306 Our second strongest association co-localizes with macrohairless1 (mhl1 ), a locus on 307 chromosome 9 that controls macrohair initiation on the leaf blade [65] and is associated 308 with a major QTL explaining 52% of macrohair variation between high and low 309 elevation teosinte mapping parents [42]. Within teosintes, populations of the lowland 310 ancestor of maize, parviglumis, show convergent soft sweeps at the mhl1 locus not 311 shared by mexicana [32]. Macrohairs are characteristic highland phenotypes in teosinte 312 and maize and are thought to confer adaptive benefits through insect defence and/or 313 thermal insulation [41,42]. We identified a 3 Mb outlier region within the larger mhl1 314 QTL which we analyzed further using PCA. We found three genetic clusters along the 315 first principal component, evidence that an inversion polymorphism (hereafter Inv9f ) 316 maintains differentiation between maize/parviglumis and mexicana haplotypes across 317 this region (S33 and S34 Figs). Additionally, we found evidence that the mexicana-type 318 allele at the inversion segregates at low frequency within our lowland parviglumis 319 reference population. Based on reduced diversity, the lowland maize/parviglumis-type 320 allele at the inversion is likely derived (S6 Table). Thus mexicana-alleles at Inv9f could 321 have been inherited by maize either through introgression or incomplete lineage sorting 322  [50] and the mhl1 locus [65] were converted to the maize reference genome v4 coordinates using Assembly Converter (ensembl.gramene.org). Chromosome numbers are placed at the centromere midpoint (approximate centromere positions are from [66]). before selection pushed them to high frequency in highland populations.

323
The clinal patterns of admixture that we observe at inversions Inv4m and Inv9f  Selection at candidate domestication genes 333 We hypothesized that domestication genes will be barriers to introgression bilaterally 334 between maize and mexicana [43]. While we do not have power to identify individual 335 outlier genes that have low introgression, we can test for enriched overlap between 336 'introgression deserts' and a set of putative domestication genes spread across the 337 genome. 338 We examined introgression for a sample of 15 well-characterized domestication genes 339 from the literature (see S7 Table), and compared them to the regions of the genome 340 with the lowest 5% introgression of teosinte ancestry into sympatric maize and maize 341 ancestry into sympatric mexicana ('introgression deserts'). A small but enriched subset 342 of these domestication genes overlap with introgression deserts in sympatric maize (5, P 343 < 0.001) and likewise in sympatric mexicana (5, P = 0.001).

350
Another six domestication genes have low introgression in one direction only [73][74][75][76][77] 351 (see S7 Table). Among these, sugary1 (su1 ) in the starch pathway has low maize 352 ancestry in mexicana but shows a steep increase in introgressed mexicana ancestry 353 proportion with elevation in maize (+0.95 per km, < 5% FDR), which suggests this 354 gene has pleiotropic effects on non-domestication traits in maize, with fitness trade-offs 355 across elevation. Sugary1 mutations modify the sweetness, nutrient content and texture 356 of maize kernels (e.g. sweet corn), but also affect seed germination and emergence at 357 cold temperatures [78], candidate pleiotropic effects that could be more deleterious at 358 higher elevations.

359
The remaining seven domestication genes do not overlap introgression deserts in 360 either subspecies despite evidence for their roles in domestication: zfl2 (cob 361 rank) [79][80][81], pbf1 (storage protein synthesis) [82], ba1 (plant architecture) [83], ae1 362 (starch biosynthesis) [76], ra1 and ra2 (inflorescence architecture) [84,85] and 363 ZmSh1-5.1+ZmSh1-5.2 (seed shattering) [75]. Despite evidence of introgression at many 364 domestication loci, maize populations retain all of the classic domestication traits, and 365 mexicana populations maintain 'wild' forms. Epistasis for domestication traits [47]   We tested more broadly for enriched selection within the flowering time pathway 381 using a set of 849 candidate flowering time genes [87,88] and a more stringent 5% FDR 382 cutoff. No genes from the core flowering time pathway [87] and only 13/806 other Intrinsic genetic incompatibilities and partial temporal isolation (offset flowering times) 396 create an incomplete barrier to gene flow and F1 hybrids are commonly observed in the 397 field [37], suggesting that hybridization is frequent and ongoing. Yet, we find little hybrid seeds that do not disarticulate are farmer-dependent for successful dispersal and 405 reproduction, although first-generation backcrosses to mexicana have been observed [37]. 406 Consistent with domestication loci acting as barriers to introgression, in both maize 407 and mexicana an enriched subset of candidate domestication genes overlap Aire* (see S1 Table). A previous study of crop-wild admixture genotyped different 432 maize and mexicana individuals from 9 of these locations (marked with *), using the 433 Illumina MaizeSNP50 Genotyping BeadChip [43]. In addition, we chose three Malinalco were chosen because they have no record of contemporary maize agriculture 436 nearby and a third population, Amecameca, was added as a complement to these two 437 reference populations because it grows at a higher elevation, beyond the historical range 438 of parviglumis.

439
At each sampling location, multiple ears from maternal plants were collected for seed. 440 Population accessions varied in the number of maternal plants with viable seeds. When 441 available, we planted multiple seeds within each ear but only randomly selected one 442 individual for sequencing from the plants that successfully germinated in the greenhouse. 443 DNA extraction and sequencing 444 We extracted DNA from leaf tissue and then prepared sequencing DNA libraries using a 445 recently published high-throughput protocol ("Nextera Low Input, Transposase Enabled 446 protocol" [90]) with four main steps: (1) DNA shearing and tagmentation by the

469
To find local recombination rates, we converted marker coordinates from a published 470 0.2 cM genetic map [62] to the v4 maize genome using Assembly Converter 471 (ensembl.gramene.org). We removed any markers that mapped to a different First, we checked read quality using fastQ Screen (v0.14.0 [92]) and trimmed out 478 adapter content from raw sequencing reads using the trimmomatic wrapper for 479 snakemake (0.59.1/bio/trimmomatric/pe) [93]. We mapped trimmed reads to the maize 480 reference genome using bwa mem (v0.7.17 [94]). We then sorted reads using SAMtools 481 (v1.9 [95]), removed duplicates using picardtools (v2.7.1) MarkDuplicates and merged 482 libraries of the same individual sequenced on multiple lanes using SAMtools merge. In 483 all subsequent analyses in the methods below we filtered out reads with low mapping 484 scores (< 30) and bases with low base quality scores (< 20).  Table). These maize and 506 parviglumis reference populations were sampled about 1 km apart from each other but 507 maintain high F ST and are isolated by differences in flowering time [54]. Both maize 508 and parviglumis reference populations are allopatric to mexicana, growing well below its 509 elevational range. We mapped and filtered reads for these reference maize and 510 parviglumis individuals using the pipeline described above and capped base quality 511 scores using BAQ.

Global ancestry inference 528
To estimate genetic relationships between populations and genomewide ancestry 529 proportions, we used methods specific to low-coverage data that rely on genotype 530 likelihoods, rather than called genotypes. Because these methods are sensitive to SNPs 531 in high linkage disequilibrium (LD), we thinned genotype likelihoods to every 100th 532 SNP (∼4kb spacing) [99]. To first confirm that maize and mexicana ancestry form the 533 major axis of genetic variation in our sample, we estimated the genetic covariance 534 matrix between all individuals using PCAngsd (v0.98.2 [100]) and visualized principal 535 components computed using eigen() in R. We then estimated global ancestry 536 proportions using the same thinned genotype likelihood files as input to NGSAdmix [56], 537 using K = 3 clusters. Clusters clearly mapped onto the three reference groups, which we 538 used to label the three ancestry components as 'maize', 'mexicana' and 'parviglumis'.

539
Local ancestry and timing of admixture 540 We inferred local ancestry across the genome using a hidden Markov model that is 541 appropriate for low-coverage data because it models genotype uncertainty down to the 542 level of read counts for all admixed individuals (ancestry hmm [58]). This method relies 543 on allele counts from separate reference populations to estimate allele frequencies for 544 each ancestry. Because some of our reference individuals have too low of coverage to 545 accurately call genotypes, we randomly sampled one read per individual to get unbiased 546 frequency estimates for major and minor alleles at each site ('angsd -doCounts 1 547 -dumpCounts 3'). To maximize ancestry-informativeness of sites in this analysis, we 548 identified SNPs in the top 10% tail for any of the three possible configurations of the 549 population branch statistic between maize, mexicana and parviglumis reference 550 populations, thereby enriching for SNPs that distinguish one reference population from 551 the other two. We only considered SNPs with at least 44 reference maize, 12 reference 552 mexicana, and 40 reference parviglumis individuals with sequencing coverage at a site. 553 For these SNPs, we estimated reference population allele frequencies from angsd ('angsd 554 -doMajorMinor 3 -GL 1 -baq 2 -doMaf 1'), then estimated pairwise F ST 's using the 555 Hudson 1992 estimator [101] as implemented by Bhatia et al. 2013 [102] to calculate the 556 population branch statistic [103] (PBS equation from pg 8 of the supplement). We then 557 calculated genetic distances between SNPs using the maize recombination map and 558 filtered our enriched variants to have minimum 0.001 cM spacing between adjacent 559 SNPs.

560
Running ancestry hmm jointly infers local ancestry for each individual and the 561 timing of admixture. This HMM method assumes a neutral demographic history in 562 which a constant-size population receives a pulse of admixture t generations in the past, 563 and finds the t that maximize the likelihood of the observed read counts and hidden 564 local ancestry state across each admixed individual's genome. The timing of admixture 565 defines the generations for possible meiotic recombination between ancestry tracts, and 566 therefore scales the transition probabilities between hidden ancestry states. We ran 567 ancestry hmm under a three-way admixture model: a population is founded 10,000 568 generations ago by mexicana, then receives a pulse of parviglumis ancestry at t parv 569 (prior: 1000 generations ago, range: 0-10,000) and a pulse of maize ancestry at t maize 570 (prior: 100 generations ago; range: 0-10,000). Because these Zea mays subspecies are all 571 annual grasses, generations can equivalently be interpreted as years since admixture. In 572 addition to t parv and t maize , the HMM outputs the posterior probabilities for 573 homozygous maize, homozygous mexicana, homozygous parviglumis, and heterozygous 574 ancestry for each individual at every site. We analysed each sympatric maize and 575 mexicana population separately, using the population's mean NGSAdmix global ancestry 576 estimate as a prior for mixing proportions, an approximate effective population size (Ne) 577 of 10,000 individuals, genetic positions for each SNP based on the maize linkage map, 578 an estimated sequencing base error rate of 3 × 10 −3 , and the three-way admixture 579 model described above. We ran ancestry hmm with an optional setting to bootstrap 100 580 random samples of 1,000-SNP genomic blocks to estimate uncertainty around the 581 estimated generations since admixture (t parv and t maize ). To test the sensitivity of the 582 HMM to our choice of Ne, we re-ran ancestry hmm with two other Ne's that differ by 583 an order of magnitude (Ne = 1k, 100k), but did not analyze these results further after 584 finding high correspondence for both local ancestry and timing estimates (S38 Fig). we assumed that the estimated ancestry at a focal site extends halfway to the next site 591 with a local ancestry estimate.

592
Diversity within ancestry 593 Using local ancestry estimates from the HMM, we identified high-confidence 594 homozygous ancestry tracts (posterior > 0.8). We filtered individual bams for reads 595 that overlap these tracks and used the resulting filtered bams to calculate diversity 596 within maize, parviglumis, and mexicana ancestry, separately. We estimate diversity 597 using the ANGSD/realSFS framework which is appropriate for low-coverage sequence 598 data it takes into account uncertainty in both genotypes and variant sites. We created a 599 concensus fasta sequence for Tripsacum ('angsd -doFasta 2') to use as the ancestral 600 state for polarizing the unfolded site frequency spectrum in these analyses.

601
For each population and ancestry, we estimated the site allele frequencies ('angsd 602 -doSaf 1 -GL 1') and subsequently estimated the genomewide site frequency spectrum 603 (SFS). We then used this SFS as a prior to estimate within-ancestry pairwise diversity 604 (π) genomewide from the site allele frequencies ('realSFS saf2theta').

605
For each pair of populations and ancestry, we additionally used realSFS to estimate 606 the two dimensional SFS from the individual population site allele frequencies 607 genomewide. We then used this 2D SFS as a prior to estimate genomewide 608 within-ancestry F ST between the two populations ('realSFS fst index -whichFst 1').

609
This call uses Hudson's F ST estimator [101] as parameterized in [102].  To estimate ancestry proportions for each recombination rate quintile, we first reduced 623 LD by thinning to 1% of SNPs (every 100th) and ran NGSAdmix 5 times separately 624 (once per quintile) with K=3 clusters. We assigned 'maize', 'mexicana' and 'parviglumis' 625 labels to the ancestry clusters based on majority assignment to the respective reference 626 panels.

627
To bootstrap for uncertainty, we re-sampled 1 cM windows with replacement from 628 each quintile 100 times, and re-ran NGSAdmix on the resulting bootstrap SNP sets.

629
Genetic clusters could be unambiguously assigned to maize, parviglumis, and mexicana 630 ancestries in all but 2 bootstrap replicates for the lowest recombination quintile and 3 631 for the highest coding density quintile, so we dropped this small number of replicates 632 with unclear ancestry assignment from the bootstrap analysis. We calculated 95% 633 percentile bootstrap confidence intervals for the estimated admixture proportions, and 634 the Spearman's rank correlation between the recombination rate (or coding bp per cM) 635 and admixture proportion ranks for each quintile. 636 We also tested for a difference in ancestry slopes with elevation across different In a complementary analysis, we used a ratio of f 4 statistics as an alternative method to estimate ancestry proportions by quintile. The f 4 statistic measures shared genetic drift (allelic covariance) between populations in a phylogeny, due to either shared branch lengths or admixture events in the evolutionary history relating these populations. Excess shared drift with one population from a pair of sister populations in the tree is a signature of admixture, analogous to the ABBA-BABA test [105], and a ratio of two f 4 statistics can be used to quantify the admixture proportion. Assuming the basic phylogenetic tree relating our reference populations (((parviglumis, maize), mexicana), Tripsacum) in S5 Fig, we can estimate α, the proportion of ancestry inherited from mexicana in a focal sympatric population, as follows [105,106]: α = f 4 (Tripsacum, parviglumis; X, maize) f 4 (Tripsacum, parviglumis; mexicana, maize) .
The denominator of this statistic estimates the branch length leading to parviglumis and 643 maize that separates these sister subspecies from mexicana; the full ratio estimates the 644 proportion of this branch that separates sympatric population X from parviglumis and 645 maize, i.e. the mexicana ancestry in X. Because the f 4 statistic is sensitive to additional 646 unmodeled admixture within the tree, we limited our reference mexicana group to 647 individuals from just one of the three reference populations (Amecameca), which 648 showed no evidence of admixture in our global ancestry analysis (see Fig 2).

649
For each 1 cM window across the genome, we used ANGSD to calculate . 656 We then calculated the Spearman's rank correlation between the recombination rate 657 quintiles and admixture proportion ranks for these quintiles. We calculated simple 658 bootstrap confidence intervals for our ancestry estimates and correlations by 659 re-sampling 1 cM windows within quintiles with replacement 10,000 times and 660 re-calculating the f 4 ratios and resulting rank correlation across quintiles to construct 661 95% percentile confidence intervals. We repeated this analysis using quintiles based on 662 coding bp per cM in place of recombination rate (cM/Mbp). 663 iii. Local ancestry estimates 664 We also calculated the Spearman's rank correlation between local recombination rate 665 (or coding bp per cM) and local ancestry proportion at the level of individual 1 cM 666 windows. For each window, we averaged local ancestry estimates from the HMM across 667 all individuals within sympatric maize, and separately, sympatric mexicana. We then 668 calculated simple bootstrap confidence intervals for our local ancestry estimates and 669 local recombination rate (or coding bp per cM) by re-sampling 1 cM windows across the 670 genome with replacement 10,000 times and re-calculating the rank correlation across 671 windows to construct 95% percentile confidence intervals.

672
Local ancestry simulations 673 We simulated maize, mexicana and parviglumis ancestry population frequencies 674 marginally using a multivariate-normal null model, e.g. admixed populations with shared demographic history will also tend to have an excess 683 of introgression.

684
To construct K, we calculated the covariance in ancestry between each pair of 685 populations i and j using all L loci with local ancestry calls genomewide: Above, Anc i,l and Anc j,l are local ancestry frequencies at a locus l while α i and α j 687 are the mean local ancestry frequencies across the genome for populations i and j. To characterize introgression peak sharing between individual populations, we defined 696 'ancestry peaks' as sites where a population has over 2 standard deviations more 697 introgressed ancestry than the genomewide mean. We counted the number of peaks that 698 are shared between all pairs and combinations of populations. To compare these results 699 to our null model, we also counted the number of introgression peaks shared by 700 populations in our simulated dataset, using the 2 s.d. cutoff set by the empirical data to 701 define peaks.

702
Because mexicana ancestry shows significant diversity, we additionally characterized 703 diversity for mexicana ancestry peaks introgressed into maize. For all introgressed 704 ancestry outlier regions in a focal maize population, we used ANGSD to estimate 705 pairwise diversity within the population (π) and differentiation (F ST ) between the focal 706 sympatric maize populations and their local sympatric mexicana population. We 707 focused on the mexicana ancestry within peaks by limiting our diversity estimates to 708 only include high-confidence homozygous mexicana ancestry tracts (posterior > 0.8).

709
For these analyses, we pooled information across outlier peaks, but distinguish between 710 introgression peaks exclusive to the focal population and introgression peaks shared  We calculated 5% false-discovery-rate (FDR) cutoffs for high and low introgressed 723 ancestry using the Benjamini-Hochberg method [108] and simulation results to estimate 724 the expected frequency of false-positives under our null MVN model (one-tailed tests). 725 We repeated this approach to identify outlier loci with steep positive (or negative) 726 slopes for mexicana ancestry across elevation at a 5% FDR.

727
Test for reduced introgression at domestication genes 728 To test whether domestication genes are unusually resistant to introgression, we first 729 defined 'introgression deserts' as regions with the lowest 5% genomewide of teosinte 730 introgression into sympatric maize (or, separately, maize introgression into sympatric 731 mexicana). We then looked up v4 coordinates on Ensembl.org for genes associated with 732 maize domestication in the literature (S7 Table), and used bedtools 'intersect' to identify 733 which of these genes ±20 kb overlap introgression deserts. To test for significance, we 734 randomly shuffled the gene positions across the genome (bedtools 'shuffle') 1000 times 735 and re-calculated overlap with introgression deserts for each permuted data set.

736
Test for selection within the flowering time pathway 737 We identified a list of 48 core flowering time pathway genes from the literature [87], and 738 a broader list of 905 flowering time candidate genes [87,88]. From the combined set, we 739 included 849 total genes (43 core pathway) which we were able to localize on assembled 740 autosomes of the v4 reference genome using MaizeGDB gene cross-reference files [109]. 741 We counted the number of genes ±20 kb that intersected with outlier regions for steep 742 increases in mexicana introgression with elevation (and, separately, high mexicana 743 introgression) in sympatric maize populations (< 5% FDR) using bedtools 'intersect', 744 then tested for significance by repeating this analysis with 1000 randomly shuffled gene 745 positions. 746 Fig S1. PCA of maize, mexicana and parviglumis. First and second principal components from the genomewide genetic covariance matrix relating all maize, mexicana and parviglumis individuals. PC1 aligns with a maize to mexicana ancestry gradient while PC2 separates out parviglumis ancestry from the other two Zea subspecies.   Diversity (π) within ancestry Each point summarises pairwise genetic diversity (π) for genomic regions with high-confidence homozygous maize or mexicana ancestry, calculated separately for the sympatric maize and mexicana populations at each sampled location. For maize ancestry (top), only a genomewide π is estimated, using all regions with high-confidence homozygous maize ancestry. For mexicana ancestry (bottom), π is calculated and plotted separately for three subsets of the genome: introgression peaks (> 2 s.d. above the mean) found in the focal maize population only, introgression peaks shared between the focal maize and at least 3 other maize populations, and a genomewide estimate. Population tree Phylogenetic tree assumed when estimating the ratio of f 4 statistics. The pink branch represents the shared drift between maize and parviglumis that is introduced to the focal sympatric population via admixture of proportion 1 − α. We used only plants from the Amecameca site in our mexicana reference group for this analysis because that site showed no evidence of previous admixture. Only for sympatric maize do we find that larger sets of populations (brighter colored bars) commonly share peaks across the genome. Populations are ordered from high to low elevation (top to bottom), showing that introgression peak sharing is more common among combinations of the highest elevation maize populations.      [62] 0.2cM linkage map on reference genome v2 were successfully transferred to reference genome v4, using Assembly Converter (ensembl.gramene.org). We dropped markers automatically that mapped to the wrong chromosome or in reverse map order (presumably due to small contigs having corrected orientations in the newer version of the reference genome). A small number of markers were dropped by hand for more complex out-of-order mapping errors. (B) With few exceptions, the number of markers dropped in a row is small. (C) Visualization of the full linkage map on v4 of the reference genome. Included markers are in blue while excluded markers are highlighted in orange and red. (D) Zoomed in view of all 6 regions in the genome where out-of-order markers did not form simple reversals. For these more complex regions we identified markers to drop by hand (in red) to reach a monotonically increasing map solution.