Does habitat stability structure intraspecific genetic diversity? It’s complicated...

Regional phylogeographic studies have long been conducted in the southeastern United States for a variety of species. With some exceptions, many of these studies focus on single species or single clades of organisms, and those considering multiple species tend to focus on deep historical breaks causing differentiation. However, in many species more recent factors may be influencing genetic diversity. To understand the roles of historic and contemporary processes in structuring genetic diversity, we reanalyzed existing genetic data from Southeast of North America using approaches gleaned from phylogeographic and landscape genetic literature that were implemented across species including AMOVAs, PCoAs, Species Distribution Modelling, and tests of isolation by distance, environment, and habitat instability. Genetic variance was significantly partitioned by ecoregions, watersheds, and across phylogeographic breaks in the majority of species. Similarly, genetic variation was significantly associated with some combination of geographic or environmental distance or habitat instability in most species. Patterns of genetic variation were largely idiosyncratic across species. While habitat instability over time is significantly correlated with genetic diversity in some species, it appears generally less important than isolation by geographic or environmental distance. Our results suggest that many factors, both historical and contemporary, impact genetic diversity within a species, and more so, that these patterns aren’t always similar in closely related species. This supports the importance of species- specific factors and cautions against assumptions that closely related species will respond to historical and contemporary forces in similar ways.

• We used a variety of approaches to quantify factors that potentially influence genetic structure across multiple species in the Southeastern United States • Our analyses indicate that both contemporary and historical factors promote genetic divergence across multiple taxa • Machine learning analysis indicated that taxa are largely idiosyncratic in terms of which factors are most important, suggesting that specific factors regulate how particular taxa respond to external factors • Our results highlight the importance of considering both landscape and phylogeographic methodology when investigating genetic structure

Abstract
Regional phylogeographic studies have long been conducted in the southeastern United States for a variety of species. With some exceptions, many of these studies focus on single species or single clades of organisms, and those considering multiple species tend to focus on deep historical breaks causing differentiation. However, in many species more recent factors may be influencing genetic diversity. To understand the roles of historic and contemporary processes in structuring genetic diversity, we reanalyzed existing genetic data from Southeast of North America using approaches gleaned from phylogeographic and landscape genetic literature that were implemented across species including AMOVAs, PCoAs, Species Distribution Modelling, and tests of isolation by distance, environment, and habitat instability. Genetic variance was significantly partitioned by ecoregions, watersheds, and across phylogeographic breaks in the majority of species. Similarly, genetic variation was significantly associated with some combination of geographic or environmental distance or habitat instability in most species. Patterns of genetic variation were largely idiosyncratic across species. While habitat instability over time is significantly correlated with genetic diversity in some species, it appears generally less important than isolation by geographic or environmental distance. Our results suggest that many factors, both historical and contemporary, impact genetic diversity within a species, and more so, that these patterns aren't always similar in closely related species. This supports the importance of species-specific factors and cautions against assumptions that closely related species will respond to historical and contemporary forces in similar ways.

Introduction
Phylogeography aims to understand how climatic and geologic events have structured genetic diversity; while comparative phylogeography seeks this understanding in multiple species. Traditionally, this discipline interprets similar patterns (e.g., genealogies, STRUCTURE plots, etc.) across species as evidence for a shared species response to climate and/or geologic events (Avise 1987, Sullivan et al. 2000. One region that has served as the setting for many early phylogeographic studies (e.g., Avise 2000;Soltis 2006) and has continued to capture the interest of researchers (e.g., Satler and Carstens 2016; Barrow et al. 2017) is the Southeastern United States. The geologic and climatic history of this region is complex, involving diverse processes acting over millions of years, including mountain orogeny, changing river basins, and climate cycles, and species may have been affected in diverse ways by these factors. Soltis et al. (2006) reviewed phylogeographic patterns from the Southeastern US and suggested that these patterns could be categorized into three categories: populations structured by 1) river basins, 2) mountain ranges, and 3) the locations of glacial refugia. However, these categories are not independent or exclusive; for example, species with population structure across a given river or the Appalachian Mountains may have been isolated across these barriers in separate refugia during Pleistocene glaciation.
Studies of congruence in phylogeography have often aimed to identify identical breakpoints and population structure (e.g. Avise 2000, Satler andCarstens 2016), a task that is complicated by differences in species-specific attributes, such as dispersal ability and ecological niche (Papadopoulou and Knowles 2016). Another strategy for identifying similarity in pattern across species is to assess the relative importance of different factors in structuring genetic diversity. This approach is popular in landscape genetics (Rissler 2016), where many studies aim to evaluate the relative effects of isolation-by-distance (IBD; Wright 1943) and isolation-by-environment (IBE). By considering habitat stability in such an approach we can potentially disentangle the relative contribution of both contemporary and historical factors structuring genetic diversity (e.g., Vansconcellos et al. 2019).
Comparative phylogeographic investigations have long used climate modeling to identify regions of stable habitat that putatively served as climate refugia (e.g., Hugall et al. 2002, Carstens & Richards 2007, Carnaval et al. 2009). The general interpretation of such results is that similarity in genetic structure across species is a product of species persistence during periods of climatic change in these regions of stable habitat. The relationship between habitat and genetic diversity is a feature of landscape genetic investigations (e.g., Orsini et al. 2013), particularly in combination with species distribution modeling of habitat suitability (e.g., Geber 2011). Therefore, analytical methods that explicitly consider the suitability of the environment in the context of the genetic structure of species enable integration between phylogeography and landscape genetics (Rissler 2016). One promising analysis was proposed by Vasconcellos et al. (2019), who investigated the effects of climatic shifts on the genetic structure of a savanna frog from Brazil. Their strategy used an evaluation of the relative contributions of isolation by distance and isolation by habitat suitability modeled over evolutionary time (habitat instability). Vasconcellos et al. (2019) argue that the second measure, their 'isolation by instability (IBI)' model, is a special case of the isolation by resistance model developed by McRae (2006). Like McRae's model, which posits that genetic diversity is structured by the separation of individuals using corridors of suitable habitat rather than a straight line over geographic space, Vasconcellos et al.'s model predicts that genetic diversity is structured by the connectedness of individuals over evolutionary time. Use of a multiple matrix regression with randomization (MMRR) approach (Wang 2013) allowed Vasconcellos et al. to evaluate the relative contribution of habitat instability in comparison to other factors.
While habitat instability is one important component of historical effects on genetic diversity, it is not the only factor to consider. In addition to the effects of Pleistocene glacial cycles, the effects of river formation and mountain orogeny are evident in phylogeographic datasets from eastern North America (Soltis 2000, Avise 2006, Satler and Carstens 2016. To evaluate the effects of these and other phylogeographic breaks on structuring genetic diversity, we conduct analyses of molecular variance (AMOVAs; Excoffier 1992) and habitat stability in order to evaluate whether genetic diversity is significantly structured across river basins, ecoregions, or proposed phylogeographic breaks. In combination, these approaches allow us to evaluate the relative contributions of contemporary and various historic factors across species from eastern North America. Because they provide easily comparable metrics across species, the degree to which species have similar phylogeographic patterns and responses to the factors that lead to incongruence across species can be assessed. Here, we use repurposed data collected from a variety of sources to test the factors influencing genetic variation in a comparative framework in the Southeastern United States.

Data Processing
Fifty-seven species were identified from published phylogeographic studies in the Southeastern United States, with most of the taxa included by Soltis et al. (2006) supplemented by data from several more recent investigations (Table 1). Overall guidelines for processing are presented in Fig. S1. To start, genetic data were downloaded from GenBank for each species. Genetic data were then aligned using CLUSTAL (Thompson et al. 1994) before selecting appropriate models of sequence evolution using the Akaike Information Criterion implemented in jModeltest ver 2.1.10 (Darriba et al. 2012, Guindon andGascuel 2003) and/or MrModeltest ver 2.4 (Nylander 2004). Models implemented (Table S1) included K80 (Kimura  (Tamura and Nei 1993), with gamma correction for rate variation in some species. For species with multiple genes, we chose the model that applied to the greatest number of base pairs and concatenated all cases of multiple genes. FASTA formatted data alignments were used to analyze sequence diversity and polymorphism using DnaSP ver 6 software (Rozas et al. 2017). Prior to calculating genetic distance matrices, we removed outgroups and dealt with missing data as follows. For 44 / 57 species, we deleted sites with at least one missing base across all sequences. Because this procedure deleted most sites for 13 species with many missing data, for these species we instead performed pairwise deletion of missing bases, and, for 10 of these, also replaced remaining undefined pairwise distances with the mean distance across all other sequence comparisons for that site. Adopting procedures for addressing missing data (Jombart 2008, Jombart and Ahmed 2011) allowed us to define comparable distance matrices for all species. Geographic locality information was collected from source publications where possible. In cases where specific coordinates were not provided, we used sample accession numbers to check GenBank for sample-associated metadata. If specific coordinates could not be determined from either source, we used GEOLocate, an online platform for georeferencing data, to generate coordinates based on the most specific locality available (Rios and Bart 2010). Once the data was complied, all coordinates, including those from original sources, were double-checked using GEOLocate and then visually inspected using Google Earth. Generation length and substitution rate were obtained for each species from the literature (Supplementary Table S2).

Nucleotide Mapping
In order to develop a qualitative understanding of how nucleotide diversity is distributed across the landscape, we created a heat map of nucleotide diversity across all species. To calculate nucleotide diversity, we divided North America into a 1 o grid and, using geographic coordinates for each species, assigned individuals to a cell in this grid. As nucleotide diversity requires multiple individuals for comparison, grid cells with fewer than two individuals of a species were removed. In order to avoid sampling bias, species with fewer than three occupied cells were also removed from subsequent analyses. Nucleotide diversity (π) was calculated within each grid cell using the R package "pegas" (Emmanuel et al. 2018). Given the challenges associated with using different molecular markers, we normalized each individual raster to ensure that species richness did not exert an undue influence on nucleotide diversity (π) as follows: Subsequently, GIS layers for each species were summed to create a final heat map that included all species to indicate the overall level of nucleotide diversity across the region. While this conservative approach provides a simplistic impression of nucleotide diversity, it prevents overinterpretation that may result from differences in species richness or the use of genetic markers across species that differ in their rates of evolution and as such may exhibit different levels of haplotype diversity. In addition, Moran's I was calculated using the R package "ape" (Paradis et al. 2004) to assess autocorrelation under the assumption that significant positive autocorrelation in nucleotide diversity may be indicative of shared refugia across species.

Analysis of molecular variation
We used analysis of molecular variance (AMOVA; Excoffier et al. 1992) to compare genetic variation across geographic features thought to structure populations. For each georeferenced species, we matched GPS data for individuals to geographic features delineated by shapefiles (sp::over, Pebesma and Bivand 2005). We used tools in ArcMap 10.3 (ESRI 2014) to create shapefiles from preexisting spatial data for two current landscape features, Ecoregion (CECWG 1997;Spalding et al. 2007;USEPA 2012;AAFC 2017) and Watershed (Spalding et al. 2007;NRC, USGS, and INEG 2010a), and four phylogeographic breaks described in Soltis et al. (2006): the Atlantic-Gulf Coast discontinuity (Spalding et al. 2007), the Apalachicola River-Tombigbee River discontinuity (which separates rivers flowing into the Gulf of Mexico from those flowing into the Atlantic Ocean; Spalding et al. 2007;NRC, USGS, and INEG 2010a), the Appalachian Mountains (UNEP-WCMC 2002), and the Mississippi River (NRC, USGS, and INEG 2010b). We implemented AMOVA in Arlequin using arlecore with 1000 permutations to test for significance (Excoffier et al. 2005). Outgroups and species lacking geographic data were omitted from these analyses.

Population assignment
Intraspecific genetic variation is often structured by populations, and the characterization of population genetic structure was prominently featured by many of the original publications that described the data we use here ( Table 1). This type of information is also often necessary for other types of analysis, as was the case here. To facilitate comparison across species, we inferred the number of genetic populations using an assignment protocol implemented in the adegenet R package (Jombart 2008, Jombart and Ahmed 2011). This protocol utilizes principal components (PCs) to calculate successive K-means (adegenet::find.clusters, Jombart and Collins 2015) which are then used to infer the number of population groups. We modified this method such that it utilized principal coordinates analysis (PCoA, adegenet::dudi. pco) to analyze a distance matrix calculated according to a model of sequence evolution. Note that species with < 4 individuals (Emerita talpoida, Lampsilis ovata, Lampsilis teres) were not evaluated using this algorithm, but were assigned a value of K = 1 for other analyses.
We explored how well population assignments explained the genetic variation by performing a permutational multivariate analysis of variance (PERMANOVA, adonis::vegan, Oksanen et al. 2018). The dispersion of clusters relative to each other (vegan::betadisper) was also assessed. Here, a significant beta-dispersion value indicates violation of PERMANOVA's assumption of homogeneity of variance (Anderson 2006).

Species Distribution Modelling
Species distribution modelling (SDM) was used to characterize habitat suitability in the present by associating species locality data with maps of environmental variables (limited to climate and ocean depth), and to estimate habitat stability over time. We downloaded the GBIF Records in November 2018 using the R package "dismo"  and retained records that passed the following filters: preserved specimen, living specimen, material sample from 1900 to present, present in the United States and Canada. We combined the downloaded GBIF records with the specimen localities associated with the genetic data to create a complete presence record dataset. These were reduced to one occurrence per grid cell within the boundary of [-130, -60, 23, 50] at 5 arcmin resolution using the R package "raster" (Hijmans et al. 2019); species that had less than 25 occurrences were removed from the analysis following results from van LGM conditions (all models) for the marine species. All bioclimatic variables were downloaded at a resolution of 5 arcmin and trimmed to the same constraints as the occurrence records using the R packages "raster", "maptools" (Bivand et al. 2019a), "rgdal" (Bivand et al. 2019b), and "rgeos" (Bivand et al. 2019c). We performed a correlation analysis on the raster layers under current conditions using the "layerStats" function and subsequently removed highly correlated variables (r > 0.8; Mateo et al. 2013).
SDM analyses were conducted in R using the packages "randomForest" (Liaw and Wiener 2018), "raster", "rgeos", "maptools", "dismo", "sp" (Pevesma et al. 2018), "ecospat" (Cola et al. 2017), and "rJava" (Urbanek 2019). Pseudo absence points were drawn randomly from the background layers, and we partitioned the model into testing and training data using the "kfold" function. Distributional models were estimated for each species using the present environmental layers and cast onto the last glacial maximum layers using the MaxEnt, Random Forest, and GLM approaches . In order to combine model results for an ensemble model, we evaluated the AUC of each present-day model to weight the contribution of each to the final output. We calculated a separate binary output using a threshold determined using 'ecospat.mpa' (Cola et al. 2017), resulting in a total of six raster layers for each species: present-day continuous and binary rasters, mid-Holocene continuous and binary rasters, and last glacial maximum continuous and binary rasters.
As different species have different life history traits, species were combined into functional groups (Reptiles, Amphibians, Plants, Mammals, Freshwater, and Marine) in order to build cumulative maps of suitable habitat for the present day and the LGM using raster addition. Additionally, we determined potential refugia for each species through the identification of the areas of overlap between the present day and LGM binary models for each species. Using the functional groups defined as above, we took the overlapped species models and compared areas of interspecific overlap within each major grouping to determine areas of multi-species refugia, and we added these maps together for cumulative Terrestrial, Freshwater, and Marine groupings.

Multiple Matrix Regression with Randomization
Three factors are often highlighted in phylogenetic and landscape genomic studies: Isolation by Distance (IBD), Isolation by Environment (IBE), and Isolation by Instability (IBI). To examine the impact of each force as independent predictors on genetic distance (calculated above in the AMOVA), we used multiple matrix regression with randomization (MMRR; Wang 2013). We performed the MMRR as outlined in Legendre et al. (1994) and implemented by Wang (2013) using the MRM command in the 'ecodist' R package with 1000 permutations (Goslee and Urban 2007). One advantage of MMRR is the ability to account for small amounts of covariation (less than 0.5) in the data through inclusion of randomization (Wang 2013). For use with the MRM command, each independent factor (IBD, IBE, and IBI) was transformed to distance matrices using the 'dist' function present in the stats R package. Values for IBD were determined from geographic coordinates collected with the genetic data, IBE and IBI values were extracted at sampling geographic points from species distribution models in the present, and stability maps, respectively. To generate stability maps we averaged the species distribution models from the present, Mid-Holocene, and LGM for each species. Using the averaged model gives us the relative stability of the environment over this time period, and using the 'extract' function, we sampled this stability surface at each sampling point.
The rational for each metrics inclusion are as follows: IBD is the isolation by Euclidean distance between two geographic points (Wright 1943); two individuals that are closer together should be more similar genetically than those that are not as close, as there is a greater chance of gene flow between two close individuals. IBE implies that large differences in the environment prevent movement across the landscape and therefore gene flow between individuals (Nosil 2004). Under IBI, movement across the landscape is affected by the relative stability of different regions. Where there is high instability, movement is restricted, preventing gene flow between individuals separated by unsuitable areas (Vasconcellos et al. 2019). Due to the possibility of strong correlation (greater than 0.5) between some predictors, correlations were checked for all variables. Those that were found to be significant using MMRR and were strongly correlated were checked using partial mantel tests to more accurately determine which predictors were important. MMRR also gives a value for the correlation coefficient (R 2 ) for each species. This value gives the amount of genetic variation explained by IBD, IBE, and IBI.

Random Forest
To evaluate whether some factors were predictive of species-specific responses to geographic and environmental factors across species, we constructed a Random Forest (RF) classifier. RF is a machine learning approach that uses multiple decision trees to predict the response based on many predictor variables (Liaw andWiener 2002, Biau 2012). Random forest samples the data for each decision tree with replacement and uses the unsampled data to test the model. This is used to build a confusion matrix for the prediction and calculate the out of bag error rates (OOB). Each variable's importance is determined by measuring the mean decrease in accuracy of the prediction after the removal of a given variable.
We designed different models that attempted to predict whether or not a species would have significant genetic structuring based on the categories used in the AMOVA and MMRR tests (Appalachian, Apalachicola, Coastal splits and Mississippi phylogeographic breaks; Ecoregion or Watershed; IBD, IBE, and IBI) as response variables. For predictor variables, we included organismal traits of body size, aquatic vs terrestrial, and generation length, and we included broad taxonomic groups ( Table 1) in an attempt to identify results being driven by phylogeny. Additionally, we included the mean climatic variables for each species across their sampling range. Finally, we included the proportion of North America determined via SDM analysis to be suitable for both current and LGM timeframes, and the change in proportional area between LGM and the present.

Data processing
The number of sequences per species ranged from three to 615 and the length ranged from 255 to 2395 base pairs (Table S2). The best fitted models of sequence evolution identified for each sequence alignment in either jModeltest or MrModeltest as well as the diversity and polymorphism calculations (i.e., Tajima D, Pi, θ) are recorded in Table S2 for lowest AIC by jmodeltest or mrmodeltest, and a narrowed list in Table S1 for analyses that couldn't use all possible models of substitution. Geographic data for individuals with associated genetic data showed widely distributed sampling across the eastern US, with the highest density in the south east (Fig. 1), suggesting our dataset has coverage adequate for testing our hypotheses. This was regardless of sampling method (exact location vs estimated location), although there is a higher density of estimated localities in northern reaches. There is also no apparent effect of sampling based on taxon type (Fig. 1). While angiosperms clearly had the widest range, the data do not suggest that taxa are unevenly sampled across regions.

Nucleotide Mapping
Nucleotide diversity for the taxa analyzed in this study is heterogeneously distributed across southeastern North America. While some regions contain high levels of nucleotide diversity, there is no evidence for a geographical trend in our final diversity map (Fig. 2). Moran's I was found to be close to 0 (0.037, p=0.32), indicating little positive spatial autocorrelation. This indicates that grid cells that are near one another in geographic space are not any more similar in their nucleotide diversity than would be expected by chance. An out of refugia hypothesis would predict that areas of refugia would have higher nucleotide diversity. We also compared this to rasters based on predictions of trends (glacial refugia, null and random), and none were found to be significant.

AMOVA
We used analysis of molecular variance to explore the influence of ecological regions and putative phylogeographic breaks on intraspecific diversity. Samples were partitioned according to regions (i.e., Ecoregion, Watershed) or by putative phylogeographic breaks (i.e., Coast, Appalachian Mountains, Apalachicola, Mississippi River). For example, if population genetic structuring results from historical breaks, we expect significant differentiation across phylogeographic breaks, while if genetic structure results from contemporary ecological barriers, we expect that ecoregions would be an important promotor of genetic variance. Ecoregion AMOVA analyses indicate that 15 (37%) species had significant proportion of the observed variation occurring within groups and that 4 (10%) species had significant proportion of the observed variation occurring among groups ( Fig. 3; Table S3). The results of our watershed AMOVA analyses for groups assigned by watersheds indicate that 14 (34%) species had significant proportion of the variation within group and three species (7%) had significant proportion of the observed variation occurring among groups (Table S4). An AMOVA that tested the impact of phylogeographic breaks on genetic structure was conducted in 4 analyses that were named after the putative physical barrier: Coast, Appalachian Mountains, Apalachicola, and Mississippi River. For our Coast AMOVA analyses for groups either along the coast of the Gulf of Mexico or the coast of the Atlantic Ocean indicate that 10 species (34%) had significant portion of the variation within groups and 7 species (17%) had significant proportion of the observed variation among groups ( Table S5). Results of our Appalachian AMOVA analyses for groups either on the west or east side of the Appalachian Mountains indicate that 7 species (17%) had significant portion of the variation within groups and one species (2%) had significant proportion of the observed variation among groups ( Table S6). The AMOVA analyses for groups either on the west or east side of the Apalachicola River indicate that 10 species (24%) had significant portion of the variation within groups and four species (10%) had significant proportion of the observed variation among groups (Table S7). Lastly, the Mississippi River AMOVA split show that six species (15%) had significant portion of the variation within groups and five species (12%) had significant proportion of the observed variation among groups ( Table S8). Results of all AMOVA analyses are summarized in Table 2.

Population assignment
The K-means procedure typically produced population assignments that agreed with visual examinations of BIC vs. K plots (Fig. S2). For seven species, we chose a K that differed from that identified via the automated  . Pie charts summarizing AMOVA results with a variety of different potential breaks/factors. Grey indicates species that could not be tested due to not being in multiple regions or on both sides of a break (NA). Orange are those species that were tested but were found to not be significant (NS). Species that were significant among breaks are in yellow. Blue gives species that were found to be significant within a break or region. procedure. The majority of species were assigned to two or three populations ( Table 3). Results of PERMANOVA suggested population assignments were efficacious. For 42 of the 53 species that were split into populations (K > 1), we found that population assignments explained significant portions of the genetic variation (p < 0.05, Table 3). Those with non-significant population assignments had either low sample sizes or low overall genetic variation. Fifteen species had significant PERMANOVA results but had beta-dispersion values ( Table 3) that suggested assigned groupings were not significantly differentiated. About half (27/53) of the species split into populations which had significant assignments that were not over-dispersed.

Species Distribution Modelling
We retained 31 species for species distribution modelling after removing those species with <25 unique occurrence records (Table S9). Output maps for each of these species display habitat suitability indices for   Table 3. Summary of population assignment results and assessment of population assignment utility. Shown for each species are the number of individuals, the model of sequence evolution, the final 'K' chosen, the P-value from the PERMANOVA, and the mean population membership probability. For each species, populations were assigned by calculating a distance matrix based on an appropriate model of sequence evolution, analyzing this matrix with PCoA, using the principal components to calculate successive K-means, and choosing an optimum K based on BIC vs. K plots (Fig. S1). Efficacy of population assignments were assessed via PERMANOVA and DAPC. Complete records found in Table S11. the models based on these data (Individual maps in Fig. S4). The output maps displaying the cumulative stable habitat (including combinations of major functional groups of Amphibians, Reptiles, Terrestrial, Terrestrial & Freshwater), that existed in both the current day and LGM predictions are in (Fig. 4)

Multiple Matrix Regression with Randomization
The MMRR required both a genetic distance matrix (requiring more than 3 individuals that exhibit genetic differences) and a species distribution model, generated previously. Using these criteria, we were able to include 28 species in this analysis. The multiple matrix regression highlighted the differences between species, with varying combinations of our independent factors (IBD, IBE, and IBI) being found significant (Fig. 5). Additionally, those variables that were correlated (>0.5) and were both significant, were checked using partial mantel tests. This analysis was able to explain between 3% and 86% of the variation in genetic distances using varying combinations of IBD, IBE and IBI, averaging 27% and a median of 14% variation explained. In short, IBD was found to be a significant factor in 71% of species, IBE in 29%, and IBI in 32% of species. Interestingly, 14% of species' genetic diversity was not significantly influenced by any of these factors. There were no apparent trends based on taxonomy or other characteristics of species in terms of which factors explained genetic variation in a given species.

Random Forest
Across all tests the random forest analysis was generally not effective at building a predictive model that could correctly assign species to the predetermined categories (Table S11), a result that may result from the small number of species present in our data matrix (Table S12). The use of the machine learning framework provides a check against overinterpretation of similar results in particular species.

Discussion
Data repurposing (Sidlauskas et al. 2010) provides a more comprehensive approach than meta-analysis for comparative phylogeography because data collected from species that share a geographic distribution can be analyzed and interpreted in a common framework. While the species examined here have all been investigated previously (e.g., Soltis 2000, Avise 2006, the specific contexts of these studies differ, which makes it difficult to evaluate common factors. Our results support the viewpoint that many forces combine to influence genetic diversity within species (e.g., Zhang et al. 2018). While this result is not unexpected, it is striking to observe in a comparative context across disparate taxa. Three factors (geographic separation, habitat stability, and barriers to dispersal) largely explain gross patterns of intraspecific genetic variation in the species analyzed here.
The first factor that contributes to intraspecific genetic structure is the physical separation of individuals. Genetic isolation by geographic distance (e.g., Wright, 1943) is statistically significant in all but five cases (82%; Fig. 5), and these species generally contain fewer samples, which may lead to false negatives and is consistent with previously reported biases (Meirmans 2012, Pelletier andCarstens 2018). We identified contemporary climatic conditions as a significant factor in 8/28 (29%) species exhibited significant IBE (Fig. 5), a result that is broadly consistent with the AMOVAs that were conducted with species partitioned by 'ecoregions' (19/35 species, a higher proportion Figure 5. Results of multiple matrix regression across species. Shown for each species are the p-values and coefficients for three tests (IBD, IBE, IBI) with significant results with an *. Full detailed numbers can be found in Table S11. than were significantly partitioned by 'watershed'; Table 2). Within many species, genetic diversity is correlated with differences in the current environment across the range of the species, as assumed by many landscape genetic investigations (Sexton et al. 2013).
Following previous work, we also explored the impact of the phylogeographic breaks that are expected to structure genetic diversity over deeper time scales. In a slight majority of species (18/35; 51%), genetic variance was significantly partitioned across putative biogeographic barriers in our AMOVA results ( Table 2). Since these barriers were derived from the literature on species included in this comparison set, this result is circular for some species (see Soltis et al. 2006 for these species). However, many of these same species also exhibited isolation by habitat instability, which also speaks to the relative importance of historical processes in structuring genetic diversity. Unlike Vasconcellos et al. (2019), we found that habitat instability over time does not appear to explain how intraspecific genetic diversity is distributed across the landscape. While 9/28 species exhibited significant isolation by instability, in nearly all of these cases, other factors (i.e., IBD or IBE) were also found to be significant. The nucleotide diversity map (Fig.2) supports this idea. If habitat stability were an important predictor, we would see an "out of refugia" pattern due to those areas being the most stable over long periods of time. Although the combined regression models explained a large (>50%) proportion of the variation in several species, only one of these (Xerula furfuracea) included IBI. Overall, this suggests that IBI is not an important contributor to population genetic structure, at least in this region. However, it may play a role in other regions that were more directly impacted by glaciation.
In most cases, either geographic distance or physical barriers were important in determining genetic differentiation; this is supported by both broad-scale phylogeographic and fine-scale landscape approaches. Despite the near-ubiquitous importance of physical barriers and geographic distance, the influence of environmental factors in the process of genetic differentiation should not be ignored. On the contrary, our results implicate a complex mixture of processes related to geographic distance, environment, and instability, which are responsible for affecting within-species genetic differentiation. A more concise view of these complexities may be attained by observing species that were in both the AMOVA and the MMRR analyses (see Table S13). Each species with a significant inter-population AMOVA result (7 species; Table S13) also had significant MMRR results. In only one of these cases was IBD the only significant MMRR result (Apalone mutica), supporting the role of processes other than geographic distance or geographic barriers in structuring genetic diversity. In several cases (11; Table S14), the MMRR analyses found support for an environmental (IBE), distance-related (IBD), and/or instability-related (IBI) factor without support for a phylogeographic barrier hypothesis. Clearly, geographic distance/phylogeographic breaks are important facets to intraspecific genetic differentiation, but an overreliance on these types of hypotheses would ignore other variables explaining this differentiation, and in turn could prevent the recognition of patterns that may exist between taxonomic groups of interest. These results highlight the need for the integration of both phylogeographic and landscape genetic approaches in investigations that seek to discover processes affecting genetic differentiation within species. They also suggest that sampling bias is a potential limitation. For example, data from some of the species used here were able to be included in only some analyses due to low sampling numbers in some regions.
The mixed results presented here offer the possibility for hypotheses that could be tested in future studies in this region. In closely related species, we could observe differences between both time scales influencing genetic structuring but also the factors present in the same time scale. To us, this indicates that we cannot simply apply understanding from one species and expect it to hold for another species, even a closely related one. Evolution is a complex process and it can be difficult to measure every feature that may be under selection, leading to genetic structuring, especially for non-model systems. For example, two species of frogs may be evolutionarily similar, exist in the same time and space, but have experienced vastly different evolutionary pressures leading to variations in the factors influencing their genetic structure. One of the two species may be slightly more impacted by environmental pressures that are hard to detect, so it may be affected by IBE where the other species may not be, but these types of relationships are hard to detect when only using one type of analysis.
Our results suggest that both broad historical and fine scale contemporary processes are likely to influence genetic diversity within a given species. Rissler (2016) suggested that landscape genetics and phylogeography were headed towards a conceptual unification due to advances in sequencing technology and analytical approaches. As observed here, a more complete understanding of the forces that influence genetic diversity might be gained through a conceptual unification of these two methods, particularly if data from a given species is analyzed using approaches drawn from each. Our study highlights the multiple processes that likely influence genetic structure in a given species and suggests that any investigation conducted from a single perspective (i.e., historical or contemporary; phylogeographic or landscape genetic) is likely to provide an incomplete understanding of the focal system.

Data Accessibility
All code used is freely available for download at https://github.com/jgwieringa/IBI_and_others and all sequencing data can be collected from the source manuscripts listed in Table 1. by the OSU University Fellowship. MRB is supported by National Science Foundation Graduate Research Fellowships Program #DGE-1343012. Some analyses were conducted using computational resources from the Ohio Supercomputer Center (PAA0202).

Supplementary Materials
The following materials are available as part of the online article from https://escholarship.org/uc/fb Table S1. Population assignment analyses using four methods. Table S2. Data collection for all species and estimated parameter values. Table S3. AMOVA results for ecoregion analyses. Table S4. AMOVA results for watershed analyses. Table S5. AMOVA results for coastal analyses. Table S6. AMOVA results for Appalachian analyses. Table S7. AMOVA results for Apalachicola analyses. Table S8. AMOVA results for Mississippi analyses. Table S9. Georeferenced localities for all individuals used in manuscript. Table S10. AUC evaluation for each species from the SDM analyses. Table S11. Detailed results from the MMR analysis. Table S12. Detailed results from the Random Forest analysis. Table S13. Data matrix used in the Random Forest analysis. Table S14. Summary of results from MMR and AMOVA. Figure S1. Graphical summary of data analysis strategy. Figure S2. BIC vs K plots for all species giving the recommended number of K for each species. Figure S3. PCOA visualization for population groupings. Figure S4. Species distribution models for all species in present and last glacial maximum climates.
Appendix S1. Supplementary bibliography. Appendix S2. Analysis pipeline scripts and associated readme files.