Frontiers of Biogeography Do genetically informed distribution models improve range predictions in past climates? A case study with balsam poplar

Species distribution models (SDMs) are one of the most widely used approaches to predict changes in habitat suitability in response to climate change. However, as typically implemented, SDMs treat species as genetically uniform throughout their ranges and thereby ignore potentially important genetic differences between populations. While numerous studies have used SDMs to model genetically based subgroupings within species, the ability of such models to be transferred to new times has rarely been evaluated. Here, we used standard and genetically informed distribution models (gSDMs) to predict the future and past range of balsam poplar ( Populus balsamifera L.). We then assessed model transferability of standard SDMs and gSDMs using balsam poplar fossil pollen and macrofossil occurrences. In general, standard and gSDMs performed similarly through time, with both predicting a northward expanding range from refugia as glaciers receded over the past 22 ky BP and declining suitable area in future climates. Both standard and gSDMs showed moderate abilities to distinguish balsam poplar fossils from pseudo-absences but tended to predict lower suitability at fossil sites during the Pleistocene-Holocene transition. Although gSDMs applied to balsam poplar did not prove more transferable than standard SDMs, they provided numerous unique insights, such as the change in suitable area of genetic clusters through time and potential refugial locations. We argue more research is needed to determine which species may benefit most from the gSDM approach and to test gSDMs with temporally or spatially independent occurrences, as is often recommended for standard SDMs.

• We found little difference in the performance of genetically informed and standard distribution models, but genetically informed models offer insights not available from standard models, including the past and future trajectories of different genetic clusters in response to climate change.
• While more work is needed to determine which species benefit most from integrating genetic information to predict their future and past distributions, treating species as being comprised of genetically uniform populations ignores potentially relevant biological variation which may obscure important range dynamics in future climates.

Abstract
Species distribution models (SDMs) are one of the most widely used approaches to predict changes in habitat suitability in response to climate change. However, as typically implemented, SDMs treat species as genetically uniform throughout their ranges and thereby ignore potentially important genetic differences between populations. While numerous studies have used SDMs to model genetically based subgroupings within species, the ability of such models to be transferred to new times has rarely been evaluated. Here, we used standard and genetically informed distribution models (gSDMs) to predict the future and past range of balsam poplar (Populus balsamifera L.). We then assessed model transferability of standard SDMs and gSDMs using balsam poplar fossil pollen and macrofossil occurrences. In general, standard and gSDMs performed similarly through time, with both predicting a northward expanding range from refugia as glaciers receded over the past 22 ky BP and declining suitable area in future climates. Both standard and gSDMs showed moderate abilities to distinguish balsam poplar fossils from pseudo-absences but tended to predict lower suitability at fossil sites during the Pleistocene-Holocene transition. Although gSDMs applied to balsam poplar did not prove more transferable than standard SDMs, they provided numerous unique insights, such as the change in suitable area of genetic clusters through time and potential refugial locations.
We argue more research is needed to determine which species may benefit most from the gSDM approach and to test gSDMs with temporally or spatially independent occurrences, as is often recommended for standard SDMs.

Introduction
Geographic shifts in species ranges are a commonly predicted biotic response to climate change (Chen et al., 2011;Fei et al., 2017;Hickling et al., 2006;Parmesan, 2006;Scheffers et al., 2016) and are expected to alter ecosystems at multiple levels of biological organization (Montoya & Raffaelli, 2010;Schweiger et al., 2008;Walther, 2010). At the species-level, shifting geographic ranges may result in loss of range area, and extirpation of populations especially at the trailing range edge. In turn, loss of edge populations could reduce genetic diversity and adaptive capacity within species, with these losses being most pronounced if trailing edge populations are adapted to marginal environmental conditions within the range and/or harbor unique alleles not found elsewhere in the range (Alsos et al., 2012;Hampe & Petit, 2005). Accurate predictions of species range shifts, hence, are needed to understand where populations may be most at risk of extirpation and the associated potential loss of genetic diversity.
Species distribution models (SDMs) are among the most common approaches to predict species range shifts. SDMs, however, make multiple simplifying assumptions that can affect their performance and biological realism (Elith & Leathwick, 2009;Franklin, 2010;Wiens et al., 2009). For instance, SDMs typically implicitly assume genetic uniformity throughout the range (Fitzpatrick & Keller, 2015;Gotelli & Stanton-Geddes, 2015). The assumption of genetic uniformity can be problematic when species are structured into genetic clusters, encompass multiple lineages/ subspecies, or when populations are locally adapted to climate -all characteristics of many plant species (Leimu & Fischer, 2008;Savolainen et al., 2013).
The primary approach to accommodating genetic population structure in SDMs is by fitting individual models for each biologically relevant (e.g., genetic, morphological) subunit within a species range, and combining the individual model predictions into a single composite prediction for the entire species (i.e., genetically informed SDMs or gSDMs). The gSDM approach can be particularly beneficial when intraspecific groups (e.g., genetic clusters, subspecies, etc.) are differentiated along environmental gradients and may better fulfill the assumption of genetic uniformity when fitting models. While the gSDM approach has shown promising results when compared to models fit to the species as a whole (e.g., Ikeda et al., 2017;Marcer et al., 2016;Oney et al., 2013;Pearman et al., 2010), it is often unclear if gSDMs are better able to predict responses to climate change, compared to standard SDMs (Peterson et al., 2019). Assessing how well gSDMs can be transferred to time periods not used in model training is especially important given that a major goal of gSDMs studies is to predict species range shifts in response to future climates.
While direct evaluation of SDM-based forecasts decades into the future may not be possible, past climate simulations in concert with fossil data offer opportunities to test transferability of SDMs to new time periods (Maguire et al., 2016). If gSDM hindcasts have a greater ability to predict aspects of past distributions than standard SDMs, gSDMs may similarly be more reliable in future climates (though likely partially contingent on the divergence time of clusters being modelled, notably deep-time phylogenetic differences vs. more recent adaptation differences). While future climates may become more novel than past climates (compared to current climate), requiring greater model extrapolation (Fitzpatrick et al., 2018), pairing past climate simulations with fossil records offer one of the few ways to validate the temporal transferability of SDMs with independent data. Furthermore, projecting gSDMs to past climates could offer numerous insights not available from standard SDMs, such as where different genetic clusters may have originated on the landscape and potential migration routes used to fill the current range.
In this study, we used fossil pollen and macrofossil occurrence records to assess the ability of standard and gSDMs to predict changes in the distribution of balsam poplar (Populus balsamifera L.) since the Last Glacial Maximum (LGM). Because fossil records were not used during model calibration, they provide an independent test of the transferability of standard and gSDMs to new time periods. We also used the distribution models to estimate how the size of balsam poplar's range has changed since the LGM, and how it may change in the future. We show that while gSDMs did not enhance transferability relative to standard SDMs, they offered numerous insights not available from standard SDMs.

Genetic clusters
Several recent genetic studies have shown balsam poplar to be structured into three genetic clusters -one in the eastern part of the range, and a gradient of two clusters in the northern and central parts of the range (Keller et al., 2010;Meirmans et al., 2017). To create gSDMs, we used admixture proportions previously estimated from two different single-nucleotide polymorphism (SNP) datasets (Chhatre et al., 2019;Gougherty et al., 2020;Keller et al., 2010), which together covered 85 populations throughout the range of balsam poplar (Fig. S1). To ensure the SNP datasets were comparable, a minor allele frequency filter was applied to the more extensive SNP dataset (Chhatre et al., 2019) to match the sampling frequency of the less extensive dataset (Keller et al., 2010) (described in Gougherty et al., 2020) -which helped to ensure that potential differences in population structure between the two datasets were not due to differences in sampling density. While the number of SNPs and SNP loci were not the same between the two datasets, the admixture analyses on each dataset identified the same geographic pattern of the three genetic clusters, which were the units used to combine the sampling across datasets. Populations were assigned to the cluster with the highest average admixture coefficient for individuals within the population, resulting in 11, 62, and 12 populations in the northern, central, and eastern clusters respectively.

Balsam poplar occurrence records
Because each cluster had relatively few populations (i.e., fewer than needed to create robust distribution models), we followed methods used in other studies (e.g., Ikeda et al., 2017;Oney et al., 2013) and supplemented our field sampled population locations with occurrences from other sources, including the Global Biodiversity Information Facility (Gbif.org, 2019), forest inventory plots for the US and Canada (Gillis et al., 2005;Woudenberg et al., 2010), and other recent field sampling efforts (Chhatre et al., 2019;Elmore et al., 2017). These supplemental occurrences were assigned to the cluster of the nearest sampled population. Assigning genetic clusters to occurrences lacking genetic information introduces uncertainty in the distribution of the clusters, in particular near cluster margins where populations may be admixed. However, because clusters were spatially structured and admixture coefficients were spatially autocorrelated over hundreds of kilometers (see Results), this approach should provide a reasonable approximation of cluster affiliations. Because of strong spatial and climatic biases in the occurrences, we thinned occurrences in both geographic and environmental space. To do so, first, we selected one occurrence per climate grid cell (0.5 arc degree resolution, see below). Next, we extracted climate (variables described below) at the remaining occurrence locations and, to reduce the dimensionality of the data, conducted a principal components analysis on the climate variables. Finally, we plotted the scores for the first two components over a 0.2 resolution grid and we randomly selected one occurrence per grid cell. This process helped ensure that particular climates in balsam poplar's range were not overrepresented when fitting the models. Of the 4,173 original occurrence records, 475 remained after thinning. Of the 475 occurrences, 149 were assigned to the northern cluster, 274 to the central cluster, and 52 to the eastern cluster.
We used pollen and macrofossil records retrieved from the Neotoma paleoecological database (www. neotomadb.org; Goring et al., 2015, accessed March 2022 and the published literature (Mann et al., 2010) to evaluate models. Although pollen often cannot be identified below the genus-level, some researchers have used morphological differences in pollen of Populus species to identify pollen to balsam poplar (as in Brubaker et al., 1983Brubaker et al., , 2005Cwynar & Spear, 1991;Edwards et al., 1985 and various entries in the North American Pollen Database). We used only pollen identified as Populus balsamifera, but not Populus balsamifera-type or undifferentiated Populus pollen in our analyses. We assigned pollen and macrofossils to the nearest time period of the climate layers (±250 years; see below), based on their calibrated dates before present. In total, we included 291 site × climate layer fossil records in our analyses. It is important to note that the pollen record is incomplete, and sensitive to false absences (e.g., due to pollen not being preserved) and false presences (e.g., due to pollen being transported away from source populations), so precise pollen locations should be interpreted with caution.

Climate simulations
We used climate simulations from Lorenz et al. (2016) to hindcast and forecast our distribution models. The Lorenz et al. (2016) data include downscaled and debiased climate simulations for North America spanning from 22 ky BP to present in 500-year increments, and from present to 2100 in decadal increments, all at a resolution of 0.5 arc degrees. We parameterized the distribution models using five climate variables that lacked strong correlation (|r| < 0.75) with other variables: summer and winter mean temperature and precipitation, annual precipitation variability, and average evapotranspiration ratio (actual/potential evapotranspiration). We hindcasted distribution models to 22 ky BP in 500-year increments, and projected to four future periods (2030,2050,2070,2090) using simulations from twelve global circulation models (GCMs) and two emission scenarios (RCP 4.5 and 8.5).

SDM calibration and evaluation
We fit species distribution models for four entities: one each for the eastern, central, and northern clusters, and another for the species as a whole.
To provide a direct comparison to the species-wide model, we also combined the geographic predictions of the three individual cluster models into a single composite prediction (described below). We used the biomod2 package in R to create ensemble SDMs (Thuiller et al., 2009) composed of six algorithms, including generalized linear models, boosted regression trees, generalized additive models, flexible discriminant analysis, multiple adaptive regression splines, and random forest. We fit each algorithm with two replicates of five-fold cross evaluation (where 80% of data is used to train the models, and 20% is used to test the models, iterated 5 times, run twice) for a total of 60 models in each ensemble.
We created ensemble models separately for the northern, central, and eastern genetic clusters and for the species as whole. Because we lacked true absences for balsam poplar, 1,000 pseudo-absence points were selected from across North America, and were selected at least 2° from any occurrences used in model training (Barbet-Massin et al., 2012). Although the optimal number of pseudo-absences can vary by modelling algorithm, selecting 1,000 pseudo-absences helped streamline our model fitting and produced robust models (see Results). Pseudo-absences were selected independently for the species and eastern, central, and northern clusters in order to capture the individual climatic niches of each cluster and species as a whole. Pseudo-absences and presences were given equal weight in each of the models to ensure pseudo-absences did not have an outsized impact when fewer occurrences were used to train cluster models (compared to the species-wide model).
Because of a lack of consensus in the best approach to evaluate discriminatory ability of SDMs and limitations to any single statistic (e.g., sensitivity to prevalence and spatial extent, weighting of omission/commission errors, etc.; Leroy et al., 2018;Lobo et al., 2008), we used two commonly-used statistics to evaluate predictions of testing data: true skill statistic (TSS; Allouche et al., 2006) and area under the receiver operating characteristic curve (AUC-ROC). We used a committee averaging approach to create the ensemble predictions. To do so, we averaged binary predictions created using the threshold that maximized TSS across all models with TSS scores greater than 0.7. The resultant continuous map illustrates the proportion of models that predict presence of a genetic cluster or the species. To create binary maps of the ensemble predictions, we then applied a threshold of 0.5 to the committee averaged maps -illustrating areas where at least 50% of models predict presence of a genetic cluster or the species.
To compare the ability of standard SDMs and gSDMs to predict balsam poplar fossils in past climates, we first combined our individual cluster predictions into a single composite prediction. The composite prediction was calculated as the probability of at least one genetic cluster being present (Pearman et al., 2010), as given by: where P(x) is the probability of at least one genetic cluster being present, and P(x i ) is the probability of i th genetic cluster. Next, we extracted climatic suitability at fossil locations from both the composite predictions of the gSDM, and the standard SDM and from an equivalent number of pseudo-absence points from unglaciated parts of the landscape. Because most time periods had relatively few fossil records (Populus is often "palynologically silent"; Godwin, 1934;Pedersen et al., 2016), we calculated AUC and TSS for the fossils pooled over all time periods. We also assessed model extrapolation by quantifying the relationship between suitability and climatic novelty at fossil occurrences. This analysis informed whether suitability declined as fossil occurrences become more climatically distant from the climate used to train models. We quantified climate novelty using Mahalanobis distance, a scaleinvariant multidimensional distance metric, using the mahalanobis function in R (R Core Team, 2019). Mahalanobis distance was calculated from the average current North American climate to climate extracted at fossil sites. Distances were transformed to probabilities using the lower tail of a chi-square distributionillustrating the probability of climate at fossil sites falling outside current North American climate. We assessed variable importance for each ensemble distribution model to determine whether genetic clusters responded differently to climate, and whether important variables differed from the species-wide model. To do so, we used variable importance metrics from biomod2. Briefly, variable importance was calculated by testing the correlation between model predictions when a single variable is permuted, and when it is not, and then subtracting the value from 1.0. A high correlation between model predictions when a variable is permuted and when it is not, indicates a variable has a small effect on the model prediction, and hence, is not very important in the model. For each of the 60 models, each variable was permuted ten times, for a total of 600 permutations per variable. Variable importance was averaged over each of the models in the ensemble.

Analyses
We tested for multivariate spatial autocorrelation in admixture coefficients among the 85 populations using the mpmcorrelogram package (Matesanz et al., 2011) in R. Specifically, we quantified the relationship between the geographic distance between population locations and Euclidean distance between admixture proportion for each population. Pearson's correlation was calculated in 100 km bins up to 3,000 km (approximately half the maximum distance between populations). Significance was determined using 999 permutations, and a progressive Bonferroni correction to p-values. We also tested how well predicted suitability of the three clusters in current climate related to observed admixture at the 85 populations. To do so, we extracted suitability for each cluster model at the population locations and standardized the values to sum to 1 by dividing the suitability for each cluster by the summed suitability (because admixture proportions similarly sum to 1).
To assess past and future range dynamics of the genetic clusters, we quantified how the suitable area of each cluster varied over time. To calculate the area of suitable climate for hindcasts, continuous predictions were converted to binary (1/0) maps using the threshold that maximized TSS. Because each time-period included 60 predictions, a map cell was considered suitable if the majority (> 50%) of the binary models found it suitable (i.e., equal to 1.0). We repeated this procedure for future climate projections but averaged over the twelve GCMs. Once a composite binary map was calculated, the suitable area was calculated using the 'area' function in the raster package in R.

Evaluation statistics using cross-validation
Each of the ensemble models, whether fit species-wide or to individual genetic clusters, had good ability to discriminate between presences and pseudo-absences in current climate (Fig. 1). Across the 60 models in each of the ensembles (i.e., northern, central, eastern, and species-wide models), average AUC was greater than 0.90 and average TSS was above 0.80. There were, however, significant differences in evaluation statistics across the models. The specieswide model had the lowest overall TSS and AUC. The central cluster models were not significantly different from the species-wide models for either TSS or AUC (p > 0.05) and the northern cluster models were not significantly different than the central cluster.
The eastern cluster, in contrast, had the highest TSS.

Evaluation statistics using fossil records
Balsam poplar fossils tended to occur in areas where both the aggregated gSDM (i.e., combination of predictions from the northern, central, eastern Frontiers of Biogeography 2022, 14.3, e56931 © the authors, CC-BY 4.0 license 5 models) and standard SDM predicted high climatic suitability. The occurrence of fossils tended to track the receding ice sheet northward, often very near the southern margin of the glacier. Fossils extended nearly the entire length of balsam poplar's transcontinental range -from Beringia to the western Great Lakes to Atlantic Canada (Fig. 2). AUC scores, calculated by pooling fossil suitability over all time periods, were similar among gSDM and standard SDM (0.82 and 0.81 respectively), as was TSS (gSDM: 0.54, standard: 0.58). Despite moderately high evaluation statistics, fossil suitability tended to decline between 10 and 15 ky BP, for both the composite and standard models. Many fossils between 10-15 ky BP occurred in eastern North America along the southern edge of the receding Laurentide glacier. In contrast, older fossils in Beringia were consistently predicted to have high suitability. The relationship between climatic novelty and fossil suitability was weak, though significant, for the gSDM (r = -0.14, p = 0.02), and non-significant for the standard SDM (r = 0.02, p = 0.71) -indicating fossils in novel climates (compared to contemporary climates) tended to have marginally lower gSDM suitability than fossils in more analogous climates.

Predicting admixture
Admixture proportions were autocorrelated across the 85 populations over hundreds of kilometers. Multivariate spatial autocorrelation analyses showed admixture coefficients were positively correlated up to a maximum of 1,500 km, and random or negatively correlated at longer distances. SDMs for the three clusters had a good ability to predict cluster affiliation, as well as the relative mixing among clusters within populations. Generally, the climatic suitability of the northern, central, and eastern clusters at population locations was proportional to average admixture coefficients within populations (Fig. S2).

Variable importance
Variable importance differed between the gSDMs and standard SDMs (Figs. 3, 4). The most important variables in the species-wide model were each related to temperature, specifically average winter temperature, while variables related to precipitation (summer and winter precipitation and precipitation variability) had relatively low importance. Temperature variables were similarly important to each of the cluster models, in particular average summer temperature. The eastern cluster model was a notable exception and suggested high importance of annual average evapotranspiration ratio, which tended to be amongst the lowest ranking variables in the other models.

Change in area of suitable climate
The availability of climatically suitable area varied among the three genomic clusters through time, but some consistencies did emerge (Figs. 5, S3). In general, each cluster increased in area over the past 22 ky BP. The suitable area of each cluster at 22 ky BP ranged from 25 -75% smaller than the current suitable area, while the species as a whole (aggregated gSDM and standard SDM) was around 50% smaller than the current suitable area. As the glaciers receded in northern North America, each cluster was predicted to have expanded its range as it shifted northward (or eastward in the case of the northern cluster in Beringia), until eventually filling its contemporary range by ~7.5 ky BP (Fig. 2). The central cluster exhibited the greatest relative increase in suitable area over the past 22 ky BP, and by 12 ky BP had the greatest absolute available area among the three clusters. The eastern cluster similarly exhibited a gradual increase in suitable area over the past 22 ky BP but maintained among the lowest absolute area from 22 ky BP to present. In contrast, the northern cluster had the greatest absolute suitable area at 22 ky BP, maintaining approximately 75% of its current range. The suitable area available to the northern cluster oscillated at the end of the Pleistocene, before steadily increasing to its current suitable area over the past 10 ky BP. Suitable area of each of the genetic clusters was predicted to decline by 2090 (Fig. 5, S3). The central cluster is predicted to see a modest decrease in suitable area by 2030 (~15%), then maintain a stable suitable area through the end of the 21 st century (Fig. 5). Over the coming decades, the central cluster is predicted to gradually shift northward including into Alaska, where it currently has relatively low suitability (Fig. 2a, b). In contrast, the eastern cluster is predicted to decline by more than 25% of its current suitable area by 2030, and further decline by the end of the century. The suitable area for the northern cluster is projected to exhibit a different future trend. By 2030, the northern cluster is predicted to increase in suitable area by nearly 50%, followed by a steady decline through the end of the century, ultimately resulting in a modest (<5%) net loss in area by 2090 (Fig. 5). The modest loss in suitable area is the result of the northernmost parts of North America becoming climatically suitable for the northern cluster by 2030 and a declining northern landmass as the shift northward continues in later decades.

Discussion
We performed one of the few comparative evaluations of the transferability of gSDMs and standard SDMs using an independent set of fossil records across multiple time steps. We found that predictions from standard and gSDMs were similar through time, but, in contrast to other studies, gSDMs did not exhibit superior performance as compared to standard SDMs. Nonetheless, gSDMs provided unique information about balsam poplar's past and future range dynamics, such as changes in suitable area of genetic clusters through time and potential refugial locations during the LGM. More work is needed, however, to determine why the gSDM approach improves performance in some species (Ikeda et al., 2017), but not others.

Model comparison
We found when models were tested with an independent set of occurrences excluded from model training, the gSDMs did not perform better than standard SDMs. The lack of substantial improvement with gSDMs stands in contrast to other studies that have reported a multi-fold improvement in model accuracy with gSDMs when compared to standard SDMs (e.g., Ikeda et al., 2017). Most studies that compare standard and gSDMs, however, limit model training and testing to current climate (Ikeda et al., 2017;Marcer et al., 2016;Oney et al., 2013), and hence do not inform whether gSDMs are more transferable than standard SDMs to time periods outside those used to train models. This is an especially important distinction as a major motivation of integrating SDMs with genetic information is to improve predictions of species responses to future climate change through the accommodation of genetic differentiation within species (Ikeda et al., 2017;Maguire et al., 2018;Oney et al., 2013). Our findings suggest that to fully understand differences in performance of gSDMs and SDMs, model testing ideally should be conducted on independent data, including from areas or time periods not used to train models, as is recommended for standard SDM implementations (Araújo et al., 2019). Testing models on occurrences from multiple time periods provides a true test of model transferability and extrapolation to novel climates, which is especially important when predicting to future climates (Fitzpatrick et al., 2018).
There are numerous reasons why the gSDM approach may not have improved performance compared to standard SDMs when modeling balsam poplar. First, gSDMs assume that populations within clusters are not only identical and therefore respond similarly to climate but also that they are potentially differentiated from populations in other clustersinsofar that each cluster is represented by a separate, independent model. While the assumption of niche differentiation may be adequate for higher-level taxa or even subspecies that are strongly differentiated along climatic gradients, modeling the niche of genetic clusters has limitations in the presence of admixture, such as when individuals have mixed ancestries or when populations include individuals from multiple ancestries. In balsam poplar, populations in the western portion of the range show mixed ancestry involving the central and northern clusters that may actually reflect a more continuous population genetic structure of isolation by distance as opposed to discrete, spatially separated clusters (Keller et al., 2010;Meirmans et al., 2017). Modeling the northern and central clusters as distinct entities ignores this continuous variation and therefore gSDMs based on these two clusters may be quantifying a mixed signal that is not entirely representative of hypothetically "pure" clusters, which in turn could introduce some uncertainty into the models (e.g., potentially affecting parameter estimates or model structure). Furthermore, climate is likely not the sole factor limiting the distribution of genetic clusters, and climate could be aligned with population structure for a variety of reasons. Geographic isolation, limited dispersal/gene flow (Lecocq et al., 2019), or historical events (e.g., bottlenecks, founding events, genomic barriers) are examples of non-climatic factors that may affect the current distribution of clusters and hence influence the outcomes of correlative, climate-based SDMs. While using genotypic variants or trait combinations that represent adaptive variation (as opposed to neutral population structure) to subdivide the climatic niches of a species could help ensure clusters better represent functional climatic differences, clusters of different ancestry groups identified from neutral markers may represent better approximations of historically isolated groups that have inhabited, and possibly adapted to, local abiotic environments within the range and therefore occupy a unique niche space. Despite its limitations, neutral population variation remains among the most common way to split species climatic niches into multiple subsets (Smith et al., 2019), but explicitly accounting for adaptive variation and/or trait variation could provide one way to advance the gSDM approach.
It remains an open question as to when modeling subgroups within species will prove advantageous. Just as standard SDMs assume no intraspecific variability in the niche requirements within the species as a whole, gSDMs built on individual genetic subgroupings require a similar, problematic assumption of genetic uniformity within each genetic cluster and thereby also ignore fine scale population-level local adaptation (Fitzpatrick & Keller 2015). Most gSDM studies, including this one, consider a single species, which makes it difficult to generalize results, especially since gSDM studies utilize a variety of modeling techniques and evaluation methods. Some, though not all, gSDM studies that report superior performance over SDMs have been conducted on species with disjunct ranges. In these cases, genetic clusters may be more strongly differentiated due to geographic isolation and therefore better fulfill the gSDM assumption of complete differentiation. Balsam poplar, in contrast, despite being structured into multiple genetic clusters, has a large continuous range with few impediments to gene flow, which may limit any advantage of separating the range into discrete units. Future work focused on (i) when it is most advantageous to split a species ecological niche into multiple, genetically-informed subsets, (ii) the minimum/maximum number of subunits required to best capture the species entire niche and (iii) whether any functional or geographic traits affect gSDM performance (as has been done for standard SDMs, e.g., Hanspach et al., 2010;Syphard & Franklin, 2010) could help improve the biological realism and, potentially, the performance of the gSDM approach.
Reducing the need to assign genetic information to occurrence locations lacking genetic samples could also improve the gSDM approach. Gotelli & Stanton-Geddes (2015), for instance, advocate the gSDM approach on distinct clusters but propose weighting each population by the relative ancestry proportions rather than assuming a discrete assignment to an individual cluster. Similarly, Martínez-Minaya et al. (2019) have developed an approach to model the spatial distribution of genetic clusters which could reduce the need to assign occurrences to a single cluster. The gSDM approach could also benefit from using techniques designed for rare species with few known occurrences (Breiner et al., 2015;Shcheglovitova & Anderson, 2013), as the number of populations in genetic studies assigned to any individual cluster is often relatively small. Alternatively, explicitly modeling gene-climate relationships associated with adaptive variation (such as the approaches described in Fitzpatrick & Keller, 2015;Gougherty et al., 2021) could be a useful alternative to understanding the maladaptive impacts of climate change, without artificially dividing a species range into discrete units.

Balsam poplar's past distribution & potential refugia
While the gSDMs did not exhibit superior model performance -at least as measured by their ability to distinguish presences from pseudo-absences -when paired with fossil data, gSDMs revealed numerous insights into balsam poplar's range dynamics that standard SDMs did not. Despite the pollen record being undeniably incomplete, and the presence and absence of fossil pollen needing to be interpreted with caution, gSDMs and fossil records both suggest that following glacial retreat, balsam poplar's migration northward was broad-fronted. By 13 ky BP suitable climate and fossil records extended from Beringia to the center of the current range (Minnesota, Wisconsin) and to the easternmost part of the current range in Nova Scotia. gSDMs and fossil records both point to the possibility that balsam poplar filled its contemporary range from multiple refugia -specifically in Beringia and south of the Cordilleran and Laurentide ice sheets (Fig. 2). The suitable area south of the ice sheets during the LGM was nearly continuous from the Rocky Mountains to the Atlantic Coast and was likely the primary refugium for balsam poplar as each of the three clusters had suitable area in this region. This is consistent with Keller et al. (2010), who suggested a refugium in the Rockies based on a signature of range expansion and the phylogeographic relationship between the three clusters. While genetic studies to date have not detected a signature of a separate northern refugium or expansion of balsam poplar out of Beringia (Breen et al., 2012;Keller et al., 2010), both SDMs and fossils suggest balsam poplar was present in Beringia during the LGM. Others have also reported the presence of balsam poplar fossil pollen and macrofossils in Beringia. Brubaker et al. (2005), for instance, reported Populus pollen (the predominant species, they suggest, being balsam poplar) from at least 20 ky BP and an increase in occurrence and abundance of Populus pollen after 15 ky BP. They note that the increase in Populus pollen abundance is unlikely to be the result of migration from outside Beringia as an ice-free corridor to the southern ice margin had not yet opened. Our SDMs, however, suggest that as soon as the corridor was ice free (~13 ky BP) climate in the region was suitable for balsam poplar. Consistent with these predictions, recent work has shown Populus species, based on pollen and eDNA data, were likely among the dominant tree species in the corridor (MacDonald & McLeod, 1996;Pedersen et al., 2016). This lends to the possibility that populations south of the ice sheet came into contact with Beringial populations soon after the ice-free corridor opened. Long and continued contact between populations north and south of the ice sheet could have eroded any distinctive markers that would have been emblematic of a distinctive northern refugium and could explain the gradient in cluster affiliation throughout the western part of the range. These sorts of insights are not available from SDMs fit to the species as a whole.

Future trajectories
Like studies of other North American plant species, we found the range of balsam poplar is predicted to shift northward in future climates (e.g., Iverson et al., 2008;Morin et al., 2008;Oney et al., 2013). The largest increases in suitability tended to occur in the northernmost portions of North America, where, interestingly, recent studies report an expansion of balsam poplar's distribution and an increase in its abundance (Roland et al., 2016). Roland et al. (2016) suggest the expanding distribution and increasing abundance is primarily being driven by warming summer temperatures, which we also found to be the most important climatic driver for the northern cluster. The expansion of balsam poplar along its northern edge is likely facilitated by its ability to rapidly reach reproductive maturity, produce an ample annual seed crop, and disperse its seeds long distances. Although we cannot be sure whether balsam poplar will be able to track its suitable climate throughout its range, these findings illustrate balsam poplar's sensitivity to climate change and the need for accurate models to understand its responses.

Author Contributions
AVG and MCF conceived the study. SRK processed and provided the genetic information. AVG analyzed the data and led the writing, with contributions and discussion from all authors.

Supplementary Material
The following materials are available as part of the online article at https://escholarship.org/uc/fb Figure S1. Average cluster affiliation for 85 balsam poplar populations used in distribution models. Populations are ordered from west to east. Figure S2. Predicted suitability and average admixture proportions of 85 balsam poplar populations across three genetic clusters. Figure S3. Change in the area of climatically suitable habitat through time of three balsam poplar genetic clusters and the entire species.