Frontiers of Biogeography A simulation-based method for selecting calibration areas for ecological niche models and species distribution models

Ecological niche models and species distribution models (ENM and SDM, respectively) are tools that have seen massive use and considerable improvement during the last twenty years. The choice of calibration areas for such models has strong effects on model outcomes and model interpretation, as well as on model transfer to distinct environmental settings. However, approaches to selecting these areas remain simple and/or unlinked to biological concepts. Such models should be calibrated within areas that the species of interest has explored throughout its recent history, the accessible area ( M ). In this paper, we provide a simulation approach for estimating a species’ M considering processes of dispersal, colonization, and extinction in constant current climate or glacial-interglacial climate change frameworks, implemented within a new R package we developed called grinnell. Using the avian genus Aphelocoma , we explored different parameterizations of our simulation, and compared them to current approaches for M selection, in terms of model performance and risk of extrapolation using the algorithm Maxent and mobility-oriented parity analyses. Model calibration exercises from all approaches resulted in at least one model meeting optimal performance criteria for each species; however, we noted high variability among taxa and M selection methods. More importantly, M hypotheses derived directly from simulations of key biological processes, rather than being based on simple proxies of those processes, and as such are better suited to erecting biologically appropriate contrasts in model calibration, and to characterizing the potential for model extrapolation more rigorously. Major factors in our simulations were environmental layer resolution, dispersal kernel characteristics, and the inclusion of a changing framework of climatic conditions. This contribution represents the first simulation-based method for selecting calibration areas for ENM and SDM, offering a quantitative approach to estimate the accessible area of a species while considering its dispersal ability, along with patterns of change in environmental suitability across space and time. and extent from those created with other methods. ! Although M areas derived from simulations are still hypotheses, they are closer to the regions that could have been explored by a species; comparisons of model outputs showed that these differences have effects in model prediction and interpretation.


Abstract
Ecological niche models and species distribution models (ENM and SDM, respectively) are tools that have seen massive use and considerable improvement during the last twenty years. The choice of calibration areas for such models has strong effects on model outcomes and model interpretation, as well as on model transfer to distinct environmental settings. However, approaches to selecting these areas remain simple and/or unlinked to biological concepts. Such models should be calibrated within areas that the species of interest has explored throughout its recent history, the accessible area (M). In this paper, we provide a simulation approach for estimating a species' M considering processes of dispersal, colonization, and extinction in constant current climate or glacial-interglacial climate change frameworks, implemented within a new R package we developed called grinnell. Using the avian genus Aphelocoma, we explored different parameterizations of our simulation, and compared them to current approaches for M selection, in terms of model performance and risk of extrapolation using the algorithm Maxent and mobility-oriented parity analyses. Model calibration exercises from all approaches resulted in at least one model meeting optimal performance criteria for each species; however, we noted high variability among taxa and M selection methods. More importantly, M hypotheses derived directly from simulations of key biological processes, rather than being based on simple proxies of those processes, and as such are better suited to erecting biologically appropriate contrasts in model calibration, and to characterizing the potential for model extrapolation more rigorously. Major factors in our simulations were environmental layer resolution, dispersal kernel characteristics, and the inclusion of a changing framework of climatic conditions. This contribution represents the first simulation-based method for selecting calibration areas for ENM and SDM, offering a quantitative approach to estimate the accessible area of a species while considering its dispersal ability, along with patterns of change in environmental suitability across space and time.

Introduction
Ecological niche models (ENM) and species distribution models (SDM) have become a popular technique in the emerging area of distributional ecology (Araújo et al. 2019), with publications based on these tools in evolutionary biology (e.g. Warren et al. 2008, Varela et al. 2011, Saupe et al. 2018, ecology (e.g. Anderson 2017, Zurell 2017, Osorio-Olvera et al. 2020, conservation biology (e.g. Guisan et al. 2013, Freeman et al. 2018, and public health (e.g. Peterson 2014, Quiner andNakazawa 2017). Many aspects of this methodology have seen intensive analysis, including algorithm selection (Qiao et al. 2015), design of model transfers (Werkowska et al. 2017, Yates et al. 2018, model calibration (Muscarella et al. 2014, Cobos et al. 2019a, and model complexity and assembly (Warren and Seifert 2011, Zhang et al. 2015, Galante et al. 2018. However, approaches for selection of areas across which models are calibrated have remained at least partially neglected. The size of the calibration area and the environmental space it contains (i.e., the background data used for calibration) have been acknowledged to influence ENM and SDM results (Giovanelli et al. 2010, Cooper and Soberón 2018. Anderson and Raza (2010) pointed out that choice of calibration area makes a considerable difference in modeling outcomes, and indicated that a smaller area closer to known occurrence points yielded better distributional predictions. Barve et al. (2011) demonstrated that calibration area choice matters, and linked a biological concept (i.e., the set of sites that has been accessible to the species over relevant time periods) and the area over which models should be calibrated. Barve et al. (2011) also outlined a method by which such an accessible area could be simulated; however, they presented only a simple sketch of that method, which until now has not been implemented fully. ENM and SDM inevitably involve some level of model transfer from areas and conditions that were actually sampled to broader areas and sets of conditions (Yates et al. 2018), even when models are interpreted across the model calibration area only. Therefore, obtaining more realistic calibration areas is critical to achieve better interpretations of models in regions where results solely derive from extrapolation. Soberón and Peterson (2005) provided a heuristic framework for distributional ecology, the BAM diagram, in which the occupied distribution of a species (GO) is the intersection of the area presenting habitable abiotic conditions (A), the area suitable in terms of biotic conditions (B), and the area accessible to the species over relevant time periods (M). Crucially, M is the set of sites (i.e., cells in a raster data layer) that the species has "explored" over its history, and as such, M represents the area formed by sites, with suitable or unsuitable environmental conditions, across which the species' responses to environments can be estimated.

Introduction to concepts
On the basis of the definition of M, Barve et al. (2011) argued that the set of environments manifested across the region, termed η(M), is the range of environments that the species has explored and tested. Thus, in effect, the algorithms that are used for niche modeling or distribution modeling compare and contrast the subsets of η(M) that the species is known to occupy [termed η (G+)] and the full breadth of η (M). It follows also that the accessible area (i.e., M) is the geographic area across which these contrasts should be developed.

Purpose
The accessible area should include all sites to which the species has had access through dispersal, a subset of which would be the sites that are appropriate for the species in terms of abiotic and biotic environments. However, Barve et al. (2011) showed that this accessible area is often broader than simply the sites presently accessible to the species via dispersal, in view of climate changes deeper in history that made other, more distant sites accessible to the species. This point is demonstrated by the well-known phenomenon of "sky islands," in which species have presently isolated populations at sites well south of the main distributional area that were colonized by the species during glacial periods (Heald 1951, Mayr and Phelps 1967, Masta 2000, Sekar and Karanth 2013. Therefore, when estimating accessible areas, one should consider how different barriers have facilitated or constrained the dispersal of species into novel regions. The purpose of this contribution is to provide a simulation-based methodology for estimating the accessible area of a species under biologically realistic assumptions, offering the possibility of incorporating relevant climate changes into this estimation. That is, we take the concept of the fundamental ecological niche of a species (NF) and the area A that it defines, along with the known sites where the species occurs (G+) as a starting point (Peterson and Soberón 2012). We then simulate processes of dispersal, colonization, and extinction of four species of Aphelocoma jays in the face of Holocene-Pleistocene climate change, and thereby simulate their past range dynamics. Crucially for estimating M, we keep track of the full set of sites (i.e., cells) that the species are simulated to have explored and accessed-this set of cells is a reasonable hypothesis of the accessible area, a region of likely dispersal during relevant periods of history. We compare metrics and outputs of ENM derived from simulated Ms against models in which Ms are drawn from current methods, to explore implications of using different calibration area hypotheses. We developed scripts to create these simulations in a new R package (i.e., grinnell;Machado-Stredel et al. 2020), and offer this methodology as a means of identifying and delimiting calibration areas for ENM and SDM.

Description of simulation
The method presented is an extension of the one outlined by Barve et al. (2011), which was based on a cellular automata simulation where dispersers move based on fixed rules (e.g. movement is allowed only to neighboring cells; Gardner 1970, Wolfram 1984. Here we take a probabilistic view of dispersal incorporating dispersal kernels, and simulate colonization and extinction events in a cell grid that has suitable conditions drawn from a preliminary estimate of the fundamental ecological niche. These processes are simulated several times in replicates that generate a summary of accessed cells. The inputs needed for the simulations are occurrence data for the species and raster layers of environmental conditions (Fig. 1).

Figure 1.
Initial data preparation and analysis for simulations. Solid margin blocks and arrows show the required steps for simulations in a stable framework of climatic conditions: 1A) Obtaining current principal components (PCA), 2) NF envelope modeling, and 3A) Geographic projection of suitability matrix S for the current climatic scenario. Dashed margin blocks and arrows refer to additional steps needed for simulations in a changing framework of climatic conditions: 1B) LGM variable transformation with current PCA loadings, 1C) Interpolations for transitional scenarios, and 3B) Geographic projections of suitability matrices (S1-ST) for all T climatic scenarios. In the interpolation step a simplified graph of mean temperature versus time is included, showing a glacial-interglacial cycle with four periods (interpolations are shown as vertical red dashed lines).
In each simulation replicate, a subset of occurrences is selected to set occupied cells in the grid as "starting" populations, from which dispersers will start moving and accessing new cells. These cells, of course, are not the actual populations from where the species begun to spread in the past, but instead represent potential populations from which an area of potential dispersal can be simulated during recent history. After a disperser moves to a new cell, the cell is considered accessed and is saved in memory (i.e., the result of the dispersal, not the individual, is stored). Cells are colonized by dispersers if the environmental conditions for maintaining populations are suitable (i.e., inside NF), with new dispersers moving from old and newly colonized cells in subsequent dispersal events. Thus, cells can be accessed regardless of their conditions, but the environment constrains the colonization of cells. Additionally, users can choose between a stable (i.e., current conditions are kept constant) or a changing framework of climatic conditions. In a changing framework, interpolations between two sets of climatic variables, one for current and one for past conditions, are generated to characterize several climatic scenarios. Ultimately, simulation replicates are summarized in an estimate of M that houses the most frequently accessed cells.

Time
Step 1A Step 2 Step 1B

Fundamental ecological niche estimation
To define the places to which the species can disperse and colonize, as well as those where it may reach but not establish sustainable populations, suitable and unsuitable areas need to be defined. An initial estimate of the fundamental ecological niche (NF) of the species is constructed with an ellipsoid envelope model (Green 1974, Soberón and using occurrence data and principal components that represent the current climate (Fig. 1). The ellipsoid is defined by the centroid and the covariance matrix. The limits of what is suitable and unsuitable are found using a chi-squared cumulative probability distribution on the Mahalanobis distance from the ellipsoid's centroid (i.e., MD; Etherington 2019), and imposing a suitability threshold (equivalent to E in Peterson et al. 2008; see also Nuñez-Penichet et al. 2021) to exclude extreme data (95% as default, Table 1). For each cell, MDij defines the suitability Sij as in a multivariate normal distribution: The S matrix is calculated for the present or, if needed, ST matrices are obtained for past and current climatic scenarios, being projected on the geographic domain of interest G (Fig. 1). It is worth noting that use of envelope models for approximating fundamental ecological niches is somewhat perilous if the data include erroneous points, as they may create overly broad ellipsoid envelopes, which places a premium on careful quality control and error assessment of the occurrence data prior to fitting ellipsoids. 1 Accessibility threshold Lower tail (percentage) of the frequency distribution of to be discarded when selecting the subset of most frequently accessed cells.

5
a Parameters used only in a changing framework of climatic conditions.

Preparation of climatic scenarios
First, the dimensionality of the current abiotic variables over G, which likely contains M, is reduced with a principal component analysis (PCA) to improve transferability, and the first three PCs are used to obtain an ellipsoid for NF that is assumed to be conserved ( Fig. 1; Peterson et al. 1999, Peterson 2011. Climate scenarios are defined as time intervals during which the grid has specific and constant climatic conditions that correspond to an associated suitability matrix. If the framework of climatic conditions is stable, the simulation occurs in one current climatic scenario that lasts the amount of dispersal events (t) set by the user (Table 1, Supplementary Fig.  S1), and during which only one suitability matrix S is used (Fig. 1). As mentioned before, the S matrix shows which cells are outside the NF estimate (i.e., Sij = 0), that is, any cell which MDij is beyond the ellipsoid's limit defined by the suitability threshold.
In a changing framework of climatic conditions, principal components for the LGM are obtained by transforming LGM variables using the current PCA loadings (Fig. 1). Then, a simple algorithm generates interpolations for transitional scenarios, incorporating them in a sequence of glacial-interglacial scenarios (i.e., a cycle; Fig. 1, Supplementary Fig. S1). Climatic scenarios for transitions between glacial and interglacial climates are obtained using a simple method in which each interpolation (Ii) consists of a weighted mean of conditions of the two scenarios, calculated as: The weight (WDi) is given by the relative distance in time (ranging 0-1) from the closest glacial scenario to the transitional scenario of interest ( Supplementary Fig. S1). In the simulations, a cycle starts in a period of transition between current-like ("interglacial") and glacial scenarios, passes a glacial period, continues with a transition between glacial and interglacial, and ends in a current period (i.e., a cycle has four periods, Fig. 1, Supplementary  Fig. S1). The number of T scenarios is defined by the parameters "total simulation period" and "scenario span", as well as the selected times for each of the four periods within a cycle: (1) transition from past current-like conditions to glacial conditions; (2) glacial conditions; (3) transition from glacial to current conditions; and (4) current conditions (Table 1, Fig. 1, Supplementary Fig. S1). In this framework, ST suitability matrices are created for each T climatic scenario. For instance, using default parameters in a changing framework, the simulation will be run in 30 suitability matrices, corresponding to 30 climatic scenarios, given that the scenario span is 1 ka and the total simulation period is 30 ka (i.e., 5 scenarios in each transition, 10 glacial and 10 interglacial scenarios; Table 1).

Simulation design
At the start of each simulation, two empty matrices are created with the same extent as G and S. The first of these matrices (C) will contain the frequency of cells that are colonized during a climatic scenario, with dispersal starting from a set of initial populations ( Fig. 2A). A dispersal movement consists of a disperser moving from a colonized cell in C to a cell saved in the second matrix D, which accumulates the frequency of cells that are accessed by dispersal (Fig. 2). Before the first dispersal event (i.e., t = 0) the set of occurrences inside NF is randomly subsampled (without replacement) to obtain N different occurrence subsets of 50%, which are used as initial populations in the matrices C and D of each replicate (Table 1). Cells containing initial populations are considered to have been both colonized in C(t=0) and accessed in D(t=0) (Fig. 2). Having an appropriate number of N replicates ensures that most occurrences inside NF are considered as potential initial populations at least in one of the 50% subsets (Supplementary Table S1).
Movements of dispersers are set in polar coordinates, where the radial distance (d) is drawn from two possible dispersal kernels, normal or log-normal (fat tailed), and the angle (d) is drawn from a random distribution ( Fig. 2A). The dispersal kernel mean is set at zero (i.e., average dispersers stay in a cell or move to neighboring ones), and the standard deviation is defined by the user depending on the resolution of G and the dispersal ability of the species.
Given d and d, the rounded coordinates of the cell that will be accessed are given by sine and cosine expressions as number of cells to move ( Fig. 2A).
At the beginning of the simulation, the first dispersal event (t = 1) starts from colonized cells Cij(t=0) and the result of the movements is recorded in cells of D(t=1). If the accessed cells are suitable, they are colonized and recorded in C(t=1) (Fig. 2B). Users can indicate if more than one disperser moves from each cell per dispersal event by changing the maximum number of dispersers (NdMax), a parameter that splits the suitability range into a number of intervals (e.g. NdMax = 2 gives two intervals: [0.0-0.5] and [0.5-1.0]). From each colonized cell Cij, dispersers move in each dispersal event based on the suitability Sij of that cell (e.g. given NdMax = 2, two dispersers move from Cij if Sij > 0.5, and one disperser moves if Sij ≤ 0.5; Fig. 2B). This feature is based on evidence that suggests that species abundances are inversely correlated with distance to the fundamental niche centroid (Martínez-Meyer et al. 2013, Osorio-Olvera et al. 2019, Osorio-Olvera et al. 2020, hence, more dispersers might move from cells with more individuals. As mentioned before, an accessed cell Dij is only colonized if suitable (i.e., Sij > 0, inside NF), and the colonization is recorded in C before the next dispersal event (Cij(t+1) = Cij(t) + 1; Fig. 2B).

Figure 2.
Simulation setting and workflow for the estimation of the accessible area M. A) Initial setting for one replicate before dispersal starts. Initial populations (black dots) are shown in the suitability (S) and colonization (C) matrices. Colonized cells in C and accessed cells in D are shown in gray and blue, respectively. The dispersal angle ( ) and distance (d) determine the position of a cell that will be accessed (dashed blue cell). B) Simulation workflow for a stable framework of climatic conditions in which dispersal starts from each colonized cell in C(t) (Step 1), the result of the movement is recorded in D(t+1) (Step 2), and cells with suitable conditions are colonized in C(t+1) (Step 3; unsuitable cells are shown as X's). The dashed margin block comprises all simulated processes for each replicate within one climatic scenario (current conditions). In step 1, cells with Sij > 0.5 send two dispersers and cells with Sij ≤ 0.5 send one disperser in each dispersal event, given NdMax = 2 (arrows in Step 2 show the result of the movements). Darker cells in C and D reflect larger absolute frequencies. After the last dispersal event t, a final D matrix is obtained in each replicate (Step 4), a mean matrix is calculated (Step 5), and the upper 95% distribution of is made binary to obtain M (Step 6).
In a changing framework of climatic conditions, several suitability matrices ST are obtained for each of the T climatic scenarios ( Fig. 1). At the beginning of the framework, the initial populations experience a transitional suitability matrix (S1), facing extinction immediately

S
Step 1) Dispersal from Cij Step 2) Cells are accessed Step 3) Cells are colonized Step 4) Final Ds per replicate

M
Step 6) Accesible area binary 95% Step 5) Mean matrix D D if the cells' conditions are unsuitable in that scenario, or in subsequent ones, which eliminates populations from inhabitable cells in the past (Fig. 1, Fig. 3, Supplementary Fig. S1). Colonization and dispersal occur as in the stable framework during t dispersal events, after which the simulation passes to the next scenario (i.e., T+1). In the new scenario, the colonization matrix C is updated according to the conditions on the new suitability matrix (ST+1), and the dispersal matrix D is stored in memory (Fig. 3). Cells colonized in a previous scenario (CijT) are kept if their new suitability values (SijT+1) are greater than zero. As before, if the new conditions are outside NF (i.e., SijT = 0), populations in those cells face extinction (i.e., local extirpation; Fig. 3). These steps generate the initial matrix of colonized cells for the next scenario (C(t=0)T+1; Fig. 3).
Since the D matrix is cumulative, the initial matrix of accessed cells for the new scenario (D(t=0)T+1) matches the final matrix from the previous scenario (D(t)T).
In both frameworks of climatic conditions, final C and D constitute absolute frequency matrices, and after all t dispersal events occur along the defined number of climatic scenarios in each replicate, the mean of the D matrices from all replicates is calculated (i.e., ). The upper 95% distribution of accessed cells is selected from to exclude cells with low access frequencies (see accessibility threshold; Table 1). Finally, a binary version of this upper 95% subset of is selected as an estimate of M (Fig. 2B, 3). Steps within dashed margin blocks are done as in the stable framework (Fig. 2B). The first suitability matrix (S1) is used in the first scenario (T=1) to start simulations. After the last dispersal event t happens at the end of scenario T (Steps 1-3), the new scenario starts (T+1) with the corresponding suitability matrix (Step 4). The colonization matrix C is updated based on ST+1, and cells in new unsuitable conditions experience extinction (shown as X's; Step 5). The D matrix from scenario T is stored and used at the start of scenario T+1 (D(t)T = D(t=0)T+1; Step 6), and a new set of t dispersal events begins. After all T scenarios, the final three steps are executed as in the stable framework (i.e., Steps 7-9 are equivalent to Steps 4-6 in Fig. 2B).

Case example: Aphelocoma jays
The genus Aphelocoma was, until relatively recently, conceived as comprising three species, each relatively broadly distributed across North and Middle America: A. coerulescens in temperate oak and pinyon scrub, A. ultramarina in sub-tropical pine-oak forest, and A. unicolor in cloud forest (Pitelka 1951). However, with more in-depth, molecular genetic studies, it was Step 4) Scenario changes:T+1 Step 5) Update C matrix: C(t=0)T+1 (Steps 1-3) Step 6) Keep D: determined that the genus comprised more distinct lineages, such that a present-day 'best' concept for the genus would comprise as many as 14 species (McCormack et al. 2011). Here, we use a selection of four representative taxa from this genus, A. californica, A. coerulescens, A. ultramarina, and A. unicolor, with distributions that vary in range size and latitudinal position (Fig. 4), to illustrate effects of distinct patterns of distribution and environmental conditions on simulations of accessible areas.

Data preparation
We gathered occurrence data for the 4 taxa from the eBird database (Sullivan et al. 2009). Duplicates, records lacking coordinates, and records from ambiguous localities were removed to avoid problems in analyses . We also performed a spatial thinning of occurrences using a 10 km distance filter to reduce problems derived from autocorrelation (Anderson 2012). The distance used for thinning the records was selected considering the resolution of environmental variables (see below) and based on a visual inspection of the localities in the area of interest, given knowledge of the genus' natural history and experience with its populations rangewide by one of us (ATP). Data cleaning and thinning were done in R (R Core Team 2019), using basic functionalities and the package ellipsenm (Cobos et al. 2020). Occurrences used for the four Aphelocoma species are included as part of Appendix S1.

A. ultramarina
As current environmental variables, we used the bioclimatic variables from the WorldClim database v1.4 (resolution = 2.5'; Hijmans et al. 2005). To represent Last Glacial Maximum conditions, we used past bioclimatic variables developed under the general circulation model CCSM4, at the same resolution (variables available at https://www.worldclim.org/data/v1.4/worldclim14.html). We excluded four variables that combine temperature and precipitation information because they are known to present spatial artifacts (i.e., BIO8-9, BIO18-19; see Escobar et al. 2014). Current and LGM variables were cropped to relevant geographic extents (Gs) that include areas that are suitable for the species, but also other surrounding places with unsuitable conditions to permit and support interpretation ( Fig. 4; extents and sizes of all Gs can be found in Supplementary Tables S2-3).

Simulations of M
We performed simulations under two frameworks, one of stable conditions (current), and another one of changing glacial and interglacial conditions (see above, Preparation of climatic scenarios). For simulations performed under the stable framework, we varied dispersal kernel ("normal" and "log-normal"), kernel standard deviation (SD = 1, 2, 5), and number of dispersal events (250 and 450) to illustrate their effects (Table 1). To explore additional aspects of our simulation approach, for A. ultramarina, a species with a relatively small distributional area that facilitates development of larger number of replicate simulations, we performed a set of extra simulations under current conditions. To show the effects of number of replicates (5, 10, and 50; "normal" kernel, SD = 1) and raster resolution (30" and 5'; "normal" and "log-normal" kernels, SD = 1, and 0.6, respectively), we performed simulations only with 250 dispersal events. Lower values of standard deviation for log-normal kernels (0.1, 0.4, and 0.8) were also explored for this species in simulations with 250 and 450 dispersal events.
In our parameterizations for simulations under a changing framework, we varied dispersal kernel and kernel SD, as we did for simulations under the stable framework. Here, we did simulations under two types of cycle with the following duration for glacial, interglacial, and transition periods: (1) 45 ka for glacial (LGM) and interglacial (current) periods, and 5 ka for the two periods of transition (i.e., 45-5 cycles); and (2) 25 ka for each of the four periods (i.e., 25-25 cycles; 1 ka = 10 3 years). We obtained transitional climatic scenarios between LGM and current conditions using the algorithm explained in Preparation of climatic scenarios. Scenario span was set to 2.5 ka, with 25 dispersal events occurring during this time interval (i.e., one dispersal event per 100 years). We defined the total simulation period to be 1 Ma, which represents 10 cycles of 100 ka with 40 climatic scenarios per cycle. The maximum number of dispersers (NdMax) was set to 2, and the suitability threshold was the default value (Table 1); these two parameters were held constant for all simulations. Most simulations were run with the default number of replicates (Table 1, Supplementary Table S4). In all, 62 simulations under the stable framework (12 common processes for the four species, and 14 extra for A. ultramarina), and 48 under the changing framework were performed (110 simulations in total). All simulations were run as sequential processes (non-multi-core analyses) on a XEON 2.8 GH, 32 GB of RAM computer, which prevented high demands of RAM.

Implications of M hypotheses for ENM outputs
We explored the implications for ENM outcomes from Maxent 3.4.1 (Phillips et al. 2017) derived from using M areas defined with our approach versus M areas delimited with methods currently used in the field. To do so, we compared the range and frequency of environments represented in distinct M areas for all species using annual mean temperature and annual precipitation. We also obtained metrics of performance for Maxent models produced using the different M hypotheses for each species, including partial ROC tests, omission rates, and AICc values, which allowed selection of optimal models for each calibration area approach. The simulated M selected for these comparisons was obtained in a framework of changing conditions using 45-5 cycles (i.e., LGM and present periods each lasted 45 ka; transitions lasted 5 ka), a normal kernel (SD = 2), environmental rasters at 2.5' resolution, and 10 simulation replicates. Alternative M hypotheses were delimited using four approaches: (1) convex hull polygons of records with 500 km buffers; (2) polygons resulted from buffering occurrences with a 500 km distance; (3) concave hull polygons of records with 500 km buffers; and (4) a selection of ecoregions containing occurrence records plus a buffer of one ecoregion on all sides. These alternative accessible areas were created in R using the ellipsenm, rgdal , and rgeos (Bivand and Rundel 2019) packages. Spatial polygons representing ecoregions (Omernik andGriffith 2014, Commission for Environmental Cooperation 1997) are included as part of Appendix S1.
To obtain appropriate ENMs and compare their performance using distinct M areas, we performed model calibration exercises for all M hypotheses using the kuenm package (Cobos et al. 2019a). To reduce dimensionality and avoid multicollinearity-related issues, we performed PCAs with the 15 bioclimatic variables and used the first five principal components (PCs) in these explorations. We randomly partitioned occurrence data into a 50% training set and a 50% set for testing models. In all, 1300 candidate models were evaluated for each M during calibration, for a total of 26,000 models (i.e., 4 species x 5 M hypotheses x 1300 models). Candidate models for each M resulted from combining 5 feature classes (q, lq, lp, qp, and lqp; where l = linear, q = quadratic, and p = product), 10 regularization multipliers (i.e., 0.1, 0.25, 0.5, 0.75, 1-6), and 26 distinct sets of variables representing all unique combinations of more than two PCs (Cobos et al. 2019b; Supplementary Table S5).
We calibrated Maxent models using a total of 20,000 points randomly selected from each M for all species, except for A. unicolor (6,373 points in total), as this last species had the fewest number of pixels in one of the M hypotheses used; all occurrences were included among the background points. We measured model performance only on the background points mentioned above. Models with optimal performance were those that met the following criteria: (1) statistical significance (partial ROC; Peterson et al. 2008), (2) omission rate < 0.05 (Anderson et al. 2003), and (3) ΔAICc ≤ 2 (Akaike Information Criterion corrected for small sample sizes; Warren and Seifert 2011); we note that the ΔAICc metrics were among the reduced set of models that passed the first two criteria, rather than globally among the full set of candidate models. These metrics allowed us to select the best model for each species and M. We then produced geographic projections of each best parameterization in a final model using all occurrence data, 10 replicates by bootstrap, cloglog outputs, and free extrapolation for projections to assess the implications of using distinct M hypotheses. Extrapolation risks in ENMs were assessed through mobilityoriented parity (MOP) analyses (Owens et al. 2013) using the five PCs masked with each M. MOP analyses identify risks of strict or combinational extrapolation in the area of model projection (G) by comparing environments in that area against the closest 5% of the cloud of conditions inside M. Raster layer and spatial polygon processing were done in R using the package raster (Hijmans 2019), as well as rgdal and rgeos. PCAs, model calibration, and MOP analyses were done using the kuenm package. R code for replicating all analyses is presented in Appendix S2.

Simulation results
The calibration areas resulting from our simulations were consistent with the distribution of the studied Aphelocoma species, including potential unsuitable regions where these taxa have not been recorded. Simulations within a stable framework were considerably faster than simulations under a changing framework (Supplementary Fig. S2) Table S4). The length of the simulations was associated mostly with the number of occurrences used and the size of G (Fig. 4), as well as the dispersal kernel and the number of dispersal events (Supplementary Table S4).
The factors with major effects on the geographic characteristics of the M areas obtained with our simulations were raster pixel size, kernel type, and kernel standard deviation, as well as whether simulations were performed within a stable or a changing framework of climatic conditions (Fig. 5-6, Supplementary Fig. S3-13). Number of replicates, number of dispersal events (under stable conditions; Fig. 5), and the type of cycle within the changing framework (i.e., 45-5 or 25-25 cycles; Fig. 6), did not show strong effects on the geographic shape or size of our results ( Supplementary Fig. S3-13). Figure 5. Effect of parameter settings (number of replicates, raster resolution, number of dispersal events, and framework of climatic conditions) on simulated M areas obtained for Aphelocoma ultramarina. All simulations were performed using normal kernels and a kernel SD = 1. Occurrences inside NF used to start simulations are represented with black dots; annual mean temperature is used as a base map. Log-normal dispersal kernels produced broader M areas than normal kernels with equal SDs; increasing SD values made M areas broader. In the case of log-normal kernels with SD values ≥ 0.8, M outcomes presented scattered and small areas around the main region of dispersal in simulations of 2.5' resolution (see boundaries of log-normal example in Fig. 6; Supplementary Fig. S3-10, S12). Accessible areas simulated under the changing framework of climatic conditions were broader, including regions that were not reached with simulations under the stable framework of current conditions (e.g. Islas Tres Marías, Mexico, for A. ultramarina; Fig. 6, see also Supplementary Fig. S7-10). The effect of raster pixel size is related directly to the extent of resulting M areas (Supplementary Fig. 13). Figure 6. Effect of parameters on simulated M areas obtained for Aphelocoma ultramarina under a changing framework of climatic conditions. Time of periods in the type of glacial-interglacial cycle (25-25 and 45-5) is defined in ka. All simulations were done using 10 replicates, at a pixel size of 2.5', with 25 dispersal events during each scenario (scenario span = 2.5 ka). Occurrences inside NF used to start simulations are shown as black dots, and annual mean temperature is the base map. Note that the lognormal kernel example identified an extremely broad area with scattered boundaries, extending from northern Mexico south through much of Mesoamerica. Although simulations under the changing framework were done for a total simulation period of 1 Ma (10 glacial-interglacial cycles; 1 Ma = 10 6 years), the maximum extent of M, considering current geography, was reached after ~375 ka (150 scenarios, less than 4 cycles) for three of the four species. Hence, we presume that most simulations under this framework would have taken a third of the allocated time. This case did not hold for A. ultramarina, for which ~925 ka (370 scenarios, > 9 cycles) was needed for reaching stability in the resulting M. Regarding the frequency for which areas in the final M were visited in distinct replicates, we detected high variability in areas close to occurrences (Supplementary Fig. S14). Most areas close to the border of M showed zero or low variability between simulation replicates, except for areas with high suitability.

Implications of M hypotheses for ENM outputs
The M hypotheses that we explored differed in overall extent and complexity of borders. Areas resulting from polygons derived from convex hulls, concave hulls, and buffers were simpler than those derived from ecoregions and simulations, and were similar in size among each other (Supplementary Table S3). The size and shape of Ms derived from ecoregions and simulations were dependent on the species and the geographic domain of interest G. The largest Ms for A. californica and A. unicolor were generated from ecoregions and simulations, respectively. For A. coerulescens, all alternative calibration areas were larger than the simulated M, and all Ms had similar sizes for A. ultramarina (Supplementary Table S3). Range and frequency of environmental conditions inside Ms compared to conditions associated with occurrences varied depending on the M hypothesis of each species (Supplementary Fig. S15-16). Ranges of mean annual temperature in simulated Ms were closer to values in occurrences than the ranges presented in alternative M hypotheses for two species: A. californica and A. coerulescens ( Supplementary Fig. S15), hence, occupying smaller ranges in environmental space (see also annual precipitation ranges for A. ultramarina; Supplementary Fig. S16).
The variables used in ENM calibrations (first five PCs) explained >97% of the variance for all M hypotheses in each species (Supplementary Table S6). Regularization multipliers and variable sets of best models varied widely among M hypotheses in each taxon, however, feature classes were similar and most variable sets included the first principal component (Table 2,  Supplementary Tables S5, S7). All model calibration exercises resulted in at least one model that met the three selection criteria. Concave and convex Ms in most species produced just one optimal parameterization (Supplementary Table S7), which was also the case for all M hypotheses of A. californica (Table 2). Buffer-derived M areas had one or more parameterizations that coincided with those selected from simulated Ms of A. ultramarina and A. unicolor. Geographic patterns of model outputs transferred across G depended rather dramatically on the hypothesis of M used (Fig. 7, Supplementary Fig. S17-19). Although patterns observed cannot be generalized, suitable areas (suitability above a 5% modified minimum training presence threshold) were broader for A. californica and A. ultramarina when using Ms from convex hulls and ecoregions than when using other M hypotheses (Fig. 7, Supplementary Fig.  S17). The same pattern was observed in A. unicolor when using the M hypothesis based on ecoregions ( Supplementary Fig. S19).  Table 2. Gray pixels have suitability values below a modified least training presence threshold of 5%.
In general, simulated M hypotheses identified broader areas with risk of strict extrapolation (Fig. 8, Supplementary Fig. S20-22). This point was especially true for A. coerulescens, for which only small areas were detected with the M based on ecoregions, and no areas of strict extrapolation were identified when using Ms constructed with buffers, convex hulls, and concave hulls. However, considerably broader areas were noted using the simulated M ( Supplementary Fig. S21). In the case of A. ultramarina, areas of strict extrapolation in the central Caribbean coast of Mexico, the Yucatán Peninsula, and southern Guatemala were only detected while using our approach (Fig. 8). Areas of strict extrapolation were considerably reduced for A. unicolor and were null for A. californica when ecoregions were used to construct M. For A. unicolor, areas of strict extrapolation recovered using the simulated M were broader in the Florida Peninsula, a region outside of the accessible area for the species.

Discussion
The geographic extent and environmental diversity of the area across which correlative ENM and SDM are calibrated is one of the factors with major implications for modeling outputs and, therefore, model interpretation. The effects of such areas are seen especially in how well predictions fit to the data (Anderson and Raza 2010), but also, and depending on the response of suitability to distinct variables, in the ability of a model to produce appropriate projections to distinct environmental scenarios (i.e., model transfer). Barve et al. (2011)-based on statistical and biological implications of such areas-proposed that areas for model calibration should reflect a region that has been accessible to the species during relevant periods of time (M). The latter consideration in defining calibration areas is perhaps not yet a universal standard, but is increasingly becoming an explicit methodological step when using ENM and SDM tools. However, approaches to defining M in applications so far have been simple, and hypotheses about accessibility are usually erected using fixed-distance buffers or assumption-based delimitations considering known biogeographic barriers (e.g. Boria et al. 2014, de Andrade et al. 2020, Romero-Alvarez et al. 2020). These approaches have helped to improve model quality, but whether the difference between actual simulations of accessible areas and simpler hypotheses has significant effects on model outputs has not been explored until now.

Advantages of the simulation-based approach
Here, we presented a simulation-based method by which to define M hypotheses to be used as areas for ENM and SDM calibration. Our method reconstructs M by considering species' ability to disperse, as well as the geometry, pattern, and dynamics of suitable areas, through which we anticipate it could help delimit more appropriate M hypotheses. This realism is shown when comparing borders and extents of simulated Ms with those obtained from other methods, as limits of accessible regions from simulations are not always as distant from records as borders of other M hypotheses based on fixed-distance buffers applied to points or hull-polygonsbecause real-world environmental and geographic landscapes are complex, a simple, equal-rate spread from known populations is unlikely to be realistic.
We emphasize that our approach is desirable because it simulates M directly, based on a set of biological assumptions rather than on proxies of those processes (e.g. a buffer, ecoregions). Our case example results illustrated the effect of M characteristics on model performance and their outputs. Simulated M areas, which were not always smaller than alternative M hypotheses (Supplementary Table S3), allowed us to obtain models that performed well during calibration (i.e., better than random expectations, and with omission rates < 0.05), but that we surmise did not include distant regions with environments that the species has likely never explored. According to Peterson et al. (2008), partial ROC tests show when model predictions are significantly better than null random expectations, which was the case for all of our selected parameterizations (Table 2). In terms of lowest omission rates, the best models for A. ultramarina and A. unicolor were produced from simulated Ms (see also M based on ecoregions for the former species; Table 2).
However, the models that showed lowest omission rates for A. californica and A. coerulescens (ecoregions and convex hull, respectively) might not be appropriate when considering MOP results, since almost no areas of strict extrapolation were detected ( Supplementary Fig. S20-21; see also areas in the Yucatán Peninsula and Mesoamerica for A. ultramarina; Fig. 8). We note that all correlative models involve some degree of model transfer (if only from the sites sampled to the more continuous calibration region), and that out-of-range conditions force models to extrapolate, which often leads to interpretations and conclusions that lack biological reality (Owens et al. 2013). As such, we emphasize the importance of evaluating the potential for extrapolation, and of the crucial role that rigorous estimation of accessible areas plays in identifying such potential.
Overfitting is among the most common issues of ENM and SDM, and it can derive from inappropriate selection of M areas, producing high omission rates and reduced areas of predicted suitability in model predictions. Our selection criteria in model calibration assures that models were not overfitted (i.e., low omission, AICc), at least for the particular M hypotheses that we tested (Cobos et al. 2019a). However, some predictions obtained with models using Ms from other methods appeared to be under-fitted, with broader suitable areas than models constructed with simulated Ms (Fig. 7, Supplementary Fig. S17, S19). Interestingly, some of the distant predicted suitable areas obtained with Ms from other methods were located in regions that did not show strict extrapolation risks in their corresponding MOP analyses. Such resultsapparently appropriate models based on calibration metrics, but under-fitted considering actual distributions of species-are predictions that could lead to misinterpretation of suitable areas for species ( Supplementary Fig. S20-21). Extrapolative potential hinges on the difference between the accessible area and the full area of interest in model development (G) through time, such that the M areas reconstructed with our simulations are likely to guide this perilous step more appropriately.

Aphelocoma jay biogeography
Although we have used the genus Aphelocoma as a test case purely out of convenience, as it presents well-and poorly-sampled species across a complex geographic setting, our simulated Ms offer some interesting insights into the geographic history of the genus. For A. californica (see Supplementary Fig. S3, S7), the Mohave Desert appears to represent a more porous barrier for this species than, for example, the boreal forest regions to the north of the species' range (e.g. Washington state)-interestingly, the species' M area is broadest toward the east and southeast, which indeed is likely the region in which this species has most often been found as a vagrant, although east-to-west gene flow proved more frequent than west-to-east movement across the Mohave Desert region (Peterson 1991).
For A. coerulescens, the M area ( Supplementary Fig. S4, S8) embraced known distributions of the species closely, save for a somewhat broader area south to the tip of the Florida Peninsula, and north to northern Florida or southern Georgia. This species is extremely philopatric (Woolfenden and Fitzpatrick 1984), and probably would be best characterized by the low-dispersal scenarios that we explored. Nonetheless, this result points to this species as geographically isolated, and likely without much in the way of broad geographic potential for exploration and possible colonization.
The species A. ultramarina sensu stricto (i.e., splitting off the 2 or 3 northern populations of A. ultramarina sensu lato) is endemic to the Transvolcanic Belt of central Mexico. The different simulations that we developed (see Supplementary Fig. S5, S9) had the interesting quality of anticipating greater dispersal capacity and flow to the northwest compared to the northeast. This difference coincides with at least a few instances of apparent gene flow, documented in the last monographic treatment of the genus (Pitelka 1951), but this region is woefully lacking in sampling intensity.
Perhaps the most interesting of the simulations corresponds to A. unicolor ( Supplementary Fig. S6, S10). This species is restricted to Mesoamerican cloud forest patches, and as a consequence has a distribution that is highly fragmented. Our simulations suggest that populations of this species would have been highly connected under Last Glacial Maximum conditions, which contradicts recently published indications that the different geographic isolates of the species are highly differentiated (Venkatraman et al. 2019). As such, we suspect that dispersal ability in this species is minimal, perhaps even lower than the lowest dispersal setting that we explored in our simulations.
In sum, we noted considerable content in the outcomes of our simulations that was relevant to understanding the biogeography and natural history of the Aphelocoma jays. The comparison of the M hypotheses with the known distributions, and those two with the scanty evidence of gene flow out from species' known breeding distributions is intriguing. However, much more sampling is needed if anything quantitative is to be done with the known extralimital occurrences, beyond just anecdotal confirmation.

Assumptions and implications
Important assumptions are made when using our method to reconstruct M areas. Our simple approach to producing interpolations for glacial-interglacial cycles is not free of subjectivity. These interpolations are intended to represent scenarios that follow a simple trend (linear) between glacial and interglacial conditions; for this reason, they do not represent mid-Holocene conditions (for instance) as appropriately as variables derived from climatic models constructed based on historic climatic and geological considerations (e.g. Rangel et al. 2018). In the same sense, better approximations to integrate information from Pleistocene interglacials should be incorporated. Although other sets of variables could be used to represent various scenarios of changing conditions, use of past environmental scenarios will always be a source of uncertainty.
Another important assumption is the initial characterization of the species' fundamental ecological niche (NF) by means of an ellipsoid envelope. Although this type of model produces convex representations of ecological niches, which resemble physiological response curves better than more complex shapes (Angilletta et al. 2010), ellipsoids constructed based on environmental conditions in occurrences are sensitive to multiple biases derived specifically from sampling (Jiménez et al. 2019). We acknowledge that NF cannot be fully represented in real world geography Soberón 2012, Soberón and, and that other means to estimate it are needed. However, ellipsoid envelope models are fit using only data from occurrences and do not require a calibration or background area. These models are also less sensitive to extrapolation risks owing to their convexity; for this reason, we consider ellipsoid envelopes to be among the most appropriate hypotheses that can be used as a prior of the fundamental niche (see also Varela et al. 2011). Using three principal components as variables to build our ellipsoids aims to reduce dimensionality and improve transferability to past climates (Peterson 2011), but we are working on allowing the use of other raw relevant variables in future versions of the package (see below). In particular, we caution researchers that aim to use our ellipsoid approach for species with life history traits that suggest the lack of a convex fundamental niche (e.g. annual plants in seasonal environments; Soberón and Peterson 2020).
Several studies have focused on incorporating dispersal into simulations of range dynamics using SDM outcomes or known refugia to define "starting" populations in their approaches (Engler et al. 2012, Bocedi et al. 2014, Nobis and Normand 2014, but see De Marco et al. 2008. In this work, we aim to simulate the area of potential dispersal of a species to accessible sites incorporating the effects of climate along its recent history, that is, the accessible area M, which we consider an appropriate calibration extent for ENM and SDM. Hence, using outputs from such algorithms as inputs in our simulations becomes prohibitive. Moreover, we consider that setting a single origin (e.g. one glacial refugium) to start simulations might be misleading, since for instance, not all hypothesized refugia contribute equally to postglacial expansions, the existence of cryptic or microrefugia has been recognized, and expansions might be complex and dependent on species-specific responses (Magri et al. 2006, Mee and Moore 2014, Pedreschi et al. 2019. Initially, we intended to use single occurrences as "starting" populations in each replicate, but as it can be expected, the place of origin had a considerable effect on the simulation outcome ( Supplementary Fig. S23; see also low variance of simulations in Supplementary Fig. S14). Thus, we think that assuring that a species started spreading from one site in the face of climate change can be difficult, unless its history is well documented and/or there is evidence that the speciation event occurred in such site, which we are limited to assess in most instances.
Fossil occurrences represent an invaluable resource and a viable alternative for selecting "starting" populations, since they can depict a preliminary view of past species distributions (Varela et al. 2011). Although the fossil record has strong biases across time, geography, and taxonomic groups, as well as other potential issues that might limit its use in our approach (see Fernández-Jalvo et al. 2011, Varela et al. 2011, we are working into incorporating user-defined "starting" populations into grinnell. This option will serve researchers counting with robust evidence of species limited to one or a few sets of regions in the past. The Grinnellian fundamental niche holds the conditions in which populations have a positive intrinsic growth rate (i.e., source populations; Soberón 2007), and can serve as a link between the present and past distributions of a species if it remains conserved. Our ability to travel back in time and study past distributions is constrained by when such signals start to fade (e.g. niche evolution; see Lee and Gelembiuk 2008). Having an estimate of the fundamental niche, one might assume that sites in which source populations are maintained in the present (inside NF), have a higher probability of being past populations, if past conditions in those sites have remained inside NF. Therefore, in our research we have selected subsets of present occurrences per replicate, that are inside an NF estimate and occupy suitable cells to simulate an area of potential dispersal. We recognize that better approximations can be taken (see future directions below), but we think that is more appropriate to consider the potential history of populations in several suitable points from past climates, than to assume a single origin of dispersal.
Using only climatic raster layers in these simulations allows us to consider past climatic scenarios. However, dispersal-limiting factors may not always be related exclusively to climate (e.g. Pickering et al. 2011, Cobos andAlonso Bosch 2018). For instance, one might consider specific soil conditions that could prevent establishment of a species of plant even if appropriate climatic conditions are present (see Velazco et al. 2017). Other factors that could potentially prevent dispersal to and colonization of new areas include biotic interactions (Russo et al. 2006).
Although the role of biotic interactions in defining species' distributional limits has been argued to be minor because it often occurs at more local scales than other macroecological processes (Soberón 2007; but see, e.g. Wisz et al. 2013, Atauchi-Rojas et al. 2018, some key interactions could definitely influence the ability of populations to persist in certain places without migrational subsidies (e.g. Adams-Hosking et al. 2012, Newsome et al. 2017.

Future directions
Defining an M area that indeed approximates the geographic region that has been accessed by a species during relevant periods of time is a challenging endeavor, since M cannot actually be predicted, and no "truth" is likely ever to be available. Many aspects of the ecology, natural history, and geography of a species play roles in delimiting this area. Our simulation approach considers some of these aspects (e.g. suitable environmental conditions and dispersal ability) as part of the parameter settings that users define. However, defining appropriate values for those parameters is not an easy task, and even experts with substantial knowledge in a group of taxa could find these decisions to be challenging and subjective. We understand the subjectivity of defining some parameters, but argue that informed decisions ("educated guesses") can be made when the ecology and natural history of a species are understood, and effects of changing parameters on simulation outcomes can be assessed through sensitivity analysis.
The number of dispersal events is a parameter that ought to have significant implications on results from simulations, but once this number is sufficient to allow exploring most reachable areas-as in our examples-its effect on results in our simulation framework is not dramatic (see stabilization below). The dispersal distance is extracted from kernels that should be selected based on the natural history of the study species. In that sense, we recommend choosing standard deviation values considering the spatial resolution of the study area and the information available on the species' dispersal ability. Moreover, the maximum number of dispersers depends on the relationship between the fundamental niche and population fitness, but likely also on frequencydependent processes that are not considered within our approach yet, such that the effect of this parameter in the characteristics of simulated Ms remains to be tested.
Our approach certainly constitutes a work in progress, but we consider it a step forward in improving niche and distribution modeling methodologies. Some future steps for improving the current method include: (1) allowing use of different approaches and/or variables for fundamental niche estimation; (2) incorporating additional past climate layers to generate transitional scenarios, rather than just an interpolation between LGM and present climate layers; (3) improving how biogeographic barriers are considered in simulations; (4) identifying time points when accessible areas reach stabilization; (5) adding analyses of sensitivity to parameterizations; (6) adding functionalities for asymmetric dispersal patterns; and (7) general optimization of the software for faster processing.
To our knowledge, this paper presents the first simulation-based method to define calibration areas for ENM and SDM, for which we consider it a relevant contribution to the field of distributional ecology. The set of all sites ever accessed by a species across G is probably not knowable. However, our simulations constitute a method that summarizes a set of sites that have or have not been accessible to the species, over relevant periods of time. These sites represent an approximation to the M in the BAM framework, and as such offer a refined approach to choosing areas for model calibration in ENM and SDM applications. These simulation-based areas take into consideration the natural history of the species in question, the geography across which it is distributed, and the environmental variation across that area-as such, they lend to more rigorous contrasts in model fitting, as well as more realistic characterization of the potential for model extrapolation. Thus, we offer this methodology as an integral initial step in preparing for calibration of ecological niche models and species distribution models.

Data Availability
Aphelocoma occurrences and ecoregions spatial polygons are included in Appendix S1. R code for simulations, for creating alternative M hypotheses, and for all ENM analyses is presented in Appendix S2.

Supplementary Material
The following materials are available as part of the online article at https://escholarship.org/uc/fb Table S1. Number of suggested simulation replicates (N) according to the number of occurrences inside NF. Table S2. Extent of G areas used for running accessible area simulations for the four species of Aphelocoma. Table S3. Size in number of cells of geographic domains of interest (Gs) and accessible areas (Ms) per species.                    Figure S17. Geographic projections of suitability for Aphelocoma californica derived from Maxent models produced with selected parameters, but using distinct M hypothesis. Figure S18. Geographic projections of suitability for Aphelocoma coerulescens derived from Maxent models produced with selected parameters, but using distinct M hypothesis. Figure S19. Geographic projections of suitability for Aphelocoma unicolor derived from Maxent models produced with selected parameters, using distinct M hypothesis.