Species distribution modeling to inform transboundary species conservation and management under climate change: promise and pitfalls

Spatially explicit biogeographic models are among the most used methods in conservation biogeography, with correlative species distribution models (SDMs) being the most popular among them. SDMs can identify the potential for species’ and community range shifts under climate change, and thus can inspire, inform, and guide complex and adaptive conservation management planning efforts such as collaborative transboundary conservation frameworks. However, SDMs are rarely developed collaboratively, which would be ideal for conservation applications of such models. Further, SDMs that are applied to conservation often do not follow best practices of the field, which are particularly important for applications in climate change contexts for which model extrapolation into potentially novel climates is necessary. Thus, while there is substantial promise, particularly among machine-learning based SDM approaches, there are also many pitfalls to consider when applying SDMs to conservation, and especially in the context of transboundary management under climate change. Here, we summarize these pitfalls and the key steps to mitigate them and maximize the promise of applying SDMs to facilitate transboundary conservation planning under climate change. We argue that conservation modeling capacity must be elevated among practitioners such that they can easily implement best practices when using SDMs, especially regarding: 1) avoiding model overcomplexity, 2) addressing input data bias, and 3) accounting for uncertainty in model extrapolations and projections. While our discussion centers mainly on the pitfalls and opportunities of applying the most popular correlative SDM algorithm, Maxent, our suggestions can also be generalized to a range of other SDM tools. Overall, improved training in, tools for, and implementation of best practices in biogeographic models such as SDMs hold great promise to


Introduction
Conservation biogeography aims to apply biogeographic theories, principles, and analyses to management practices that sustain biodiversity (Ladle andWhittaker 2011, Franklin 2013). A key focus of the field is the investigation of global environmental change and its observed and predicted effects on biodiversity (Franklin 2013). For example, due to climate change, species' suitable areas are expected to shift with some areas becoming less suitable for species' survival, and others becoming more so. The resulting shifts in species' distributions, species' interactions, and biological community assembly and structure, increase the likelihood of local and global extinctions (Raxworthy et al. 2008, Pecl et al. 2017, Brambilla et al. 2020). To date, several studies have documented observed impacts of climate change on species' distributions, abundance, phenology, and body size (e.g., Rosenzweig et al. 2008, Koleček et al. 2020 or predicted future impacts of climate change on species, such as extinction risk (e.g., Thomas et al. 2004, Li et al. 2013. The field of conservation biogeography has presented practical methods for incorporating climate change within conservation planning and assessments to inform forward-thinking conservation management (Carvalho et al. 2011, Blair et al. 2012, Crossman et al. 2012. Spatially explicit biogeographic models are among the most commonly used methods in conservation biogeography, with correlative species distribution models (SDMs, including ecological niche models or niche-based distribution models) the most popular among them (Franklin 2009, Franklin 2013, Peterson et al. 2011. These SDMs most typically estimate environmental suitability for a species in geographic space using associations between species' occurrence records and environmental variables (Peterson et al. 2011). However, modeling efforts intended for conservation planning have been slow in applying best practice standards for correlative SDMs, in part because modeling goals may differ between academic and conservation applications of models (Araújo et al. 2019, Sofaer et al. 2019, Urbina-Cardona et al. 2019. Nonetheless, because SDMs can identify the potential for species range and community shifts under climate change, they have the ability to inspire, inform, and guide complex and adaptive conservation management planning efforts such as collaborative cross-border conservation frameworks (e.g., Blair et al. 2012, Middleton et al. 2020, Titley et al. 2021).
More than half of all terrestrial birds, mammals, and amphibians have distributions that cross national borders (hereafter termed 'transboundary'; Liu et al. 2020, Mason et al. 2020). In addition to widespread direct threats such as deforestation and hunting, these species may be directly threatened by the construction of border infrastructure, as well as indirectly by lack of coordination of conservation activities on either side of the border (e.g., differences in land use and policy and other sociopolitical contexts and histories; Linnell et al. 2016, Liu et al. 2020, Mason et al. 2020, Titley et al. 2021. Such threats are likely to be exacerbated by climate change. For example, one study predicts that most areas climatically suitable to be habitat for about one-third of mammal and bird species will have shifted to a different country by the 2070s (Titley et al. 2021). This highlights the need for strategic, coordinated approaches towards managing transboundary species and landscapes to prevent extinctions or further declines (Liu et al. 2020, Mason et al. 2020, Titley et al. 2021).

Transboundary conservation
Transboundary collaboration for conservation management will become even more important under climate change, as demonstrated by several other papers in this special issue (e.g., Blair et al. 2022a, Ngo et al. 2022, Tan et al. 2022) and as described further here. While in some areas, agreements for transboundary cooperation in biodiversity management may already be in place, in many cases coordination is slow because of limited capacity or other geopolitical and governance factors. Also, for most species and areas of interest, there is only limited information to document current or simulate potential future climate-driven habitat changes to assess their vulnerability and identify potential actions.
When agreements are not yet in place or when progress is stalled, generating information on current and potential future habitat for high-interest species (or developing capacity to do so) could help to establish the foundation for and boost progress on developing such agreements and collaborations and improve the information generated at the same time (Guisan et al. 2013). For example, in a case study of collaborative monitoring of Amur leopards between Russia and China, country-specific results were less accurate, with uncertainty twice as high, compared to integrated estimates (Vitkalova et al. 2018).
While such processes take more time to develop, successful transboundary conservation is necessarily collaborative. A robust collaborative process should inform conservation planning in complex transboundary contexts where synergies and trade-offs among diverse stakeholder needs must be balanced. For example, policies addressing the needs of marginalized border communities must be coordinated with those addressing the anticipated effects of climate change and shifts in biodiversity (Wilder et al. 2013). Cooperative agreements are more likely to affect successful policies and actions that account for complex sociopolitical contexts (Hodgetts et al. 2018, Vitkalova et al. 2018. Training in and implementation of best practices in biogeographic models such as SDMs hold great promise to facilitate and help guide complex, transboundary collaborations for long-term planning of conservation under climate change. SDMs can be particularly helpful in illuminating the importance of transboundary conservation (e.g., Wang et al. 2021). In particular, SDMs can provide inputs for mitigation and adaptation strategies, such as 'climate-connecting' corridors for coordinated conservation of potential movement (Senior et al. 2019).
However, SDMs are often not collaboratively developed, which would be ideal for conservation applications of such models. Further, SDMs that are applied to conservation often do not follow best practices of the field (e.g., accounting for sampling bias in input datasets or taking steps to avoid overly complex and overfit models), which is particularly important in applications of climate change for which model extrapolation into future novel climates is necessary (Araújo et al. 2019, Sofaer et al. 2019).
Thus, while there lies substantial promise in applying SDMs to conservation, there are also many pitfalls to consider, especially in the context of transboundary management under climate change.
However, collaboration among researchers and practitioners during the development of models and implementation of policies can help to bridge the gap between research and application (e.g., Urbina-Cardona et al. 2019), and thus promote more sustainable and broader use of SDMs as well as their best practices.
Here, we summarize key steps to mitigate the pitfalls and maximize the benefits of applying SDMs to facilitate transboundary conservation planning under climate change. Our discussion centers mainly on the pitfalls and opportunities of applying the most popular correlative SDM algorithm, the machine-learning based Maxent (Phillips et al. 2017), to transboundary conservation under climate change. However, our suggestions can also be generalized to a range of other SDM tools and applications, as we discuss.

The promise of machine-learning based SDMs for transboundary conservation under climate change
The ability to predict species' distributions and relationships with their environment can be greatly enhanced by the application of machine learning to SDMs (Elith et al. 2011). Machine-learning based SDM algorithms now dominate the field (Urbina-Cardona et al. 2019) and include methods ranging from random forests and artificial neural networks (Deneu et al. 2021) to support vector machines (Drake et al. 2006, Ferrell et al. 2019) and boosted regression trees (Elith et al. 2006). However, among the machine-learning based SDM approaches that can be applied to presence-only datasets, the maximum entropy approach implemented in Maxent software has performed better than many others under a range of circumstances, especially if appropriate corrections for bias and overcomplexity are taken (Elith et al. 2006, 2011, Radosavljevic and Anderson 2014. As such, Maxent is now the most widely used algorithm for correlative SDMs (Urbina-Cardona et al. 2019). Maxent was born out of a public-private partnership between machine learning experts from AT&T labs and scientists at the American Museum of Natural History . The paper first documenting the implementation of Maxent to modeling species distributions has been cited more than 14,000 times (Phillips et al. 2006).
Machine-learning based SDMs are used for a wide variety of applications relevant to transboundary conservation efforts, including guiding field surveys to accelerate discovery of unknown range areas and species (e.g., Raxworthy et al. 2003), predicting invasive species risk (e.g., Peterson et al. 2008) or supporting conservation area priority-setting and reserve selection and related corridor networks (e.g., Senior et al. 2019), and more (as reviewed in Urbina-Cardona et al. 2019). Importantly in the context of this special issue, SDMs can identify the parts of a species' geographic range that are expected to be more susceptible to climate change (e.g., Blair et al. 2022a, Trinh-Dinh et al. 2022. Similarly relevant for transboundary conservation is that SDMs can update and inform range estimates used in formalized red-listing and threat assessment processes (Kass et al. 2021a, Merow et al. 2022, which are crucial for cross border species as asymmetric listing across borders can hamper conservation efforts. Asymmetrical conservation statuses could pose a challenge for effective management for transboundary connectivity and climate change resilience in the face of species range shifts (Thornton and Branch 2019). For example, more than a quarter of mammals in the Americas have asymmetric listings across borders and many have mismatches between local, national, and global listings (Thornton and Branch 2019). Asymmetries in listing could indicate that species truly are under less threat in one region compared to another, or could reflect different levels of concern between the two regions although population status is similar.
SDMs have great potential to facilitate coordinating and improving range information for listing purposes both to correct asymmetries and to incorporate climate change concerns. In particular, presenceonly approaches are very attractive because they can leverage new online databases and update estimates even for rare species with very few available occurrence records (e.g., Pearson et al. 2007, Kass et al. 2021a). Also, machine-learning based SDMs may sidestep concerns about the use of correlated input variables. For example, Maxent performs iterative internal predictions that learn from novel information in correlated variables, leaving out repeated information (Elith et al. 2011). Also, Maxent can account for complex variable interactions, and has an extrapolation approach to project models to environmental spaces that are outside of the range of model training data, for example under future climate and land use change scenarios (Phillips et al. 2006(Phillips et al. , 2017. While correlative SDMs often do not account for other factors that influence distributions including dispersal, demography, and biotic interactions, it is possible to couple SDMs with spatially explicit stochastic population models to explore the interactions of mechanisms causing population decline (e.g. Stanton et al. 2015).

Pitfalls and challenges
While SDMs have been widely applied to many fields, major challenges and pitfalls have presented themselves in the last decade, especially to the machine-learning based correlative approaches like Maxent. Researchers are beginning to establish best practice standards (e.g. Araújo et al. 2019, Sofaer et al. 2019, however, among the largest concerns are issues with:

4) modeling capacity among practitioners.
Common across these concerns is the ability for Maxent and other machine-learning based approaches to overfit to the training data. As an example, overly complex models are very easy to build using Maxent. Overly complex models are those that include large numbers of features that can end up over-fitting to random effects in a training dataset, with limited ability to generalize well to new data. In fact, models run with default settings tend to be overfit because of the sophisticated way that Maxent allows for variable interactions and multiple feature classes (Radosavljevic and Anderson 2014). Maxent allows multiple feature classes in the same model, meaning that a single variable can be included in a model in multiple ways (e.g. the same variable can be included in the model in multiple ways, as a linear, quadratic, and a hinge feature). Thus, Maxent will potentially predict very tightly to training data. This can lead to a poor predictive ability as withheld or new data or undetected occurrences may not be predicted.
Further, if our data are biased to begin with -then what are we even predicting? Occurrence data often suffer from biased sampling across geography and especially across geopolitical and administrative boundaries (Meyer et al. 2016), leading to biases in the representativeness of environments (Radosavljevic and Anderson 2014). This is especially important when projecting the model to different regions or time periods. In studies of potential climate change effects, these biases will be extrapolated and lead to great over-or underestimations of suitable habitat and downstream biodiversity change analyses, with the potential of misdirecting conservation efforts (Sofaer et al. 2018).
Similarly, the extrapolation feature of Maxent ('clamping') as described above can present a challenge, depending on the situation. Due to the likelihood of non-analog conditions in the future, choices about extrapolation are particularly important when thinking about these projections, especially given the expected role in which climate change will play in altering species' distributions. Fortunately, Maxent includes model exploration tools and features to help understand the effects of extrapolation (such as the multivariate environmental suitability surface (MESS) tool, described further below and see Elith et al. 2010). However, these tools are often overlooked or resources around proper parameterization are inaccessible.

Navigating the pitfalls and a path towards wider implementation of best practices
Luckily, as mentioned above, many machine-learning based SDM algorithms, including Maxent, include tools and guidance to help navigate potential pitfalls, and extensive advice on best practices in applying SDM exists (for further reading and elaborated details on best practices, please see Araújo et al. 2019). Applying best practice standards to navigating SDM pitfalls is especially important for studies of climate change and in conservation applications of SDMs (Sofaer et al. 2018). This is especially true in transboundary conservation contexts under climate change, where there may be inherent biases in training datasets such that overly complex, overfit models would be unable to extrapolate to areas beyond that of model training, which is an inherent goal of projecting SDMs under future climate change. Here, we summarize key steps to navigate SDM pitfalls, including a review of selected best practices for model training and application to transboundary conservation management under climate change and ways to lower entry barriers to using these best practices for practitioners:

Avoid model overcomplexity
A widely accepted SDM best practice to avoid overly complex models is to test multiple models with a range of parameter settings (e.g., in Maxent, regularization multiplier and feature classes) and choose the setting with optimal model complexity based on a combined set of evaluation metrics that provide slightly different types of information about predictive performance and model complexity (e.g., omission error, AUC, AIC; please see Warren andSeifert 2011, Radosavljevic andAnderson 2014).
Avoiding overly complex models is of particular importance for rare species of conservation concern that may have very small input sample sizes for occurrence data (Radosavljevic and Anderson 2014). In Maxent, trying a range of different regularization multiplier values is particularly important. The regularization multiplier limits the complexity of the model to generate a less localized prediction (Phillips and Dudík 2008): the default value of 1 tends to allow for more complexity and tends to lead to overfit models. Higher regularization values penalize complexity, so the best practice is to try a range of regularization values and then choose an optimal model for the species based on a set of evaluation metrics (e.g., see Kass et al. 2021b). Similarly, Maxent's default settings allow for multiple feature classes in same model, based on the number of occurrence records, which can also lead to model overfitting and overcomplexity depending on the particular biological system. Alternatively, avoiding overly simple models is equally important, but the previously mentioned evaluation metrics are usually sufficient to remove these models.

Address Input Data Bias
Reducing bias in the input data is another best practice to avoid overfitting SDMs. If the occurrence records used to build a correlative species' distribution model do not provide unbiased information regarding the environmental requirements of the species, then the model cannot accurately estimate the species' environmental tolerances or, conversely, a given location's suitability as habitat for the species.
One strategy to address this issue is to improve the quality of species' occurrence datasets. A GBIF task group on Data Fitness for Use in Distribution Modeling outlined a number of ways to enable this, which have and will greatly improve the quality of globally available input data (Anderson et al. 2020). A recommendation from this task group included that GBIF should serve indicators of precision, quality, and uncertainty of data, which has already been implemented. Another recommendation is to develop functionalities to enable users to annotate and communicate errors to data providers. An excellent example of a national biodiversity occurrence dataset that is collaboratively vetted and curated by both taxonomic experts and modelers is that of BioModelos in Colombia (Velásquez-Tibatá et al. 2019).
Indeed, even before starting to follow best practices to address remaining biases (as described below), modelers should do as much as possible to assemble a set of occurrence records that is as comprehensive as possible.
Where an unbiased and comprehensive coverage of occurrence records cannot be assured, best practices that avoid models being overfit because of biased data include reducing bias itself by thinning points to remove spatial clustering to reduce sampling bias and therefore spatial autocorrelation.
One can also mitigate the effects of sampling bias through approaches that quantify sampling effort by including a bias layer in model training (Phillips et al. 2009).
Another best practice is selecting the proper background area used to train models, which can help account for both biased data and differences in dispersal capacities. The model assumes, following niche theory, that areas where the species is not present are due to unsuitable habitat, rather than just an artefact of sampling bias or a dispersal barrier (Soberón 2007). Thus, it is recommended to constrain the background training area to only include those areas to which the species might possibly disperse (Anderson and Raza 2010).
Data biases can also come from environmental input variables and how they are included in the model. One should carefully study all of the variable response curves for the completeness of predictor variable sampling (i.e., that the entire range of the species' suitability for the variable was sampled in model training). Further, models are commonly improved by including other abiotic factors beyond climate in model training. Vegetation cover, microclimate, water surface coverage, and geology (e.g. Blair et al. 2022b, Ngo et al. 2022, Tan et al. 2022 can all improve a model's predictive ability so long as they are relevant to the species' biology. These types of variables may present challenges to climate change projections because of data limitations, restrictions, or lack of interoperability in some areas. For example, microclimate data may be quite important for many species but relevant datasets may be challenging to obtain depending on the extent and resolution required. However, a study in this issue shows that variables that approximate microclimate reflect essential characteristics that result in predictions that are likely as informative as using microclimate itself for model training (Blair et al. 2022b this issue). Another study in this issue pointed out the need for more long-term ecological research to better understand microclimate and fine-scale habitat preferences and improve model projections for the purpose of adaptive conservation management plans (Blair et al. 2022a this issue). It is noted however that while including non-climate variables, such as topographic variables, may improve model fitting, doing so may also compromise model predictability under future climate change because of high correlations between topography and climate, such as temperature and elevation, depending on the modeling algorithm applied.

Account for uncertainty in model extrapolations and projections in conservation recommendations
Even with suitable validation data (e.g., ground-truthing a model), SDM projections under climate change can have poor performance (Sofaer et al. 2018). One common reason for this poor performance is uncertainty in future climate projections, the effects of which can be exacerbated in areas with a low density of species' occurrence records or those that are topographically complex, which often exhibit rapid and systematic changes in temperature and precipitation over fine spatial scales (Kueppers et al. 2005). Unfortunately, these are often exactly the areas of highest interest for biodiversity conservation (biodiversity hotspots, topographically diverse areas, particularly in the tropics). Therefore, for conservation efforts, especially in complex transboundary contexts, accounting for uncertainty in model extrapolation and projection is key.
A large amount of variation in model projections is driven by GCM uncertainty and variation, especially for future projections under climate change (Blair et al. 2012). A typical strategy to address model uncertainty is to apply an ensemble approach (Araújo and New 2007, Beaumont et al. 2019, Woodman et al. 2019. It is very important to not just choose one GCM or only one emissions scenario for future climate change projections, it is best to choose a range and then compare overall trends by summarizing across them to get a sense of trends across models and the extent of variation among them (Araújo and New 2007, Woodman et al. 2019, Blair et al. 2022a. Basing management decisions on agreements across a range of scenarios is a reasonable, conservative approach to guide conservation (Beaumont et al. 2019). Ensembling can also be used for current models of species to examine the potential effects of intuitive extrapolation, in addition to the other tools and strategies. For example, as mentioned above, many modelers choose the default setting in Maxent for intuitive extrapolation (clamping). The multivariate environmental suitability surface (MESS) tool included with Maxent can help to decide whether one wants to turn clamping on or off, change the training area given the purpose of the model, or remove non-analog extrapolation areas from the final projection (Elith et al. 2010).

Build modeling capacity among practitioners
One reason that many SDM studies, especially for conservation, do not employ best practices to address the pitfalls discussed above is the "black box" nature of some machine-learning based tools, especially Maxent in its past versions. It is not easy to open the box and look at what is inside and examine it to adjust the model to best fit the data for a specific purpose. Thus, it has been tempting for people to just use the default parameters Anderson 2017, Phillips et al. 2017). The new opensource version release of Maxent (Phillips et al. 2017), as well as other open-source tools that facilitate best practices in model tuning and parameterization like Wallace (Kass et al. 2018), are changing this landscape to lower entry barriers into robust application of SDM for conservation. For example, Wallace implements two state-of-the art R packages spThin (Aiello-Lammens et al. 2015) and ENMEval (Kass et al. 2021b) that facilitate some of the best practices outlined above to avoid pitfalls, and in a user-friendly graphical user interface (GUI) environment. The application guides modelers through a complete analysis, from the acquisition of data to choosing and evaluating optimal models, to visualizing model predictions on an interactive map, thus bundling complex workflows into a single, streamlined interface.
Increased openness, reproducibility, and transparency of modeling approaches is not only an ethical imperative but also necessary to build capacity for higher-quality SDMs, and thereby facilitate more robust, collaborative SDM research on global change in transboundary contexts.

SDMs as essential tools for improved Capacity for Transboundary Conservation under Climate Change
When used in combination with other information, accounting for bias and uncertainty, and trained well, SDMs have great promise to help conservation managers be forward-thinking about the possibility of endangered species moving out of their current distributions and to prepare for coordinated transboundary management, among other strategies. For example, in this issue several studies identify potential areas suitable for population re-establishment and community monitoring (Blair et al. 2022a, Trinh-Dinh et al. 2022, and for refugia for species (Nguyen, T.A. et al. 2022, this issue). More indirectly, SDM approaches can inform monitoring and management for species highly threatened by international wildlife trade, a particularly challenging transboundary conservation issue (Nguyen, T.T., et al. In review). We have discussed why increased capacity to build robust SDMs and more collaborations across borders will be increasingly important for forward-thinking conservation management under climate change. We also show how, conversely, when SDMs are poorly done or when partnerships are not strong or communication is not happening, it can negatively affect important decisions.
Partnerships are a strong foundation for strengthening transboundary work and communication, including the development of SDMs for conservation across borders. Collaborative efforts provide a better understanding of range dynamics, build trust, and can lead to cooperative agreements to coordinate conservation policies for endangered species conservation across borders (Vitkalova et al. 2018). Thus here, the guiding principles of stakeholder engagement and science diplomacy, which can facilitate transboundary communication and consultation, are especially crucial to use. The need for strong partnerships and communication to assure evidenced-based conclusions and decisions is at the foundation of the concept of science diplomacy-the use of scientific collaborations among nations to address common problems and to build constructive international partnerships (CGSPSD 2011). Stakeholder engagement processes, in turn, are critical for successful biodiversity conservation outcomes, especially to ground project goals and activities in particular social-cultural-political contexts (Sterling et al. 2017). Review), and have more potential to best fit the goals and preferred outcomes of diverse stakeholder communities in complex border contexts (Villero et al. 2016, Sterling et al. 2017. Collaboration and engagement of input from a variety of stakeholders in the SDM development process will also improve results by better facilitating inclusion of human responses to climate change in assessments, which are often overlooked (Segan et al. 2015).
Further, we argue that transboundary work should be transboundary in authorship and a collaborative process from the beginning of the work. We note that in addition to analyses about how important transboundary conservation and collaborative management across boundaries is and will continue to be, it is perhaps even more important to engage in collaborative processes from the outset of research activities themselves, as in the case of our authorship group on this paper and in many of the papers in this special issue.
Collaborative frameworks that include multiscale and multisector partnerships are even stronger, including coordination for science policy and management across borders. Such frameworks are vitally important for more complex situations such as management of migratory species across different jurisdictions (e.g. ungulate migrations in Greater Yellowstone Ecosystem; Middleton et al. 2020).
Transboundary science, policy, and management frameworks may consist of widespread mapping and assessment of distributions and migrations, improved coordination of policy and management across jurisdictional lines, increased investments, and strong engagement of local stakeholders.
Borders often indicate complex sociopolitical contexts and histories between countries. We argue that conservation scientists, practitioners, and managers, including SDM modelers, have an obligation to understand those contexts and know how they relate to our work and the goals of conservation (Hodgetts et al. 2018, Murphy 2021. Biodiversity and landscapes do not follow sociopolitical borders; thus, especially as global changes including climate change continue, we will have to increasingly work across borders to achieve biodiversity conservation goals. Fortunately, conservation and science can build bridges between societies where official relationships may be difficult, and through approaches like those detailed here and others in this special issue, we hope to model ways to strengthen interactions and partnerships between both scientific and diplomatic communities.

Author Contributions
MEB, MDL, and MX co-led study design and writing of the manuscript.