Integration of dispersal data into distribution modeling: What have we done and what have we learned?

Inclusion of dispersal data in models of species’ distributions in response to environmental change has been advocated for more than 15 years. We investigated whether there has been a shift in recent publications to include dispersal processes and how dispersal estimates explicitly change the conclusions of analyses. To address this question, we conducted a systemic review of the literature to assess what kinds of dispersal data and methods are being included in species distribution models across taxa. We collected metadata on 6,406 publications, 907 of which included dispersal data. The proportion of papers that included dispersal data in estimates of the species’ range increased from 8% to 20% from 1991 to 2017. Evaluation of a subsample of 200 papers showed no evidence for differences in taxa studied between dispersal and non-dispersal publications, with most studies focused on North America or Europe. Dispersal was incorporated at a higher frequency in studies from South America, Africa, and island systems. We found that forecasting models predicting range shifts with climate change rarely used dispersal data, but when they did, range shift projections were greatly affected. Our simulation models, in which a range of dispersal estimates were included, showed that projections were greatly influenced by dispersal distance assumptions. We summarize best practices for future research on distributions, including potential methodologies for dispersal integration and highlight the problems if dispersal is ignored.


Introduction
The use of species distribution models (SDMs) to estimate species' ranges has been widely implemented by biogeographers (Phillips et al. 2006, Phillips and Dudik 2008, Franklin 2013. This is done by modelling the current relationship between known locations of presence, absence or pseudo-absence locations, and the values of the environmental variables in those locations. Once the current association is understood, a projection into the future or the past is accomplished by estimating areas with suitable environment across space and time. These models, also called climate-envelope, bio-climate, or species-habitat, or habitat distribution models, can then be used to project where a species' climatic niche may be in the future or was in the past to better understand species range dynamics (Loarie et al. 2008, Carroll 2010, Carvalho et al. 2010, Milanovich et al. 2010, Rebelo et al. 2010, Sork et al. 2010. Conservation strategies have incorporated these correlative models to determine the extent and quality of habitat reserves (Carroll 2010, Kaky & Gilbert 2016, Mata et al. 2019, discover new populations , Rhoden et al. 2017, Allen & McMullin 2019, identify population threats and prioritization (Ryberg et al. 2017, Thorn et al. 2009), estimate abundance values (Ferrier 2002;VanDerWal et al. 2009, but see Dallas & Hastings 2018), and explore shifts under climate change or gene flow among populations.
One important consideration when generating species' distribution models is the inclusion of dispersal estimates. As described in the Biotic-Abiotic-Movement (BAM) framework Peterson 2005, Peterson 2011), range shifts are only going to occur when suitable biotic, abiotic, and movement regions overlap. These mechanisms are especially critical to understand the dynamics of species' range shifts, as species' ranges will expand and contract in response to these elements across time in different areas of the range (Pearman et al. 2008). Methods for developing dispersal estimates include tracking individuals to quantify movement (Wikelski et al. 2007) and landscape genetic methods using correlative models of gene flow (Manel et al. 2003, Holderegger andWagner 2006). Parameter estimation, such as long-distance dispersal rates, distances, and likelihoods of establishment, are difficult to obtain (Nathan 2001(Nathan , 2005. Long-distance dispersal, for example, is defined and studied in multiple ways with methodologies varying with the definition used and taxa studied (Nathan et al. 2003, Kinlan et al. 2005, Fonte et al. 2019. Colonization is also a complicated process, with a wide range of factors influencing success across systems (García-Valdés et al. 2015, Southwell et al. 2016. Despite these challenges, methodological improvements may have increased incorporation of dispersal data into the studies of distribution over time. Alternatively, limitations of the technology used to track animals could bias the types of organisms for which dispersal estimates are being used within SDMs (e.g., exclusion of small organisms or fossorial animals).
Another limitation of including dispersal estimates in these models is that dispersal rates across a species range can be complex. For example, climate-based SDMs often do not consider variation in movement among populations across the species range due to differences in topography (Kormann et al. 2012); predatory (Sih & Wooster 1994), symbiotic (Barlow & Schodde 1993), or competitive interactions (Brom et al. 2016, Liang et al. 2018; or standing genetic variation or plasticity for dispersal (e.g., due to local adaptation, Oliver et al. 2009). Some models that incorporate dispersal limitation find that plants and animals may not be able to track the rapid movements of the suitable environment across the landscape (Schloss et al. 2012, Cunze et al. 2013, García et al. 2017. Therefore, climate-based range-shift models represent potential species distributions that may not be realized due to limitations imposed by biotic interactions, dispersal modalities, local adaptation, human impacts, and geographic barriers to movement Peterson 2005, Sinclair et al. 2010). In addition, movement may change across time through adaptation or genetic drift. For example, dispersal capabilities may decrease when fitness cost increases with dispersal distance and when the local environment is stable and less chaotic (Murrell et al. 2002). It has been argued that models of range expansion and delineation based only on dispersal estimates may be the most parsimonious compared to climate-determined distribution models (Rodríguez-Rey 2013, Kubisch et al. 2014), in particular when range size can be explained simply by dispersal capabilities of a species (Laube et al. 2013, Penner and Rödel 2017, but see Lester et al. 2007). Yet, data needed to appropriately estimate dispersal are often elusive, thus limiting its use in models despite their presumed importance, although the patterns and consequences of incorporation of dispersal across the literature are not well established.
Incorporating dispersal within distribution models can take many forms, ranging from zero to unlimited dispersal (Miller and Holloway 2015), and can lead to different projections in range shifts from those generated by climate-only models (Franklin 2010, Engler et al. 2012, Bateman et al. 2013, Holloway et al. 2016, Singer et al. 2018, LaRue et al. 2019, Maiorano et al. 2019. A large number of model frameworks and tools now exist to incorporate dispersal estimates into range predictions. The frameworks range from population-level (e.g., RangeShifter, Bocedi et al. 2014) to individual-level (MigClim, Engler et al. 2012;CDmetaPOP, Landguth et al. 2017). To illustrate the impacts of how dispersal estimates can affect projected species range shifts with climate change, we constructed distribution models for Rana luteiventris, a frog species in northwestern North America, in which we incorporated a span of dispersal estimates (Figure 1). When dispersal estimates are not included in the model, the range was predicted to shrink because colonization is ignored. At the other extreme, assuming unlimited dispersal in the model lead to range expansion because dispersal barriers or the likelihood of climate change out-pacing migration rate is ignored. When we include an estimate of dispersal rate based on the average dispersal rates of anurans, we find an intermediate result, although qualitatively closer to the zero dispersal scenario for this species. Thus, incorporating realistic dispersal scenarios from empirical data is needed to better understand species' range dynamics. Besides realistic range contraction or expansion, this incorporation can also lead to more fine-grained projections such as the extent of population fragmentation or the importance of dispersal barriers.
Incorporation of movement and dispersal data into species range research has been called for multiple times over the past 15 years (Soberón and Peterson 2005, Kokko and López-Sepulcre 2006, Brooker et al. 2007, Broquet and Petit 2009, Jønsson et al. 2016), but how the field has responded has not been characterized. To fill this knowledge gap, we used established systematic literature review methodologies (Pullin and Stewart 2006, Moher et al. 2009, Lortie 2014, Lortie and Bonte 2016 to assess how dispersal data has been integrated within species range research and evaluate how this inclusion has affected prediction of species range shifts relative to those made by climate-only models. Specifically, we asked whether inclusion of dispersal data has proportionally increased over time in relation to the publication of any call to action papers by identifying publication spikes, what kinds of dispersal data have been incorporated into models, and whether constraints on collecting dispersal data has limited the taxonomic breadth of such analyses.

Methods
Literature Search. We conducted a literature review following methods of Lortie and Bonte (2016) and others for species' range papers that incorporated dispersal data following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) format (Moher et al. 2009, Figure 2). Searches were conducted in the ISI Web of Science database during December 2017. We used the search string Topic: "specie* rang*". The following subjects were excluded: Engineering (all types), Geography Physical, Materials Science Multidisciplinary, Computer Science Information Systems, Chemistry (all types), Agronomy, Public Environmental Occupational Health, Biotechnology, Applied Microbiology, Automation Control Systems, Instruments Instrumentation, Pharmacology, Food Science Technology, Medical General Internal, Thermodynamics, Surgery, Water Resources, Physics (all types), Computer Science, Construction Building Technology, Mathematics (all types), Dentistry, Horticulture, Orthopedics, Chemistry, Polymer Science, Operations Research Management, Health Care Sciences, Acoustics, and Telecommunication. Next, we removed review papers, editorials, book chapters, news items, and proceedings papers. This resulted in 6,406 publications (referred to as the "full dataset"; Supplementary Material Table S1). We then created the second dataset from a subsample of these papers. This "dispersal dataset" was derived by using Search within Results and the term "dispers*", which resulted in 970 papers that included dispersal (Supplementary  Material Table S2). We focused on dispersal as it is the most common term used when discussing movement within species' distribution models (Holloway and Miller 2017). Dispersal is also the term for movement needed for successful colonization of an area, as opposed to the terms "migration" or just "movement". No other limitations were given in the search parameters. It is possible that these steps may have biased our results.
Bibliographic Analysis. For these first two datasets, the species range dataset ("full range dataset") and the papers from that dataset which includes dispersal ("dispersal dataset"), we counted the number of publications per year and the cumulative number in total. We then calculated the proportion of papers that included dispersal data. From these datasets, we then determined the top 15 publishing journals and the proportion of the total journal output these papers represent over the period of time represented in the "full range dataset" . 1991 was the first year from the literature search, and no papers were removed from any dataset based on date. We also completed a keyword co-occurrence network to look for differences between these two datasets with the R bibliometrix package and the Fruchterman method (R Team 2013, Aria and Cuccurullo 2017). This is a force-directed mapping method, where the keywords act as nodes and crossing of links is minimized to visualize relationships.
Systematic Review. We randomly subsampled 200 papers from the "full dataset" of 6,406 papers to create a third dataset ("reviewed dataset") to analyze differences in publication patterns when dispersal data is not included in species range research. We categorized abstracts on method used, taxa involved, study location, and inclusion of future or past predictions. If unclear from the abstract, which was most of the time, we read the entire article for this information. Method types were: surveys, simulations, molecular, meta-analysis, species distribution model, habitat model, or some combination. Habitat models were distinguished from distribution models by the use of habitat-species interactions through empirical data collection at different populations, which did not always result in models being applied to the entirety of the range or being spatially-explicit. Animal taxonomic categories were: squamates, testudines, anurans, ichthyes, mammals, aves, arthropods, and non-arthropod invertebrates. Plant taxonomic categories were angiosperm, gymnosperm, multiple, or other. For study location, we categorized at the continental scale: North America, South America, Europe, Asia, Africa, Australia, or island systems. Distribution model prediction categories were: none, future, or historic. Due to low numbers of papers including future or historical distribution predictions, we created a new data set by subsampling 100 additional papers from the "full dataset", with either future or historical predictions of species' ranges. This was done by using R to filter for papers with "climate change" or "historic range", and "distribution" or "range" within the text of the abstract. This allowed us to look for trends in predictive methodologies that may have been overlooked in the original 200 paper subsample. This fourth dataset is referred to as "range shift dataset". We did not quantify the dispersal data for each paper as the goal was not to provide a meta-analysis of published data, but rather we focused on the categories of research and their frequencies.
Statistical Analyses. In addition to qualitative evaluation of the "reviewed dataset" papers, we performed Pearson's χ 2 tests to look for differences in count data of the methods, study location, taxa, and the incorporation of future or historical distribution models and predictions.

Results and Discussion
Bibliometric analysis and systematic review of the literature highlighted several patterns regarding the state of the field. When comparing the "full range dataset" to the "dispersal dataset", we found the top 15 publishing journals were similar between publication groups, with the top 5 being identical ( Figure 3A). This was done to highlight potential differences in the frequency at which different research fields were incorporating our definition of dispersal. Molecular Ecology and Conservation Genetics were more frequently encountered for the dispersal group, driven mostly by the fields of phylogeography and landscape genetics, with both primarily being used to infer connectivity among populations at varying temporal scales. PLoS ONE was near the top for both publication groups, but this was driven by the large number of papers published each year ( Figure 3B). The variety of biological fields and disciplines represented in the journal list also highlights the interdisciplinary nature of species' range research, spanning multiple disciplines within ecology and evolution.
Over the time analyzed in our literature searches , the proportion of papers that included dispersal data slowly increased, both when looking at the proportion of total papers per year (from 0.08 to 0.20) and the within-year proportions (from 0.06 to 0.15), with both increasing by about 0.10 over the last decade (Figure 4). There were no sudden changes, which could have been expected in response to any potential specific call to action paper or other event. These small, steady increases may instead be due to incremental decreases in methodology costs coupled with improvements in molecular and tracking methods. As specific examples, Segelbacher et al. (2010) reviewed the declining costs of genetic data for landscape genetics, while Lennox et al. (2017) highlighted improving animal tracking technology in aquatic environments. These methodological improvements can quickly occur by following the trajectory of ever-increasing data availability, as observed in island biogeography (Fernández-Palacios et al. 2015). There is one major caveat to these patterns: the scope of research and primary research goals may not be predictive in nature and therefore are less focused on dispersal abilities of the organism. Much of the research described in our literature search focused on describing present-day species ranges. Even though species' range limits are a result of dispersal patterns, if authors assume that species currently cover all of its suitable area then incorporation of dispersal estimates are not necessary.
The keyword co-occurrence clustering analysis found similar terms used between the "full dataset" and "dispersal dataset" (Figure 5). One exception was that "speciation" and "local adaptation" were not found in the top 25 terms of the "full dataset." In addition, the clustering analysis found no clear pattern within the "full dataset." In the "dispersal dataset," two primary clusters were found, with phylogeography and population structure clustering, along with their related methodologies (e.g., mitochondrial DNA), in their own group separate from the rest ( Figure 5A).
During the systematic review, several patterns were observed between dispersal and nondispersal papers within the subsample from the full dataset ("reviewed dataset"). Methods varied between the dispersal and non-dispersal papers in this dataset ( Figure 6A, Pearson's χ 2 , P < 0.001). The most common method used to estimate dispersal was "molecular methods", with the fields of phylogeography and landscape genetics being well represented. The higher proportion of molecular methods in the dispersal subsample were observed due to lower proportions using only traditional survey and tracking methods, or distribution modelling by itself. There were no differences in taxonomic group represented across the two datasets ( Figure 6B, Pearson's χ 2 P = 0.37).
There was a correlation between method and taxonomic category in the "reviewed dataset" (Pearson's χ 2 P = 0.03), although no difference was detected when furthering categorization by dispersal inclusion (Cochran-Mantel-Haenszel, P = 0.31). This implies that some methods may be more common for certain species, but we might not have had the statistical power to detect the relationship after adding another layer of categorization. One potential reason for differences in taxonomic-method categories may be due to the applicability of available tracking methods to specific organisms, or the number of species within a taxonomic group, or the amount of research focused on a specific taxon. For example, radio-collars with longer battery life are best suited for larger animals (e.g., coyotes, Hinton et al. 2015), and this was reflected by two of the three radio-collar studies within the subsampled papers (i.e., wolf, Chadwick et al. 2010; sage grouse, Taylor et al. 2017). However, the third radio-collar paper was on the Mt. Graham red squirrel, which is about 240 grams (Koprowski 2005). Smaller animals, such as frogs, limit the available options as even small devices may impact movement (Bull 2000). Although size is limiting, attachment options also vary with the skin of the organism, so that tracking a small amphibian presents unique challenges compared to tracking a small mammal (e.g., Merrick and Koprowski 2016). In addition, difficulties can often be present in determining whether collected tracking data represents dispersal or some other migratory behavior, in particular with animals with high site fidelity. In the subsampled papers, the only papers looking at invertebrates which included dispersal data focused on molecular methods.
Overall geographic biases were observed, with most studies taking place in Europe or North America (Figure 7). A potential trend of difference between dispersal and non-dispersal was found with geographic location of studies ( Figure 6, Pearson's χ 2 P = 0.09). Dispersal was incorporated at a higher proportion in South America, Africa, and island systems. For South America and Africa, this may be due to movement research as a focus due to concerns about habitat fragmentation (e.g., Cramer et al. 2007, Van Houtan et al. 2007) and interest in wideranging migratory species, respectively. Dispersal of organisms among island systems was studied more frequently and in greater detail than contiguous populations in our dataset, and this focus has been driven by the field of island biogeography (Jønsson et al. 2016). This could be due to the scale or discreteness of identifying populations between island systems and mainland locations. In addition, dispersal within a small island may not be as important because the movement capabilities of individuals may exceed island size.
Dispersal, although incredibly important, was found to be rarely incorporated in studies predicting range shifts. In the "reviewed dataset", predictions of changes in species ranges, either future or in the past, were not frequently included (21/200) and incorporation of dispersal was unrelated to whether researchers created predictions (10/21) included dispersal, Pearson's χ 2 P = 0.15). Within the additional papers sampled for a detailed analysis focusing on range predictions ("range shift dataset"), 19/100 included dispersal. This low number of studies in the original sample is likely due to predictions either not being one of the intended goals or the logistic difficulty of creating quality models, both of which would be taxon, system, and researcher specific. It is possible that researchers acknowledged the limitations of their data and choose not to create predictive models with dispersal due to the assumptions that would need to be accepted. For example, it may be inappropriate to build range-wide models if large variation in movement exists across the range (Austin et al. 2004). The pattern of future range predictions occurring more often than past range predictions may be driven by the need for reliable models and predictions of range expansions in conservation planning or the assumptions and challenges of large-scale historic range modelling. Increased timescales for range predictions also increases the potential problems with certain assumptions, such as no climatic adaptation or changes in movement modalities; however, the scale can also make hindcasting easier as aspects of the model can be averaged out over large periods of time and individual stochastic processes may be less important.
In our literature search, we found that 80% of papers did not include dispersal despite the history of calls to include it, and when included, it greatly shifted the inferences on species' ranges and dynamics. For example, Bush and Hoskins (2017) highlight that the rate and type of dispersal had significant impacts on projected areas for freshwater species. This pattern was clearest when increasing mean yearly dispersal from 1 to 3 km for crayfish, dragonflies, and frogs. When Radinger et al. (2017) included dispersal parameters in their models projecting range shifts of river fish, they found that barriers like dams were associated with the reduction in colonization of new suitable habitat resulting in smaller predicted ranges. Broadly, Radinger et al. (2017) found that habitat may shift faster than the dispersal capabilities of some species. Similarly, Ofori et al. (2017) found that populations of Cunningham's skinks will not be able to reach newly suitable habitat due to dispersal constraints. Carroll et al. (2018) found that connectivity of populations with climate change is influenced directly by climatic and topographic factors, suggesting dispersal data may need to be collected across the range. All of these are just some of the examples highlighting the problems if dispersal is not incorporated.
Aligned with our simple simulations (Figure 1), we found studies incorporated dispersal in three ways. First, there were models that assumed zero dispersal, where distributions were confined to the current range and the overall ranges could only decrease in size (e.g., Luo et al. 2015). In the zero dispersal method, ranges can only contract. This highlights that ignoring dispersal capabilities may lead to overestimation of range declines with climate change. This may shift conservation status and management decisions of a species to be overly cautious. Second, and potentially more problematic, were models that assumed an unlimited dispersal scenario. The assumptions of maximum dispersal also lead to problems, ranging from over-estimating range increases and amounts of niche differences and similarities among taxa Thuiller 2005, Peterson 2011). This could lead to insufficient conservation efforts. Working with scenarios between these two extremes may improve projections even when data are lacking (Bateman et al. 2013), which leads to the third option: using biologically feasible distances between zero and unlimited ( Figure 1). In some cases, this can be based on available literature, for example, by using the intermediate value of dispersal ranges to set the maximum the range may shift (Reside et al. 2012).
While there have been methodological advances to collect dispersal data, such as telemetry, mark-recapture surveys, or molecular connectivity analyses, limitations of effective incorporation of biologically feasible distances in forecasting or hindcasting models still exist. Depending on the methods used to measure dispersal, estimates of colonization rate must also be calculated to be incorporated in SDMs (e.g., Singer et al. 2018). When collecting new dispersal data from the field, we recommended that dispersal, colonization, and connectivity be estimated through methods appropriate to the study system so that the meaningful aspects of individual movement may be parsed out. One example of methods being determined by study system would be the use of radio-collar data for large mammals where battery life may be less problematic when compared to light-weight birds or small animals (Bridge et al. 2011). For some species, using molecular methods to estimate landscape connectivity or number of migrants may be most appropriate, depending on the dispersal parameters of interest (Broquet andPetit 2009, Congrains et al. 2016). However, clear methods may not currently exist to bridge the gaps between these existing molecular data metrics and direct estimates of dispersal parameters. This is an area where novel computational methods would have a great impact on the ability to incorporate realistic dispersal rates in models projecting range shifts in the future for many species. While, in some situations, it may not be feasible to gather dispersal information (Barve et al. 2011), incorporation of realistic ranges of dispersal parameters reported in the literature for similar species may still improve model performance over assuming no or unlimited dispersal.
It is therefore important for researchers to evaluate the quality of the data available or obtainable for each of their systems. One potential solution to missing or low-quality dispersal data is to create multiple models to assess the sensitivity to model parameters under different environmental scenarios to evaluate model uncertainty. Even in scenarios where species-specific dispersal is not well known, multiple models of dispersal and aspects of the environment can be explored to better understand the range of results (e.g., water flow: Nickols et al. 2015). When parameterizing models, it is important to note that shifts in environmental variables may cause the movement behaviors to shift as well, so caution must be exercised when making inferences with a shifting environment but static dispersal (Travis et al. 2013). Dispersal capabilities can change through time in areas of range expansion (Simmons and Thomas 2004) and be influenced by the configuration of the landscape and the perception of the organism (Baguette and Van Dyck 2007). In these cases, modelling and estimating the patterns and processes of species' range changes will need to take a dynamic approach, consistently updating as new data are available, to minimize under-or overestimating range shift sizes. We recommend, at minimum, that models which estimate the distribution of a species should address three dispersal scenarios: no dispersal, unlimited dispersal, and some intermediate value(s) using dispersal kernel distributions pulled from the literature.
The incorporation of dispersal data in species' range research is important to avoid incorrect or incomplete inferences, especially when models are used to predict shifts in response to climate change. Despite calls for the inclusion of dispersal in models predicting species range dynamics at least 15 years ago (e.g., Soberón and Peterson 2005), our systematic literature search and example models (Figure 1) have shown that its use is still in its early stages and implementation is far from being widespread. There are still technological and computational constraints that restrict the inclusion of dispersal data in models, but even when scant data are available, there are approaches that can be used to incorporate this important process. Figure 1. Demonstration of the consequences of not including dispersal data, by using a species distribution model to predict changes 2 for the Columbia spotted frog (Rana luteiventris) with climate change. This species distribution model was created using Maxent 3 (Phillips et al. 2006) in conjunction with 609 presence points from GBIF.org and the 19 Bioclimatic layers from WorldClim.org. We 4 provide three example scenarios for predicting the shifts in the range to 2070: 1) Range constrained to within the current range 5 representing zero dispersal; 2) Range after limiting movement to ~1 km per year, which is within maximum dispersal distances 6 reported for anurans (Smith and Green 2005); 3) unlimited dispersal. (following a modified PRISMA diagram format, Moher et al. 2009). Screening of records was done within Web of Science using a 13 stepwise method, except for the abstract screening of the "range shift dataset" which was done in R by filtering text within the 14 abstracts of the "full dataset" (R Core Team 2013). The "full range dataset" and "dispersal dataset" were used to compare publication 15 trends through time and journal location of studies. The "reviewed dataset" was used for categorizing papers based on geographic 16 location, taxonomic group, and methods used. An additional 100 prediction papers was subsampled for analysis of papers using 17 forecasting or hindcasting due to low number of papers in the original 200 paper dataset that was reviewed. The first published record   Figure 4. Proportion of publications from a Web of Science search focused on species' range research that include dispersal data, 26 calculated from "dispersal dataset" divided by the "full dataset" across all journals. Total publication proportion is calculated from the 27 cumulative number of papers published from 1991, the first year in the Web of Science search, through and including the given year.

28
Within year count publication proportion represents the proportion of papers with dispersal within that single given year.  that did (Yes) and did not (No) include dispersal data (N=100 for each). Methods used in the 38 studies were different between the dispersal and non-dispersal sets (Pearson's χ 2 , P < 0.001).