The importance of defining measures of stability in macroecology and biogeography

. Stability, the continuity of environments or habitats through space and time, is a widely used concept in macroecology and biogeography and is often invoked in studies attempting to explain the uneven spatial distribution of biodiversity. Stability can be measured in various ways and at various spatiotemporal scales; however, few studies explicitly define their use of the term. This makes interpreting and comparing studies difficult. We suggest an integrated approach to defining measures of stability in macroecology and biogeography. This approach addresses five key challenges concerning the biological, environmental and spatiotemporal scales at which stability is assessed, and how the complexity of change across time and space is summarised into a metric of stability. Using this approach allows for clarity around the choice, conceptualisation, communication and comparison of measures of stability.

The term "stability" appears widely, and in many different contexts, across ecology and evolutionary biology (Ives andCarpenter 2007, Grimm andWissel 1997). In macroecology and biogeography, a wide range of studies have linked stability to the accumulation of biodiversity in specific areas and to processes such as evolution of the abiotic niche. For example, the relative climate stability of an area through time has been linked to high richness and endemism of species and genetic lineages, compared to less stable areas (Carnaval et al. 2009, Gavin et al. 2014, Cowling et al. 2015, Rosauer et al. 2015. In this context, stability is defined broadly as the continuity of environments, habitats, or populations through space and time. For instance, a site where a single habitat has occurred across millennia (e.g., rainforest) would be considered more stable than a site that has experienced multiple habitat switches (such as repeated shifts between rainforest and grassland) (Costa et al. 2018). Within this broad definition, measures of stability can vary in terms of the entity being measured (e.g., climate, species, or habitat), the spatiotemporal scale used (e.g., global studies over millions of years to local interannual studies), and the method of calculating it (e.g., the variance, mean or extremes). However, studies do not always clearly define their measure, leading to ambiguity in the interpretation of results. This is the issue we seek to address. We focus on stability as measured over millennia across regional to landscape scales, a topic of many studies, although the framework we describe can be applied to many different spatial and temporal scales.
Most studies on stability as a potential cause of diversity seek to identify landscapes exhibiting higher stability as these areas tend to accumulate more biological diversity than areas with higher stochasticity or variability. This can occur through processes of speciation (as stability may promote speciation) and persistence (as stability may protect taxa from extinction). Stability can promote speciation over long time scales by allowing taxa more time to adapt to their local environment (Klopfer 1959, Fischer 1960 or by isolating populations in separate stable areas, allowing them to diverge (Haffer 1997). High species and functional diversity can then help to stabilise communities and biomes by buffering them against climatic changes over time, for example by increasing resistance as shown by Isbell et al. (2015).
The relatively short time scale of the analyses of stability we consider, typically from the Last Glacial Maximum (LGM,21kya) or Last Interglacial (LIG,120kya) to the present, means they are usually focused on persistence rather than speciation. Continuity of environments and habitats can allow diversity to persist by protecting older communities and lineages from extinction whilst they are lost in less stable areas. For example, stable regions in Africa are believed to have protected Gondwanan lineages such as ricinuleid spiders from extinction during the climatic changes following the breakup of Gondwana (Murienne et al. 2013). Persistence can be assessed explicitly, for instance using population genetic tests for sustained high population size or range expansion, often within the context of refugia -climatically stable regions that allow taxa to persist while the climate in surrounding areas is unsuitable (e.g., Carnaval et al. 2009). Such insights may be relevant to deriving recommendations and policies on how to manage ecosystems for resilience to future climate change (Reside et al. 2013), imbuing the assessment of stability with important practical implications. However, it is important to note that attributes such as increased diversity over time may constitute a biological response to stability but are not measures of stability in themselves.

Measures of stability
The broad perspective on stability adopted across macroecology and biogeography still leaves open a vast array of possible measures of stability that can be employed in any given study. Few authors justify their particular measure, and the measures from different studies vary markedly in terms of the attribute of the system for which stability is being assessed (e.g., climate, habitat), the spatial and temporal scales of analysis, and the strategy used to synthesise complex spatiotemporal results into a summary metric. This leads to ambiguity about what is being measured and how to interpret results. For example, terms such as "climate stability" and "habitat stability" are often used without definition and sometimes interchangeably (e.g., Faye et al. 2016). This issue has been identified before in relation to community ecology (e.g., Grimm and Wissel 1997), but no clear solution has emerged. With a rising number of macroecological studies invoking stability in its various forms (Figure 1), it is important to strive for greater clarity around the definition of measures of stability employed in such studies.
Here, we propose a framework for conceptualising stability within the context of particular biogeographic or macroecological hypotheses and for better understanding the choices which need to be made in defining an appropriate measure of stability in any given context. These choices correspond to five questions that must be answered when conceptualising stability: 1. What are we measuring stability of? 2. What is the spatial scale?
3. What is the temporal scale? 4. How is the interaction between space and time addressed?
5. How do we summarise temporal variation into a single measure of stability for a site? Figure 1. The number of macroecological studies invoking stability is rising. This shows the publication date for papers in Web of Science using the term "stability" and either "macroecolog*" or "biogeograph*" in their title, abstract, or keywords on 13 August 2019. The line depicts the loess regression as a visual aid.
We hope this framework will assist authors in making informed decisions in selecting and defining measures of stability employed in their studies and will allow for more effective comparison of results across the body of research on this topic.

A framework for defining measures of stability
Confusion around the concept of stability has been acknowledged by various authors (e.g., Pimm 1984, Grimm and Wissel 1997, Donohue et al. 2013), but despite this there is little consistency in how stability is defined and measured.
Our framework focuses on one stability property: spatial continuity through time within geographic or climatic space, as it may be used for predicting alpha and beta diversity at biogeographic scales. It builds on the checklist described by Grimm and Wissel (1997), which sought to help ecologists clarify how the term stability was used. That paper defined three fundamental categories of stability concepts: persistence (persistence through time), resilience (returning to reference state after disturbance), and constancy (staying essentially unchanged). However, we view these as the biological manifestations of a single stability concept, that is, spatial continuity through time. Spatiotemporal continuity is then the driver of other forms of stability.
Additionally, in the two decades since the publication of Grimm and Wissel's (1997) checklist, there have been many studies focusing on the methodology of operationalising the concept of stability. However, this literature has yet to be unified. For example, recent studies have revealed the importance of using fine temporal resolution to capture climate fluctuations (Fordham et al. 2018) and the importance of considering the temporal extent when defining areas of stability (Ashcroft et al. 2012). Other studies have focused on describing metrics for quantifying stability, including climate velocity (Loarie et al. 2009, Brito-Morales et al. 2018 and how the relationship between space and time changes with different metrics (Garcia et al. 2014). Here, we combine each of these components to provide a comprehensive conceptual framework for selecting and defining measures of stability in future studies.
The general concept of stability employed in our framework is illustrated in Figure 2, which shows a variable changing through time, for example species' ranges expanding and contracting through time but maintaining continuity in space. To derive a basic measure of stability for any given point in this region we can create a line graph showing change over time at that point ( Figure 2b). From Figure 2, we can see there are several questions and challenges that arise in relation to key features of any given measure of stability, summarised in Table 1. (a) Shows a variable (such as temperature or habitat suitability) changing across space (x and y axes) and through time (vertical axis). The planes show a region at different time points, and the shading represents the variable being measured, for example habitat suitability. The arrows track a single site through time. The choices in metric design are shown in italics. In order to measure stability we need to choose (1) the variable being measured, (2) the spatial scale it is being measured across, (3) the temporal scale being measured across, and (4) the way of measuring the interaction between space and time. (b) Shows stability for the site tracked in (a), summarised into a line graph. To do so, we need to choose (4) the interaction between space and time and (5) the metric used to summarise patterns to a single measure of stability for that site (vertical dashed line), for example the arithmetic mean.

What are we measuring stability of?
Stability can be measured in relation to many different attributes of a system or of a particular entity within that system. At the scales we are concerned with here, we can measure the variability of an environmental parameter such as annual mean temperature or the range of a species, assemblage or biome, measured for example using the bioclimatic envelope.
The simplest way to measure stability, and one of the most commonly used, is to examine it across one or more environmental parameters, such as a measurement of temperature or rainfall. This allows for the measurement of how much, or how fast, the environment has changed over time using measures such as climate velocity (the displacement rate of climate through time divided by the rate through space (Loarie et al. 2009)). Environmental or climate stability therefore refers to continuity in environmental variables at a specific location. For example, southern Africa has high climate stability as the rate of change of mean annual temperature over the last 21ky has been low, while in contrast central Europe has low climate stability as it has experienced large changes in its temperature since the LGM ). Measures of environmental stability can be used to encapsulate variation in the past or projected future (Garcia et al. 2014), or novel climates that have arisen, or are expected to arise (Williams et al. 2007). Environmental stability can also be used as a proxy for changes in the potential distribution of species or biomes if there are no distributional data available for the biological group of interest (Garcia et al. 2014).
Models of species-level stability rest on estimates of the changing spatial distribution of a species' abiotic niche over time or the continuity in the spatial location of a species range through time. For example, desert pupfish have stable ranges because they inhabit a limited number of desert ponds that have moved little (Brown and Feldmeth 1971). This contrasts with species whose ranges are shifting rapidly either due to direct human intervention (introductions) or to track climate change. Species-level stability can be used to look at questions of extinction or migration under past or future climates (Nogués-Bravo 2009). It is usually measured using correlative ecological niche models (ENMs), which are fitted to the current realised niche then projected into the time periods of interest. There are several issues with this method, including the assumptions that species are in equilibrium with their environment and that a taxon's realised and fundamental niches are equivalent. Correlative ENMs also do not generally account for biotic interactions, non-analog climates, or niche shifts (e.g., Pearson and Dawson 2003, Nogués-Bravo 2009, Fitzpatrick and Hargrove 2009. Hence, there is strong interest in applying more mechanistic models of species stability (e.g., Fordham et al. 2012, Mathewson et al. 2016, but this approach remains difficult to scale up to large numbers of taxa. For now, it seems that practicality dictates use of correlative models in most cases despite their well-known limitations (Wiens et al. 2009).
Compositional or assemblage stability relates to changes in community composition (beta diversity) over time. For example, the Serengeti Plains in eastern Africa have high compositional stability as the community has changed little over time, possibly due to the low rainfall and small species pool (Anderson 2008). This contrasts with areas where the community composition has changed rapidly, for example through species introductions or species range shifts associated Average, extremes, variance, difference, presence, rate of change with climate change. Compositional stability is usually measured using macroecologically constrained "stacked" species distribution models (Guisan and Rahbek 2011), but distance matrix-based modelling techniques such as generalised dissimilarity models (GDM)  are also used. These models assess the degree to which community composition has been stable over time.
Biome or ecosystem stability is analogous to species stability, but here the goal is to estimate the stability of the range of a biome rather than a species. It is measured with a particular regional assemblage in mind, usually by fitting ENMs to the realised niche of the biome, or sometimes using mechanistic dynamic vegetation models (Thuiller et al. 2008). These methods use models fitted in the present and projected into other time periods to assess the continuity and, hence, stability of the biome or vegetation type (e.g., Costa et al. 2018). Biome stability has been studied in a variety of systems, with clearest results for those with well-defined climatic boundaries such as rainforests (Graham et al. 2010, Rosauer et al. 2015 or regional forest to savanna transitions (Hirota et al. 2011).
These different types of biological stability are interlinked. Compositional stability is impacted by biome stability, as when a biome retreats or expands it affects the community composition at a site. Habitat stability is in turn affected by environmental stability, depending on how broad a climatic tolerance the ecosystem has (West and Salm 2003). This close interaction may explain why many studies looking at climate or habitat stability are unclear about which they are studying despite the concepts being quite distinct (Ashcroft 2010).

Environmental variables
Most studies of stability include a measure of environment, whether explicitly or in models such as ENMs. The term "environment" is very broad. For the current purpose, it comprises the abiotic variables describing a region, including its climate, geology, and topography. These variables can be looked at in two ways: as raw or as transformed variables. Raw variables are those directly measured in the environment, for example annual precipitation as measured by a weather stations, or inferred through a model or proxy, such as annual mean temperature derived from a paleoclimate model. Estimating stability using these variables would directly measure changes in the abiotic environment. Alternatively, measures of stability can be derived using environmental variables which have first been statistically transformed to better reflect observed present-day patterns in the turnover of the species composition of communities across these gradients. For example, methods such as GDM and Gradient Forest use available biological data to statistically transform each of a set of raw environmental variables such that distances within the multivariate space defined by these transformed variables correlate as closely as possible with observed dissimilarities in present-day species composition between sampled sites (Ferrier et al. 2007, Ellis et al. 2012). This approach scales the relative effect that changes in different environmental variables are expected to have on compositional turnover (e.g., the relative importance of temperature versus precipitation), along with scaling variation in this effect at different points along any given gradient (e.g., a higher rate of turnover per unit change in precipitation at the low versus high end of a precipitation gradient). This scaling of environmental space also allows changes over time to be expressed in terms of the compositional dissimilarity expected between two time points as a function of changes in multiple environmental variables (Blois et al. 2013).
Using either raw or transformed variables, the variables that are most important will depend on the physiology, niche, and ecological interactions of the biological entity of interest (Williams et al. 2012). Regions that are stable for one species or entity may not be stable for another. The best way to identify informative variables, at least for studying single entities, is to include physiological and ecological data, such as those obtained from performance trials and experimental or extensive field studies. However, for many systems these data are not available and are impractical to obtain, such that realised distributions are used as a surrogate. When direct physiological data are not available, data on the ecology of the taxa can be combined with environmental layers and presence/absence data (Williams et al. 2012).

What is the spatial scale?
The issue of scale has been discussed widely in ecological literature since at least the 1970s, with several comprehensive reviews published (e.g., Wiens 1989, Levin 1992, Chave 2013). The importance of conducting studies at an appropriate spatial scale is well-known (e.g., Chase and Leibold 2002, Williams et al. 2002, Cavender-Bares et al. 2006) as processes and correlates that are important at one scale may not be important at others. For example, biotic interactions tend to be important in describing species distributions at local scales, with decreasing importance as the scale increases. In contrast, climate is classically viewed as being an important driver of diversity at regional scale and above and less so at a local scale. However, recent work has shown the importance of microclimates for environmental filtering at local scales, with the mechanisms by which drivers influence biogeographic patterns also changing with scale (Chase and Leibold 2002, Hortal et al. 2010, D'Amen et al. 2017. Our framework recognises two major components of spatial scale, extent and resolution, both of which need to be chosen carefully based on the patterns and processes being studied. Spatial resolution, also known as grain or focus, relates to the size of the individual spatial units being analysed (Turner et al. 1989, Whittaker et al. 2001). These may be plots of a few square metres or grid cells of 100 kilometres. As the size of the spatial units increases, variation between cells decreases because more variation is captured in each individual cell (Levin 1992). This means that some patterns, such as micro-refugia, will be more apparent at a fine resolution that captures more variation between cells (e.g., Ashcroft et al. 2012).
Extent refers to the overall size of the analysis region, such as a protected area, biogeographic region, country or global scale (Wiens 1989). A greater extent generally captures more variance between the cells. It is important to note that very few systems are completely closed, so processes and patterns outside the chosen extent may still impact the results (Wiens 1989). Taxa perceive and interact with their environment at different scales, so using a priori behavioural and ecological data will assist in choosing an appropriate scale (Wiens 1989, Rahbek 2004, Anderson et al. 2010.

What is the temporal scale?
Like spatial scale, the temporal scale of a study needs to be defined in terms of both extent and resolution. The temporal extent considered will depend largely on the question being considered. For instance, looking at the stability of an area over a month would give a very different response to looking over a millennial timescale, with the location of areas of stability varying based on the time frame considered (Ashcroft et al. 2012). Without attention to the temporal scale, studies addressing the same question may be mistakenly compared despite measuring very different things. Most studies invoking stability focus on millennial time scales, often since the last interglacial or LGM, although some consider smaller temporal extents, including down to intra-annual time scales (e.g., Martin andFerrer 2015, Gainsbury andMeiri 2017).
Temporal resolution refers to the number and spacing of time periods considered, represented in Figure 2 by the number of time slices included. A study comparing only the LGM to the present would have a different result to one considering the same temporal extent, but with modelled data for every 100 years, with higher temporal resolution leading to greater accuracy (Fordham et al. 2018). If, for example, the modelled range of a population became regionally extinct at one time but was later re-established, it would not have maintained continuity through time, so it would not be considered stable. However, if one considered only two time points, before and after this discontinuity, this break in continuity would not be identified.
Studies at different temporal scales may not be comparable (Wiens 1989). Different processes operate at different scales, with a gradual shift from ecological to evolutionary processes as the temporal extent lengthens (Chave 2013). Yet, there is a link between variation at different scales, such as between annual temperature range and longer term temperature fluctuations (Janzen 1967). Studies at a large spatial scale often (though not always) use a large temporal scale as well (Wiens 1989). This means that the appropriate temporal scale for a study will depend on the processes being studied and the spatial scale chosen, as well as any time lags between the process and response (Anderson et al. 2010) and the generation times of the organisms being studied, if any (Levin 1992).

How is the interaction between space and time addressed?
Another challenge in describing stability in a region is considering how changes over both space and time interact. How can changes through time across the surrounding landscape be addressed in assessing stability for a single site? Three possible ways of doing this are local stability, neighbourhood stability, or dynamic landscape stability.
The simplest case is local or static stability, where a single site in a region is compared to itself through time (Graham et al. 2010). A stable area would be one that has remained continuously suitable or similar through time. This approach does not take the conditions in adjacent cells into account, although the spatial scale is still important. Local stability is the most commonly measured type of stability.
Neighbourhood stability considers the spatially dynamic nature of environments, whereby a species or biome may persist by moving locally to track changes in the environment. In neighbourhood stability, a single cell is compared to the surrounding cells in the region, to look for analogous environments. Climate change velocity uses this method, comparing change in climate over time to that over space (Sandel et al. 2011).
In more complex dynamic landscape stability models, entities such as species or biomes can shift to track changes across the landscape through time. The size of the surrounding region considered can be scaled depending on the question and organism of interest. The maximum distance allowed from the original cell of interest to a surrounding analogous cell depends on the capacity of the organism or biome to disperse, being larger for a high-dispersal organism such as a bird compared to a low-dispersal organism such as a lizard (Sandel et al. 2011).
The method chosen to combine space and time will have a significant impact on the final measure of stability, as shown in Figure 3. In this example, a site becomes completely unsuitable at one time, suggesting local extinction using a static stability model. However, when using a dynamic stability model (Graham et al. 2010), which allows species or biomes to track contiguous suitable environments through the landscape, changes are much less pronounced.

How do we summarise temporal variation into a single measure of stability for a site?
Having resolved the first four challenges, a final decision is choosing a metric to summarise temporal variation for a site into a single measure. There are  Figure 2, showing a variable (for example, habitat suitability for a species) changing over space (x and y axes) and time (vertical axis). Here we show different methods of combining space and time when measuring stability for a site: local stability (red arrow), neighbourhood stability within a radius of the original site (red arrow combined with the circle around the site), and fully dynamic landscape stability allowing for tracking across the landscape (yellow arrow). (b) Shows how stability for that site would be measured across time using all three methods for combining space and time. The shaded bar represents the value of the variable being measured (e.g., habitat suitability for the site), and each line in the plot represents a method of combining space and time. (c) The final step of measuring stability is to obtain a single value for the stability at each site. This illustrates some metrics for doing this. Possible metrics include (1) extremes (shown as the minimum), (2) difference or anomalies (shown as the difference between one end of the time series and the extremes), (3) geometric mean, (4) arithmetic mean, or (5) percentage of time in a given range of values (with the bracket indicating a hypothetical range of values). Table 2. Commonly used metrics for summarising stability. Biological meanings are defined assuming that stability is being measured for climate, but similar interpretations apply for other levels of stability.

Metric
Definition Examples of specific metrics Biological question

Difference between time periods
The amount of change that has occurred between time periods.
Climate anomalies (e.g., Sonne et al. 2016) How similar is the current environment/available niche to environments in other time periods?

Mean
The climate or suitability of a location averaged across time.
Arithmetic mean; geometric mean (e.g., Graham et al. 2010) What climatic conditions have taxa had to adapt to?

Rate of change of environment
The speed at which the environment has changed over time.
Climate velocity (Ma et al. 2016) How well can taxa track the changes in climate?

Extremes
The most extreme conditions or suitability experienced over time. Standard deviation (e.g. Brown et al. 2014) How much climatic variability have taxa experienced? six commonly used classes of metric (see Table 2): difference between time periods, mean, rate of change, extremes, presence in all time periods, and variance between time periods. Different metrics emphasise different biological processes, so their choice should be driven by the system and question being studied. For example, extremes such as very low suitability may indicate bottlenecks in a population, while the geometric mean is useful in showing whether a region was continuously suitable through time. Some metrics rely on decisions made in other steps. Climate velocity, for example, is a measure of the rate of change of the environment but assumes some form of dynamic stability (where entities can track changes across the landscape) (Ma et al. 2016).

Implementing the framework
Together, these five challenges make up a framework for designing and communicating measures of stability at the biogeographic scale. By working through each of these challenges sequentially, a more robust measure of stability that is relevant to the hypothesis being tested will be designed and communicated. Explicitly considering the variable being measured will ensure that the results can be interpreted in a biologically meaningful way. The choice of spatial and temporal scales will affect the drivers and mechanisms that can be tested for. How stability is summarised into a single number for each site -through both the choice of how space and time interact and the choice of metric -will change the biological meaning of the result and which hypotheses can be tested. Figure 4 summarises the challenges and the options available for each.
Unfortunately, while there have been a few studies measuring the impact of one specific aspect of stability, for example temporal resolution (Fordham et al. 2018) or dynamic and static stability (Graham et al. 2010), there have been no studies systematically altering how stability is measured across the five dimensions of stability. This gap in the literature means that, while explicitly considering how stability is measured is important from a conceptual and communication perspective, it is difficult to know what impact the current lack of clarity has on the results of studies. Future studies systematically investigating this will allow the impact of consciously choosing a stability measure to be measured.
Despite this lack, some insight can be gained in comparing the results of studies investigating the same region but using different measures of stability. For example, there has been a lot of research on stability of the Australian Wet Tropic rainforests, starting with some of the earliest spatial models of paleoclimate (Nix and Switzer 1991). While this is an intensively studied region, with broad patterns of stability well-established from both paleomodeling and paleoecological data (Vanderwal et al. 2009), variation in the details of results occurs. Much of this is due to differences in the stability metrics used. For example, using dynamic stability consistently shows greater connectivity between refugial areas compared to using static stability (Graham et al. 2010, Rosauer et al. 2015. Changing the spatial extent can make a large difference to predictions of refugia (e.g., Vanderwal et al. 2009). Similarly, the differences in the refugia identified by Bell et al. (2010) and Moussalli et al. (2009) are likely due to a combination of the taxa chosen (widespread versus montane skinks) and the metrics used to summarise across time, specifically the geometric mean versus the product of suitability.
As can be seen in this example and in Box 1, the framework offers a clear foundation for choosing the most appropriate way of measuring stability based on a given hypothesis. Doing so, and clearly communicating the choices made and reasons behind them, will help to enhance interpretation and comparison across multiple studies in this field, while future research will help clarify the quantitative importance of these decisions.

Box 1 -An example of the framework
Here, we give two examples of how this framework can be used to determine possible approaches to measuring stability (see Figure 5).
Our first example uses stability to test the hypothesis "Areas that acted as refugia through the last glacial-interglacial period have shaped patterns of phylogenetic diversity (PD) in the Australian Wet Tropics (AWT) Rainforest". For testing this hypothesis, the variable being measured would be the biome as a whole, particularly as tropical rainforest as a biome is welldefined climatically and so can be readily modelled using a small number of environmental variables (Hilbert et al. 2007). The spatial scale would ideally be a local to regional resolution, to allow for the identification of fine patterns of PD at the same resolution, with an extent slightly larger than the AWT, including a buffer to allow for past climatic changes and reduce edge effects. The temporal scale would be an extent of the present to the LGM, with as fine a resolution as possible given the available data, and the generation time of the taxa. A common practice is to use only a few time periods -the present, the LGM, and one or two intermediary points in the Holocene, representing the variability observed in pollen records (Kershaw and Nix 1988). While this reduces computation time, having such a low resolution means that key features, such as periods of high velocity, could be missed. Thus, temporal resolution would ideally be of centuries or even decades (e.g., Fordham et al. 2018). Allowing space and time to interact through dynamic landscape stability allows the biome to shift and track suitable climatic conditions (Graham et al. 2010). There are several appropriate metrics that could be used to identify refugia, for example the rate of change (e.g., climate velocity) or the minimum suitability over time. In contrast, the average suitability over time would not be appropriate as areas that have been moderately unsuitable but stable could get the same score as areas that have fluctuated between being highly suitable and highly unsuitable.
Our second example uses stability to identify current microrefugia for a low-dispersal endangered species with a shrinking range induced by climate change. Here, the variable being measured is species stability. The spatial scale would be a local extent with fine resolution in order to incorporate microclimate observations (e.g., Ashcroft et al. 2012). Temporal scale would likely be an extent of fifty to one hundred years, possibly including future projections, with a resolution of years (e.g., Cheddadi et al. 2017). Static stability may be appropriate here as the aim is to identify areas to focus conservation efforts on. Finally, the most appropriate metric would likely be the presence of the species at a site in all time periods. Figure 5. Two examples of how the methodological framework for stability in macroecology and biogeography can be used. (a) Shows appropriate choices for measuring stability when testing the hypothesis "Areas that acted as refugia through the last glacial-interglacial period have shaped patterns of phylogenetic diversity in the Australian Wet Tropics". (b) Shows appropriate choices for measuring stability over much smaller spatiotemporal scales when aiming to identify current microrefugia for an endangered species with a shrinking range.

Implications for future projections and conservation
While the concept of stability has traditionally been used to study the past, an increasing number of studies use the concept of stability to identify areas that may act as refugia under future climate change. These can then be used to evaluate current reserve systems and incorporated into conservation planning (Reside et al. 2013), with refugia now being considered in the creation of government policy as well. For example, the Australian Government's Biodiversity Conservation Strategy explicitly references the need to "identify and protect climate change refugia" (Natural Resource Management Ministerial Council 2010).
With such direct, practical implications, it is even more vital that stability is clearly defined and that an appropriate measure be used. Multiple studies have shown the identification of future refugia, and, hence, appropriate reserve choices are heavily dependent on the methodological choices made (Ashcroft et al. 2012, Keppel et al. 2012, Reside et al. 2013). Employing our framework in studies of future climate change will ensure that sound conservation recommendations can be made.