Big data suggest migration and bioregion connectivity as crucial for the evolution of Neotropical biodiversity

. Tropical America (the Neotropics) is the most biodiverse realm on Earth and might harbour more species than tropical Asia and Africa combined. The evolutionary history generating this outstanding diversity remains poorly understood partly because data on the geographic distribution of species are scarce. Collections from museums and herbaria can overcome this gap, but uncertain data quality hampers their use, especially in historical biogeography. Here, I highlight the results from recent studies quantifying diversification and bioregion connectivity in the Neotropics using large-scale species occurrence data, and argue that (i) recently developed software to analyse large-scale data provides a methodological route forward for biogeography, and (ii) biotic connectivity within and among bioregions is a common, but underappreciated, process in the evolution of Neotropical diversity.


Introduction
The tropical regions of Earth are prime examples for a fundamental question in biogeography: Why are there more species in some areas than others despite comparable environments? Tropical America (the Neotropics) is a global biodiversity centre, harbouring three times as many flowering plant species as tropical Africa and potentially more species than tropical Africa and Asia combined (Antonelli et al. 2011). The diversity of this region is outstanding concerning the species richness in many individual clades (Richards 1973, Gentry 1982, as well as concerning the number of species co-occurring both in large grid cells used in global richness analyses (Kreft et al. 2008, Slik et al. 2015 and in size-standardized vegetation plots (Gentry 1988, de Cáceres et al. 2012, Ricklefs et al. 2012, Terborgh et al. 2016. Most data available for comparisons across realms are from evergreen forests, but the Neotropics seem particularly species-rich across biomes ( Fig. 1) and taxonomic groups (Fig. 2;Ceballos and Ehrlich 2006, Wiens 2007, Somveille et al. 2013. The drivers of Neotropic hyperdiversity and the difference in species richness across tropical realms have puzzled biologists for decades (Raven et al. 1974, Gentry 1982 but are surprisingly little studied (Couvreur 2015). While there is evidence that in particular African lineages have suffered increased extinction (e.g., Raven et al. 1974, Morley 2000, recent studies highlight the importance of increased speciation rates in the Neotropics, potentially linked to the uplift of the Andes, past marine incursions, or biotic interactions (Gentry 1982, Antonelli et al. 2011, Lagomarsino et al. 2016, Rangel et al. 2018. Additionally, biotic connectivity-here broadly defined as the movement of evolutionary lineages through geographic space and across bioregion and biome borders-emerges as potentially important for Neotropic diversification. This regards (i) the relation of Neotropic (and in particular South American) biota with biota in other parts of the world and (ii) the dispersal of lineages within and among major bioregions and biomes in the Neotropics.
Once considered a continent in splendid isolation (Simpson 1980), South America might, in fact, have exchanged evolutionary lineages with other regions for much longer than previously expected . Furthermore, the Neotropics comprise different bioregions (regions of similar species composition, Morrone 2014) and biomes (large-scale habitat types, Olson et al. 2001), but it is unclear how these are connected on evolutionary time-scales. In general, the exchange of evolutionary lineages among bioregions and biomes can be important for plant diversification (Donoghue et al. 2014, Onstein et al. 2016), but it is considered limited due to lineages' tendencies to retain their ancestral ecological niches ("phylogenetic niche conservatism"; Wiens 2004). Indeed, biome shifts have been shown to be relatively rare in plants Frontiers of Biogeography 2019, 11.2, e40617 across the southern hemisphere (Crisp et al. 2009). In contrast, exchange of evolutionary lineages among bioregions and biomes might be relatively common in the Neotropics (Simon et al. 2009, Lohmann et al. 2013, Souza-Neto et al. 2016 and potentially be linked to species-rich regions (Simon et al. 2009) and changes in diversification rates (Areces-Berazain et al. 2017). Hence, the question emerges to which extent biotic connectivity has contributed to the accumulation of Neotropic biodiversity.
Biological fieldwork to collect data on species ecology and evolution in the Tropics is difficult and time-consuming. Despite recent advances, a lack of data on species geographic distribution ("Wallacean  The major biomes in the Neotropics, Afrotropics, and comparable latitudes in Asia (Indomalaya + Australasia) and, (b) the raw species richness estimates of flowering plants in these biomes and biome area. For the majority of biomes, the species richness is highest in the Neotropics, a pattern especially prominent in forest biomes. Realms, biomes and biome area from (Olson et al. 2001), species estimates from GBIF (2016), geographically cleaned and taxonomically scrubbed. Species can occur in multiple areas. The grey dots show the area of each biome in each realm. understanding of the relative importance of extinction, speciation, and biotic connectivity for Neotropical diversification. Over the last decade, the digitization of natural history collections and occurrence data from observation networks made unprecedented amounts of global species distribution information publicly available (e.g., via the Global Biodiversity Information Facility 1 ). GBIF provides access to more than 140 million geo-referenced collection records from almost 200,000 angiosperm species and includes more than 1.3 billion occurrence records in total (as of May 2019). These data aggregate hundreds of years of scientific collection effort and, despite considerable biases and caveats (Meyer et al. 2016), have the potential to reduce at least the Wallacean shortfall, especially in the tropics (Kier et al. 2005, Feeley et al. 2011). However, their use for global-scale, data-driven biogeography is hampered by unclear data quality and the inability of recent software to process large amounts of data.
Zizka (2018) provided a large-scale perspective on the evolution of Neotropical biodiversity based on large amounts of species occurrence data and fossil occurrences using novel software with a special focus on improving geographic precision and the reproducibility of analyses. Specifically, Zizka (2018) addressed three broad questions: 1) How to process large-scale species distribution data in biogeographic research in a scalable and reproducible manner?
2) What is the role of speciation and extinction rates for the outstanding diversity of the Neotropics?
3) How connected are biomes and bioregions within the Neotropics and Neotropic biota with other regions of the world?

Scalable and reproducible processing of large-scale species distribution data
Big data call for automated and scalable data processing. This raises three major practical challenges to the use of large-scale species distribution data in biogeography. (I) Data curation. Especially, geo-referencing errors, taxonomic misidentification, and sampling biases are common data quality issues in public databases of species distributions (Meyer et al. 2016), but rarely explicitly accounted for in biogeographic analyses. Of these three, the latter two are difficult to address at the record-level without revision of the original collection material or additional targeted data collection. However, geo-referencing errors can be addressed, to a certain 1 www.gbif.org 2 See https://ropensci.github.io/CoordinateCleaner for detailed documentation 3 http://bioregions.mapequation.org/ 4 https://github.com/azizka/speciesgeocodeR 5 From www.gbif.org extent, automatically for large amounts of data. For instance, a case study in the Neotropic plant tribe Cinchoneae (Rubiaceae) demonstrated that GBIF data, after the careful removal of erroneously geo-referenced records guided by automated flagging, can accurately represent large-scale diversity patterns and ecological preferences of species (Maldonado et al. 2015). (II) Area delimitation. Often discrete areas (e.g., bioregions) are necessary for analyses, but finding biologically meaningful delimitations is often difficult, limiting analyses to continental scale or arbitrary expert-knowledge based areas. (III) Assignment of species to discrete areas. Assigning species to discrete areas based on expert knowledge or checklist data is time-consuming, difficult to reproduce, and not scalable to large data sets.
In a recently developed workflow Zizka (2018; Fig. 3) explicitly addressed these three challenges and facilitated the curation of large-scale occurrence data and their use in biogeography, with three software packages: 1) CoordinateCleaner 2 , an R-package to automatically identify geo-referencing errors common to biological and fossil collection databases, such as zero coordinates, coordinates in the sea, coordinates assigned to capitals, the geographic centroid of political units or biodiversity institutions, and spatial and temporal outliers, among others (Zizka et al.

2019);
2) Infomap Bioregions 3 , an extremely fast and user-friendly web-app to identify taxon-specific biogeographic regions from species occurrence information based on the map equation algorithm (Edler et al. 2017); 3) SpeciesGeoCoder 4 , an R-package to facilitate point-to-polygon classification, visualization of diversity patterns, and automated conservation assessments based on large amounts of species occurrence records (Töpel et al. 2016).

Speciation, extinction and the differences in species richness among tropical regions
To quantify differences in speciation and extinction rates among tropical regions, Antonelli et al. (2015) combined c. 20 million occurrence records of angiosperms 5 with a large-scale phylogeny (Zanne et al. 2014) using the software described above. Based on these data, these authors estimated area-specific diversification rates for different tropical and temperate regions based on c. 22,000 species using state-specific specification and extinction models (FitzJohn 2012).
Interestingly, net diversification rates (speciationextinction) were on average not higher in Neotropic plant lineages compared to other tropical realms. This result contradicted the idea of increased speciation as a driver for the assembly of Neotropic biodiversity, but an improved phylogenetic sampling of evolutionary lineages would be necessary to clarify this finding. Furthermore, the results did show significantly elevated speciation and extinction rates in the Neotropics (c. 2-2.5 times higher than in the Afrotropics or tropical Asia), indicating exceedingly rapid evolutionary turnover. This suggests that Neotropical species are formed and replaced by one another at unparalleled rates, reflecting the many recent radiations characterizing South American plant diversity. The causes underlying this high species turnover might be associated with the substantial landscape dynamics that have affected northern South America since the Miocene, among other continent-specific differences such as biome sizes, niche space, and climatic history (Antonelli et al. 2011).

The role of biotic connectivity in the evolution of Neotropical diversity
To quantify the connectivity of Neotropic biota with other regions of the world, Antonelli et al. (2015) reconstructed the evolutionary history of angiosperm lineage migration among tropical and temperate regions based on stochastic character mapping (Huelsenbeck et al. 2003). The results suggest comparable mean rates of immigration and emigration for tropical Africa and tropical Asia but consistently higher emigration from tropical America throughout most of the Cenozoic (the last 66 Ma). Hence, the Neotropics might have functioned as a "species pump" to the rest of the world. An ongoing biotic connection of South America with the North American continent was supported by a cross-taxonomic analysis of 415 predominantly Neotropical clades comprising 4,450 species from six major clades across the tree of life (amphibians, birds, ferns, flowering plants, mammals, and Squamata; Antonelli et al. 2018). An ancestral area estimation for these groups using different biogeographic models showed more than 1,400 successful shifts between Central America and South American bioregions since at least 50 million years ago (Fig. 4).
To quantify the biotic connectivity among bioregions and biomes within the Neotropics, Antonelli et al. (2018) reconstructed evolutionary shifts of lineages in the same six taxa among ten major bioregions and two biome types (closed canopy and open canopy) throughout the Cenozoic. The results suggested that biotic interchange had been common within the Neotropics, with more than 4,500 dispersal events among bioregions (Fig. 4), of which more than 2,000 were related to biome shifts. All regions had served as source and recipient of lineages, and there was generally high congruence in the directionality of dispersal events across taxa. For instance, all taxa showed a substantial interchange between Amazonia and Mesoamerica, the Atlantic Forests, the Cerrado and Chaco, and the Andean Grasslands (Fig. 4). These results contrast the view that bioregion and biome shifts over evolutionary time are rare events and imply that even very dissimilar regions-in terms of climatic and environmental variables and inherent biota-do not evolve in isolation but were biologically interconnected over evolutionary time-scales. Amazonia (including the Andean slopes) emerged as the primary source region, supplying over 2,800 lineages to other regions, and hence could be considered the primary source of Neotropical biodiversity: not only did it generate enormous in situ diversity, it also supplied lineages to all other Neotropical regions.
Finally, to test the connectivity of locations within bioregions, Zizka et al. (2018) used large scale distribution information of angiosperms to identify putatively rare species (species with less than three collection records available from www.gbif.org) throughout the Neotropics and investigated their distribution. The fraction of Figure 3. Exemplary workflow to use large-scale species occurrence or fossil records in historical biogeography. Tools presented by Zizka (2018) are marked in red. (A) CoordinateCleaner to clean common geo-referencing errors in collection data, (B) Infomap bioregions to infer taxon-specific bioregions, and (C) SpeciesGeoCoder to assign records to discrete areas in a format ready to use for commonly used tools in historical biogeography accounting for further details, such as elevation or minimum occurrence thresholds. Solid lines: recent occurrences and DNA, dashed lines: fossils. Light grey are tools and methods not further developed but used for data analysis.
Frontiers of Biogeography 2019, 11.2, e40617 © the authors, CC-BY 4.0 license 5 these putatively rare species is low (below 5% of all collections in 100x100 km grid cells) and homogeneous throughout most parts of the lowland Neotropics and, in particular, lowland Amazonia. The two known localities of a given putatively rare species might be far apart (for 20% of the species more than 200 km apart, and 5% more than 1,700 km), suggesting that a considerable proportion of rare plant species had large distribution ranges, including across bioregion borders. In lowlands, given one occurrence record, the location of the second record for many of these rare species was largely unpredictable, highlighting the need for intensive and broad biological sampling. In highlands above 1,000 m, especially in the Andes and the Guiana Shield, the fraction of putatively rare species was significantly higher, and the record distribution was more predictable. For instance, rare species were more often confined within the Andes. A virtually independent dataset from vegetation plots supported these results. The results showed that some species, while having very few collections, had surprisingly large distribution ranges and that disjunct distributions of rare species are in many cases largely unpredictable in lowland areas.

Conclusions
Scalable and reproducible approaches for data curation and analysis are essential for biogeography in the age of 'big data'. Recently developed open-source software -including CoordinateCleaner, SpeciesGeoCoder, and Infomap Bioregions -facilitates scalable, reproducible, and data-driven analyses of large-scale occurrence data, in particular in combination with phylogenetic trees. These and similar tools help to unlock the full potential of datasets from digitalized herbarium and museum collections around the world for biogeographic research, while at the same time addressing common errors in the geographic information common in these datasets.
The outstanding species richness found today in the Neotropic angiosperms is potentially associated with a high species turnover, an ongoing interchange of lineages with other regions of the world, and high biotic connectivity within and among bioregions and biomes. Overall, the observed distribution of species over large geographic distances and the exchange of evolutionary lineages among geographic realms, bioregions, and biomes seems prevalent across taxonomic groups in the Neotropics. These findings contrast with the idea of a dominant role of dispersal limitation and phylogenetic niche conservatism in the Neotropics and suggests the continuous exchange of evolutionary lineages among realms, biomes, and bioregions as important processes for the evolution of the outstanding Neotropic biodiversity observed today.