Biogeographia – The Journal of Integrative Biogeography A dataset of Tanaidacea from the Iberian Peninsula and surrounding areas

tanaidaceans in the area and, by making it open access, it will allow comparisons of the distribution of tanaidaceans in zoogeographic studies.


INTRODUCTION
Databases are efficient tools to compile useful information, and are becoming increasingly used in marine ecology (Gerovasileiou et al. 2016, Hudson et al. 2016. A rising number of open access databases has been released during the last decade, many of them focused on different animal groups, such as rotifers (Garlaschè et al. 2020), polychaete annelids (Pagliosa et al. 2014), corals (Madin et al. 2016), amphipods (Horton et al. 2013), and fishes (Froese and Pauli 2019). These databases gathered and organized valuable information on target organisms, which would be otherwise scattered in the literature, making the data readily available to address different biological and ecological questions based on spatially explicit occurrence data. Research in biogeography, conservation science, or focused on the analyses of historical trends has already largely benefited from these databases (Stein 2003).
Tanaidaceans (tanaids) represent an order of peracaridan crustaceans with around 1400 described species. This relatively low diversity, particularly when compared to the more than 6000 described species for other groups of peracaridans such as amphipods and isopods, most likely reflects the low attention that the group has received historically rather than its actual diversity (Appeltans et al. 2012, Błażewicz-Paszkowycz et al. 2012) that 22,600-56,500 species of tanaids might be waiting to be described (Appeltans et al. 2012), a hard task given the small number of zoologists currently engaged with the systematics of the group. Furthermore, tanaids are important from an ecological perspective, as they are part of the hyperbenthos, act as shallow burrowers, and play a key role in marine food webs (Mees & Jones 1997). Due to their ecological preferences, tanaids have been also used as bioindicators in several ecological studies (Vizzini et al. 2002, Ambrosio et al. 2014. Despite this importance, no worldwide tanaid databases are freely available to date. At regional level, a checklist of tanaids occurring in Greek Seas is available (Koulouri et al. 2020).
We here present the first open access database for tanaids from the Iberian Peninsula and adjacent archipelagos, including geographical and ecological data, along with remarks on different sampling methods. The database encompasses mainly marine waters of Spain and Portugal, but also part of the Moroccan and Atlantic French maritime coastal areas due to their geographic proximity to the Iberian Peninsula. We expect that this database will make future research questions targeting large-scale diversity patterns of the group more amenable, possibly helping to disentangle complex historical processes such as the colonisation of the Mediterranean after the Messinian Crisis or the drivers of shallow water endemism in the Macaronesian archipelagos.

Geographical and ecological data
In order to allow the inclusion of all references related to the Iberian Peninsula, a bounding box of 6°E, 48°N, 34°W, 20°S was established as the geographic limit of the survey. Thereby, the Gulf of Biscay is included with the Celtic Sea as the northern limit, and the Porcupine Bight the north western one; the Balearic Sea is included as far as the Menorca Slope; the Portuguese and Spanish Atlantic archipelagos (Azores, Madeira, Savage Islands, and Canary Islands) are also included within these limits.
The geographical information was coded to allow the use of different spatial scales in diversity pattern analyses; thus, the study Biogeographia 36: a008 García  area was firstly subdivided in oceans, secondly in (marine) provinces, and finally in ecoregions, included as additional geographical information. This geographically nested structure is based on climatic and ecological data developed for marine ecosystems monitoring and conservation known as Marine Ecoregions of the World (MEOW), covering all coasts and shelf waters to 200 nautical miles offshore (Spalding et al. 2007(Spalding et al. , 2012. Given the specificity of the area with the presence of different archipelagos, we further divided some of the MEOW, gaining a larger resolution for future studies related to the Iberian Peninsula, totalling ten ecoregions from the five MEOW ecoregions. The subdivision was performed using the software QGIS 3.10 (QGIS Development Team 2020) and the shape file is reported as Supplementary File 1. Nested within "Western Mediterranean", an additional ecoregion called "Balearic Islands" was used to enable studies on specific diversity from the archipelago, distinguishing it from the peninsular coast. The Macaronesian ecoregion known as "Azores Canaries Madeira" was divided in those archipelagos ("Azores", "Madeira", and "Canary and Savage Islands") following the same rationale (see Freitas et al. 2019). The ecoregion known as "South European Atlantic Shelf" was divided in "Gulf of Biscay", "Portugal", and "Gulf of Cadiz" in order to obtain a clear separation between gulfs, which are different regarding oceanographic conditions. "Alboran Sea" and "Saharan Upwelling" were not modified. After doing these additional divisions, the ecoregions of the dataset were: "Alboran Sea", "Azores", "Balearic Islands" (nested in "Western Mediterranean"), "Canary and Savage Islands", "Gulf of Biscay", "Gulf of Cadiz", "Madeira", "Portugal, "Saharan Upwelling", and "Western Mediterranean" (without the Balearic Islands) (Figure 1). Figure 1. Ecoregion boundaries as used to cluster the records of the dataset. The map reports the ecoregions with the legend to their acronyms, together with the georeferenced localisation of the records described in this paper, differently coloured for published (Bibliographic survey) and unpublished data. Depth limits were categorised following the zonation by Templado et al. (2012), developed to analyse the species assemblages of marine habitats in Spain. These groups are categorised with the discrete variable "depthZonation", including: "Mediolittoral", "Infralittoral", "Circalittoral", "Bathyal", and "Abyssal". In parallel to this detailed zonation of depth, we also used a coarser separation into "Deep" and "Shallow" water through a variable named "deepShallow", selecting the boundary of 200 m depth to discriminate between shallow and deep waters, as it is the generally regarded limit between the continental shelf and slope, and was commonly used in previous tanaid studies (Błażewicz-Paszkowycz et al. 2012).

Bibliographic survey
All published records of tanaids within the geographic limits of the study area have been gathered and analysed. For that purpose, we searched in Google Scholar for all the references published with the keywords "Tanaidacea", "Anisopoda" and "Tanaidae" since 1828 until 2019 included, retaining those reporting records for the selected area. All the available information was extracted and included in a dataset.

Unpublished records
The dataset was complemented with other records of tanaids from material collected between 1967 and 2016.  (Higgins andThiel 1988, Sørensen andPardos, 2008). Individuals retained in a 62 µm mesh sieve were either bulk fixed in formalin and posteriorly preserved in 20% ethylene glycol or sorted alive and preserved in 100% ethanol. Tanaid specimens were mounted individually in a modified Hoyer's medium or in Fluoromount-G® and then examined with an Olympus BX51 microscope equipped with differential interference contrast (DIC) optics and an Olympus DP70 camera at the Meiofauna Laboratory (Universidad Complutense, Madrid).

Summary statistics
A total of 137 published sources (Table 2) was found and included in the dataset, providing 2706 records. The published sources included 122 research articles in peer-reviewed journals, 7 doctoral theses, and 9 sources of various types (e.g. environmental reports or regional inventories). In addition, the newly collected material preserved at the Meiofauna Laboratory in Universidad Complutense de Madrid added other 52 unpublished shallow water records (codified as source number 137: shallow water samples), and the deep-sea oceanographic cruises added 698 records (codified as source number 138: deep-sea cruises). Table 2. List of bibliographic sources for each ecoregion. Acronyms are explained in Figure 1.   10 levels: "Gulf of Biscay", "Portugal", "Gulf of Cadiz", "Saharan Upwelling", "Alboran Sea", "Western Mediterranean", "Balearic Islands" (nested in "Western Mediterranean"), "Azores", "Madeira", "Canary and Savage Islands". If "Substratum nature" is codified as 1) "Inorganic"; 3 levels: "Hard bottom" (more than 2mm of granulometric size), "Soft bottom" (less than 2 mm of granulometric size), "Organic origin"; 2) "Vegetal organic", 2 levels: "Algae", "Plantae" 3) "Animal organic": Description of the faunal substratum as reported in the original publication. The dataset from published and unpublished sources gathered a total of 3456 records (Supplementary File 2). The records cover all ten ecoregions and a wide bathymetry, from 0 to 5370 m in depth (Supplementary File 3).
Of the total number of records, 3001 (86.8%) are provided at species level encompassing a total of 186 species; out of the 455 remaining records, 40 correspond to individuals identified at family level, 402 correspond to individuals identified at genus level, whereas 13 potentially address 5 species that were published as doubtful (e.g. flagged as cf. in the source publication). Overall, the records correspond to 22 families, in addition to 14 species that are considered as incertae sedis (Supplementary File 3). Records from both extant tanaid suborders, Tanaidomorpha Sieg, 1980 andApseudomorpha Sieg, 1980 Additionally, 49 records of 24 species were within our geographical boundary but outside the ecoregions we defined (records outside ecoregions in Figure 1). Tanais dulongii is the only species recorded in all the ecoregions. Ecoregions covered by each bibliographic source can be found in Table 2.

The dataset
This dataset is composed of one unique table (as a xlsx file), in which each row represents the single record of a tanaid species in one geographical point. 45 additional variables are reported for each record (Table 3). These included a unique identifier per each record, 11 variables addressing taxonomic ranks (e.g. Order, Family, etc.), 9 referring to sampling (e.g. method, area, volume, etc.), 10 representing geographic information (e.g. province, ecoregion, coordinates, etc.), 10 detailing ecological data (e.g. type of substratum, water temperature, etc.), and 4 describing exclusively bibliographic (author, year of publication, etc.) (

Geographic data
Geographic range: The dataset covers the Iberian Peninsula and Spanish and Portuguese archipelagos (Azores, Madeira, Salvage Islands, Canary Islands, and Balearic Islands), and surrounding areas. This includes Moroccan and Saharan coasts, as well as Algeria until the coastal city El Aouana. Geographical range has been built by using and modifying MEOW (Spalding et al. 2007). In the North, the Gulf of Biscay was included until the Celtic Sea northern limit, and the Porcupine northwest limits; in the East, the Balearic Sea was included until the Menorca Slope; in the South, the limit was the Western Sahara limit; in the West, the limit was the Azores Exclusive Economic Zone boundary. Quality control for geographic data: Quality control was performed using QGIS 3.10, by displaying coordinates within the MEOW boundaries.
Anomalous records were individually analysed and amended.

Ecological data
Habitat type: Habitats were reported as they were found in the original literature. Some examples include gravel, coarse sand, mud, Posidonia oceanica meadows, Zostera spp. meadows, or algae.
Depth: Depth range varies from intertidal to 5370 meters. Depth limits were categorised following the zonation by Templado et al. (2012). The boundary limit between shallow waters and deep sea was established in 200 meters.

Quality
control for ecological data: Assignation of each record to any depth category and habitat was verified with the current knowledge regarding the ecology of each species, if available.

Literature search
García  Biogeographia 36: a008 Literature search method: Online webtool Google Scholar was used to search all the available literature from 1849 to 2019, including the words "Tanaidacea", "Anisopoda" and "Tanaidae". From all the resulted literature, the records for the selected area were retained.
Literature list: See Table 2.
Quality control for literature data: The completeness of the literature was confirmed repeating the search twice and cross-checking with the literature lists reported in each paper.

Taxonomy
Taxonomic ranks: All extant Tanaidacea taxa were considered in this database, including the two currently accepted suborders Tanaidomorpha and Apseudomorpha.
Species names: Both current accepted name (according to WoRMS) and species name as originally reported in each source have been compiled in the dataset, in different columns.
Taxonomic methods: Field sampled tanaids were identified to the lowest possible taxonomic rank following the available literature and WoRMS resources.  (Dollfus, 1898) recorded at 2114 and 2704 meters. Furthermore, Sampaio et al. (2016) reported the shallow apseudomorph species Apseudopsis latreillii (Milne Edwards, 1828) across a depth range from 37 to 140 meters, without clarifying whether the species was found across all the range or only in the upper parts of the range, which seems more likely given the other records in the area. While we compiled here the information as shown in the original literature, these remarks should be considered when these data will be used in further analyses.

AUTHOR CONTRIBUTION
AGH, GGG, NS, AM and FP planned the study and sampled the shallow water tanaids. AGH identified the shallow water tanaids and surveyed the literature and compiled the needed information for the dataset. GB sampled and identified the deep-sea cruises data. FP and DF provided facilities and support both in the laboratory and field sampling. All authors contributed to the writing to additions and comments to the text. highly appreciated. Samples from Cova des Coll cave were collected by Dr. Thomas M. Iliffe (Texas A&M University), always happy to help in any cavernous matter. Sampling from Garrucha and Denia were obtained and sorted by Dr. Jesús Benito (Universidad Complutense de Madrid), pioneer of meiofaunal studies in Spain.
A Graña Biological Station (Universidade de Santiago de Compostela), Toralla Marine Science Station (Universidade de Vigo), and Santa Pola Marine Research Center -CIMAR (Universidad de Alicante), and especially Andrés Izquierdo and Alfonso Ramos, allowed us to use their research facilities to perform different sampling campaigns. We also acknowledge Buceo Carboneras diving club, because of its warm welcome in Almería. AGH wants to acknowledge Patricia Esquete (Universidade de Aveiro) for her invaluable help and mentoring in the tanaids world knowledge. Last, authors want to show gratitude to an anonymous reviewer that improved the manuscript.  (Lang, 1968)