- Braun, Camrin D;
- Arostegui, Martin C;
- Farchadi, Nima;
- Alexander, Michael;
- Afonso, Pedro;
- Allyn, Andrew;
- Bograd, Steven J;
- Brodie, Stephanie;
- Crear, Daniel P;
- Culhane, Emmett F;
- Curtis, Tobey H;
- Hazen, Elliott L;
- Kerney, Alex;
- Lezama‐Ochoa, Nerea;
- Mills, Katherine E;
- Pugh, Dylan;
- Queiroz, Nuno;
- Scott, James D;
- Skomal, Gregory B;
- Sims, David W;
- Thorrold, Simon R;
- Welch, Heather;
- Young‐Morse, Riley;
- Lewison, Rebecca L
Species distribution models (SDMs) are becoming an important tool for marine conservation and management. Yet while there is an increasing diversity and volume of marine biodiversity data for training SDMs, little practical guidance is available on how to leverage distinct data types to build robust models. We explored the effect of different data types on the fit, performance and predictive ability of SDMs by comparing models trained with four data types for a heavily exploited pelagic fish, the blue shark (Prionace glauca), in the Northwest Atlantic: two fishery dependent (conventional mark-recapture tags, fisheries observer records) and two fishery independent (satellite-linked electronic tags, pop-up archival tags). We found that all four data types can result in robust models, but differences among spatial predictions highlighted the need to consider ecological realism in model selection and interpretation regardless of data type. Differences among models were primarily attributed to biases in how each data type, and the associated representation of absences, sampled the environment and summarized the resulting species distributions. Outputs from model ensembles and a model trained on all pooled data both proved effective for combining inferences across data types and provided more ecologically realistic predictions than individual models. Our results provide valuable guidance for practitioners developing SDMs. With increasing access to diverse data sources, future work should further develop truly integrative modeling approaches that can explicitly leverage the strengths of individual data types while statistically accounting for limitations, such as sampling biases.