Comparing and Integrating Fish Surveys in the San Francisco Estuary: Why Diverse Long-Term Monitoring Programs are Important

Many fishes in the San Francisco Estuary have suffered declines in recent decades, as shown by numerous long-term monitoring programs. A long-term monitoring program, such as the Interagency Ecological Program, comprises a suite of surveys, each conducted by a state or federal agency or academic institution. These types of programs have produced rich data sets that are useful for tracking species trends over time. Problems arise from drawing conclusions based on one or few surveys because each survey samples a different subset of species or reflects different spatial or temporal trends in abundance. The challenges in using data sets from these surveys for comparative purposes stem from methodological differences, magnitude of data, incompatible data formats, and end-user preference for familiar surveys. To improve the utility of these data sets and encourage multi-survey analyses, we quantitatively rate these surveys based on their ability to represent species trends, present a methodology for integrating long-term data sets, and provide examples that highlight the importance of expanded analyses. We identify areas and species that are under-sampled, and compare fish salvage data from large water export facilities with survey data. Our analysis indicates that while surveys are redundant for some species, no two surveys are completely duplicative. Differing trends become evident when considering individual and aggregate survey data, because they imply spatial, seasonal, or gear-dependent catch. Our quantitative ratings and integrated data set allow for improved and better-informed comparisons of species trends across surveys, while highlighting the importance of the current array of sampling methodologies.


INTRODUCTION
The San Francisco Estuary (estuary) is an anthropogenically altered, geographically complex estuary that drains a watershed of more than 194,000 square kilometers in northern California (Conomos et al. 1985). Historically, the estuary supported productive commercial RESEARCH Comparing and Integrating Fish Surveys in the San Francisco Estuary: Why Diverse Long-Term Monitoring Programs are Important Dylan K. Stompe* 1 , Peter B. Moyle 1 , Avery Kruger 1 , John R. Durand 1 VOLUME 18, ISSUE 2, ARTICLE 4 and recreational fisheries for both native and introduced species (Scofield 1931;Yoshiyama et al. 1998). Rapid human population growth and increasing demands for water resulted in overharvest of many fish species, invasions of nonnative species, and widespread habitat alteration (Nichols et al. 1986;Cloern and Jassby 2012). These factors led in turn to the decline of some native and long-established non-native species, as well as some extinctions (Kohlhorst 1999;Moyle 2002, Sommer et al. 2007. From 1959 to the present, state and federal agencies and the University of California-Davis established numerous surveys to document the status of important estuary fish species. At least 14 of these surveys have been conducted continuously for 17 years or more (Table 1; Appendix A, Table A1). Survey methods include the use of a variety of trawls, beach seines, gill nets, and fyke traps. Most surveys were initiated to track the abundance of either juvenile Striped Bass (Morone saxatilis) or juvenile Chinook Salmon (Oncorhynchus tshawytscha). Since their inception, many have shifted emphasis to track Delta Smelt (Hypomesus transpacificus) abundance and that of other endangered species. Methodologies remained largely consistent, and survey crews generally recorded all species captured, resulting in a long-term record of trends in fish abundance and diversity.
The challenges in using these data sets from these surveys for comparative purposes result from the magnitude of data from each survey paired with incompatible data formats (i.e. species coding, units, file type, etc.). Problems arise in drawing conclusions based on one or a few surveys, because each survey samples a different subset of species or reflects different spatial or temporal trends in fish abundance. Because of disparate data formats and species coding, researchers and managers rarely conduct analyses across the breadth of data sets from these surveys. We identified these issues through our own exploratory analysis of trends in abundance of estuary fish species across these surveys, which proved difficult and time-consuming.
Here, we compare relative catch of different fish species and assemblages across 14 of these surveys, and then provide methods to integrate their data sets for analysis of broad species trends. We use the integrated data set to provide examples of disparities in catch for select species, and to make comparisons of results with fish salvage data (referred to hereafter as 'salvage') from the State Water Project (SWP) and Central Valley Project (CVP) water export facilities in the South Delta. Comparisons were made with salvage data to explore the utility of this datarich-yet often overlooked-resource to estimate fish abundance. Finally, we selected a subset of surveys that can be easily compared because of consistency of effort over time. We use these to evaluate the long-term record of trends in abundance of four important fish species identified with the Pelagic Organism Decline (POD; Sommer et al. 2007). Our study should complement recent work that the Interagency Ecological Program has taken toward making data sets of the San Francisco Estuary more accessible. To integrate data sets, we reformatted fish and water-quality data to provide consistency across all surveys. Patterns derived from the integrated data set are valid at population scales and can be used to compare relative abundance of fish caught in each survey. Integrated data allow basic questions posed by managers to be answered quickly and efficiently, and results can suggest the need for further in-depth analysis. For example, Dahm et al. (2019) used an early version of our approach of identifying relative survey selectivity to suggest improved monitoring in the Delta by using whole fish assemblages rather than just endangered native fishes. To demonstrate the utility of the integrated data, we address the following questions: • How much redundancy is there across surveys?
• What areas and species are inadequately sampled?
• What are the abundance trends for POD species across surveys?
• Are salvage data consistent with other surveys?

METHODS
We evaluated and integrated the data from 14 surveys in a series of steps. First, we estimated which species and assemblages were best represented in the surveys, producing what we termed "species-survey ratings." We then combined the data from these surveys into one, open-access data set with associated water quality and catch data, which we call the "SFE Integrated Data Set" (SFE IDS). Using the SFE IDS, we compared differences in catch of POD species among all surveys as well as salvage. Finally, to more confidently evaluate trends in species abundance across multiple surveys, we selected a subset of eight surveys from the SFE IDS that were most comparable in terms of longevity and consistency of effort. The resulting eight surveys were combined into what was termed the "8-Survey Index" and used to evaluate trends in POD species abundance.

Species-Survey Ratings
As an exploratory effort to quantify which individual survey data were best suited for analysis of trends in species abundance, we constructed an equation to rate species-survey relationships. We developed these ratings using the equation: where "R" represents the species-survey rating, "f sp " is the number of years in which a given species was caught in the survey, "n" is the total number of years in which a survey has operated, "T c " is the total catch of a given species over the life of the survey, and "M c " is the total catch of the most caught species over the life of the survey. R-values were calculated for 36 species (Appendix A, Table A2) that were selected based on current or historical prevalence within the Delta (Dahm et al. 2019; Table 2). Higher R-values indicate better species representation in the survey. Newer estuary surveys were omitted because of limited data, but they will become increasingly useful as their durations increase.
Equation 1 was constructed iteratively to maximize spread of R-values between zero and one. The first portion of the equation ( f sp /n ) penalizes surveys that do not consistently catch a species, while the second portion ( 3 √T c /M c ) standardizes catch in relation to the maximum individual species catch for a given survey. The square and cube root portions of the equation are applied so that highly abundant species, such as Threadfin Shad (Dorosoma petenense), do not overwhelm those species that exist at intrinsically lower population levels. An R-value of one corresponds to the species that has been caught in the highest cumulative numbers and frequency for a given survey, and a zero corresponds to any species that was not caught over the life of a given survey. We also evaluated selectivity of surveys for certain fish assemblages (pelagic, benthic, fringe, submerged aquatic vegetation [SAV]; Appendix A, Table A2). We used mean R-values per assemblage for all surveys and salvage to compare the overall relative sampling selectivity of assemblages (Table 3).

San Francisco Estuary Integrated Data Set
We used data from the surveys in Table 1  We also included year, date, time of sampling, method, survey, station name, and station coordinates.
Many surveys report their findings through unique indexing methods, such as reporting catch per area or volume of water sampled. Given the differences in area sampled and catch efficiency among gear types, in addition to the fact that not all surveys report volume or flow meter readings, we chose not to index catch against volume sampled. Instead, we report catch per unit effort (CPUE) of all surveys as catch per trawl/seine. Similarly, rather than index salvage catch against volume of water exported, we treated salvage CPUE as catch per day. Our approach with these data does not allow for direct catch comparisons between surveys and/or salvage because of differential gear efficiencies. However, it does provide an accessible aggregative data set that can be cautiously analyzed while recognizing the potential comparability issues associated with our methodological decisions. Full R code for data set integration is available in Appendix B.
Using the SFE IDS, we visualized sampling distribution and species trends. Current sampling distribution for the 14 SFE IDS surveys (2017) was plotted as a heat map (Figure 1). We then visualized differences in trends for fishes identified in the POD as mean yearly CPUE across all 14 surveys ( Figure 2). Species of the POD are Striped Bass, Threadfin Shad, Longfin

Delta Salvage
To understand whether salvage tracks species abundance trends, we compared mean annual CPUE for the POD species among four key surveys and salvage using a scatterplot matrix (Figure 3). Within the scatterplot matrix, we plotted relative density as the number of observations of mean annual catch for each survey and species (Figure 3). We also tested the relationship in POD species mean annual catch between surveys and salvage using Spearman rank correlation. Spearman rank correlation was chosen to describe non-linear relationships, given that surveys and salvage catch may scale differently under different environmental and operational conditions. Correlations of individual species are color-coded; the correlation of all POD species combined is given in black (Figure 3).

Smelt (Spirinchus thaleichthys), and Delta Smelt.
We also visualized CPUE of Sacramento Splittail (Pogonichthys macrolepidotus), a species native to the estuary that appears to have maintained a healthy, if isolated, population (Sommer et al. 1997;Moyle et al. 2004Moyle et al. , 2020. Through coding in program R (Chang et al. 2018; R Core Team 2019), we created a "shiny" application that allows for simple exploratory visualization of temporal and spatial species trends using the SFE IDS. We added data-filtering tools to aid in survey comparison, and plots and data can be downloaded directly from the application, which is published on the internet and can be accessed by researchers, managers, and the public.   Table 1. As a final measure to increase the validity of trends identified using the 8-Survey Index, we controlled for changes in annual sampling intensity by equally weighting each of the eight surveys. Surveys were equally weighted by averaging the mean CPUE of each survey by year. We did this because while we had constrained spatial and temporal variability, sampling intensity varied considerably between years for the CIMWT and the BSS. Equally weighting surveys produces a metric of annual CPUE in which aggregate gear efficiency does not change over time.
To explore the utility of the 8-Survey Index data set and examine differences in trends of POD species, we plotted stacked bar graphs of mean yearly CPUE values using the 8-Survey Index and the FMWT data sets (Figure 4).  are presented in Table 2, showing the relative selectivity of surveys for Delta fishes. No two surveys had the same rank order of species R-values, and most of the 36 Delta species showed high catches in at least one survey (Table 2). Table 2 shows that while species may be well represented in some surveys, they may also be nearly or totally absent in others. For example, Mississippi Silverside (Menidia audens) is the most frequently caught species in the three beach seine surveys, and nearly the most caught species in the Mossdale Kodiak Trawl (MKT); but it is mostly absent from the two Bay Study surveys, and only marginally represented in the FMWT, the Sacramento Midwater Trawl (SMWT), and the CIMWT (Table 2). Similarly, Sacramento Splittail are well represented in the MKT and SMOT, but relatively poorly represented in the Sacramento Kodiak Trawl (SacKT) ( Table 2).

Species-Survey Ratings
When we consider mean R-values by assemblage (Table 3), pelagic species (R = 0.56) are most well represented across all the surveys, followed by fringe and benthic species (R = 0.26 and 0.25, respectively); SAV-oriented species were the least well represented (R = 0.20). Similar to Table 2, individual survey R-values are not in total agreement across assemblage groups, and agreement by gear type is mixed. For example, R-values dictate that the Yolo Bypass Beach Seine (YBBS) is most effective at capturing SAVoriented fishes, while a survey with similar gear type, the Suisun Marsh Beach Seine (SMBS), has a very low R-value (R = 0.06) for the same assemblage group. Conversely, the two surveys using otter trawls-the BSOT and the SMOT-were both more effective at sampling benthic fishes than any other gear type (Table 2).

San Francisco Estuary Integrated Data Set
We successfully integrated 14 estuary surveys into the SFE IDS. The SFE IDS is organized horizontally, with each row representing a single trawl or seine pull. Survey identifier and method columns allow for discrimination of catch by survey and gear type, across the 167 fish species that the 14 surveys have captured. Of these 167 fish species, 120 have been captured at least ten times (Appendix A, Table A3). While some recorded environmental variables differ and were omitted (channel vs. shoal, presence of debris, weather, etc.), most of the surveys consistently recorded major water-quality metrics such as water temperature, water depth, Secchi depth, and salinity, and these are included in the SFE IDS.
(A ReadMe file in. docx format and the SFE IDS in .csv format and the code associated with its construction can be downloaded as Appendices C and D or by request from the corresponding author. In addition, a program for exploratory visualization of these data can be found at the following link: https://baydeltalive.com/ fishsurveystudy/fish-survey-study.) Using the stations that the SFE IDS surveys currently sample, we mapped the density of stations as a metric of sampling intensity (Figure 1). This Figure

Delta Salvage
The R-values for a majority of species captured in the salvage facilities are high, and all  Table 2). In contrast to the majority of other surveys, most species are at least moderately well represented by salvage, and only five species have an R-value of less than 0.2 (Table 2). This evenness is apparent when considering species assemblages as well, and is only surpassed by the BSS and the YBBS when measured as the difference between the best-represented and least-represented assemblage group (Table 3).
When salvage is compared to a subset of SFE IDS surveys, correlation of mean annual catch between salvage and the surveys appears to be no more variable than correlation between surveys. For example, mean annual salvage of Striped Bass is strongly correlated with mean annual catch by the STN (cor = 0.68) and the FMWT (cor = 0.60; Figure 3). While this is a lower level of correlation in mean annual catch of Striped Bass than between the FMWT Survey and the STN (cor = 0.895), it is considerably higher than the correlation between the FMWT Survey and the BSS (cor = 0.02; Figure 3). This incongruity in correlation of POD species catch remains constant across the surveys included in Figure 3.
Similarly, we may examine the density of POD species catch for salvage and the subset of surveys included in the SFE IDS in Figure 3 as a way to investigate their agreement with one another. The plots running diagonally in Figure 3 represent the density of observations of annual catch, with the x-axis corresponding to the number of a given species caught per year and the y-axis the number of observations. Given this, species that are caught in high numbers in a given survey will be clustered around the right side of a plot, and low catch on the left side of a plot. Species caught in consistent numbers will be represented by a single peak in the density plot, whereas species with a high annual variability in catch will have a lower peak and wider density distribution.
Using the density plots, we can see that salvage catch of Threadfin Shad and Striped Bass is consistent and in high numbers (Figure 3). This is supported by R-values, which identify Striped Bass and Threadfin Shad as the two most wellrepresented species in the salvage data ( Table 2). The SMOT, which also has a high peak in mean annual Striped Bass density of catch, has low correlation in catch with salvage (cor = 0.09; Figure 3).

8-Survey Index and Pelagic Organism Decline Species Trends
We increased the validity of considering SFE IDS surveys in aggregate by turning a subset of the them into the 8-Survey Index data set. This data set includes only surveys that have run consistently since 1980, and has been spatially constrained to include only continuously operated stations and temporally constrained to consistent seasonal periods. Our subsetting and filtering procedures resulted in an aggregate data set that can be leveraged to analyze estuary species trends with considerably expanded seasonal and spatial coverage.
Through equal weighting of annual 8-Survey Index catch data, we analyzed trends in POD species abundance in comparison to trends identified using the FMWT (Figure 4). We show that the POD decline around the year 2000 is far less pronounced when the 8-Survey Index is compared to the FMWT. For example, Threadfin Shad, which shows a dramatic decline after the year 2000 in the FMWT, remains at relatively stable population levels before and after the start of the POD when 8-Survey Index data is considered (Figure 4). Striped Bass, which also shows a decline around the year 2000 in the FMWT, seem to remain at relatively stable population levels between the mid-1980s and the present when looking at the 8-Survey Index data. When the two smelt species are considered, the trends shown by the 8-Survey Index generally agree with the FMWT. However, the decline around the year 2000 appears to follow a slight rebound in 1993 after a period of drought, rather than being a prolonged decline (Figure 4). It would appear, based both on the 8-Survey Index and FMWT data sets, that the principal decline in Delta Smelt, Longfin Smelt, and Striped Bass occurred in the early to mid-1980s, rather than around the year 2000 ( Figure 4). This apparent decline in these three species occurred outside of a drought period and before the introduction of Potamocorbula amurensis, an invasive species and ecosystem engineer that has often been credited with driving native species decline in the estuary (Mac Nally at al. 2010; Thomson et al. 2010).

DISCUSSION
When tasked with describing particular species abundance trends or implementing environmental regulations, researchers and managers often choose one or a few surveys based on preference or convention (Sommer et al. 2007;Mac Nally et al. 2010;Thomson et al. 2010;Fisch et al. 2011;Miller et al. 2012). However, the R-values from our Species-Survey Ratings show differences in selectivity (Table 2); this is likely a result of gear type, sampling sites, and seasonality. For example, surveys that sample with midwater trawls preferentially capture pelagic species, whereas otter trawls were relatively more effective at sampling benthic species. Identification of species selectivity by location and season are beyond the scope of this paper; however, this type of analysis will be possible using the SFE IDS and 8-Survey Index data sets.
Visualizations from the integrated data set show that the single-survey approach is not appropriate for many species (Figures 2-4). For example, while the POD is evident from the FMWT data, it appears to be muted when the aggregated 8-Survey Index data set is considered (Figure 4). Acknowledging these disparities is important in the management of the estuary, given the richness of available data and the investment of resources in mitigation and restoration. Even a survey, such as the FMWT, that produces highquality data on diverse species cannot adequately capture all trends in species abundance.
The species-survey rating Table (Table 2), when combined with simple plots of CPUE trend data and survey spatial extent, allows for a first cut at looking at trends in all species, across surveys. Given the enormous differences in sampling gear among surveys, lengths of the sampling programs, diversity and number of sampling locations, and annual timing of surveys, there may be limitations to this analysis. Nevertheless, the data can be used to answer questions such as: • Is there high redundancy among surveys?
• What areas and species are inadequately sampled?
• What are the trends in fish species identified as part of the Pelagic Organism Decline, in diverse surveys?
• Do the salvage data show the same species trends as shown in surveys?

Is There High Redundancy Among Surveys?
The estuary is most extensively surveyed for pelagic fishes (Table 3), with the greatest intensity of sampling being in the North Delta, West Delta, Suisun Bay, Suisun Marsh, and San Pablo Bay (Figure 1). Although some surveys have similar target species and regions, no one survey entirely duplicates another because sampling occurs at different frequencies, locations, and time periods, and with different gear types (Table A1). Species found in large numbers in multiple surveys, such as Striped Bass and Threadfin Shad, do not show the same trends in abundance across all surveys (Figures 2-4). Likewise, trends in annual POD species CPUE vary among surveys ( Figure 3). These instances highlight the importance of maintaining multiple surveys that comprise long-term monitoring programs. Differences in catch among surveys may be a result of poorly understood drivers such as changes in species distribution, behavior, or the characteristics of sampling stations (Schroeter 2008;Sommer et al. 2011). Surveys often track these changes differently based on unique responses to spatial, seasonal, or gear type differences. Monthly variation in effort is relatively evenly distributed, aside from an increase in effort during summer months. However, further analysis of the SFE IDS is needed to truly disentangle seasonal effects on catch.

What Areas and Species are Inadequately Sampled?
Fishes associated with SAV, particularly in the southern and central Delta, are inadequately sampled (Figure 1; Tables 2 and 3). For example, Largemouth Bass (Micropterus salmoides) has low species-survey ratings ( Table 2) even though it is known to be an abundant species within the southern and central Delta, where it supports an important recreational fishery. The low rating is likely because Largemouth Bass, as well as a suite of centrarchid species, are most commonly associated with environments dominated by SAV (Durocher et al. 1984), which are poorly sampled by the trawls and seines that are the most widely used survey gear.
Historically, there has also been poor survey coverage of northern San Pablo Bay, as well as the central and southern portions of the San Francisco Bay. Newer surveys have increased coverage in some of these areas (e.g., those conducted by the UC Davis Otolith Geochemistry and Fish Ecology Laboratory), but were not included in our analyses because of limited temporal span. These surveys fill some spatial gaps and will prove increasingly valuable in future data sets.
The poor representation of these areas and fishes by the surveys (except by salvage and some beach seine surveys) relates to the initial purpose of most of the current sampling programs. Surveys were primarily begun to track trends in abundance for Chinook Salmon and Striped Bass-species not associated with SAV that occur primarily (at least as juveniles) in the corridor between San Pablo Bay and the Sacramento River. University and agency programs have conducted intermittent surveys that effectively sample these fishes, mostly using electrofishing. However, because these surveys have not operated continuously for long periods of time, their usefulness is limited for tracking species trends. The establishment of long-term monitoring of these fishes through appropriate sampling methods, such as boat electrofishing, would more adequately allow populations of fishes associated with SAV to be tracked.

What are the Trends in Fish Species Identified as Part of the Pelagic Organism Decline in Diverse Surveys?
Exploratory analysis of POD species trends using the SFE IDS and 8-Survey Index data sets challenges some of the trends identified using the FMWT (Sommer et al. 2007). Threadfin Shad do not show the longer-term decline seen for other POD species that show declines beginning in the early 1980s, punctuated by brief, and slight, recovery in the early 1990s. If data from the 8-Survey Index are used, as opposed to just data from the FMWT, the subsequent decline, identified as the POD (Sommer et al. 2007), is less dramatic (Figure 4). The timeline shown by the 8-Survey Index data is more consistent with known step-changes to the ecology of the upper estuary (Mac Nally et al. 2010;Thomson et al. 2010), particularly after the invasion and spread of two ecosystem engineers: the benthic clam Potamocorbula amurensis in Suisun Bay (Carlton et al. 1990;Nichols et al. 1990) and the aquatic weed Egeria densa in the Delta (Durand et al. 2016).

Do the Salvage Data Show the Same Species Trends as Shown in Surveys?
Salvage data should be used with caution because catch depends on variable water project export operations; however, the richness of this data set should not be overlooked. Salvage data for some species reflect abundance trends seen in other surveys, particularly for Delta Smelt and Striped Bass, which correlate well with the STN and FMWT data (Figure 3). This is potentially driven by the pelagic life history and (historically) the estuary-wide distribution of these two species, making them vulnerable to capture both by surveys and salvage operations.
The results of our limited investigation into differences in salvage between the SWP and the CVP in the South Delta indicate that these two facilities may not return complementary results. This may stem from differences in operation as well as the effects of predation in Clifton Court Forebay at the SWP. Although some surveys and combined South Delta salvage are highly correlated, caution should be exercised when considering SWP and CVP salvage data separately.

CONCLUSIONS
Our analyses demonstrate the necessity of longterm sampling programs that employ a suite of surveys to evaluate fish trends in the estuary. Using individual or aggregate survey data provides different lenses through which to view ecosystem dynamics, which are often cryptic. Because the estuary is a diverse and dynamic ecosystem, no single survey will adequately inform ecosystem-wide management needs or resolve scientific uncertainties. The speciessurvey ratings, data-aggregation procedures, and the readily accessible SFE IDS-along with visualization software-allow researchers and managers to more fully exploit the breadth of sampling programs within the estuary. Given the increased spatial and temporal breadth of these data, researchers may more effectively identify long-term or broad spatial trends in the abundance and distribution of estuary fishes. This will aid in the generation of hypotheses about the status and trends of fishes, both native and non-native, and will strengthen estuary fish management. We hope this exercise encourages survey managers to continue working to adopt universal procedures and coding to facilitate future collaboration and data set integration.
Our analysis of spatial and species coverage suggest that no two surveys agree for all species, which suggests that elimination of any survey should be done with great caution, especially when declining species are involved. To more holistically survey the estuary, sampling should be expanded beyond what is necessary to describe trends in listed species abundance. This is particularly true for under-sampled regions, such as the southern Delta and southern San Francisco Bay, and for SAV-associated and marine fishes, which are poorly understood in the estuary and subject to accelerating changes from global warming, water management, restoration practices, and infrastructure development.
Our analysis identifies potential pitfalls of relying on limited data to inform ecosystem management. More intensive analyses should build upon the SFE IDS to help identify drivers of differences in species trends, which may be hidden in the seasonal, spatial, and environmental aspects unique to each survey. These drivers should be further analyzed both to reveal factors important to species management, as well as to identify improvements that are needed to sample fishes within the estuary.