Relative Bias in Catch Among Long-Term Fish Monitoring Surveys Within the San Francisco Estuary

Fish monitoring gears rarely capture all available fish, an inherent bias in monitoring programs referred to as catchability. Catchability is a source of bias that can be affected by numerous aspects of gear deployment (e.g., deployment speed, mesh size, and avoidance behavior). Thus, care must be taken when multiple surveys—especially those using different sampling methods—are combined to answer spatio-temporal questions about population and community dynamics. We assessed relative catchability differences among four long-term fish monitoring surveys from the San Francisco Estuary: the Bay Study Otter Trawl (BSOT), the Bay Study Midwater Trawl (BSMT), the Fall Midwater Trawl (FMWT), and the Suisun Marsh Otter Trawl (SMOT). We used generalized additive models with a spatio-temporal smoother and survey as a fixed effect to predict gear-specific estimates of catch for 45 different fish species within large and small size classes. We used estimates of the fixed effect coefficients for each survey (e.g., BSOT) relative to the reference gear (FMWT) to develop relative measures of catchability among taxa, surveys, and fish-size classes, termed the catch-ratio. We found higher relative catchability of 27%, 22%, and 57% of fish species in large size classes from the FMWT than in the BSMT, BSOT, or SMOT, respectively. In the small size class, relative catchability was higher in the FMWT than the BSMT, BSOT, or SMOT for 50%, 18%, and 25% of fish species, respectively. As expected, relative catchability of demersal species was higher in the otter trawls (BSOT, SMOT) while relative catchability of pelagic species was higher in the midwater trawls (FMWT, BSMT). Our results demonstrate that catchability is a source of bias among monitoring efforts within the San Francisco Estuary, and assuming equal catchability among surveys, species, and size classes could result in significant bias when describing spatio-temporal patterns in catch if ignored.

and survey as a fixed effect to predict gearspecific estimates of catch for 45 different fish species within large and small size classes. We used estimates of the fixed effect coefficients for each survey (e.g., BSOT) relative to the reference gear (FMWT) to develop relative measures of catchability among taxa, surveys, and fish-size classes, termed the catch-ratio. We found higher relative catchability of 27%, 22%, and 57% of fish species in large size classes from the FMWT than in the BSMT, BSOT, or SMOT, respectively. In the small size class, relative catchability was higher in the FMWT than the BSMT, BSOT, or SMOT for 50%, 18%, and 25% of fish species, respectively. As expected, relative catchability of demersal species was higher in the otter trawls (BSOT, SMOT) while relative catchability of pelagic species was higher in the midwater trawls (FMWT, BSMT). Our results demonstrate that catchability is a source of bias among monitoring efforts within the San Francisco Estuary, and assuming equal catchability among surveys, species, and size classes could result in significant bias when describing spatio-temporal patterns in catch if ignored.

INTRODUCTION
The status and trends of fish populations help shape environmental regulations in the San Francisco Estuary (estuary) and can often drive substantial changes to water operations. However, justification for the implementation of policy and actions intended to protect the ecosystem depends on the quality of information provided. The sampling equipment used by the long-term fish surveys in the estuary can only sample a fraction of the water present in the system. Thus, at a given sampling location, the number of fish caught by sampling equipment may not reflect their true density, and a species may go undetected even if it is truly present (i.e., false negative/type II error).
Catchability is a term commonly used to describe this inherent bias in fish surveys, and it has been a key parameter of interest in the field of fisheries (Walsh 1997;Somerton et al. 1999). Catchability in fisheries can be broken down into two components: the probability that a fish is available to the sampling gear and the conditional probability that a fish is retained by the gear; that is, given the fish is available to the gear (also referred to as gear efficiency; Walsh 1997). Consequently, species, fish size, gear, and environmental conditions can all affect components of catchability, and thus should be incorporated into estimates of abundance or distribution, or both. A substantial amount of work has been done to account for observation error from differences in catchability and to provide measures of uncertainty (Walsh 1997;Royle 2004;MacKenzie and Royle 2005;Kéry and Royle 2016). Yet, work to estimate catchability in the estuary to date has either included a select few species (Perry et al. 2016;Mitchell et al. 2017;Mitchell et al. 2019;Huntsman et al. 2021aHuntsman et al. , 2021b or focused on occupancy rather than abundance (Mahardja et al. 2017;Peterson and Barajas 2018).
Trawling gear is used by most fish surveys in the estuary (Stompe et al. 2020). Data from these surveys have been used to understand the patterns and environmental drivers of individual species abundance to patterns of the entire fish community (Sommer et al. 2007;Mac Nally et al. 2010;Feyrer et al. 2015;Colombano et al. 2020;Mahardja et al. 2021). However, differences in catchability can influence trends based on count data from these surveys. Even with highly standardized field protocols-such as those implemented by the estuary's fish surveysdetection efficiency can vary considerably by species and over space and time (Schmidt 2005;Kéry et al. 2009;Mitchell and Baxter 2021). Furthermore, inferring fish population and community dynamics for the full range of habitats available within the estuary (Stompe et al. 2020) often requires integrating multiple data sets into a singular analysis. Consequently, spatiotemporal patterns in catch may better reflect differences in catchability between spatiallystratified surveys than true abundance patterns of fishes.
Our goal was to evaluate the relative differences in catchability among surveys and fish species in the estuary using three long-term trawl surveys in the region: the California Department of Fish and Wildlife's (CDFW's) San Francisco Bay Study, the CDFW's Fall Midwater Trawl survey (FMWT), and the University of California-Davis' Suisun Marsh fish study (Stompe et al. 2020). We used generalized additive models (GAMs) to account for the spatio-temporal changes in species catch, and therefore allow the relative gear efficiency (retention efficiency) for each species and size class to be estimated, as has been done in other studies (Walker et al. 2017;Moriarty et al. 2020). Unlike those studies, our study incorporates depth into the analysis by including gear types that more commonly sample demersal fishes (otter trawls) and pelagic fishes (midwater trawls). Consequently, our relative estimates of gear efficiency are also affected by the availability of fish to each gear type; thus, we refer to our gear efficiency estimates as relative catchability and discuss limitations of this approach later. Through this approach, we identified the relative catchability bias among trawl surveys for each species and size class, and estimated the overall magnitude of relative bias among surveys that sample the estuary fish community.

Survey Descriptions
The FMWT is an active survey conducted by the CDFW that began in 1967 (CDFW 2020a). Although originally developed as a survey for juvenile Striped Bass (Morone saxatilis), the FMWT has collected data on a number of species, and been used as a way to evaluate the effects of the state (State Water Project) and federal (Central Valley Project) water projects on estuarine fishes. Currently, the FMWT samples a total of 122 stations, mostly in the open water of the upper estuary from the San Pablo Bay to the uppermost extent of the Sacramento-San Joaquin Delta (Figure 1). A subset of stations has been sampled since the start of the survey, with stations visited monthly primarily during the fall months (September through December; Figure 1). The FMWT survey is conducted as an oblique tow through the water column with the current for 12 minutes, using a midwater trawl net with a 3.7-m x 3.7-m opening, with mesh decreasing in size from 20.3 cm at the mouth to 0.6 cm at the cod end. In our analyses of catch-ratios, the FMWT was used as the reference gear type because it is one of the longest-running surveys and is often used to infer patterns in the status and trends of key species for estuarine management (Sommer et al. 2007;Mac Nally et al. 2010;Stompe et al. 2020; see Statistical Analysis below).
The CDFW also conducts the San Francisco Bay Study (Bay Study), which fulfills aquatic community monitoring requirements mandated as part of the state and federal water projects (CDFW 2020b). The Bay Study focuses more on the downstream portion of the estuary, extending from the South and San Francisco bays to the lower reaches of the Sacramento-San Joaquin Delta. The Bay Study Two uses two types of trawling gear: a midwater trawl (BSMT) with similar dimensions and trawl duration as the FMWT, but cod end mesh made of narrow twine with a 1.05-to 1.2-cm opening; and an otter trawl (BSOT) to sample demersal fishes and macroinvertebrates. Like the FMWT, sampling for the Bay Study is conducted monthly but includes all months of the year. Sampling with both trawl types began in 1980 and is ongoing, whereby 52 stations are currently sampled from the San Francisco Bay up through the lower Sacramento and San Joaquin rivers ( Figure 1). The BSOT has a 4.67-m-width x 2.31-m-tall opening with 2.5-cm mesh in the body and 0.6-cm mesh at the cod end. Unlike the FMWT or BSMT, the BSOT is generally towed against the current for 5 minutes per tow.
The Suisun Marsh Otter Trawl (SMOT) is a study managed by researchers at the University of California-Davis for the operation of the Suisun Marsh Salinity Control Gates (Matern et al. 2002;O'Rear et al. 2020). All samples for this survey are collected within the Suisun Marsh, a large, brackish tidal wetland complex in the estuary ( Figure 1). The survey has been active since 1979 and samples are collected every month. A total of 24 stations has been sampled as part of this survey. The otter trawl is towed for either 5 or 10 minutes, depending on whether the sample is collected within a small or large slough. The net's mouth is 4.3 m wide x 1.5 m tall, and the mesh is 3.5 cm in the body and 0.6 cm at the cod end. Unlike the FMWT and Bay Study that sample the deep open water habitat of the estuary, the SMOT often samples shallow and narrow tidal sloughs within the Suisun Marsh. For this reason, the SMOT likely samples a larger proportion of the water column at their sites than the BSOT. More detailed metadata on all surveys can be found in a recent Interagency Ecological Program (IEP) survey review report (IEP 2021).

Data Preparation
We obtained survey data sets directly from the principal investigators (PIs) and then created a combined data set in consultation with the PIs. In each survey, fish length was measured for a subset of the catch of each species. We used this measured subset to calculate proportional length frequencies, which we then multiplied by the total catch of the species to assign lengths to every counted fish from each sample. Fish below a certain size were not counted in many samples, and these minimum cutoffs varied over time and among surveys (IEP 2021). For the purpose of this analysis, we chose to retain recorded fish of all lengths, and address implications for this discrepancy among surveys in our discussion. VOLUME 20, ISSUE 1, ARTICLE 3 Although FMWT, BSOT, and BSMT measured fish length as fork length (or total length for species with no fork length), SMOT used standard length. For the 20 species with conversion equations (see IEP 2021), we converted these standard lengths to fork length (or total length if no fork). We left the remaining species as standard length in the SMOT. In addition, we filled zero catches in for each species that did not appear in a sample.
We quantified sampling effort as the estimated volume (cubic meters) of water sampled in a tow. For FMWT and BSMT, we calculated tow volume with flowmeter (General Oceanics mechanical  For SMOT, we calculated the tow volume from the tow duration, the tow speed (4 km/hour), and the dimensions of the net while towing. For BSOT, we calculated tow volume from the distance towed and net size estimates. We prepared all data analyzed for this project as part of an IEP pilot review for these surveys (IEP 2021), and they have been packaged into the R package LTMRdata v1.0.0 (Bashevkin 2020).

Statistical Analysis
One approach to estimating relative differences in gear efficiency is by fitting generalized additive models (GAMs; see Wood 2017) with spatiotemporal smoothers that account for variability in abundance (Walker et al. 2017;Moriarty et al. 2020). The remaining variability in survey data can then be explained by including gear type as either a fixed (Walker et al. 2017) or random effect (Moriarty et al. 2020), which represents variability in gear efficiency among gear types for size-structured fish community data when gear types sample similar depths in the water column. This method can also capture some differences in availability when gear types that sample different depths in the water column are compared, and thus can better represent an estimate of relative catchability.
The full data set included BSMT and BSOT from the Bay Study, the SMOT, and the FMWT. Before analysis, we passed the full data set through a filtering process to prepare the data for GAM analysis. First, we removed all samples that did not include fish length measurements, stations that did not have latitude or longitude recorded, and tows that did not record the necessary values to calculate tow volume. Next, we assigned fish to a large (> 50 mm FL) or small size (≤ 50 mm FL) category. We adopted this size threshold because a gear efficiency study of the FMWT found that Delta Smelt (Hypomesus transpacificus) greater than approximately 50 mm were more likely to be retained in the cod end of the trawl (Mitchell et al. 2017). We only fit models when the species-size category was present in at least two gear types, because a catch-ratio cannot be made if the fish is present in only one gear. Lastly, we did not fit the model if the species-size category was present in less than 1% of all tows for all surveys, to facilitate model convergence and improve model fit (see Predictive Performance below). The final data set included 3,763,319 observations of 68 species and size classes from 1980 to 2018.
We fit a similar model structure as Walker et al. (2017) and Moriarty et al. (2020) for all speciessize categories. Because count data were overdispersed, as a result of excess zero catch events, we assumed that species-size category counts (C i ) for each individual tow (i) followed a negative binomial distribution with the following form: and where µ is the expected count, ϴ is the shape parameter representing over-dispersion, a fourdimensional tensor product (te; Wood 2008) was used to account for the spatial and temporal patterns in catch data (latitude, longitude, Julian days, and year) bound to the estuary, log(volume) was included as an offset, and survey was a fixed effect with the FMWT as the reference (β 0 ) condition, β BSOT as the BSOT effect, and BSOT i as the BSOT indicator variable, and β BSMT as the BSMT effect, and β SMOT as the SMOT effect. Within the four-dimensional tensor product, the spatial component (latitude and longitude) was modeled with a soap-film smooth (a smoothed surface within a boundary; Wood et al. 2008), while Julian day was modeled with a cyclic cubic spline and year was modeled with a cubic spline. A model was fit to each species-size category when sufficient data were available. When the FMWT did not capture a particular species-size category, we assigned the reference condition to the following order of gear-survey combinations: BSMT, BSOT, and the SMOT. (Note that SMOT was the last gear type and thus never used as the reference). We extracted gear coefficient estimates and back-transformed them from the log scale (exp(β SMOT )) to provide a relative measure of catchability (i.e., catch-ratio) for each gear type relative to the reference gear (FMWT). We fit all models using the bam function within the "mgcv" package (Wood 2011; in program R (R Core Team 2020), and we used basis dimensions of 25 for the latitude and longitude smooth, 5 for Julian day, and 5 for year (k = 625 total; Wood 2017). We chose these basis dimensions to ensure reasonable computational time while equally weighting the spatial and temporal components.
For the soap film, we used 17 manually placed internal knots and boundaries ( Figure 1). Significant differences between gear comparisons in relative catchability of species-size categories were based on whether 95% confidence intervals of the catch-ratio overlapped 1.

Predictive Performance
We assessed model performance by following methods described in Drexler and Ainsworth (2013). We fit GAMs with the same structure as the main analysis to a training data set and then predicted fish counts in the testing (out-ofsample) data sets for each species and size class. We compiled the training data set by randomly selecting 70% of data from each survey into one data set, with the testing data set constructed from the remaining 30% of data. We compared predicted counts to observed counts in the testing data set to determine model predictive performance for all species and size class combinations with sufficient data to fit models. We evaluated model accuracy by using root-meansquared-error (RMSE; Walther and Moore 2005). ( We also assessed model performance by the percent of observations in the testing data set that fall under the coverage of the 95% confidence intervals (Amundson et al. 2014). Additionally, we used an adjusted coverage metric, where the lower 95% confidence limit was changed to -0.1 for any prediction in which the upper 95% confidence limit was less than 1. We calculated this adjusted coverage metric because observed counts are integers, and observations of 0 often resulted in both upper and lower 95% confidence limits being less than 1 but not 0, thus not capturing any integer values (e.g., lower 95% CI = 0.1 and upper 95% CI = 0.7).
To ensure that the spatio-temporal smooth component of our model adheres to the current understanding of the ecology of the estuary's fishes, we plotted the predicted catch counts for two native species of interest: the Delta Smelt and Starry Flounder (Platichthys stellatus). Delta Smelt is a pelagic species that we expect to be better captured by a midwater trawl, while Starry Flounder is a demersal species that we expect to be more easily captured by an otter trawl. We constructed spatial prediction maps for a relatively high catch close to the midpoint in the time-series (2 December 1993) and a low catch year (2 December 2017) for Delta Smelt. Subsequently, to illustrate the general trend of the species catch over time, we calculated temporally smoothed predictions for the full time-series covered in the study for one location in Suisun Bay (where both species are known to occur).

RESULTS
The number of individual tows analyzed for this study was highest from the Bay Study (BSOT = 19,075 and BSMT = 16,782), followed by the FMWT (16,782); the fewest tows were from the SMOT (9,134). We identified a total of 45 fish species with sufficient data for GAM fitting ( Table 1). The majority of species were classified as demersal fishes (n = 25), followed by pelagic (n = 14) and littoral (n = 6, Table 1). Based on data conditioning, fewer fishes in the small size class (n = 25) than in the large size class (n = 43) could be analyzed.

Catch-Ratios
Estimated catch-ratios for fishes in the small size class demonstrated distinct patterns among gear types. Out of the 25 species in which catch-ratios could be estimated for the small size class, 22 had the FMWT as the reference gear type (Figure 2, Tables A1-A3). The remaining three species-  Table A1). Comparisons of catch-ratios between the FMWT and BSMT indicated that the relative catchability for 11 of 22 species in the small size class was higher in the FMWT than in the BSMT (5 demersal, 5 pelagic, 1 littoral), relative catchability for three species was higher in the BSMT than in the FMWT (2 demersal, 1 pelagic), and relative catchability for eight species was no different between the FMWT and the BSMT (6 demersal, 3 pelagic; Figure 2, Table A1).
Comparisons between the FMWT and both otter trawls (BSOT and SMOT) showed more distinct patterns in catch-ratios for small size classes based on fish life-history characteristics. Relative catchability was higher in the FMWT than in the BSOT and SMOT for four and five fish species, respectively (Figure 2, Tables A2 and A3). All but one fish species with higher relative catchability in the FMWT than either otter trawl survey was a pelagic species, with higher relative catchability of one demersal species in the FMWT than the SMOT (Plainfin Midshipman, Porichthys notatus; Figure 2, Table A3). Relative catchability of demersal fishes was higher for all 13 demersal species captured in high enough numbers to be compared between the BSOT and FMWT. Relative catchability was also higher for most demersal fishes in the SMOT than in the FMWT, where 78% of small demersal fishes had higher relative catchability in the SMOT (7 of the 9 demersal fish estimable between the two gears, Figure 2, Table A3).
A higher number of species that belonged to the large size class than small size class were fit for catch-ratios. A total of 43 species had sufficient counts among surveys to estimate catch-ratios (Table 1), with 41 catch-ratios estimated with the FMWT as the reference and the remaining 2 with the BSMT as the reference (see Tables A1-A3). Comparisons of catch-ratios between the FMWT and BSMT indicated that the relative catchability for 11 of 41 species in the large size class was higher in the FMWT than in the BSMT (5 demersal, 5 pelagic, 1 littoral), relative catchability for 11 species was higher in the BSMT than in the FMWT (8 demersal, 3 pelagic), and catch-ratios for 19 species were no different between the FMWT and the BSMT (9 demersal, 6 pelagic, 4 littoral; Figure 3). species with higher relative catchability in both otter trawls than the FMWT for fishes in the large size class. The relative catchability was higher in the FMWT than in the BSOT and SMOT for 57% (8 of 14 estimable between the two gears) and 75% (6 of 8 estimable between the two gears) of large pelagic fish species, respectively (Figure 3, Tables A2 and A3). Only 1 large demersal fish had higher relative catchability in the FMWT than in the SMOT (White Croaker, Genyonemus lineatus), and none had higher relative catchability in the FMWT than in the BSOT (Figure 3, Tables A2 and A3). When compared to the FMWT, relative catchability was higher for 100% of large demersal Figure 2 Heatmap depicting catch-ratio results for fishes in small size classes within the San Francisco Estuary, California. White panels (NA) indicate insufficient data were available to construct a catch-ratio that compared counts from the Fall Midwater Trawl (FMWT) with the comparison gear type. Colors represent catch-ratio estimates where the comparison gear was significantly higher (blue), lower (red) or not different (light blue) than the reference gear type (FMWT). Significance was based on whether 95% confidence intervals of the catch-ratio overlapped 1. Color of the taxa on the y-axis represent habitat associates, with blue colors for benthic fishes, orange for littoral fishes, and red for pelagic fishes. VOLUME 20, ISSUE 1, ARTICLE 3 fishes in the BSOT (22 of 22 demersal fish estimable between the two gears) and 79% in the SMOT (11 of 14 demersal fish estimable between the two gears; Figure 3, Tables A2 and A3). Large littoral fishes had higher relative catchability in otter trawls than in the FMWT for 80% of fishes (4 of 5 littoral fish estimable between the two gears; Figure 3, Tables A2 and A3).

Predictive Performance
Model predictive performance analysis indicated that confidence in catch estimates from GAMs varied among fish species and size classes.

Figure 3
Heatmap depicting catch-ratio results for fishes in large size classes within the San Francisco Estuary, California. White panels (NA) indicate insufficient data were available to construct a catch-ratio that compared counts from the Fall Midwater Trawl (FMWT) with the comparison gear type. Colors represent catch-ratio estimates where the comparison gear was significantly higher (blue), lower (red) or not different (light blue) than the reference gear type (FMWT). Significance was based on whether 95% confidence intervals of the catch-ratio overlapped 1. Color of the taxa on the y-axis represent habitat associates, with blue colors for benthic fishes, orange for littoral fishes, and red for pelagic fishes. 10.6%). After coverage adjustments for zero, similar patterns among life-history classification were observed but coverage increased (Table 1).
Here, we illustrate GAM results for large Delta Smelt and Starry Flounder, two native species of conservation importance within the estuary. Model predictive performance indicated that confidence could be placed on GAMs fit to each species, with relatively low RMSE (Starry Flounder = 3.1 counts, Delta Smelt = 4.1 counts) and high coverage (Starry Flounder adjusted for 0 = 90.2%, Delta Smelt adjusted for 0 = 90.5%; Table 1). Predicted spatial distributions (on Julian day 336 during 1993 and 2017, corresponding to 2 December 1993 and 2017) from GAMs fit to Delta Smelt and Starry Flounder was highest for both fishes near the Suisun Bay region of the estuary, as well as portions of the South and San Pablo bays for the Starry Flounder ( Figure 4). Temporally, predicted counts within the Suisun Bay region were highest for Delta Smelt from fall to spring (October to April), and in early years (before 1986) and middle years (1992 to 2005) of surveys ( Figure 4). Starry Flounder predicted counts were similarly highest in the early years of the survey, but counts were predicted highest during summer and fall months within the Suisun Bay region (Figure 4).

DISCUSSION
We demonstrated that considerable variability in catch-ratios exists among fish species, and among the different surveys commonly used to inform management and research in the estuary. Although differences in catch-ratios followed predictable patterns based on gear deployment within the water column (midwater vs. otter trawl) and the life-history characteristics of different fish species (demersal vs. pelagic), our results indicated that even surveys that target fish species with similar life-history characteristics could also differ in relative catchability. For example, catch-ratios suggested that relative catchability of Delta Smelt by the FMWT was greater than the BSMT, even though both gear types are midwater trawls. The differences among these surveys in spatial and temporal coverage is important because researchers and managers often combine survey data sets for a more holistic understanding of species status within the estuary (Polansky et al. 2019;Stompe et al. 2020). VOLUME 20, ISSUE 1, ARTICLE 3 Unless direct gear efficiency experiments have been conducted (see Mitchell et al. 2017Mitchell et al. , 2019Mitchell and Baxter 2021), studies that combine gear types must either include gear type in model structure to account for some variability in catch as a result of gear type (as fixed or often random effects) or assume equal catchability among gear types for each target species. Although this assumption may not be a critical flaw for some species with little differences in catchability among gear types, strong violations of this assumption can lead to significant bias in interpretations of spatial and temporal trends in population and community dynamics. We explore in more detail the consequences of these assumptions and future direction for the use of this approach below.
Detection efficiency, or catchability as it is commonly referred to in fisheries, can be complicated to estimate for any sampling method because it is not a single value (Nichols et al. 2009;Hostetter et al. 2019). In fact, catchability from trawling surveys is most commonly defined as the product of two probabilities: the probability that a fish is available to the gear when deployed; and the probability that a fish is captured by the gear, conditional on the fish being available to the gear (gear efficiency or retention probability) (Walsh 1997). Many of the differences in relative catchability among surveys in this study likely reflect the probability that fish are available to the gear more so than differences in gear efficiency. This is most obvious when comparing catch-ratios for fish species with different life- history classification between midwater trawls and otter trawls. For example, most fishes with higher relative catchability for the otter trawls than for the midwater trawls were demersal fishes-fishes that are associated with the benthos. Likewise, relative catchability for pelagic fishes was generally higher for midwater trawls than for otter trawls, reflecting again the differences between where the fishes and sampling gear are in the water column. The catchratio estimates of Walker et al. (2017) reflected gear efficiency because the spatial and temporal smoother they used captured availability, and all gear types sampled demersal fishes (beam and otter trawls). If we compared catch-ratios between gear types that sampled similar parts of the water column, our catch-ratios would similarly reflect relative gear efficiency differences (i.e., FMWT with BSMT and BSOT with SMOT).
Gear efficiency is likely responsible for differences in overall counts between the large and small size classes for many species in the data set. As a result of insufficient sample size, we were able to estimate only half the number of catch-ratios for the small size class compared to the large size class. Given the lower gear efficiency for fish < 50 mm in fork length seen in a previous study (Mitchell et al. 2017), it is not surprising that fewer small size classes of fishes (defined in this study as fish ≤ 50 mm in fork length) had sufficient counts to be analyzed. Even when using the same gear (e.g., midwater trawl), certain surveys may tow against the current while another tows with the current. Fish may be more capable of avoiding the cod end of the net with one method over another, depending on the species and their size. Minor inconsistency in methods such as this may partially explain why relative catchability can differ between surveys that use the same gear (e.g., midwater trawl).
Beyond size class breakdowns, there are multiple explanations why surveys sampling the same habitat (e.g., pelagic with midwater trawl) can have different relative catchability estimates. First, our study did not account for how total fish biomass within the trawl affected gear efficiency in the cod end of trawls. Higher biomass in the trawl is positively correlated to gear efficiency because of the reduction in mesh openings (usually toward the cod end), which prevents smaller fishes from passing through the net (Mitchell et al. 2017;Peterson and Barajas 2018;Huntsman et al. 2021b). This may explain some of the variability in relative catchability observed for some fish species captured by similarly operated gears (e.g., small Striped Bass between the FMWT and BSMT). Second, each survey has been conducted over multiple years in which different sampling crews, vessels, and gear replacement has potentially affected the efficiency by which each survey captured fishes (IEP 2021). For example, before 2001, field operators of the FMWT were inconsistent in their deployment of gear, sometimes deploying it before sunrise and other times after. Additionally, to reduce the number of Northern Anchovy bycatch, the FMWT used a larger meshed cod end to sample San Pablo Bay (Figure 1) in the 1990s and early 2000s, which may explain why the relative catchability of the pelagic Northern Anchovy in the small size class was lower than the otter trawl surveys. A final example is that minimum size cut-offs were used to determine which fishes were processed in each sample, but these minimum cutoffs varied over time and among surveys (IEP 2021). Consequently, differences in catch among surveys for the small size class may more reflect discrepancies in sample processing than in gear catchability. These are but a few of many examples for why differences in relative catchability may have occurred in this study, and further investigation into survey metadata may clarify these issues.
Predictive performance analyses indicated that model-predicted catches were reliable for many fish species and size classes, suggesting that our catch-ratio estimates were likewise reliable in those cases. However, there were a few notable exceptions for some species of conservation interest. Three species in particular-large size classes of Longfin Smelt, Northern Anchovy, and Striped Bass-were all found to have poor model performance based on coverage being less than 70% for each species. The reasons these species performed poorly are uncertain and will require further investigation. For example, anecdotal evidence from acoustic cameras suggest that large Striped Bass may avoid capture by midwater trawls because they can swim out of the trawl's sampling path (2021 in-person conversation between Fred Feyrer and BMH, unreferenced, see "Note"). Consequently, capture efficiency of large Striped Bass may reflect their ability to escape capture.

CONCLUSIONS
Future work using catch-ratios in the estuary could expand on these analyses in multiple ways. First, we used only three of the many surveys that collect information on fishes in the estuary (Stompe et al. 2020) and expanding the data set to include more surveys would likely improve predictive performance and provide insight into the relative efficiency of different gears at capturing targeted fish species. Expanding these analyses could help identify which gears and methods are most compatible for each species in terms of catchability, limiting the amount of catchability bias that would affect inferences about fish population and community dynamics that were drawn from analyzing multiple data sets. Additionally, analyses could include random effects for survey if more studies were added, making possible the decomposition of catch variance into spatial effects, temporal effects, and survey effects via variance component analysis (see Moriarty et al. 2020). Furthermore, the catch-ratios estimated here could be used to estimate true gear efficiency among gears similar to Walker et al. (2017). This would be possible if true gear efficiency estimates of the FMWT are available from covered cod end experiments (Mitchell et al. 2017;Mitchell and Baxter 2021). Therefore, catch-ratios with FMWT as the reference could be used in combination with estimates of the true gear efficiency of the FMWT to convert relative catchability of the BSMT to true catchability for the gear types we analyzed. True catchability could then be used to provide less biased population estimates for fishes in the estuary, similar to the approach taken by Polansky et al. (2019) that used catchability adjustments made available from paired gear deployments ) and covered cod end experiments (Mitchell et al. 2017). Currently, abundance estimates adjusted for catchability from trawling surveys have been made only for Delta Smelt (Newman et al. 2008;Polansky et al. 2019), but, as true gear efficiency estimates become available, this approach can be expanded to other taxa of conservation interest.
These analyses serve as valuable information for researchers and managers who use these data by identifying which gear types and species may be subject to greater catchability bias than others, as well as the potential issues that catchability may have by combining surveys into one analysis. For example, our catch-ratios suggest that catchability would have less of an impact on analyses when FMWT and BSMT data were combined for small American Shad (Alosa sapidissima) because the two gear types did not demonstrate significant differences in relative catchability between surveys. Alternatively, the FMWT was found to have a higher relative catchability than the BSMT for small Delta Smelt, indicating that issues of gear efficiency for this species would be important to address when these two similar gear types were combined in a data set. Unbiased estimates on the abundance and distribution of fishes is integral for fisheries management, and, by evaluating relative catchability bias among surveys, fish species, and size structure, our study provides an approach to facilitate integration of fish community data sets collected within the San Francisco Estuary.