This work surveys the performance of several empirical models, all recalibrated to a common data set, that were developed over the past 25 years to relate freshwater flow and salinity in the San Francisco Estuary (estuary). The estuary’s salinity regime—broadly regulated to meet urban, agricultural, and ecosystem beneficial uses—is managed in spring and certain fall months to meet ecosystem objectives by controlling the 2 parts per thousand bottom salinity isohaline position (referred to as X2). We tested five empirical models for accuracy, mean, and transient behavior. We included a sixth model, employing a machine learning framework and variables other than outflow, in this survey to compare fitting skill, but did not subject it to the full suite of tests applied to the other five empirical models. Model performance was observed to vary with hydrology, year, and season, and in some cases exhibited unique limitations as a result of mathematical formulation. However, no single model formulation was found to be consistently superior across a wide range of tests and applications. One test revealed that the models performed equally well when recalibrated to a uniformly perturbed input time-series. Thus, while the models may be used to identify anomalies or seasonal biases (the latter being the subject of a companion paper), their use as inverse models to infer freshwater outflow to the estuary from salinity observations is not expected to improve upon the absolute accuracy of existing outflow estimates. This survey suggests that, for analyses that span a long hydrologic record, an ensemble approach—rather than the use of any individual model on its own—may be preferable to exploit the strengths of individual models.