Application of biogeochemical models to the study of marine ecosystems is pervasive, yet objective quantification of these models' performance is rare. Here, 12 lower trophic level models of varying complexity are objectively assessed in two distinct regions (equatorial Pacific and Arabian Sea). Each model was run within an identical one-dimensional physical framework. A consistent variational adjoint implementation assimilating chlorophyll-a, nitrate, export, and primary productivity was applied and the same metrics were used to assess model skill. Experiments were performed in which data were assimilated from each site individually and from both sites simultaneously. A cross-validation experiment was also conducted whereby data were assimilated from one site and the resulting optimal parameters were used to generate a simulation for the second site. When a single pelagic regime is considered, the simplest models fit the data as well as those with multiple phytoplankton functional groups. However, those with multiple phytoplankton functional groups produced lower misfits when the models are required to simulate both regimes using identical parameter values. The cross-validation experiments revealed that as long as only a few key biogeochemical parameters were optimized, the models with greater phytoplankton complexity were generally more portable. Furthermore, models with multiple zooplankton compartments did not necessarily outperform models with single zooplankton compartments, even when zooplankton biomass data are assimilated. Finally, even when different models produced similar least squares model-data misfits, they often did so via very different element flow pathways, highlighting the need for more comprehensive data sets that uniquely constrain these pathways.