We present a generalized framework for assessing the skill of global upper ocean ecosystem-biogeochemical models against in-situ field data and satellite observations. We illustrate the approach utilizing a multi-decade (1979-2004) hindcast experiment conducted with the Community Climate System Model (CCSM-3) ocean carbon model. The CCSM-3 ocean carbon model incorporates a multi-nutrient, multi-phytoplankton functional group ecosystem module coupled with a carbon, oxygen, nitrogen, phosphorus, silicon, and iron biogeochemistry module embedded in a global, three-dimensional ocean general circulation model. The model is forced with physical climate forcing from atmospheric reanalysis and satellite data products and time-varying atmospheric dust deposition. Data-based skill metrics are used to evaluate the simulated time-mean spatial patterns, seasonal cycle amplitude and phase, and subannual to interannual variability. Evaluation data include: sea surface temperature and mixed layer depth; satellite-derived surface ocean chlorophyll, primary productivity, phytoplankton growth rate and carbon biomass; large-scale climatologies of surface nutrients, pCO2, and air-sea CO2 and O2 flux; and time-series data from the Joint Global Ocean Flux Study (JGOFS). Where the data is sufficient, we construct quantitative skill metrics using: model-data residuals, time-space correlation, root mean square error, and Taylor diagrams. © 2008 Elsevier B.V. All rights reserved.