Dynamically and Observationally Constrained Estimates of Water-Mass Distributions and Ages in the Global Ocean

AbstractA data-constrained ocean circulation model is used to characterize the distribution of water masses and their ages in the global ocean. The model is constrained by the time-averaged temperature, salinity, and radiocarbon distributions in the ocean, as well as independent estimates of the mean sea surface height and sea surface heat and freshwater fluxes. The data-constrained model suggests that the interior ocean is ventilated primarily by water masses forming in the Southern Ocean. Southern Ocean waters, including those waters forming in the Antarctic and subantarctic regions, make up about 55% of the interior ocean volume and an even larger percentage of the deep-ocean volume. In the deep North Pacific, the ratio of Southern Ocean to North Atlantic waters is almost 3:1. Approximately 65% of interior ocean waters make first contact with the atmosphere in the Southern Ocean, further emphasizing the central role played by the Southern Ocean in the regulation of the earth’s climate. Results of the a...


Introduction
This work aims to provide quantitative answers to two related questions: 1) What regions of the surface ocean are the most important in terms of ventilating the interior ocean? 2) What are the ventilation time scales of the interior ocean? A useful framework with which to answer these questions is to partition each Eulerian water parcel according to where and when its fluid elements were last in contact with the sea surface. The partitioning produces a normalized distribution function, P(r s , tjr), in which (r s , t) is the time and place of last contact with the sea surface and in which the conditioning argument r denotes the location of the fluid parcel in the interior of the ocean (e.g., Holzer and Primeau 2010, and references therein).
Many useful diagnostics of ocean ventilation can be obtained by computing the generalized moments of P, hg(r)i P 5 where V denotes the sea surface (e.g., Haine and Hall 2002;Primeau 2005;Primeau and Holzer 2006;Holzer and Primeau 2010). In particular, setting g(t, r s ) [ t yields the mean age of the water parcel, and setting g(t, r s ) 5 1 if r s is in a particular surface patch or outcrop V i and zero otherwise yields the fraction of the water mass that last ventilated from the V i patch. Viewed from the perspective of the above framework, many studies of ocean ventilation can be interpreted as attempts to provide quantitative answers to questions 1 or 2 by computing or estimating particular moments of P. Such studies fall into two distinct groups: those that have used ocean general circulation models (OGCMs) to simulate P or some of its moments (e.g., Thiele and Sarmiento 1990;England 1995;Deleersnijder et al. 2001;Khatiwala et al. 2001;Haine and Hall 2002;Primeau 2005;Peacock and Maltrud 2006;Primeau and Holzer 2006;Bryan et al. 2006) and those that have used tracer data to directly estimate moments of P (e.g., Broecker et al. 1998;Matsumoto 2007;Johnson 2008;Holzer et al. 2010;Gebbie and Huybers 2010;Holzer and Primeau 2010;Gebbie and Huybers 2012, hereafter GH12;Khatiwala et al. 2011, manuscript submitted to Earth Planet. Sci. Lett., hereafter KPH). Studies in the first group have made use of dynamical constraints encoded in the circulation model forced by climatological surface wind stress and buoyancy forcing but have ignored information provided by hydrographic tracer observations. In contrast, studies in the second group have used the hydrographic tracer data but have largely ignored important dynamical constraints provided by, for example, geostrophy and wind-driven Ekman transports. The goal of the present study is to combine a dynamical model with tracer data to obtain estimates of the moments of P that are mutually consistent with dynamical and observational constraints and thereby better quantify the surface ventilation patterns and time scales that renew the ocean's deep water masses.
There is a large body of work focused on combining dynamical models with data to investigate various aspects of the global ocean circulation, but to the best of our knowledge none has focused on providing quantitative answers to the questions posed in the first paragraph. For example, box-inverse models (Wunsch 1996) have been used to combine hydrographic observations with the dynamical information provided by geostrophic balance and wind-driven Ekman transports to infer the diapycnal mixing and circulation patterns that maintain the observed structure of water masses on global scales (e.g., Ganachaud andWunsch 2000, 2003). However, because box-inverse models do not resolve the regions separating the hydrographic sections used in the model, it is difficult to use them to simulate P or its moments. The studies performed as part of the Estimating the Climate and Circulation of the Ocean (ECCO) project (e.g., Stammer et al. 2002Stammer et al. , 2004Menemenlis et al. 2008;Mazloff et al. 2010) do provide a gridded estimate of the circulation, which can then be used to compute the ventilation diagnostics provided by the moments of P. (Khatiwala (2007) describes an efficient computational framework with which to compute ventilation diagnostics in the context of an ECCO-derived advection-diffusion transport operator and gives an illustrative example for a patch situated in the southeastern subtropical Pacific.) However, the ECCO data-assimilation effort, which is focused on climate variability on interannual to decadal time scales, is not necessarily appropriate for examining questions related to the ventilation of the deep ocean. The assimilation runs, which include the initial state of the model as part of the control variables, are not long enough for the deep ocean to reach equilibrium with the surface forcing. Information provided by natural radiocarbon, which integrates transport rates over thousands of years, is therefore not used as a constraint.
In contrast to the above model-data synthesis efforts, the adjoint methodology developed by Schlitzer (1993Schlitzer ( , 2007 combines simplified dynamical constraints and tracer observations in the context of a steady-state circulation model assumed to represent the climatologically averaged circulation. Because the model's water masses are in equilibrium with the surface forcing, Schlitzer (2007) was able to make use of radiocarbon observations together with other tracers to constrain deep and bottom water transports. The model of Schlitzer (2007) would therefore be well suited to estimate various moments of P in the deep ocean, but this has not yet been done.
In this paper, we adapt the adjoint methodology developed by Schlitzer (1993Schlitzer ( , 2007 to constrain an ocean circulation model using salinity, temperature, and radiocarbon tracer data and then use the resulting model to infer the moments of P. In section 2, we outline our approach and highlight briefly what we believe are the main similarities and differences in the formulation of our model and the one developed by Schlitzer (1993Schlitzer ( , 2007. The heat, freshwater, and mass transports in our optimal solution are presented briefly in section 3. The main results of the paper are found in sections 4 and 5 where we quantify, in terms of the moments of P and its adjoint P y , the time scales and regions with which the deep ocean communicates with the atmosphere (P partitions the water parcels according to the time and place of last contact with the atmosphere, whereas P y partitions the water parcels according to the time and place of their next, or first, contact with the atmosphere). The results for ventilation volume fractions and mean ages (i.e., mean last-passage times) should improve on existing data-based estimates that ignored dynamical constraints (e.g., Broecker et al. 1998;Matsumoto 2007;Johnson 2008;Holzer et al. 2010;Holzer and Primeau 2010;Gebbie and Huybers 2010;GH12;KPH). The results quantifying the regions where water masses are first exposed to the atmosphere as well as the mean time to make first contact with the atmosphere (i.e., mean first-passage times) should improve on the results of Primeau (2005) that were based on a simulation using an ocean general circulation model without taking advantage of constraints from actual temperature, salinity, and radiocarbon data. The results for exposure volumes and first-passage times are to our knowledge the first data-constrained estimates.

a. Overview of the model formulation and inference approach
We use a linearized dynamical ocean circulation model in which the baroclinic pressure terms are computed from observed climatological temperature and salinity data and to which we have added adjustable forcing terms to the horizontal momentum equations to correct for model errors (section 2b). The velocity field computed by the dynamical model is used in a tracer transport model (section 2c) to simulate the temperature, salinity, and radiocarbon distributions. All quantities simulated by our model are implicit functions of the adjustable parameters in the horizontal momentum balance. The temperature and salinity fields are also implicit functions of additional adjustable parameters used to control the air-sea exchange of heat and moisture. These adjustable parameters are determined by minimizing an objective function (section 2d) that measures the difference between (i) observed and simulated tracer fields, (ii) simulated air-sea fluxes of heat and moisture and independent data-constrained estimates of the same quantities, and (iii) the simulated mean dynamic topography and an independent data-constrained estimate of the same quantity.
Like Schlitzer (1993Schlitzer ( , 2007, our model satisfies the discretized steady-state tracer budget equations exactly and imposes only approximate dynamical constraints. However, our approaches differ in the way the dynamical constraints are imposed. Schlitzer's model includes the horizontal components of the velocity as control parameters and imposes dynamical constraints by including separate terms in the objective function that penalize deviations from the initial geostrophic shear computed from hydrographic data and from the linear vorticity balance. Schlitzer's model also includes separate penalty terms for lack of smoothness in the three components of the velocity fields. Apart from the already mentioned constraint on the mean dynamic topography, dynamical constraints in our model are imposed by adding to the objective function a term that penalizes large deviations from the model's horizontal momentum balance. Because our dynamical model encodes in a consistent way wind-driven Ekman dynamics, geostrophy, large-scale linear vorticity balances, and explicit eddyviscosity terms, we can capture with a single term the constraints that are enforced with separate terms in Schlitzer's model. Another difference between our model and the one developed by Schlitzer is that his model is constrained using nutrient, oxygen, dissolved inorganic carbon, and chlorofluorocarbon (CFC) data in addition to temperature, salinity, and natural radiocarbon data.
Although it is technically possible to include the additional tracers in our model, we have not included them in the present study.
Once we have obtained our optimized dynamical model, we use the resulting tracer transport model to compute the moments of P and P y to make inferences about ocean ventilation. Uncertainties in our inferences are quantified by propagating the uncertainty in the tracer data into our estimates of the moments of P and P y using a Monte Carlo approach and by varying the relative weight in the objective function of modeldata misfits and deviations from the dynamical model's explicit momentum balance (section 2f).

b. Dynamical model formulation
The dynamical model governing the advective velocity field used for the tracer transport equations is based on the linearized Navier-Stokes equations with the hydrostatic and Boussinesq approximations. The continuous model equations are given in appendix A. The grid for the discretized model is horizontally uniform with a 48 resolution and with 24 vertical levels ranging in thickness from 30 m at the surface to 650 m in the deepest layer. Friction due to subgrid-scale processes is parameterized with Laplacian viscous terms. The horizontal viscosity A h is set by the constraint that the width of viscous western boundary layers exceed the grid spacing (Large et al. 2001), which in our 48 resolution is satisfied by A h 5 400 3 10 3 m 2 s 21 . The vertical viscosity is set to A y 5 10 23 m 2 s 21 , which is typical of coarse-resolution models (e.g., Stammer et al. 2002).
The bottom topography was determined using the 2-minute gridded elevations/bathymetry for the world (ETOPO2v2) relief dataset by horizontal smoothing with a 28 averaging filter and interpolating to the model grid. Modifications were made to the topography after interpolation to deepen the Drake Passage to a depth of 3600 m and the Greenland-Norwegian overflow to a depth of 1200 m. The model grid does not include the Mediterranean or other marginal seas.
Using finite differences to discretize the model equations and appropriate boundary conditions at the basin boundaries leads to a matrix representation of the dynamical model where M is a sparse square matrix and s [ [u T v T w T h T ] T is the dynamical model's discretized state vector including the zonal (u), meridional (v), and vertical (w) velocities and the mean dynamic topography (h). The right-hand side d is a forcing vector that contains the baroclinic pressure forces plus the wind stress applied as a body force in the top layer of the model. The wind stress is obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) climatological wind stress fields (Trenberth et al. 1989), and the baroclinic pressure forces are computed directly from data using the World Ocean Atlas 2009 (WOA09) objectively analyzed temperature ) and salinity ) and the United Nations Educational, Scientific and Cultural Organization (UNESCO) seawater equation of state (Fofonoff and Millard 1983). We solve Eq.
(2) under the assumption that the circulation is in a climatological steady state. Because of errors in Eq. (2) due to the steady-state assumption, the neglected nonlinear inertial terms, subgrid-scale processes, errors in the data, and because of discretization errors, we add an error term e to the equation so that the steady-state dynamical balance can be written in matrix form, where e 5 [e T u e T y 0 T 0 T ] T . The error terms appear as forces in the horizontal momentum equations; the continuity and depth-integrated continuity conditions are satisfied exactly. The values of the additional forcing variables are determined by minimizing an objective function, as described in section 2d.

c. Tracer budget equations
Equilibrium tracer distributions are found by solving the discretized steady-state tracer transport equation (Schlitzer 2007) Tc 5 q(c), where T is an advection-diffusion transport operator; c is either potential temperature Q, salinity S, or radiocarbon D 14 C; and q(c) is a linear function of c that describes the sources and sinks of tracer due to air-sea fluxes and due to radioactive decay in the case of radiocarbon. The advection-diffusion operator is based on the dynamical model's velocity field using second-order centered finite differences, along with Laplacian diffusive fluxes with a horizontal diffusivity of 10 3 m 2 s 21 and a vertical diffusivity of 10 25 m 2 s 21 , which are typical values for coarse-resolution models (e.g., Stammer et al. 2002).
To simulate surface heat fluxes Q H and freshwater fluxes Q F , we restore to an effective temperature T e and salinity S e at the air-sea interface with a time scale t as 5 30 days. Here, T e and S e are included as control parameters in the minimization of the objective function. For radiocarbon, we simulate the effect of air-sea fluxes by restoring to the background radiocarbon estimates from the Global Ocean Data Analysis Project (GLODAP) (Key et al. 2004), which represent radiocarbon concentrations in the surface ocean prior to 1955, when nuclear bomb tests introduced elevated levels of radiocarbon into the atmosphere. We use a restoring time scale of t as 5 30 days in order to force surface concentrations to closely match the GLODAP estimates, although we take into account uncertainty in these estimates as described in section 2f.

d. Objective function
To determine the value for the adjustable control parameters, p 5 [e T u e T y T T e S T e ] T , we minimize an objective function, where ] T is the model state vector and y obs is a corresponding vector of observations; G yy is the covariance matrix for the observations; G pp is the prior covariance matrix for the control parameters; and v is a hyperparameter that controls the relative strengths of the data constraints and the parameter constraints. For the elements of y obs for which we do not include any observations (i.e., u, y, and w), we use a prior mean of 0 m s 21 and choose large variances for the corresponding entries of G yy (1 m s 21 ) 2 for u and y and (10 25 m s 21 ) 2 for w. These uncertainties are deliberately chosen to be about an order of magnitude larger than typical large-scale mean currents in the ocean, so as not to bias the model toward unnecessarily low velocities. Nevertheless, the variances are small enough to maintain the invertibility of G yy , which is useful for the numerical implementation of some minimization algorithms.
We assume a simple covariance structure that allows us to write the objective function as a summation of the contributions due to each state variable and control parameter where, for example, J S 5 (S 2 S obs ) T G 21 SS (S 2 S obs ) and analogously for the other terms in Eq. (6). We also appended a nonlinear constraint to J Q that prevents temperatures lower than the freezing point of seawater: The data constraints for our model are based on the WOA09 objectively analyzed annual-mean potential temperature ) and salinity ; the GLODAP background D 14 C estimates (Key et al. 2004), which are derived from an empirical relationship between D 14 C and potential alkalinity; the Archiving, Validation, and Interpretation of Satellite Oceanographic data (AVISO) mean dynamic topography (release MDT-CNES_CLS09), which estimates the mean of the sea surface height above the geoid for the period 1993-99 from satellite altimetry and gravity measurements and in situ (drifter and hydrographic) data; and the National Centers for Environmental Prediction (NCEP) 30-yr reanalysis data for air-sea heat and salt fluxes (Behringer and Xue 2004), which are averaged over the period 1980-2009. Construction of the covariance matrices for the observations is discussed in appendix B.
The prior expected values for e u and e y are set to 0 m s 22 and the prior expected values for T e and S e are set to the annually averaged values in the top layer of the WOA09 dataset. For G pp , we assigned a diagonal covariance matrix with a variance of (0.2 m s 22 ) 2 for e u and e y , of (58C) 2 for T e , and of (2 psu) 2 for S e . The variance for e u and e y corresponds to 1% of the spatial variance of the baroclinic pressure forces diagnosed from the climatological density field. The large prior variances for T e and S e allow the surface temperature and salinity to deviate from the annual-mean values, because surface temperature and salinity are more likely to represent end-of-winter conditions when mixed layers are deepest and when the bulk of surface waters enter the interior ocean.

e. Optimal solution
We define an optimal solution as the value of p that minimizes the objective function (5) and that satisfies the observational constraints within their uncertainty. The process of finding an optimal solution therefore involves finding a minimum of the objective function and finding the value of v that allows the model to fit the observations within their uncertainty, without overfitting the data.
For a given value of v, we use a limited-memory quasi-Newton algorithm to find the set of control parameters p that minimizes the objective function. The quasi-Newton algorithm computes an approximate Hessian matrix using successive gradient calculations and at each iteration uses that information to compute a descent direction in parameter space. Gradient calculations are performed using the adjoint model, similar to the approach taken by Schlitzer (1993Schlitzer ( , 2000Schlitzer ( , 2002Schlitzer ( , 2004Schlitzer ( , 2007 and in the ECCO data-assimilation effort (e.g., Stammer et al. 2002Stammer et al. , 2004. The minimization algorithm used here is publicly available (http://www.caam.rice.edu/heinken/ software/matlab_impl_constr), and a discussion of its application in nonlinear implicitly constrained optimal control problems is given by Heinkenschloss (2008). Using this method, as well as starting with an initial guess of e u 5 e y 5 0 and T e and S e set to annual-mean WOA09 values, we find a minimum of the objective function typically in about 1500 iterations.
To find the optimal value of v, we repeated the optimization for a range of v values starting with v 5 1 and gradually increasing its value until the observations were well fit, as judged by the J values for each of the observational constraints in the objective function (6). Using this process, we determined that the optimal value of v is about ffiffi ffi 5 p . At v 5 ffiffi ffi 5 p , the value of the objective function at its minimum is 11.4, with J Q 5 (1.2) 2 , J S 5 (1.2) 2 , J D 14 C 5 (0:7) 2 , J h 5 (1.6) 2 , J Q H 5 (0:5) 2 , and J Q F 5 (0:4) 2 . We tolerate J values above 1 (i.e., beyond the data uncertainty) for temperature and salinity because there is likely to be some model-data misfit due to the fact that the model does not resolve the seasonal cycle, and the errors of the annual-mean gridded datasets do not include this variability. As shown in Fig. 1, most of the excess model-data misfit for potential temperature and salinity, beyond the expected data errors, occurs in the upper 1000 m, where the seasonal cycle is influential. On the other hand, for radiocarbon, which is not appreciably affected by the seasonal cycle, the model-data misfit is generally of the same order of magnitude as the data uncertainty throughout the water column.
The spatial pattern of tracer model-data misfit shows that the largest errors in temperature and salinity occur in the upper ocean and mostly in the Southern Ocean, along the Antarctic Polar Front (APF), and in the region of the western boundary currents (Figs. 2a,b). These are all regions where the data uncertainty is large due to either undersampling (in the Southern Ocean) or large eddy variability (in the APF and along the western boundaries). The misfit between modeled and observed temperature and salinity in the deep ocean is very small and generally confined to the Southern Ocean region (Figs. 2d,e). For radiocarbon, the largest model-data difference in the upper ocean occurs in the North Pacific and north Indian Oceans, where modeled radiocarbon concentrations are higher than the GLODAP estimates (Fig. 2c). In the subtropical South Pacific, modeled radiocarbon concentrations are significantly lower than the GLODAP estimates. In the deep ocean, the largest model-data misfit occurs in the Pacific and Indian sectors of the Southern Ocean where the GLODAP background D 14 C estimates suggest waters more depleted in radiocarbon than in the model (Fig. 2f).
The modeled sea surface height field clearly captures the main features of the sea surface topography, including the subtropical high pressure zones, the large pressure gradients along western boundaries, and the eastwardsloping topography across the Atlantic and Pacific basins (Fig. 3a). The relatively large J value for sea surface height (J h 5 1.6) is likely due to the fact that our coarseresolution model is inadequate to fit all of the finescale features of the mean dynamic topography, particularly where the gradients are very steep such as at the APF and along the western boundaries of ocean basins (Fig. 3b). The model-data misfit is also large in the Arctic Ocean and along the Antarctic margin, but the data coverage in those particular areas is patchy so that large errors in these regions are not surprising.
It is also important to examine the model error terms e u and e y from the optimal solution to see if their magnitudes and spatial patterns are reasonable. The global-mean magnitude of the error terms in the optimal solution is 6.3 3 10 27 m s 22 . Much of this can be attributed to uncertainty in the density field used to compute the baroclinic pressure force. Propagating uncertainty in temperature and salinity through the seawater equation of state and the hydrostatic equation yields a global-mean uncertainty in the pressure force of about 3.7 3 10 27 m s 22 , over half of the mean model error. Inspection of the error fields shows that the largest errors are found in regions of large eddy variability and strong density gradients, such as along the APF and along the western boundaries of ocean basins ( Fig. 4), where the baroclinic pressure forces are highly variable.
Another factor that may cause model errors is the neglect of nonlinear inertial terms in the momentum equations. The model produces strong, narrow currents on the order of 40 cm s 21 near the surface in the regions of the western boundary currents and the Antarctic Circumpolar Current (ACC). Given that the width of these currents is as small as 200-400 km, the magnitude of the nonlinear terms u Á $u could be as large as 4-8 3 10 27 m s 22 , which could explain some of the large model errors in these regions (Fig. 4). Also, adjustments to the momentum balance may be required where the model is unable to resolve narrow currents flowing along steep topographical gradients. Model errors in the subarctic eastern Pacific and western tropical Pacific could be due to the steep topography there (Fig. 4). On the other hand, in the center of the ocean basins, away from regions with steep topography, fast currents, or strong eddy variability, the required adjustments to the momentum balance are negligible (Fig. 4).

f. Uncertainty
The previous section presented what we consider the optimal solution to our data-assimilated model, but we also wish to compute the uncertainty in this solution so that we might propagate the uncertainty to the modelcalculated water-mass distributions and ages. We consider two sources of uncertainty, uncertainty due to errors in the data and uncertainty due to prior assumptions about the model parameters (i.e., model error). To account for observational error, we replace y obs in the objective function (5) with a random sample from a probability density function with mean y obs and covariance G yy . To account for model error, we use a range of values for v in the objective function. Larger values of v correspond to smaller objective function values and a better fit to the observations. The range ffiffi ffi 4 p # v # ffiffi ffi 6 p produces a 20% range in the value of the objective function at its minimum, which accounts for a reasonable range of uncertainty in the magnitude of the model error terms e u and e y . (The final values of T e and S e are not affected by the chosen value of v, because their prior uncertainty is already quite large). Note that this does not account for all model errors, such as those incurred by possible misspecification of diffusive parameters or by aspects of the coarse resolution that cannot be fixed by adjustment of the horizontal momentum balance, and the influence of errors in these variables should be explored in future applications. In all, we obtain 93 different optimized solutions with different combinations of y obs and v. In the following analysis, all model-calculated diagnostics represent the mean of each quantity calculated with the 93 different model circulations, and uncertainties are one standard deviation of each quantity.

Heat, freshwater, and mass transports
Here we briefly describe the large-scale heat, freshwater, and mass transports of the model. We then proceed to the main focus of the paper, a discussion of our estimates of the moments of P and P y used to characterize where water masses communicate with the atmosphere at the sea surface (section 4) and the associated time scales for transport to and from the sea surface (section 5). Figure 5 shows the spatial distribution of the modeled heat and salt fluxes, as well as the net northward transports of heat and freshwater. The most significant differences between the modeled and NCEP mean annual air-sea heat fluxes occur in the Southern Ocean region, and this can be seen in comparing the northward heat transport, which shows a stronger poleward transport in our data-assimilated model (Fig. 5a). However, our dataassimilated model is in better agreement with the estimates from the box-inverse model of Ganachaud and Wunsch (2003), and the discrepancy between our model and the NCEP climatology is not necessarily a shortcoming, given the large uncertainties in the Southern Ocean (e.g., Bromwich et al. 2007). Peak northward heat transport occurs at about 208N in both our model and the NCEP climatology, although the NCEP northward heat transport is slightly stronger and both the data-assimilated model and the NCEP climatology predict weaker northward heat transport than does the box-inverse model of Ganachaud and Wunsch (2003). The implied air-sea freshwater fluxes from the data-assimilated model show a pattern of net evaporation in the subtropical gyre regions, as well as net precipitation in the intertropical convergence zone and the midlatitude storm tracks (Fig. 5d). The implied northward freshwater transport of our model and the NCEP climatology are qualitatively similar, although our model better matches the data-based estimates of Wijffels (2001) in the far Northern Hemisphere (Fig. 5b).
A general sense of the large-scale global ocean circulation is given by the meridional overturning circulation (MOC) for the four major ocean basins (Fig. 6) below about 3500 m, there is a weak northward flow of waters originating in the Southern Ocean. In the Pacific Ocean, the overturning circulation is much more sluggish than that in the Atlantic, with a broad counterclockwiserotating cell ventilating the deep ocean and upwelling deep waters along the equator and in the North Pacific. Circulation in the deep Indian Ocean is similarly sluggish, with a maximum transport of 4 Sv below 1000 m and with deep waters upwelling along the APF. Some small-scale noise is visible in the Pacific and Indian MOCs, because of the low viscosity and weak overturning circulation in those basins. The Southern Ocean MOC is dominated by two vigorous overturning cells, one associated with waters sinking along the Antarctic margin and flowing northward at depth and the other associated with deep and intermediate waters upwelling in the Southern Ocean and flowing northward at the surface. The separation between the two cells, which is an important biogeochemical divide (Marinov et al. 2006), occurs at about 658S in our model.

Global water-mass ventilation patterns
Figure 7a presents a regional breakdown of the volume fraction of global ocean waters in terms of where they were last in contact with the sea surface. Figure 7b shows a corresponding breakdown in terms of where interior waters will make their first contact with the sea surface. To obtain this decomposition, the ocean surface is divided into seven broad zonal bands: the Antarctic, the subantarctic, the southern subtropics, the tropics, the northern subtropics, the subarctic, and the Arctic. Each zonal band is further divided by ocean basin, resulting in 17 unique regions. These regions are similar to those defined by Gebbie and Huybers (2010), to facilitate comparison with their results, except that we also separate the tropics and subtropics along the 258C isotherm. The separation between Antarctic and subantarctic waters is defined by the 1027 kg m 23 isopycnal, and the separation between subantarctic and subtropical waters is defined by the 34.8-psu isohaline. Separation between the subtropics and subarctic regions is along the polar front, which in the North Atlantic is marked by the 35.4-psu isohaline and in the North Pacific is marked by the 34.0-psu isohaline. Separation of the Arctic and subarctic regions is along geographical boundaries.
The results show that the Antarctic is the most important ventilation region, ventilating 40% 6 3% of the interior ocean (Fig. 7a). This number agrees well with  (2010), which found that 36% of the interior ocean is ventilated from the Antarctic region and with KPH, which found 39% for the same quantity. However, we find that ventilation occurs throughout the Antarctic, with roughly 11% in from the Atlantic, 9% from the Pacific sector, and 20% from the Indian sector, supporting the idea that there are multiple sources of deep and bottom waters in the Southern Ocean (e.g., Hellmer and Beckman 2001). This is an interesting finding, considering that deep waters are traditionally thought to form mainly in the Weddell Sea and Ross Sea areas in the Southern Ocean, and is quite different from the studies of Gebbie and Huybers (2010) and KPH, which found that very little ventilation occurs in the Indian Ocean sector of the Antarctic. To what extent these differences can be attributed to the neglect of dynamical constraints in the studies of Gebbie and Huybers (2010) and KPH or to the neglect of constraints from nutrient and oxygen data in our model is unclear and will require further study. After the Antarctic, the most important ventilation regions are the subarctic North Atlantic, which ventilates 26% 6 1% of the interior ocean, the subantarctic (16% 6 1%), the southern subtropics (10%), the northern subtropics (5%), and the subarctic Pacific (2%). The tropics and the Arctic regions combined ventilate only 1% of the ocean (Fig. 7a). Note that this model does not resolve water masses originating in the Mediterranean Sea, which have been estimated to ventilate 2% of the ocean (Gebbie and Huybers 2010).
The Antarctic is also the main region where interior water masses are exposed at the surface (Fig. 7b). Over half (52% 6 2%) of interior ocean waters make first contact with the atmosphere in the Antarctic region, with the bulk of these waters resurfacing in the Indian Ocean sector (25% 6 2%). Most of the remainder of the interior ocean waters is upwelled rather broadly across the Pacific Ocean. These are the first such estimates from a dataconstrained model. Several studies have presented dataconstrained estimates of upwelling rates (e.g., Ganachaud and Wunsch 2000;Lu and Stammer 2004;Lumpkin and Speer 2007), but, in an advective-diffusive flow, the upwelling rate does not account for the full exposure rate nor does the upwelling rate scale in any general way with FIG. 5. (a) Net northward heat transport by the ocean calculated with our data-constrained model, the NCEP climatology, and the values estimated from the box-inverse model of Ganachaud and Wunsch (2003). (b) Net northward freshwater transport by the ocean calculated with our data-constrained model, the NCEP climatology, and the values from the data-based estimate of Wijffels (2001). Shading around the model estimates represents 1-s uncertainty estimates. Also shown are the spatial patterns of (c) the air-sea heat flux and (d) the air-sea freshwater flux from the data-constrained model. the total volume of waters exposed at the surface. The only other estimate of where interior waters are first exposed to the atmosphere is from the OGCM calculations done by Primeau (2005), which found that about 40% of interior ocean waters were first exposed to the atmosphere in the Southern Ocean and another 30% were first exposed in the equatorial and North Pacific. Compared to these estimates, we find a much more significant fraction of waters being exposed in the Southern Ocean (almost 70%, including waters being upwelled in both the Antarctic and subantarctic regions) and less in the equatorial and North Pacific. [That the circulation model of Primeau (2005) produced too much equatorial upwelling in the Pacific is also evident from its deficiencies when used for biogeochemical studies (Kwon andPrimeau 2006, 2008).] Our result is also an important piece of evidence supporting the argument that the Southern Ocean plays a key role in the global carbon cycle, because it is the place where much of the ocean's carbon-rich old waters are first exposed to the atmosphere (e.g., Toggweiler 1999;Sigman et al. 2010;Skinner et al. 2010;Kwon et al. 2011).
Looking at the ventilation volume fractions in more detail, the distribution of water masses in the interior ocean clearly shows the dominant influence of Antarctic waters and, secondarily, North Atlantic waters in both the Atlantic (Fig. 8) and the Pacific (Fig. 9) basins. In these plots, North Atlantic and North Pacific waters are composed of water masses forming in the northern subtropical and subarctic regions of their respective basins, whereas subtropical waters represent waters forming in the subtropics of the Southern Hemisphere. In the Atlantic basin, North Atlantic waters flow southward at depth and are gradually diluted by northward-flowing Antarctic intermediate and bottom waters (Fig. 8). Subantarctic waters penetrate northward at intermediate depths, whereas subtropical mode waters are confined mainly to the thermocline. Our results suggest that more than 20% of the Atlantic water north of 308N in the depth range between 250 and 1200 m originated in the Southern Ocean. This result is significantly higher than the estimates of Holzer et al. (2010), who found that only between 5% and 10% of the water north of 308N originated in the Southern Ocean. The estimate of Holzer et al. (2010) is based on a maximum entropy regularization of the underdetermined water-mass analysis problem. Their estimate was constrained using phosphate, oxygen, and CFC data in addition to the same tracers we used, but without taking any dynamical constraints into consideration. To reconcile these differences it will be important to add CFC data to our model in the future because CFCs provided an important constraint in the study of Holzer et al. (2010).
In the Pacific basin, Antarctic waters are dominant, with secondary contributions from North Atlantic and subantarctic waters (Fig. 9). The influence of subtropical and North Pacific waters is mainly confined to the thermocline and above. The large-scale distribution of water masses presented here is mostly in agreement with the decomposition of Gebbie and Huybers (2010), lending confidence to the conclusions of both studies. The agreement also suggests that, for the purpose of a watermass analysis, the dynamical constraints that we used for our estimation but were not used in the study of Gebbie and Huybers (2010) can act as a surrogate for the constraints provided by the nutrient and oxygen tracer data that were used in the study of Gebbie and Huybers (2010) but not in our study.
In the deep ocean, the dominant water masses are Antarctic and North Atlantic waters, with subantarctic waters also present in the deep Indian and Pacific Oceans (Fig. 10). Globally, the ratio of Antarctic to North Atlantic waters is about 1.5:1 in the deep ocean and slightly more than 2:1 in the deep Indian and Pacific Oceans. This is similar to the mixture determined from the inverse model of Gebbie and Huybers (2010) and from the water-mass decomposition of Johnson (2008). However, it is significantly greater than the 50/50 mix of Antarctic and North Atlantic deep waters inferred by Broecker et al. (1998) on the basis of the distribution of oxygen and phosphate in the deep ocean. Including subantarctic waters, the ratio of water masses originating in the Southern Ocean to those originating in the North Atlantic is nearly 3:1 in the deep Pacific Ocean.

Global water-mass ages and first-passage times
As discussed in the introduction, a useful measure of the ventilation time scale is the mean age. Note that the mean age is also referred to as the mean ventilation age (e.g., DeVries and Primeau 2010), the ideal age (e.g., Thiele and Sarmiento 1990), or the mean last-passage time (Primeau 2005). Using our data-constrained circulation model, we calculated the first moment of P with respect to t using the method described in Primeau (2005) to estimate the mean age. Zonally averaged mean ages for the Atlantic, Pacific, and Indian basins are shown in Figs. 11a-c, and a depth-averaged plot of the age for waters below 2000 m is shown in Fig. 12a and summarized in Table 1. The oldest waters are found between 2500 and 3000 m in the northeast Pacific basin, where the maximum age is approximately 1450 yr.
Simulations of the mean age in numerical ocean models also consistently produce the oldest waters in the northeast Pacific basin, but the depth of this maximum is usually shallower than our estimate, ranging from 1750 to 2500 m, depending on the model (England 1995;Deleersnijder et al. 2001;Primeau 2005;Peacock and Maltrud 2006). The value of the maximum ages in these models also varies considerably between models but is consistently older than our estimate: ;1500 yr (England 1995), 16001 yr (Peacock and Maltrud 2006), ;2000 yr (Primeau 2005), and more than 2200 yr (Deleersnijder et al. 2001). Data-based estimates averaged over depths greater than 1500 m are available from the studies of Matsumoto (2007) and KPH. The study of Matsumoto (2007) finds a maximum depth-averaged age of ;1100 yr, whereas the study of KPH obtains a maximum depth-averaged age of ;1250 yr. Over the same depth interval, we find a maximum depth-averaged age of 1350 6 25 yr, about 250 yr older than the Matsumoto (2007) estimate and 100 yr older than the KPH estimate. The Pacific zonally averaged age estimates of KPH obtained using the maximum entropy deconvolution method are broadly consistent with our estimates in terms of the depth of the maximum ages, but the ages of KPH are approximately 100 yr younger than our estimates. The deconvolution of tracer data for a depth profile in the eastern North Pacific of Holzer and Primeau (2010), also based on the maximum entropy deconvolution method, yielded a most probable value of approximately 1350 yr with a half probability uncertainty interval of (1250 yr, 1650 yr) as the maximum age. Although this last result is also 100 yr younger than our estimate of the maximum age (;1450 yr), the results are consistent given the large error bars. Recently, GH12 used a decomposition of ocean water masses based on the Total Matrix Intercomparison (TMI) method combined with radiocarbon data to estimate the age. They show a maximum age of FIG. 9. As in Fig. 8, but for the Pacific Ocean. The North Pacific (subpolar and subtropical regions combined) is also shown here because it contributes significantly to ventilating the Pacific basin.
;1500 yr in the North Pacific. For the 2500-m depth, we obtain a maximum age of 1430 6 30 yr, which is only slightly younger than the result of GH12. In the Atlantic Ocean (Fig. 11a), mean ages reach a maximum of about 800 yr in the deepest Atlantic south of the equator, but the depth-averaged ages below 2000 m are generally between 200 and 400 yr. These results are consistent with the maximum entropy inversions of Holzer and Primeau (2010). Again we see that the dynamical constraints can act as a surrogate for the additional tracer constraints provided by phosphate, oxygen, 39 Ar, and CFC data that were used in Holzer and Primeau (2010). In contrast, the model-based ages of Primeau (2005) were in excess of 800 yr for all depths greater than 3000 m in the North Atlantic.
In the Indian basin, our age estimates reach a maximum of about 1100 yr in the deepest north Indian Ocean, which is about twice as old as the model-based calculations of Primeau (2005) but very close to the model-calculated ages of Peacock and Maltrud (2006), which were around 1200 yr.
Our age estimates for the Southern Ocean ranged between 300 and 600 yr and were much more uniform with depth compared with the other basins. Similar results were obtained in Holzer and Primeau (2010) and KPH. As suggested in Holzer and Primeau (2010), the uniformity of the ages in the Southern Ocean indicates that it is an important region for the blending of different water masses and, as shown in section 4, an important region for the re-exposure of old waters to the surface.
In addition to the age, an important time scale for the renewal of interior water masses is the first t moment of the P y distribution, which yields the mean first-passage time, or the mean time that will elapse before water parcels in the interior ocean will be exposed at the surface. We used the method described in Primeau (2005) to compute this quantity with our data-constrained circulation model. The results, summarized in Table 1, are shown in Figs. 11d-f and 12b.
The mean first-passage time is relevant for a particular class of geoengineering solutions to the problem of rising anthropogenic CO 2 , which seek to sequester carbon in the deep ocean by either direct injection (e.g., Herzog et al. 2003) or fertilization of marine biota (e.g., Strong et al. 2009). Once the carbon is dissolved in the deep ocean, it will be returned to the surface ocean (and released to the atmosphere) with a time scale determined by the first-passage time distribution. In an Intergovernmental Panel on Climate Change (IPCC) special report on carbon sequestration, Metz et al. (2005) cited a ventilation age of 700-1000 yr for the age of deep North Pacific waters, but our calculation of the mean first-passage time shows that the relevant time scale for carbon sequestration is over 1000 yr everywhere below about 2000 m in the North Pacific, reaching a maximum of about 1500 yr in the bottom waters (Fig. 11d). It is also interesting to note that the mean first-passage times in the North Atlantic are much older than the mean lastpassage times there (Figs. 11a,d). This shows that the waters in the deep North Atlantic are recently ventilated but will not resurface for another 500-1000 yr. As discussed in section 4, most of these waters will resurface in the Southern Ocean or much later in the Pacific Ocean.
These calculations represent the first data-constrained estimates of mean first-passage times. Calculating these directly from data is difficult because most age tracers (e.g., radiocarbon, CFCs) better constrain the surfaceto-interior transport time scale rather than the return interior-to-surface flow. Here, the addition of dynamical constraints allows us to infer the absolute velocity and direction of ocean currents, from which the first-passage times are calculated. These data-constrained mean firstpassage times show some significant difference from the model-based calculations of Primeau (2005). Significant differences occur in the Atlantic Ocean, where mean first-passage times in deep waters are about 400-500 yr younger in our data-constrained model compared to the results of Primeau (2005). Presumably, this is because our model has more waters upwelling in the Southern Ocean and less in the equatorial and North Pacific, compared to the model of Primeau (2005). The data-constrained model also shows mean first-passage times of around 400-800 yr in the deep Southern Ocean, which is much younger than the 1000-1600 yr calculated by Primeau (2005). Our calculations suggest a vigorous upwelling in the deep Southern Ocean and rapid transport of these deep waters to the surface. The global distribution of mean last-passage times in the deep ocean shows the gradual aging of deep waters, which flow from the Atlantic basin eastward into the Indian and Pacific Oceans (Fig. 12a). It is interesting to note that both the oldest last-passage times and oldest first-passage times are found in the deep North Pacific. The fact that firstpassage times are so large in the North Pacific runs counter to the concept of the ''great ocean conveyor'' (GOC) (Broecker 1991) with a terminus in the deep North Pacific, which would suggest large last-passage times but short first-passage times. Rather, the deep North Pacific is more aptly characterized as a ''holding pen'' of old waters, which have not only taken a long time to reach the North Pacific but will also recirculate in the interior ocean for over 1000 yr before being returned to the surface. This result is qualitatively consistent with the deep North Pacific pattern of the ''diffusive ocean conveyor'' described in the modeling studies of Holzer and Primeau (2006a,b). Table 1 presents a summary of the age analysis for deep-ocean waters grouped by ocean basin. Although uncertainties in water-mass ages at the grid scale can be significant (.100 yr), we find that uncertainties on largescale integrated quantities are quite small, between 20 and 30 yr. It is interesting to compare the mean last-passage times in Table 1 for the various ocean basins with the radiocarbon ''circulation age'' calculated by Matsumoto (2007) from the distributions of radiocarbon, phosphate, and oxygen in the deep ocean. Both methods yield a qualitatively similar picture of the distribution of ventilation ages, but the ages from our model are generally older than the ages computed by Matsumoto (2007). Results from our data-assimilated model suggest that the basin-wide average ventilation age of waters in the deep Pacific Ocean (omitting the Southern Ocean) is in the range 1250-1300 yr, compared to the 916-yr radiocarbon circulation age calculated by Matsumoto (2007) for the deep Pacific Ocean. For the Indian Ocean basin, mean age from our data-assimilated model is about 950-1000 yr, compared to 769 yr calculated by Matsumoto (2007). In the Atlantic Ocean basin, we estimate a mean ventilation age in the range of 310-380 yr, slightly greater than the 274 yr calculated by Matsumoto (2007). The last column in Table 1 gives the total surface-to-surface residence time of waters in the interior ocean, which represents the sum of the first-and last-passage times. The total residence time of waters in the deep ocean is more than 2000 yr throughout most of the Indian and Pacific Oceans. The general picture of the deep-ocean circulation outlined here clearly shows that communication between the surface and interior ocean occurs on millennial time scales.

Summary and conclusions
This study used a global data-constrained ocean circulation model to characterize transport between the ocean's surface and interior. Unlike other studies, which have used tracer data to obtain global water-mass decompositions (Broecker et al. 1998;Johnson 2008;Gebbie and Huybers 2010;KPH) or ventilation ages (Matsumoto 2007;KPH;GH12), this study used not only tracer data but also dynamical constraints to obtain a complete water-mass decomposition and age analysis of the global ocean. This is also the first study to use a dataconstrained model to characterize the ocean's interior-tosurface transport, providing estimates of where interior  waters are first exposed to the atmosphere and the time for deep waters to be transported back to the surface. Results from our model indicate that the interior ocean is composed mainly of waters originating in the Antarctic region of the Southern Ocean south of the APF (which ventilates about 40% of the interior ocean), the North Atlantic (;25%), and the subantarctic region of the Southern Ocean (;15%). The ratio of Antarctic to North Atlantic waters in the deep Pacific Ocean is about 2.5:1. Interior ocean waters are first exposed to the atmosphere mainly in the Southern Ocean (primarily the Antarctic region), where about 60% of the interior ocean waters are first exposed to the atmosphere, and more broadly in the Pacific Ocean north of about 408S, where about 25% of interior ocean waters are exposed at the sea surface. Results of the age analysis suggest that the oldest waters in the ocean are found in the middepth North Pacific, where mean last-passage times (i.e., mean ventilation ages) are on the order of 1400-1500 yr. Mean first-passage times (the time for deep waters to be reexposed at the surface) also reach a maximum of about 1500 yr in the deep North Pacific, suggesting that the North Pacific can be characterized as a ''holding pen'' of old and stagnant water masses.
We expect that the main results of this study are robust and will hold up to further refinements of the model and to the addition of new data constraints, provided that the assumption that the ocean can be interpreted as being in a climatological steady state holds true. We propagated uncertainty in the data through our model and found that the associated uncertainty in the large-scale integrated quantities considered here was very small, mostly less than 10%. Furthermore, the magnitudes and spatial patterns of the model error terms were found to be reasonable and in accord with prior expectations. Based on these observations, we have no reason to suspect large inaccuracies in the water-mass distributions and ages presented here. One important point to make is that, where direct comparisons were possible, we found that our results were in good agreement with other data-based estimates that used more observational constraints than us but that did not use the dynamical constraints used in this study (e.g., Gebbie and Huybers 2010;KPH;GH12). This suggests that the dynamical constraints used here can effectively serve as a surrogate for the missing data. This further suggests that the additional information provided by nutrient and oxygen data could in our model be used to constrain uncertain biogeochemical fluxes and parameters (Schlitzer 2000(Schlitzer , 2002(Schlitzer , 2004. In addition to incorporating more observational constraints, some refinements of the dynamical model are likely to be helpful if a finer-scale water-mass census or age analysis is desired. In particular, resolution of the seasonal cycle is likely to be important in order to obtain a more accurate representation of the distribution of water masses in the upper ocean and thermocline. Refining the grid size should also help to obtain more realistic flows in areas of steep topography and to resolve water masses originating from marginal seas such as the Mediterranean Sea. We plan to address these issues in future work. and r 0 is a constant reference density; (t l , t f ) is the wind stress vector applied as a body force on the top layer of the model (the d 1k operator); Dz 1 is the depth of the top layer of the model; f is the Coriolis parameter; g is gravitational acceleration; a is the radius of the earth; A H and A V are the horizontal and vertical eddy-viscosity coefficients; $ h is a horizontal Laplacian operator; l is longitude and f is latitude; and e u and e y are adjustable parameters to account for data and model errors.
Additional constraints for the model are obtained from the continuity equation, 1 a cosf [(y cosf) f 1 u l ] 1 w z 5 0 (A2) together with the kinematic surface boundary condition h t 5 w at z 5 0, which given the rigid-lid approximation w 5 0 can be enforced in terms of a condition on the depth-integrated continuity equation, One final constraint is necessary to remove a null space of the dynamical equations, and this is provided by the constraint that the area integral of h be 0, so that the total volume of the ocean is conserved. The condition (A3) eliminates a dynamical-free mode that was present in the equations of Schlitzer (1993) and that necessitated computing a depth-independent velocity at each model grid point as part of the solution to the inverse model (Schlitzer 1993). In comparison to the data-assimilation method of Schlitzer (1993), a further advantage of applying condition (A3) is that it allows application of mean dynamic topography estimates as additional constraints on the model solution (see sections 2b and 2e).

Error Covariance of Observations
To obtain the diagonal elements in the error covariance matrix G yy for the objective function (5), we used a Monte Carlo procedure in which synthetic datasets were produced by adding spatially correlated noise to the original data. To produce spatially correlated noise, we solved a diffusion equation on the original data grid, r 2 = 2 2 1 z 1 n 5 0, where = 2 is the Laplacian operator with no-flux boundary conditions at lateral boundaries, r is the correlation length scale, and n is a random draw from a Gaussian distribution with zero mean and variance estimated from the original dataset (see below). The smoothed error field z was then returned to the original variance by multiplying by an appropriate scale factor and then interpolated to the model grid. We repeated this procedure 100 different times to yield 100 synthetic datasets. The variance of the observations on the model grid was then determined by taking the variance of the resulting 100 synthetic datasets. The full covariance matrix was then calculated using Eq. (B1) on the model grid to determine the off-diagonal elements of the covariance matrix, and the resulting matrix for each dataset was factored and stored to facilitate calculations with G 21 yy . For temperature and salinity, the data variance was estimated as the sum of a contribution from the raw data and from the objective mapping procedure (e.g., Schmittner et al. 2009). The variance due to the objective mapping was estimated by calculating the deviation between the objectively analyzed and mean temperature (or salinity) fields, squaring the result, and averaging over each depth level. This depth-dependent error was added to the spatially variable squared standard error to produce the total variance for each field. We assumed an error correlation length scale of 450 km for the gridded datasets, as was done by Gebbie and Huybers (2010). For background D 14 C, we used the error estimate supplied with the GLODAP dataset, which includes both sampling and mapping error. However, we multiplied the quoted errors by a factor of 2, to take into account the fact that the GLODAP errors are most likely underestimates of the true error (Key et al. 2004). We assumed an error correlation length scale of 1550 km in the zonal direction and 740 km in the meridional direction, following the correlation length scales used in the objective mapping procedure used to create the GLODAP gridded datasets (Key et al. 2004).
No formal error estimates were provided with the NCEP reanalysis data, but for the air-sea heat and freshwater fluxes we estimated the variance by calculating the interannual variability of the NCEP reanalysis data for the 30-yr period 1980-2009. Inspection of the patterns of interannual variability revealed that the interannual variability is highly correlated over large length scales, and so we chose 450 km as a reasonable estimate of the correlation length scale of the errors in heat and freshwater fluxes. For sea surface height, we used the variance estimate supplied by AVISO with the MDT-CNES_CLS09 release. No error correlation length scale was provided with the AVISO dataset, but we used an estimate of 100 km based on the results of Knudsen and Tscherning (2005).