This paper presents the ECCO v4 non-linear inverse modeling framework and its baseline solution for the evolving ocean state over the period 1992–2011. Both components are publicly available and subjected to regular, automated regression tests. The modeling framework includes sets of global conformal grids, a global model setup, implementations of data constraints and control parameters, an interface to algorithmic differentiation, as well as a grid-independent, fully capable Matlab toolbox. The baseline ECCO v4 solution is a dynamically consistent ocean state estimate without unidentified sources of heat and buoyancy, which any interested user will be able to reproduce accurately. The solution is an acceptable fit to most data and has been found to be physically plausible in many respects, as documented here and in related publications. Users are being provided with capabilities to assess model–data misfits for themselves. The synergy between modeling and data synthesis is asserted through the joint presentation of the modeling framework and the state estimate. In particular, the inverse estimate of parameterized physics was instrumental in improving the fit to the observed hydrography, and becomes an integral part of the ocean model setup available for general use. More generally, a first assessment of the relative importance of external, parametric and structural model errors is presented. Parametric and external model uncertainties appear to be of comparable importance and dominate over structural model uncertainty. The results generally underline the importance of including turbulent transport parameters in the inverse problem.

The history of inverse modeling in oceanography goes back at least 4 decades

Three possible approaches to gridding the globe. (Left) LL maps the
earth to a single rectangular array (one

The original implementation has been extended substantially over subsequent
decades, and some of the key technical developments are worth recalling

Time dependency and the use of Lagrange multipliers (i.e., the adjoint
method) were first introduced in ocean inverse modeling by

The MITgcm AD capabilities remain exceptional amongst general circulation
models. Over the last decade, in the context of the Estimating the
Circulation and Climate of the Ocean (ECCO) project, the MITgcm non-linear
inverse modeling framework (using the adjoint method and algorithmic
differentiation) has become a common tool for data synthesis, applied by many
investigators to derive ocean state estimates

General circulation models implement the primitive equations, which extend far beyond the physics and numerics used in common inverse box models. On the one hand, they readily provide a versatile tool for dynamical interpolation of virtually all types of observations. On the other hand, numerical modeling has to be regarded as an integral part of non-linear inverse modeling, and as a primary responsibility of groups carrying ocean state estimation. Indeed, the quality of the model and the adequacy of its settings determine the physical consistency of ocean state estimates. Hence the state estimation group at MIT has become a main contributor of MITgcm code including but not limited to the implementation of the estimation framework. Furthermore, the development of the new ECCO version 4 (ECCO v4) estimate described here started with an extensive revisit of MITgcm settings.

These considerations prompt the joint depiction of forward model setup and estimation framework developments as part of ECCO v4, and of the baseline solution of the non-linear inverse model. The overarching goal, which is essential to the oceanographic community, is the unification of the two pillars of science, namely observations (the emphasis here is on data of global coverage) and theory (of which general circulation models are a vehicle). Thus, the synergy between data analysis and modeling is a guiding thread of this paper.

As a complement to this paper, and a number of associated publications, the setup and baseline solution of ECCO v4 are thoroughly documented by an extended suite of diagnostics (the “standard analysis” provided as the Supplement) that users can readily download or reproduce. Daily and monthly regression tests are run for, respectively, a few time steps and 20 years. This will allow, for the foreseeable future, any user to generate additional output that may be needed for extended data and model analyses. Thus, the authors aim to provide ECCO v4 as a fully integrated non-linear inverse modeling framework, including its baseline time-dependent solution, that any interested user can readily analyze and/or accurately re-run.

Average grid spacing for LLC90 (in km) computed as the square root of grid cell area. LLC90 denotes the LLC grid with 90 grid points as the common face dimension (i.e., along one-quarter of the earth's circumference at the Equator).

The foundation of the ECCO v4 model setup is a set of global grids of the earth's surface (Sect. 2). The design, implementation and specification of the forward model setup and of the estimation framework are presented in Sects. 3 and 4, respectively. The baseline ECCO v4 solution (the ECCO v4, release 1 state estimate) is the subject of Sect. 5, which is followed by conclusions and perspectives (Sect. 6).

The most visible grid improvement, as compared with earlier ECCO
configurations, is the extension of the gridded domain to the
Arctic. This limitation of ECCO estimates produced until 2008 was due
to the use of a latitude–longitude grid (LL; left panel of
Fig.

The cubed-sphere grid (CS; center panel in Fig.

However, a number of shortcomings of CS have been noted. First, loss of
orthogonality near the cube corners is exacerbated when increasing horizontal
resolution. Second, some of the vertices have to be placed on ocean-covered
areas, and have an exceedingly high resolution, requiring unnecessarily small
time steps. Third, such grids represent an obstacle for new users who were
accustomed to latitude–longitude grids. These considerations led to the
design of the Lat-Lon-Cap grid (LLC; right panel in Fig.

the grid reverts to a simple LL sector between 70

grid vertices are located over land; and

grid heterogeneities remain acceptable at

An alternative version of LLC that remains locally isotropic in the tropics is also available.

Poleward of 57

For any given Arctic face dimension, LLC has the added
advantage of an increased resolution in the Arctic as compared with CS, which
has vertices at 45

Grid spacing details for LLC90 as a function of latitude, in
km. Between 70

Looking beyond the immediate need for a truly global coarse-resolution grid,
we chose to generate a parent

The resolution along the Equator is quoted as

The parent

Number
17 280 is known as the compositorial of 10, i.e., the product of composite
numbers less than or equal to 10

Advanced gridding has clear advantages from the standpoint of numerical ocean
modeling. It can however put additional burdens on users of ocean model
output, who may find themselves coding the same diagnostics over and over
again to accommodate different grids. One common approach is to distribute
fields that were interpolated to a simpler grid (e.g., LL). This approach,
however, tends to introduce sizable errors (e.g., in areal integrals and
transports). A different and simple approach to the analysis of model output
is chosen here that does not alter the results but alleviates the burden of
grid specifics when analyzing model output – the gcmfaces analysis framework
that mimics the gridded earth decomposition of general circulation models in
Matlab (Appendix

Selected interior and boundary model parameters. A more exhaustive list of model parameter settings is available within the model standard output (text file). For each group of parameters, the file where it is defined at run-time is indicated in square brackets in the last column. Parameters reported as “first guess” are further adjusted as part of state estimation (see Sects. 4 and 5).

The model configuration presented below is the ECCO v4 setup used in state
estimation (Sects. 4 and 5) and based on the LLC90 grid (Sect. 2). Variants
of the ECCO v4 setup are also used in un-optimized model simulations

The MITgcm, as configured in ECCO v4, solves the hydrostatic, Boussinesq
equations

The relative importance of various model settings generally depends on the
ocean state characteristic of interest. Here, a selection of ocean state
characteristics is made amongst squared model–data distances (see Sect. 4),
monthly time series of global mean quantities, and time-averaged meridional
transports (Table

Ocean state characteristics used to verify 20-year model solutions
(Appendix

For a water column that extends from the bottom at

For practical reasons, the
vertical velocity calculated by the model (

The

Apart from the horizontal grid and the vertical coordinate

ECCO v4 uses a non-linear free surface combined with

The forcing terms

With the non-linear free surface, the water column thickness varies as the
free surface goes up and down (as is apparent in
Eqs.

Each

Earlier ECCO configurations relied on the linear free-surface method (LFS),
where column thickness and grid-cell thickness are fixed in time. The LFS
version of Eqs. (

The symmetry between continuity (Eq.

Regression testing of (top three rows;
Appendix

Ocean tracers are advected by the residual mean velocity

Root mean squared vertical velocity at 2000

Diffusion includes diapycnal and isopycnal components, the GGL mixed-layer
turbulence closure

Parameters of the momentum Eq. (

Previous ECCO configurations used the C-D scheme

A comparable damping of the barotropic circulation could be obtained through
a large increase in viscosity (not shown). Also, vertical velocity noise is
most intense near the ocean floor, which led us to the inference that adding
viscosity more selectively near topography could suffice to damp the vertical
velocity noise (Fig.

Upward buoyancy, and radiative and mass fluxes (latent, sensible and
radiative contributions to

Earlier ECCO configurations using the

Open ocean rain, evaporation and runoff simply carry (advect through the free
surface) the local SST and zero salinity in the model. When sea ice is
present, buoyancy and mass fluxes

The implementation of mass, buoyancy and momentum exchanges through the
sea-ice–ocean interface in the rescaled

In centennial ocean model simulations, it is customary to add a Newtonian
relaxation of surface salinity to a gridded observational product

Wind stress, also from ERA-Interim, is applied directly as part of

The state estimation problem is defined here by a squared model–data
distance (

The state estimate (Sect. 5) is a solution of the forward model (Sects. 2 and
3) at an approximate minimum of

Note that the existence of a unique global minimum of

The degree of non-linearity may depend on the process of interest and increases substantially upon inclusion of meso-scale eddies.

, one can only aim to find at least one approximate minimum ofState estimation consists in minimizing a squared distance,

Organization and roles of MITgcm estimation packages. A more
complete presentation of MITgcm packages can be found
in the manual. The algorithmic differentiation (AD) tool
currently being used is TAF. The handling of checkpoints and active files
is described in

Generic model–data comparison capabilities provided by the “ecco”
package (Sect. 4.3). The corresponding terms in Eqs. (

Model counterparts (

The control problem, as implemented in ECCO v4, is non-dimensional, as
reflected by the omission of weights in control penalties
(

The specification of (always approximate) error covariances (e.g.,

For problems as massive as ECCO v4 (see
Tables

Within pure linear least squares theory, under the unrealistic assumption of
perfect error covariance specifications, multipliers

The method of Lagrange multipliers (i.e., the adjoint method) and its
application to numerical models being stepped forward in time are well
documented elsewhere. In particular, the interested reader is referred to

Note that this desirable property does not hold in the case of sequential data assimilation schemes (whether or not using an adjoint model), but this is not a case of interest here. In particular, it does not hold in 4DVar as practiced in numerical weather prediction.

. Adjoint models have many useful applications in their own right, and we shall list a few that are particularly relevant to ECCO.Non-dimensional adjoint sensitivity (

Integrating adjoint models over extended periods of time allows diagnosis of
the sensitivity of model dynamics to various parameters. Two examples are
provided in Fig.

Unlike the simple case treated in

During the early development stages of ECCO v4, the adjoint handling
of exchanges and storage was extended (partly hand-coded) to allow for
elaborate grids such as CS and LLC (Fig.

Overwhelmingly expensive recomputations of non-linear terms in the adjoint
are treated by adding TAF storage directives

TAF adopts a “recompute-all” strategy by default; OpenAD in contrast uses “store-all” by default.

. These directives take the form of Fortran comments (starting with “CADJ”) embedded in the forward model code, which TAF transforms into code for storage operationsThe non-linear free surface, the AB-3 time stepping scheme, and implicit vertical advection were thus added as adjoint capabilities as part of ECCO v4. Including the non-linear free surface, along with the real freshwater flux boundary condition, in the ocean state estimate is regarded as a major improvement in physical realism. The AB-3 and implicit vertical advection schemes have a minor impact on the forward model solution but provide additional stability also in adjoint mode.

Exactness and completeness of the adjoint is the general goal of the MITgcm
adjoint development. Exactness can be of particular importance in carrying
out quantitative analyses of adjoint sensitivities

In ECCO v4, the

Beyond the removal of unstable adjoint dependencies, other alterations of the
adjoint are of practical value for optimization purposes. In particular, it
is common practice to increase viscosity parameters to add stability to
MITgcm adjoint simulations

Selected ocean state characteristics (defined in
Table

Ocean state estimation involves constraining ocean model solutions to data.
Model–data comparison (i.e., computing Eq.

In situ data are handled by the “profiles” package. A model profile is
computed at the time step and grid point nearest to each observed profile
(see Appendix

Gridded data

By “gridded” we mean either interpolated (e.g., for monthly sea surface temperature) or simply bin-averaged (e.g., for along-track altimetry).

are commonly based upon monthly or daily averaged fields and handled by the “ecco” package. Many features have been added to “ecco” over the course of the ECCO v4 development. In preparation for this paper, these features were generalized so they can immediately be applied, when adequate, to any gridded data set. As of MITgcm's checkpoint65h, the generic “ecco” capabilities are those listed in TableModel counterparts to observed variables are diagnosed from model state
variables via operator

The basic steps in constraining a model solution to data using the ecco
package are the following.

Mapping data (whether along satellite tracks,
gridded, or interpolated) to the model grid, which is easily done, e.g., in
Matlab using gcmfaces (Appendix

Specifying error covariances (

Carrying optimization until convergence to an approximate
minimum of

In situ data to which the state estimate has been constrained. XBT,
CTD, and ITP stand for expendable bathythermograph,
conductivity–temperature–depth sensors, and ice-tethered profilers,
respectively. SEaOS is data collected by Southern Ocean elephant seals. The
CLIMODE field campaign focused on the North Atlantic subtropical gyre

Gridded data to which the state estimate has been constrained.

Within the MITgcm, the “ctrl” package (Fig.

Most features in “ctrl” were recently generalized so they can
readily be applied, when adequate, to any set of controls. The generic
pre-processor

Control parameters that have been adjusted as part of the state estimation.

Most generally, complete and accurate error covariance estimates are lacking
for control parameters. For all controls used in the state estimate
(Table

For atmospheric re-analyses fields, in the absence of formal error estimates,
ad hoc specifications of

Bootstrap distribution of an index of cosensitivity between ocean
state characteristics. For each pair of characteristics

For

The ECCO v4, release 1 state estimate covers the period from 1992 to 2011 and
is the baseline solution of the ECCO v4 forward model setup (Sects. 2 and 3),
using control parameter adjustments guided by data constraints (Sect. 4). The
solution fits altimetry

Sensitivity of ocean state characteristics
(Table

Ocean state estimation is by definition a multi-faceted problem, as reflected
by the selection of ocean state characteristics in
Table

The various squared model–data distances (the first seven characteristics)
show contrasting levels of sensitivity to control parameter adjustments
(Table

Mean squared distance to in situ observations
(Table

High correlations between meridional transports and squared model–data
distances (top and middle right panels) provide evidence that Argo and
altimetry may efficiently constrain heat and freshwater transports

Model–data misfits for salinity at 300

A related concern is that global mean time series show outstanding
sensitivity not only to atmospheric and oceanic control parameters
(Table

Note that mT, mS (top to bottom global means) and mH may react to any change in ocean model controls and settings, since oceanic heat and freshwater uptake is determined by bulk formulae.

Meridional heat and freshwater transports, in particular, appear much less sensitive than corresponding global mean time series (Fig.In developing and producing the ECCO v4 state estimate, a primary goal was to
improve the fit to observed in situ profiles of

The contrasts in jT and jS amongst solutions reflect large-scale misfits as
illustrated in Fig.

The contrast in misfit amplitude between ECCO v4 and earlier solutions
(Figs.

Model error categories as discussed in this paper.

Within ECCO v4, jT and jS are particularly sensitive to estimated turbulent
transport parameter adjustments and generally less sensitive to estimated
atmospheric control adjustments, with the exception of expectedly high
salinity sensitivity to precipitation adjustments (see Table

Amongst turbulent transport control parameters in ECCO v4, jT and jS are most
sensitive to the

The

Comparison of Tables

In this section, the focus is on model uncertainty and controllability, which
directly impacts the possibility of fitting a model to data. Random data
errors and model representation errors are left out of the discussion, which
are comparatively well studied

The interplay of external, structural and parametric ocean model errors has
never been tackled in any systematic and quantitative manner. To distinguish
amongst model uncertainties associated with ECCO v4 settings, we propose the
simple, practical category definitions in Table

A first assessment of the relative importance of external, parametric and
structural model uncertainty in ECCO v4 can then be made from
Table

A ratio

Bootstrap distribution of a controllability index

It is therefore encouraging that

Increasing model controllability is a priori favorable to state estimation.
To this end, one may seek to replace discrete choices and switches with
continuous parameter specifications that enable smooth state
transitions

At this point it is assumed, for the sake of a simple
preliminary discussion, that an

If algorithmic differentiation is the method of choice to this end, then schemes that have fewer discrete switches are preferable over other comparable schemes.

State estimation should aim towards universality and completeness

Firstly, the state estimate would benefit from further optimization, with
additional data, controls, and refined error covariance specifications.
Remaining misfits seen in the top left panel of Fig.

Secondly, the lack of “posterior” error estimates is regarded as the most
outstanding issue with ECCO v4, release 1. Producing formal error estimates,
at a reasonable computational expense and with acceptable precision, for the
full, evolving ocean state would be another major breakthrough. In principle,
a number of methods are available to this end. In practice, however, most of
them are intractable for problems of size

Thirdly, the ECCO v4 model setup could be extended and improved, with
possibly important implications for the state estimate. The lack of
atmospheric, land, and bio-geochemistry components is an obvious limitation
of ECCO v4 at this stage. The surface boundary conditions and sea-ice model
settings require further assessment. Issues such as the use of the Boussinesq
approximation (in Eqs.

This paper emphasizes the synergy between ocean modeling and data analysis.
The entanglement of models and observations is nothing new –

List of the ECCO v4 framework components, which are fully integrated with(in) the MITgcm and its adjoint.

Each component of the framework is being (re)designed to be modular and of
general applicability, as they all are thought to provide valuable
stand-alone pieces to different degrees. Standardized in situ data sets in
particular, while a by-product of carrying out ECCO v4, allow for a variety
of scientific analyses in their own right. For example, they are used for
analyses of observed variance that is never fully represented in numerical
model solutions

As another example, the gcmfaces Matlab framework
(Appendix

The state estimate and the MITgcm are highly integrated with each other.
Beyond the few aspects of the solution that have been investigated in some
detail, the MITgcm provides numerous prognostic and diagnostic capabilities
that remain to be applied to, or employed within, ECCO v4. The “ctrl”,
“ecco” and “profiles” packages, are just examples of the many MITgcm
packages. The last two diagnose model–data misfits and statistics. In
contrast, the “ctrl” package defines control parameters that act upon the
forward prognostic equations. It also lends itself to development of new
parameterizations. Note that the roles of these packages (diagnosing or
acting on the solution) are reversed in the adjoint. Amongst forward
prognostic MITgcm packages not yet used in ECCO v4, biogeochemistry and
simplified atmospheres

Furthermore, the MITgcm provides a convenient platform for parallel computing
and variational estimation that allows for, but is not limited to, ocean data
synthesis and analysis

It is expected that all of the ECCO v4 components listed in
Table

At the present time, taking full advantage of the ECCO v4 framework
(Table

Gridded observational products (such as hydrography climatologies, ocean
state estimates, etc.) are commonly used as a practical shorthand to data. It
should be stressed that a gridded field in itself does not provide any
information about its errors. Therefore, and since data coverage is uneven
and restricted to a few variables, state estimate users are strongly
encouraged to consider the underlying data base. This being said, and despite
the need for continued improvement, the usefulness and scientific value of
the ECCO v4 solution is by now largely documented in a number of papers

As compared with earlier ECCO solutions, the state estimate benefits from an
extensive revisit of model settings. The improved fit to in situ observations
(Argo profiles of

Looking to the future, the need for associating formal error estimates with
the full, evolving ocean state remains of utmost importance. Aside from this
aspect, extensions of the state estimation framework to include other climate
components (atmosphere, land, cryosphere) and different variables (biology,
chemistry) would be desirable

The overarching scientific problem (setting aside technicalities) for
data–model combination lies in the attribution of errors amongst the various
elements of Eq. (

Alleviating structural model errors is a prerequisite to improved dynamical
interpolation of observations. In this regard, the main improvement compared
with previous ECCO estimates may be the extension of the gridded domain to
the Arctic, the addition of the non-linear free surface, and the switch to
real freshwater flux (Sect. 3). These specific

Parametric and external model uncertainty (Table

Parametric model uncertainty (associated here with interior turbulent
transports) and external model uncertainty (associated here with surface
forcing fields) appear to be of comparable magnitude (Table

At high latitude, the LLC mesh is generated numerically by adapting the
two-dimensional conformal mapping algorithm developed by Zacharias and Ives
in the 1980s

To numerically mesh each sub-domain it is first conformally projected onto
a plane, using a polar stereographic transformation. The result is then
conformally mapped to a rectangular shape by iteratively applying the
so-called “hinge-point” or “power” transformation to each of the four arc
segments that make up the sub-domain edges. The transformation works with
points

The result of the transformation is a rectangular shape in a new coordinate
space denoted by coordinates

The time-discretized version of
Eqs. (

Momentum advection and the Coriolis term are evaluated at time

Simple Eulerian time-stepping (first-order, forward in time) is used in

The updated

The tracer Eqs. (

Isopycnal diffusivity (

The MITgcm “diagnostics” package is generally used to generate binary
output for offline analysis of the solutions. In the case of the LLC90 grid,
a two-dimensional field is thus output as an array of size

Example of a field (ocean bathymetry) mapped to the LLC90
grid (Fig.

Gridded earth variable (two-dimensional) represented in Matlab as
a gcmfaces object (a set of connected arrays) when the LLC90 grid is used.
See also Fig.

The need for nctiles files stems from the fact that there is no simple, robust and general way to re-arrange global model output in a single two-dimensional map. For LLC fields, it is only the LL sector that can readily be re-assembled as a single two-dimensional array. To this end a simple Matlab script is provided (eccov4_lonlat.m; see Sect. “Code availability”). It is mainly intended for users of earlier non-global ECCO estimates that may want to re-use their old analysis codes. ECCO v4 users are generally advised against interpolating, which introduces errors, and often precludes accurate transport computations. Instead, mimicking the gridded earth decomposition of general circulation models is regarded as the most convenient, robust and general way to carry out offline analyses of the solutions.

This approach is readily implemented in Matlab by the

Transport and budget computations are coded with the same degree of
generality within gcmfaces. Hard-coding array sizes or exploiting specific
grid symmetries (e.g., the zonal symmetry of the LL grid) is excluded, in
order to avoid having to re-code the same diagnostics on different grids. Two
basic elements are instrumental to the generality of gcmfaces codes, which
are worth noting here. First, any transport is computed following a grid line
path, as illustrated in Fig.

It is commonly
called

Example of a grid line path (in red) that approximates
a great circle between 45

From the state estimate output made available online, users can readily
re-compute the gcmfaces standard analysis. The standard analysis document
serves as a general documentation of the state estimate, and allows for
a direct comparison with other MITgcm simulations regardless of grid
specifics. It proceeds in two steps:

The computational loop (i.e., diags_driver.m) uses model output in “release1/nctiles/” and results are stored to files in “release1/mat/”. The display phase (i.e., diags_driver_tex.m) then generates “release1/tex/standardAnalysis.tex”.

Diagnosing mass, heat, and salt budgets requires snapshots of the
ocean

The full specification of the MITgcm “diagnostics” package (“data.diagnostics”) are available online for ECCO v4, along with the gmfaces (Matlab) codes that assemble the budgets and compute the standard analysis. They can be readily applied to re-runs of the state estimates, or to most perturbation experiments. Re-running the state estimate after editing “data.diagnostics” is the re-commended method for users that desire output that is not readily online.

The MITgcm “profiles” package subsamples the model solution, while it is
being computed, at the locations and times of observed in situ profiles. It
uses input files in the “MITprof” format described below. At model
initialization, observed profile dates and locations are read from file and
each profile is allocated to the processor corresponding to its sub-domain
tile. The latter is generally facilitated by a pre-processing step: observed
profiles are collocated with grid points using gcmfaces (see
Appendix

MITprof files contain in situ profiles (prof_T and prof_S) as well as
corresponding state estimate profiles (prof_Testim and prof_Sestim) and
least square weights (prof_Tweight and prof_Sweight) as illustrated in
Fig.

Netcdf file header illustrating the MITprof format used in MITgcm/pkg/profiles.

The MITprof format contains a limited amount of ancillary information:
profile locations, dates, and an identifying code (prof_descr). This choice,
along with the use of standard depth levels, yields data sets that are both
more compact and simpler than most data center formats (e.g., the Argo
format), providing easy access to vast collections of profiles of various
origins (Table

As part of the MITprof Matlab toolbox, the pre-processing of in situ profiles
consists of four basic steps: (1) applying relevant data quality flags, if
provided by the data center, (2) converting in situ to potential temperature
or pressure to depth, if needed, (3) interpolating to standard depth
levels

An option also exists to interpolate to standard density
levels, which was used in

The MITgcm “smooth” package is an implementation of recipes presented in
detail by

When the smoother is applied to uncorrelated grid-scale noise, the resulting
fields have a Gaussian correlation (Fig.

Diffusion applied to grid-scale noise (set to unit variance) introduces correlation (contours, drawn for select points) and yields a reduced noise variance (color shading). The smoothing scale was set to three grid points.

This method is used for all control parameter covariances (see Sect. 4.4;

While MITgcm evolves continuously its results are subjected to regression
testing

Automated daily regression tests are carried out using the “CVS” and “testreport” capabilities for short runs (a few time steps), on a small number of processors (or just one), and exclude optimization by compilers. This design is suited to detect mistakes in code revisions and distinguish them from truncation errors. The ECCO v4 model setup (Sects. 2 and 3) takes full advantage of that framework, which makes it both portable and stable (“Code availability” section).

Advanced usage of ECCO v4 may include re-running forward model solutions (the
state estimate in particular) or its adjoint. Computational requirements are
modest – the 20-

While the “testreport” tool is very useful and practical, it does not
directly apply to the state estimate, but rather to the underlying model code
and setup. An extension to the regression testing framework is therefore
proposed that is suited for the full state estimate solution. It is
implemented as a self-contained Matlab routine (testreport_ecco.m). It
relies upon squared model–data distances and monthly mean model output
(Table

For any given model run, squared model–data distances are simply read from
a summary text file (typically named cost function0011) that MITgcm generates
at the end of the model integration. Reference values are then read from
a Matlab file (typically name testreport_release1.mat) and relative
differences are reported as shown in Table

The ECCO v4, release 1 state estimate was produced in several phases over the course of the ECCO v4 development. In total, 45 iterations were performed, and a summary of the different phases is provided below. We should stress that the documented solution history reflects the progressive development of ECCO v4 – as opposed to a systematic or advocated approach to the optimization of model solutions.

The first series of 14 adjoint iterations was carried (with the MITgcm's checkpoint62k) using a non-synchronous time step (3 h for tracers, and 20 min for momentum), sea surface salinity relaxation to climatological values, and the linear free surface method. Revision 1 was the switch to the 1 h time step (for both tracers and momentum) and to the non-linear free surface, followed by 14 adjoint iterations (with checkpoint62y).

In revision 2, the

Up to this point (revision 4, iteration 8), time-variable global mean sea
level had been omitted from the altimetry constraint – letting the other
data constraints, primarily from in situ hydrography, SST and regional
altimetry, determine the solution variability. Then, revision 4 iteration 9
consisted in estimating a time-variable global mean precipitation adjustment
under the sole constraint of fitting the time-variable global mean altimetry.
This operation had very little influence on the rest of the model–data
misfits – consistent with the analysis presented in Sect. 5.1. This solution
is used in

Revision 4 iteration 10 consisted in a trimming of atmospheric control
parameter adjustments to reduce irregularities in the forcing that had
appeared during adjoint iterations. To this end, the four leading empirical
orthogonal functions were subtracted from atmospheric control parameter
adjustments. To further reduce dynamical imbalances during the first years of
integration, the initial state of 1 January 1992 as adjusted during the
adjoint iterations was replaced with the state of 1 January 1995. This
solution is used in

Revision 4 iteration 11 is the ECCO v4, release 1 state estimate, which
originally ran with MITgcm's checkpoint64t. For regression testing purposes
(Appendix

The MITgcm is developed and maintained within the Concurrent Versions System
(CVS). This framework allows users to download frozen versions of the model
code (checkoint65i at the time of writing) or to keep their local copy up to
date. The evolving code is subjected to regression tests on a daily basis
using the “testreport” capability (Appendix

Major support for this work was provided through NASA's Physical Oceanography Program. The bulk of the calculations was performed on the NASA Advanced Supercomputing (NAS) division's Pleiades supercomputer at NASA/ARC. The authors wish to acknowledge the various groups that carry out and promote ocean state estimation, at IFM-UH, SIO and JPL in particular. The authors also wish to give much credit to John Marshall for his leadership of, and continued commitment to, the development of the MITgcm; Detlef Stammer for his leadership of the German ECCO project, and his continued commitment to, and promotion of the MITgcm state estimation framework; Fastopt for providing the TAF algorithmic differentiation tool, and the support that was provided to facilitate adjointing the exch2 package in particular. Edited by: J. Annan