Global biogeochemical ocean models contain a variety of different
biogeochemical components and often much simplified representations of
complex dynamical interactions, which are described by many (

We here present a framework for the calibration of global biogeochemical ocean models on short and long timescales. The framework combines an offline approach for transport of biogeochemical tracers with an estimation of distribution algorithm (Covariance Matrix Adaption Evolution Strategy, CMA-ES). We explore the performance and capability of this framework by five different optimizations of six biogeochemical parameters of a global biogeochemical model, simulated over 3000 years. First, a twin experiment explores the feasibility of this approach. Four optimizations against a climatology of observations of annual mean dissolved nutrients and oxygen determine the extent to which different setups of the optimization influence model fit and parameter estimates. Because the misfit function applied focuses on the large-scale distribution of inorganic biogeochemical tracers, parameters that act on large spatial and temporal scales are determined earliest, and with the least spread. Parameters more closely tied to surface biology, which act on shorter timescales, are more difficult to determine. In particular, the search for optimum zooplankton parameters can benefit from a sound knowledge of maximum and minimum parameter values, leading to a more efficient optimization. It is encouraging that, although the misfit function does not contain any direct information about biogeochemical turnover, the optimized models nevertheless provide a better fit to observed global biogeochemical fluxes.

Global ocean models that simulate biogeochemical interactions are subject to
many uncertainties, among them those related to initial conditions, forcing,
and parameterizations of physical and biological processes, as well as the
adequacy of the chosen model complexity with respect to the scientific
problem under investigation. It is generally assumed that all these “input”
factors affect the simulation results in ways that may be different for
different models, but a thorough understanding of how uncertainties in input
map onto model output (residuals, i.e., deviations from the true state) is
still lacking. Quantitative estimates of the effect of model uncertainty on
model residuals are generally obtained from individual sensitivity studies
and model intercomparison or model ensemble studies, where the spread of
model results is regarded as a measure of model uncertainty. This procedure
is, for example, followed in the assessment reports of the Intergovernmental
Project of Climate Change (IPCC). The Ocean Carbon Model Intercomparison
Project (OCMIP) applied a strict protocol regarding the description of
biogeochemical processes to a suite of different ocean circulation models to
show that the effect of uncertainties in the simulated circulation on
biogeochemical tracer distributions and their residuals can be considerable

Uncertainties in biogeochemical model setup partly arise from sparse
observations, particularly in the open ocean and during winter season in the
high latitudes

An under-sampled ocean, together with a large variety of timescales and space
scales and a high level of structural model complexity, poses a challenge for
optimization, and for a full, and dense enough, scan of the parameter space
on a global scale. Therefore, optimization of marine biogeochemical models
has mostly been carried out in a local, zero- or one-dimensional setting

In this paper we first test the global biogeochemical model optimization against synthetic data, derived from a previous model experiment with perturbed model parameters in so-called twin experiments. We then present four optimizations against a global, synoptic data set of observed phosphate, nitrate, and oxygen.

For easy and generic coupling between different biogeochemical models and
circulation fields, as well as fast and efficient computation, we use the
Transport Matrix Method (TMM),
developed by Samar Khatiwala

For optimization, we use the TMM with monthly mean transport matrices derived
from a 2.8

The biogeochemical model employed as representative of current
state-of-the-art models is the same as presented by

Sinking of detritus is simulated using a sinking speed increasing with depth

Simulating both surface (primary production, grazing, egestion and excretion by zooplankton) as well as deep (sinking and decay of organic matter) processes before the background of ocean circulation and seasonally varying forcing, the model thus encompasses processes that act on a variety of timescales, from the order of hours to days (surface) to months and years.

A general EA (left) and EDA (right) schematic. Cycles represent sets of solutions (vectors of BGC parameters in our case) or an explicit probability distribution from which new solutions can be drawn. Rectangle symbols depict operations. Operations displayed in red font depend on random decisions. EA: a set of candidate solutions (population) is iteratively updated. In each generation, candidate solutions compete to form a mating pool which is realized by a random selection operator. Offspring solutions are produced by recombining mates and/or introducing some mutation. Finally, there is a fitness based insertion back into the population, which is usually trimmed to a predefined population size. The random operators selection, recombination, and mutation imply an implicit probability distribution on the search space with respect to which solutions are likely to appear in the next generation. EDA: candidate solutions of the current iteration's population are used to update an explicit probability distribution such that the likelihood to sample good solutions increases. New candidate solutions are directly sampled from the probability distribution. Usually, the realization of the probability distribution update ensures that information of former solutions fades out slowly, resisting for several iterations. Therefore, the population may be smaller as an EA population and even be replaced with the entire set of new samples, which is the case for the CMA-ES algorithm we use.

The TMM as described above is fast enough to be used together with meta-heuristic methods for parameter optimization, such as Evolutionary Algorithms (EAs) or Estimation of Distribution Algorithms (EDAs). Although these methods require more function evaluations to converge to some local optimum than gradient based methods, they are of advantage in complicated, irregular “search landscapes” with local optima (which might be far worse than the global optimum), or discontinuities.

The common goal of such population based meta-heuristics is to strike a good
balance of both search properties, exploration (search for promising
solutions in a wide area of the search space), and exploitation (search
within small regions around good solutions to quickly reach local optima).
Classical evolutionary algorithms as depicted on the left of
Fig.

In contrast to classical EAs, Estimation of Distribution Algorithms (sketched
on the right of Fig.

Here we use a state-of-the-art EDA for optimization of (firstly) six
parameters. Our task can be classified as a continuous optimization problem
with bound constraints, i.e., boundaries for the parameters. One appropriate
EA/EDA tool is the Covariance Matrix Adaption Evolution Strategy

We essentially follow the description of the

Iterations of the CMA-ES applied to test functions. Left: a
uni-variate Griewank-type function

In CMA-ES the distribution from which candidate solutions (BGC parameter
vectors in our application) are sampled is a multi-variate normal
distribution. It generalizes the usual normal distribution, also known as
Gaussian distribution, from

A measure of the “diversity” of a probability distribution is the so-called
(differential) entropy. For a given variance, the normal distribution has the
maximum entropy amongst all distributions with the same variance

An EDA that works with Gaussian distributions is supposed to carefully update
both defining distribution parameters mean and variance, in order to balance
its exploration and exploitation ability. This update process is illustrated
in Fig.

Similarly to the definition of the uni-variate Gaussian distribution by mean
and variance, a multi-variate normal distribution can be uniquely identified
by a mean vector

Sampling a multi-variate normal distribution

Note that for our problem there are bound constraints on the parameters such
that samples of a normal distribution might be infeasible, regardless of
whether the distribution mean is feasible or not. However, a boundary
handling procedure (see Sect.

Operational constants of the CMA-ES algorithm (cf. the initialization in Algorithm 1).

Empirical (re)estimates

Clearly, the estimates become more reliable the larger

As mentioned above, reliable distribution estimates require a sufficiently
large number of samples. However, for a competitive computational performance
we must get along with a rather small number of samples. CMA-ES therefore
involves the information of former populations by updating the covariance
matrix

Another feature that facilitates small population sizes

While

Finally, there is an additional explicit adaption of the overall scale (the
step size) of the distribution by adapting a scaling factor

In order to consider boundary constraints, we use the procedure proposed in

In our implementation of CMA-ES, the feasible box we operate on is the unit
cube

The CMA-ES approach described in Sect.

Here,

The

Together with the problem dimension

The algorithm details are summarized in Algorithm 1.

It starts with the identity matrix

Our current technical implementation of the parallel framework can be easily
transferred to other EAs/EDAs. The iterative optimization process is carried
out via a series of chain jobs, where short serial jobs (the actual
optimizer) that update the population of model evaluations (“individuals”;
i.e., parameter sets for biogeochemistry) alternate with parallel jobs of
function evaluations (“generations”), i.e., forward integrations of the
coupled ocean model with different parameter sets. Parameters of the
optimizer are population size

As noted above, the framework presented here is set up such that a serial
script

Experimental setup of optimization. “low” and “high” indicate boundary constraints of the optimizations, respectively.

As a first approach to optimization, we have calculated the root-mean-square
error RMSE between simulated and observed (or twin) annual mean phosphate,
nitrate, and oxygen concentrations on a global scale, weighted by the volume

Although the model contains more than 20 parameters

Four parameters are more relevant for biological interactions at the sea
surface. Phytoplankton growth is controlled by the half-saturation for light
(

For each parameter we initially chose a rather wide range of possible
parameter values (Table

Ranges of parameters related to
surface processes were more difficult to assign. Due to the highly aggregated
form of the organic biological components in the model, these parameters are
supposed to reflect a variety of processes such as species shift and
adaptation

Optimization results (evaluations, i.e., number of individuals,

Using the combined framework described above, i.e.,
TMM

First we tested the ability of CMA-ES to recover known parameters of a model
simulation that applied the same biogeochemical parameters as MOPS-RemHigh of

Four further optimizations were carried out against observations of annual
mean phosphate, nitrate, and oxygen

Experiment OBS-WIDE differs from TWIN only with respect to the observations
that enter the misfit function. In OBS-WIDE we encountered an unlikely (with
respect to biological tracer concentrations) solution, pointing towards a
potential local minimum in the misfit function. We therefore set up two
experiments to investigate strategies to improve the performance of CMA-ES
with respect to more plausible solutions. The experiments both increase the
search density in the parameter space with respect to OBS-WIDE. In experiment
OBS-WIDE-20 search density is increased by doubling the population size of
CMA-ES to

Global annual fluxes of primary production (PP), grazing (GRAZ),
aerobic and anaerobic remineralization of detritus and DOM to nutrients
(REM), excretion by zooplankton (EXCR) export production (

Because optimization OBS-NARR showed the best results with respect to misfit
function, biogeochemical fluxes, and optimization performance (see
below; Tables

The internal termination criterion of CMA-ES was reached after 95, 173, 182,
and 140 generations for OBS-WIDE, OBS-WIDE-20, OBS-NARR, and OBS-NARR-R,
respectively. For the twin experiment, we restricted the maximum number of
generations to 200, at which TWIN had approached the target parameters, the
misfit declined to

Optimization trajectory for six parameters of the twin experiment.
The thick black line shows the average parameter of all 10 individuals of a
generation. Red lines indicate their maximum and minimum parameter values.
Horizontal black lines indicate the target parameter. Note that we restrict
the

Model misfit, its variance, calculated from individuals of each
population (both transformed logarithmically by log

Model misfit, plotted for each pair of parameter combinations of the
twin experiment. Color indicates misfit (see the color bars on the right). A
cross indicates the target value, i.e., the value of the reference
experiment. A circle indicates the parameter of one individual of the last
generation. Note that for better visibility we restrict the parameter range
to its boundaries (see Table

As Fig.

The optimization starts with a wide range of potential parameters (see
Fig.

The misfit function, its variance, and the parameter variance do not decrease
monotonically throughout the optimization trajectory. In particular, after an
initial decline over ca. 60 generations, parameter and misfit variance
increase again. Further increases in variance can be seen around generation
100, and, at the end, when the algorithm widens its search area again,
probably in search of an optimal

The largest fraction of the misfit function is related to oxygen, followed by
the misfit to nitrate, and then phosphate. The dominance of oxygen and
nitrate is not surprising, as these tracers are not conservative; i.e., their
global inventory might change due to air–sea gas exchange, denitrification,
and nitrogen fixation

As Fig.

In Fig.

Summarizing, CMA-ES seems capable of dealing even with our irregular search
landscape, when iterated for a long enough time and with a sufficiently large
population size. A problem remains with regards to the half-saturation
constant of phytoplankton for phosphate uptake: zooming into the scatter plot
presented in Fig.

As Fig.

One reason for this low sensitivity of the misfit function to

When optimizing the model against observed concentrations with exactly the
same setup as for experiment TWIN, optimization OBS-WIDE reaches the internal
termination criterion of the CMA-ES at generation 95. Instead of declining
exponentially towards zero, the misfit only declines from an average initial
value of

Surface (first) layer concentrations (in mmol C m

As Fig.

As Fig.

As Fig.

As Fig.

As Fig.

As Fig.

Model deviations from observations of vertically integrated
phosphate (top), nitrate (middle), and oxygen (bottom) for the reference run,
and three generations (61, 110, 182) of OBS-NARR. See the blue lines in
Fig.

Some parameters diverge strongly from those of the reference run. In particular, the phytoplankton's
half-saturation constant for light,

Therefore, although optimization OBS-WIDE against observations has decreased
the misfit to observations to

To examine whether this optimization became trapped in a local minimum, in
experiment OBS-WIDE-20 we increased the population size of CMA-ES from

Summarizing, using a larger population size and thus a denser scan of the
parameter space (see Fig.

Optimizations with a population size of

To enforce live zooplankton, we restricted the range of zooplankton
parameters to

As for OBS-WIDE-20, the quadratic mortality of zooplankton,

A closer look at the topography of the misfit function shows that the misfit
is quite insensitive to changes in some parameters
(Fig.

However, variations in parameters after

Except for deep particle fluxes, all biogeochemical fluxes are increased
compared to the reference run or experiment OBS-WIDE, but similar to that of
OBS-WIDE-20 (Table

Repeating optimization OBS-NARR with a different random selection of
parameters from the parameter distribution in each generation (OBS-NARR-R)
yields the same, or very similar, best values for most of the parameters (see
Table

Our results suggest that the CMA-ES optimization algorithm performs well,
particularly for the twin experiment, even though the parameters to be
estimated involve diverse temporal and spatial scales. CMA-ES manages to set
up curved search paths in parameter space, and therefore is capable of
approaching an optimum within a rather complex topography of the misfit
function. Its sometimes elongated and/or curved shape resembles many of those
resulting from earlier one-dimensional

As the computational effort remains a challenge in parameter optimization of
global ocean BGC models, further possibilities to accelerate model
evaluations within the optimization process are desirable. Surrogate-assisted
approaches use meta-models to approximate model evaluations within
optimization

In our study we chose annual means of dissolved nutrients and oxygen on a
rather coarse spatial grid as a measure for model skill. By doing so, we
avoid problems associated with time lags (e.g., in phytoplankton blooms,
which would result in time lags of nutrient depletion) or meso- and
submeso-scale spatial structures

Our optimizations against observations with wide and narrow boundaries for
zooplankton parameters produced two solutions with quite similar misfit, but
with very different biological parameters, and consequently different fluxes
and concentrations of organic components in the surface layers. Using wide
boundary constraints for zooplankton parameters resulted in a solution where
zooplankton is almost extinct, while phytoplankton and DOM concentration are
far too high. Solutions of optimizations with unrealistic parameter values or
concentrations for zooplankton have been observed earlier

Another possibility to avoid undesired effects like nearly extinct
zooplankton is to introduce further criteria that take account of this issue.
A technically easy approach would be to add further objective terms to the
misfit function. But facing complex model interactions, it can become
difficult to find suitable weights for the different terms in order to force
solutions to become a desired compromise of objectives. An alternative is to
deal with more than one objective function, say

Nevertheless, even for the more realistic optimizations OBS-WIDE-20,
OBS-NARR, and OBS-NARR-R, we find similar misfits for a rather wide range of
some phytoplankton and zooplankton
parameters, pointing towards an indeterminacy of these parameters when using
the current misfit function. While it cannot be ruled out that this arises
from a correlation among these parameters, even simpler biogeochemical models
with less degrees of freedom might be difficult to constrain from nutrient
data alone: problems were also encountered by

Even the use of observations more closely related to surface biology may not
resolve the problem of indeterminacy, as shown by

The above-mentioned problems may even increase if we move towards more
sparsely sampled, biased, or noisy data. So far, for the twin experiment as
well as for the optimization against observations, we assume perfect data
coverage. However, sparse data sets (as usually available from cruises or
time series stations) as well as the influence of noise have been shown to be
very influential for the ability of an optimization to recover results from
zero-

While we found a decrease in the twin experiment's misfit to almost zero, the
misfit of the optimization against observations remained relatively high (on
average, about 15 % of global mean tracer concentrations). Potential
reasons for this are an inappropriate biogeochemical model structure, wrong
choice of parameters to be optimized, or flaws in the physical model. For
example, it is well known that coarse-resolution models do not resolve
physical processes of the Equatorial Pacific current system

To summarize, any global model study that aims to inversely determine
parameters of a global biogeochemical ocean model in an attempt to find the
model setup “best” suited for a particular application (and circulation)
has to consider five tasks: (1) investigate model solutions on the
appropriate (depending on tunable parameters) timescales, possibly including
long, millennial simulations; (2) address the potential of local minima
(depending on the topography of the misfit function); (3) investigate
different parameter combinations and boundaries, including the misfit
function's sensitivity to them; (4) disentangle the effects of physical and
biogeochemical models on model–data misfit; and (5) investigate the effect
of misfit function, including data distribution and availability in model
assessment. This last point also includes decisions about weights applied to
different data sets, or for a particular form of misfit function, which may
be very influential for the optimal parameter choice

We have presented a framework for the optimization of global biogeochemical
ocean models that combines an offline approach for transport of
biogeochemical tracers with an estimation of distribution algorithm
(Covariance Matrix Adaption Evolution Strategy, CMA-ES). A twin experiment
revealed a good performance of this algorithm with respect to recovering six
parameters that are associated with various timescales and space scales.
Optimizations against observations of annual mean nutrients and oxygen could
reduce the misfit of the model to some extent; however, even for the “best”
model solution the remaining misfit is still

Encouragingly, parameter sets associated with the lowest misfit to dissolved inorganic tracers also show the best fit to global mean tracer fluxes not considered during optimization. This increases our confidence in the method presented here. Some parameter estimates are associated with a rather high level of uncertainty. Incorporating different or additional data sets that more closely relate to the parameters to be optimized can help to improve estimates for these parameters. Likewise, observations that provide information about the upper and lower bounds of biological parameters – such as zooplankton grazing and mortality rates – will provide a good guidance for future optimization studies and lower their computational demand.

The source code of MOPS coupled to TMM, as well as the optimization
framework, are available as the Supplement. The most recent TMM source code,
forcing, etc., are available at

As research questions may diverge strongly (and therefore, also the different user groups, hardware, biogeochemical models, and circulations), we aimed to construct a tool that is as generic and universally applicable as possible, with a high level of portability among different architectures. The model-optimization framework of TMM comprises new subroutines for data assimilation and misfit function evaluation, as well as monitor routines to facilitate run-time checks of model state, and a more generic coupling interface for biogeochemistry. It can thus easily be applied within an optimization framework. While we here focus on the coarse-resolution model, we note that the generic structure of the TMM framework allows the user to easily switch between transport matrices, once these are available. Likewise, coupling different biogeochemical models to the framework only requires editing of a (few) interface subroutines. Finally, in principle it should be possible to exchange the optimization algorithm by any other algorithm that requires only model misfit as input and provides a set of parameter files as output.

Besides the stand-alone, forward integration of a global biogeochemical
model, two additional tasks are required for optimization: computation and
output of misfit, and input of trial sets of parameters passed to the model
by the optimizer. In the following, files relevant for input of parameter
vectors and computation of misfit that have been added or changed

As noted in

Communication between the different modules is carried out mainly via several header files:

Finally, one may want to prevent computation of a simulation if during spinup
some parameter values or concentrations lead to erroneous (e.g., negative)
tracer concentrations. Routine

As noted above, the framework presented here is set up such that a serial
script

This work is a contribution to DFG-supported project SFB754 and to the research platforms of DFG cluster of excellence The Future Ocean. We thank Nikolaus Hansen for support on and open access to the CMA-ES code. Parallel supercomputing resources have been provided by the North-German Supercomputing Alliance (HLRN). The authors wish to acknowledge use of the Ferret program of the NOAA's Pacific Marine Environmental Laboratory for analysis and graphics in this paper. We thank Momme Butenschön and an anonymous reviewer for their very constructive and helpful comments. Edited by: C. Sierra Reviewed by: M. Butenschön and one anonymous referee