This paper presents the technical
implementation of a new, probabilistic version of the NEMO ocean–sea-ice
modelling system. Ensemble simulations with

Probabilistic approaches, based on large ensemble simulations,
have been helpful in many branches of Earth-system modelling sciences to
tackle the difficulties inherent to the complex and chaotic nature of the
dynamical systems at play. In oceanography, ensemble simulations have first
been introduced for data assimilation purposes, in order to explicitly
simulate and, given observational data, reduce the uncertainties associated
with, for example, model dynamics, numerical formulation, initial states,
atmospheric forcing

Performing ensemble simulations can be seen as a natural way to take into
account the internal variability inherent to any chaotic and turbulent
system, by sampling a range of possible trajectories of this system
(independent and identically distributed). For example, long-term climate
projections, or short-term weather forecasts, rely on large ensembles of
atmosphere–ice–ocean coupled model simulations to simulate the
probabilistic response of the climate system to various external forcing
scenarios, or to perturbed initial conditions, respectively

The ocean is, like the atmosphere or the full climate system, a chaotic
system governed by non-linear equations which couple various spatio-temporal
scales. A consequence is that, in the turbulent regime (i.e. for

On the other hand, NEMO climatological simulations at

Simulating, separating, and comparing these two components of the oceanic
variability requires an ensemble of turbulent ocean hindcasts, driven by the
same atmospheric forcing, and started from perturbed initial conditions. The
high computational cost of performing such ensembles at global or basin scale
explains why only a small number of studies have carried out this type of
approach until now, and with small ensemble sizes

Building on the results obtained from climatological simulations, the ongoing
OCCIPUT project

This paper presents the technical implementation of the new, fully
probabilistic version of the NEMO modelling system required for this project.
It stands at the interface between scientific purposes and new technical
developments implemented in the model. The OCCIPUT project is presented here
as an application, to illustrate the system requirements and numerical
performances. The mathematical background supporting our probabilistic
approach is detailed in Sect.

The classical, deterministic ocean model formulation can be written as
follows:

Computing a solution to Eq. (

In addition to uncertainties in the initial condition, it is sometimes useful
to assume that the model dynamics themselves are uncertain. This leads to a
non-deterministic ocean model formulation, in which model uncertainties are
described by stochastic processes. One possibility is, for instance, to modify
Eq. (

In this equation,

Schematic of an ensemble simulation (red trajectories), as an approximation to the simulation of an evolving probability distribution (in blue).

However, since Eqs. (

This Monte Carlo approach is very general and can be also applied to any kind
of stochastic parameterization (not only the particular case described by
Eq.

In summary, Eq. (

The NEMO model (Nucleus for a European Model of the Ocean), described in

The standard NEMO code is parallelized with MPI (message-passing interface)
using a domain decomposition method. The model grid is divided in rectangular
subdomains (

In practice, upon initialization one MPI communicator is defined with as many processors as subdomains, and each processor is associated with a subdomain and knows which are its neighbours.

Schematic of the double parallelization introduced in NEMO: each
processor (black squares) is dedicated to the computations associated with one
model subdomain and one ensemble member. There is one MPI communicator within
each ensemble member (in blue) to allow communications between neighbouring
subdomains as in the standard NEMO parallelization, and there is one MPI
communicator within each model subdomain (in red) to allow communication
between ensemble members (e.g. to compute ensemble statistics online if
needed). The total number of processors is thus equal to the product of the
ensemble size by the number of subdomains (

Ensemble simulations may be performed with NEMO by a direct generalization of
the standard parallelization procedure described above. In other words, our
ensemble simulations are performed from one single call to the NEMO
executable, simply using more processors to run all members in parallel. This
technical option is both natural and unnatural. It is natural since an
ensemble simulation provides an approximate description of the probability
distribution; it is thus conceptually appealing to advance all members
together in time. It is unnatural since independent ensemble members may be
run separately (in parallel, or successively) using independent calls to
NEMO. However, the solution we propose is so straightforward that there is
virtually no implementation cost, and is more flexible since the ensemble
members may be run independently, by groups of any size, or all together.
Furthermore, running all ensemble members together provides a new interesting
capability: the characteristics of the probability
distribution

In practice, this implementation option only requires that at the beginning
of the NEMO simulation, one MPI communicator is defined for each ensemble
member, each one with as many processors as subdomains, so that each
processor knows to which member it belongs, on which subdomain it is going to
compute and what its neighbours are. Inside each of these communicators, each
ensemble member may be run independently from the other members, without
changing anything else in the NEMO code. However, all members are obviously
not supposed to behave exactly the same: the index of the ensemble member
must have some influence on the simulation. This influence may be in the name
of the files defining the initial condition, parameters or forcing, or in the
seeding of the random number generator (if a random forcing is applied, as in
Eq.

In summary, the NEMO ensemble system relies on a double parallelization, over
model subdomains and over ensemble members, as illustrated in
Fig.

As mentioned above, one important novelty offered by the ensemble NEMO
parallelization is the ability to compute online any feature of the
probability distribution

the mean of the distribution:

where

the variance of the distribution:

where

Ensemble covariance between two variables at the same model grid point:

where

This is directly generalizable to the computation of higher-order moments
(skewness, kurtosis), which is then reduced to MPI sums in the ensemble
communicators. Moreover, simple MPI algorithms can also be designed to
compute many other probabilistic diagnostics online, such as the rank
of each member in the ensemble, and from there, estimates of quantiles of the
probability distribution. Specific applications of this feature are discussed
in Sect.

This online estimation of the probability distribution, via the computation of ensemble statistics, opens another interesting new capability: the solution of the model equations may now depend on ensemble statistics, available at each time step if needed. For instance, it may be interesting to relax the modelled forced variability towards reference (e.g. reanalysed or climatological) fields, with no explicit damping of the intrinsic variability: the nudging term would involve the current ensemble mean and be applied identically to all members at the next time step, resulting in a simple “translation” of the entire ensemble distribution toward the reference field.

Other applications, such as ensemble data assimilation, may also require an online control of the ensemble spread, which is hereby made possible within NEMO.

Ensemble simulations are directly connected to stochastic parameterizations
(as introduced in Eq.

Ensemble model simulations are also key in ensemble data assimilation
systems: they propagate, in time, uncertainties in the model initial
condition, and provide a description of model uncertainties in the
assimilation system (e.g. using stochastic perturbations). Data assimilation
can then be carried out by conditioning this probability distribution to the
observations whenever they are available. The ensemble data assimilation
method that is currently most commonly used in ocean applications is the
Ensemble Kalman filter

Another important benefit of the probabilistic approach is to consolidate and
objectivate statistical comparisons between actual observations and
model-derived ensemble synthetic observations. Probabilistic assessment
metrics are commonly used in the atmospheric community

In OCCIPUT, such probabilistic scores will be computed from real observations and from the ensemble synthetic observations (along-track Jason-2 altimeter data and ENACT–ENSEMBLES temperature and salinity profile data) generated online using the existing NEMO observation operator (NEMO-OBS module). NEMO-OBS is used exactly as in standard NEMO within each member of the ensemble, thereby providing an ensemble of model equivalents for each observation rather than a single value. Probabilistic metrics (i.e. CRPS score) will then be computed to assess the reliability and resolution of the OCCIPUT simulations.

Our implementation of ensemble NEMO using enhanced parallelization is technically not independent from the NEMO I/O strategy. Indeed, in NEMO, the input and output of data is managed by an external server (XIOS, for XML IO Server), which is run on a set of additional processors (not used by NEMO). The behaviour of this server is controlled by an XML file, which governs the interaction between XIOS and NEMO, and which defines the characteristics of input and output data: model fields, domains, grid, I/O frequencies, time averaging for outputs, etc. To exchange data with disk files, every NEMO processor makes a request to the XIOS servers, consistently with the definitions included in the XML file. In this operation, the XIOS servers buffer data in memory, with the decisive advantage of not interrupting NEMO computations with the reading or writing in disk files. One peculiarity of this buffering is that each XIOS server reads and writes one stripe of the global model domain (along the second model dimension), and thus exchanges data with processors corresponding to several model subdomains. To optimize the system, it is obviously important that the number of XIOS servers (and thus the size of these stripes) be correctly dimensioned according to the amount of I/O data, which may heavily depend on the model configuration and on the definition of the model outputs.

To use XIOS with our implementation of ensemble NEMO for OCCIPUT, we thus had to take care of the two following issues. First, different ensemble members must write different files. This problem could be solved because XIOS was already designed to work with a coupled model, and can thus deal with multiple contexts: i.e. one for each of the coupled model components. It was thus directly possible to define one context for each ensemble member, just as if they were different components of a coupled model. Second, in ensemble simulations, the amount of output data is proportional to the ensemble size, so that the number of XIOS servers must be increased accordingly, albeit with some care, because the size of the data stripe that is processed by each server should not be reduced too much.

The implementation of this ensemble configuration of NEMO was motivated to a
large extent by the scientific objectives of the OCCIPUT project, described
in the introduction. In this section, we present two ensemble simulations,
E-NATL025 and E-ORCA025, performed in the context of this project. We focus
on the model set-up, the integration strategy, and the numerical performances of
the system, followed by a few illustrative preliminary results in
Sect.

E-ORCA025 is the main ensemble simulation aimed for
OCCIPUT. It is a 50-member ensemble
of global ocean–sea-ice hindcasts at

DRAKKAR-ORCA025 website:

Main characteristics of the NEMO 3.5 set-up used for the regional and global OCCIPUT ensembles.

A one-member spin-up simulation is first performed for each ensemble. For the
regional ensemble (E-NATL025), it is performed from 1973 (cold start) to
1992, forced with DFS.5.2 atmospheric conditions

The

The regional ensemble (E-NATL025) was performed to test the system
implementation and to calibrate the global configuration. The global ensemble
simulation E-ORCA025 represents, in total, 2821 cumulated years of simulation
(56 years

All simulations were performed between 2014 and 2016 on the French Tier-0
Curie supercomputer, supported by PRACE (Partnership for Advanced Computing
in Europe) and GENCI (Grand Equipement National de Calcul Intensif, French
representative in PRACE) grants (

Preliminary tests showed that the one-member ORCA025 configuration has a good
scalability up to 400 cores on Curie-TN (not shown). In order to test the
ensemble global configuration on Curie-TN, short 180-step experiments were
run, disregarding the first and last steps (which correspond to reading and
writing steps, respectively, that are performed only once during production
jobs). The performance of the system was measured in steps per minute by
analysing the 160 steps in between (steps 10 to 170).
Figure

Based on these performance tests, a domain decomposition with relatively few
cores was chosen in order to maintain a manageable rate of I/Os. The
decomposition with 128 cores per member has been retained (corresponding to
the red line in Fig.

In order to optimize and to make the I/O data flux management flexible, 40
XIOS servers have been run as independent MPI tasks in detached mode,
allowing the overlap of I/O operations with computations. Compared to the
10-member regional case, the 50-member global case required a larger XIOS
buffer size. For this reason, each of the 40 XIOS instances was run on a
dedicated and exclusive Curie-TN, allowing each server to use the
entire memory available on each 16-core node (i.e. 64 GB); the 40 XIOS
servers thus used 16

XIOS makes use of parallel file system capabilities via the Netcdf4-HDF5 format, which allows both online data compression and parallel I/O. Therefore, XIOS is used in “multiple file” mode where each XIOS instance writes a file for one stripe of the global domain, yielding 40 files times 50 members for each variable and each time. At the end of each job, the 40 stripes are recombined on-the-fly into global files.

Preliminary tests have shown that the 50-member E-ORCA025 global
configuration performs about 20 steps min

The final E-ORCA025 global database is saved in Netcdf4-HDF5 format (chunked
and compressed, compression ratio in

We now present some preliminary results from the regional and global OCCIPUT
ensemble simulations described in Sect.

Figure

Ensemble statistics of the monthly temperature anomalies from the
regional ensemble E-NATL025, at depth 93 m at two grid points:

These temperature anomalies were computed by first removing the long-term
non-linear trend of the time series derived from a local regression model

The ensemble-mean time series (hereafter E-mean, also noted

The dispersion of individual time series about the ensemble mean indicates the
amount of intrinsic chaotic variability generated by the model. Its
time-varying magnitude may be estimated by the ensemble standard deviation
(hereafter E-SD, also noted

Unlike in short-range ensemble forecast exercises, we do not seek here to maximize the growth rate of the initial dispersion; we let the model feed the spread and control its evolution following its physical laws.

Figure

Figure

E-SD (shading) for year 2012 of the regional ensemble simulation
E-NATL025, computed from annual means of

Comparing Fig.

This is expected from the design of these ensemble simulations: each ensemble
member is driven through bulk formulae by the same atmospheric forcing
function, but turbulent air–sea heat fluxes differ somewhat among the
ensemble because SSTs do so. This approach induces an implicit relaxation of
SST toward the same equivalent air temperature

Figure

The E-SD can also be compared to the ensemble distribution of the Time-SDs of
the

The variability of the atlantic meridional overturning circulation (AMOC)
transport is of major influence on the climate system

The simulated AMOC time series are in a good agreement with the observed AMOC variations at both monthly and annual timescales (Fig. 6a and c). The total (i.e. combination of forced and intrinsic) AMOC variability is computed as a Time-SD from the observed time series and from each ensemble member, and plotted in Fig. 6b and d as gray lines. At both timescales, the total AMOC variability simulated by E-ORCA025 lies below the observed variability, consistent with the fact that the model seems to miss a few observed peaks (e.g. 2005, 2009, and 2013 on the annual time series). Figure 6b and d also highlight the substantial imprint of chaotic intrinsic variability on this climate-relevant oceanic index at both timescales: at interannual timescale, the AMOC intrinsic variability is weaker than the forced variability, but amounts to about 30 % of the latter. A more in-depth investigation of the relative proportion of intrinsic and forced variability in the AMOC and of the variations of the intrinsic contribution with time is currently underway and will be the subject of a dedicated publication.

Same as Fig.

We have presented in this paper the technical implementation of a new,
probabilistic version of the NEMO ocean modelling system. Ensemble
simulations with

The OCCIPUT project was presented here as an example application of these new
modelling developments. Its scientific focus is on studying and comparing the
intrinsic/chaotic and the atmospherically forced parts of the ocean
variability at monthly to multidecadal timescales

The members are all driven by the same realistic atmospheric boundary
conditions (DFS5.2) through bulk formulae, and represent

Our probabilistic NEMO version includes several new features. The generic stochastic parameterization, used here on the equation of state to trigger the growth of the ensemble spread, can be applied to other parameters to simulate model or subgrid-scale uncertainties. The MPI communication between members allows the online computation of ensemble statistics (PDFs, variances, covariances, quantiles, etc.) across the ensemble members, which may be saved at any frequency and location and for any variable thanks to the flexible XIOS servers.

The size

More generally, this numerical system computes the temporal evolution of the full PDF of the 3-D, multivariate states of the ocean and sea ice. A very interesting perspective is the online use of the PDF of any state variable or derived quantity (or other statistics such as ensemble means, variances, covariances, skewnesses, etc.) for the computation of the next time step during the integration. This would allow, for instance, distinct treatments of the ensemble mean (forced variability) or the ensemble spread (intrinsic variability) during the integration, e.g. for data assimilation purposes. This NEMO version can therefore solve the oceanic Fokker–Planck equation, which may open new avenues in term of experimental design for operational, climate-related, or process-oriented oceanography.

The ensemble simulations described in this paper have been performed using a
probabilistic ocean modelling system based on NEMO 3.5. The model code for
NEMO 3.5 is available from the NEMO website (

The ensemblist features of the model are based on a generic tool implemented in the NEMO parallelization module.

The computer code includes one new FORTRAN routine (mpp_ens_set; see Algorithm 1) which defines the MPI communicators required to perform simultaneous simulations, and to compute online ensemble diagnostics. This routine returns to each NEMO instance: (i) the MPI communicator that it must use to run the model, and (ii) the index of the ensemble member to be run. This index can then be used by NEMO to modify (i) the input filenames (initial condition, forcing, parameters), (ii) the output filenames (model state, restart file, diagnostics), and (iii) the seed of the random number generator used in the stochastic parameterizations.

The online computation of ensemble diagnostics requires additional routines, for instance to compute the ensemble mean or standard deviation of model variables (mpp_ens_ave_std, see Algorithm 2). This routine uses the diagnostic communicators defined by mpp_ens_set to perform summations over all ensemble members.

As can be seen from these routines, this implementation is generic and can be implemented in any kind of model that is already parallelized using a domain decomposition method.

mpp_ens_set

mpp_ens_ave_std

The authors declare that they have no conflict of interest.

This work is mainly a contribution to the OCCIPUT project, which is supported by the Agence Nationale de la Recherche (ANR) through contract ANR-13-BS06-0007-01. We acknowledge that the results of this research have been achieved using the PRACE Research Infrastructure resource Curie based in France at TGCC. The support of the TGCC-CCRT hotline from CEA, France, to the technical work is gratefully acknowledged. Some of the computations presented in this study were performed at TGCC under allocations granted by GENCI. This work also benefited from many interactions with the DRAKKAR ocean-modelling consortium, with the SANGOMA and CHAOCEAN projects. DRAKKAR is the International Coordination Network (GDRI) established between the Centre National de la Recherche Scientifique (CNRS), the National Oceanography Centre in Southampton (NOCS), GEOMAR in Kiel, and IFREMER. SANGOMA is funded by the European Community's Seventh Framework Programme FP7/2007-2013 under grant agreement 283580. CHAOCEAN is funded by the Centre National d'études Spatiales (CNES) through the Ocean Surface Topography Science Team (OST/ST). The authors are grateful for useful comments from three anonymous reviewers; they also thank the NEMO System Team and Yann Meurdesoif for interesting discussions about the development of the probabilistic version of NEMO. Laurent Bessières and Stéphanie Leroux are supported by ANR. Jean-Michel Brankart, Jean-Marc Molines, Pierre-Antoine Bouttier, Thierry Penduff, and Bernard Barnier are supported by CNRS. Marie-Pierre Moine and Laurent Terray are supported by CERFACS, and Guillaume Sérazin by CNES and Région Midi-Pyrénées. Edited by: David Ham Reviewed by: three anonymous referees