A community diagnostics and performance metrics tool for the evaluation of
Earth system models (ESMs) has been developed that allows for routine
comparison of single or multiple models, either against predecessor versions
or against observations. The priority of the effort so far has been to target
specific scientific themes focusing on selected essential climate variables
(ECVs), a range of known systematic biases common to ESMs, such as coupled
tropical climate variability, monsoons, Southern Ocean processes, continental
dry biases, and soil hydrology–climate interactions, as well as atmospheric
CO
Earth system model (ESM) evaluation with observations or reanalyses is
performed both to understand the performance of a given model and to gauge
the quality of a new model, either against predecessor versions or a wider
set of models. Over the past decades, the benefits of multi-model
intercomparison projects such as the Coupled Model Intercomparison Project
(CMIP) have been demonstrated. Since the beginning of CMIP in 1995,
participating models have been further developed, with more complex and
higher resolution models joining in CMIP5 (Taylor et al., 2012) which
supported the Intergovernmental Panel on Climate Change (IPCC) Fifth
Assessment Report (AR5) (IPCC, 2013). The main purpose of these
internationally coordinated model experiments is to address outstanding
scientific questions, to improve the understanding of climate, and to provide
estimates of future climate change. Standardization of model output in a
format that follows the Network Common Data Format (netCDF) Climate and
Forecast (CF) Metadata Convention (
An important new aspect in the next phase of CMIP (i.e. CMIP6; Eyring et al., 2015) is a more distributed organization under the oversight of the CMIP Panel, where a set of standard model experiments, which were common across earlier CMIP cycles, the Diagnostic, Evaluation and Characterization of Klima (DECK) experiments and the CMIP6 historical simulations, will be used to broadly characterize model performance and sensitivity to standard external forcing. Standardization, coordination, common infrastructure, and documentation functions that make the simulation results and their main characteristics available to the broader community are envisaged to be a central part of CMIP6. The Earth System Model Evaluation Tool (ESMValTool) presented here is a community development that can be used as one of the documentation functions in CMIP to help diagnose and understand the origin and consequences of model biases and inter-model spread. Our goal is to develop an evaluation tool that users can run to produce well-established analyses of the CMIP models once the output becomes available on the ESGF. This is realized through text files that we refer to as standard namelists, each calling a certain set of diagnostics and performance metrics to reproduce analyses that have demonstrated to be of importance in ESM evaluation in previous peer-reviewed papers or assessment reports. Through this approach, routine and systematic evaluation of model results can be made more efficient. The framework enables scientists to focus on developing more innovative analysis methods rather than constantly having to “re-invent the wheel”. An additional purpose of the ESMValTool is to facilitate model evaluation at individual modelling centres, in particular to rapidly assess the performance of a new model against predecessor versions. Righi et al. (2015) and Jöckel et al. (2016) have applied a subset of the namelists presented here to evaluate a set of simulations using different configurations of the global ECHAM/MESSy Atmospheric Chemistry model (EMAC). In this paper we also highlight the integration of ESMValTool into modelling workflows – including models developed at NOAA's Geophysical Fluid Dynamics Laboratory (GFDL), the EMAC model, and the NEMO ocean model – through the use of the ESMValTool's reformatting routine capabilities.
In addition to standardized model output, the ESGF hosts observations for
Model Intercomparison Projects (obs4MIPs; Ferraro et al., 2015; Teixeira
et al., 2014) and reanalyses data (ana4MIPs,
Schematic overview of the ESMValTool (v1.0) structure. The primary input to the workflow manager is a user-configurable text namelist file (orange). Standardized libraries/utilities (purple) available to all diagnostics scripts are handled through common interface scripts (blue). The workflow manager runs diagnostic scripts (red) that can be written in several freely available scripting languages. The output of the ESMValTool (grey) includes figures, binary files (netCDF), and a log file with a list of relevant references and processed input files for each diagnostic.
For the model evaluation we apply diagnostics and in several cases also performance metrics. Diagnostics (e.g. the calculation of zonal means or derived variables in comparison to observations) provide a qualitative comparison of the models with observations. Performance metrics are defined as a quantitative measure of agreement between a simulated and observed quantity which can be used to assess the performance of individual models or generation of models. Quantitative performance metrics are routinely calculated for numerical weather forecast models, but have been increasingly applied to atmosphere–ocean general circulation models (AOGCMs) or ESMs. Performance metrics used in these studies have mainly focused on climatological mean values of selected ECVs (Connolley and Bracegirdle, 2007; Gleckler et al., 2008; Pincus et al., 2008; Reichler and Kim, 2008), and only a few studies have developed process-based performance metrics (SPARC-CCMVal, 2010; Waugh and Eyring, 2008; Williams and Webb, 2009). The implementation of performance metrics in the ESMValTool enables a quantitative assessment of model improvements, both for different versions of individual ESMs and for different generations of model ensembles used in international assessments (e.g. CMIP5 versus CMIP6). Application of performance metrics to multiple models helps in highlighting when and where one or more models represent a particular process well. While quantitative metrics provide a valuable summary of overall model performance, they usually do not give information on how particular aspects of a model's simulation interact to determine the overall fidelity. For example, a model could simulate a mean state (and trend) in global mean surface temperature that agrees well with observations, but this could be due to compensating errors. To learn more about the sources of errors and uncertainties in models and thereby highlight specific areas requiring improvement, evaluation of the underlying processes and phenomena is necessary. A range of diagnostics and performance metrics focussing on a number of key processes are also included in the ESMValTool.
This paper describes ESMValTool version 1.0 (v1.0), which is the first release of the tool to the wider community for application and further development as open-source software. It demonstrates the use of the tool by showing example figures for each namelist for either all or a subset of CMIP5 models. Section 2 describes the technical aspects of the tool, and Sect. 3 the type of modelling and observational data currently supported by the ESMValTool (v1.0). In Sect. 4 an overview of the namelists of the ESMValTool (v1.0) is given along with their diagnostics and performance metrics and the variables and observations used. Section 5 describes the use of the ESMValTool in a typical model development cycle and evaluation workflow and Sect. 6 closes with a summary and an outlook.
In this section we give a brief overview of the ESMValTool (v1.0) which is schematically depicted in Fig. 1. A detailed user's guide is provided in the Supplement.
The ESMValTool consists of a workflow manager and a number of diagnostic and
graphical output scripts. It builds on a previously published diagnostic tool
for chemistry–climate model evaluation (CCMVal-Diag Tool; Gettelman et al.,
2012), but is different in its focus. In particular, it extends to ESMs by
including diagnostics and performance metrics relevant for the coupled Earth
system, and also focuses on evaluating models with a common set of
diagnostics rather than being mostly flexible as the CCMVal-Diag tool. In
addition, several technical and structural changes have been made that
facilitate development by multiple users. The workflow manager is written in
Python, while a multi-language support is provided in the diagnostic and the
graphic routines. The current version supports Python (
Within the workflow, the input data are checked for compliance with the CF
and Climate Model Output Rewriter (CMOR,
To facilitate the development of new namelists and diagnostics by multiple
developers from various institutions while preserving code quality and
reliability, an automated testing framework is included in the package. This
allows the developers to verify that modifications and new code are
compatible with the existing code and do not change the results of existing
diagnostics. Automated testing within the ESMValTool is implemented on two
complementary levels:
unittests are used to verify that small code units (e.g.
functions/subroutines) provide the expected results. integration testing is used to verify that a diagnostic integrates well
into the ESMValTool framework and that a diagnostic provides expected
results. This is verified by comparison of the results against a set of
reference data generated during the implementation of the diagnostic. file availability: a check that all required output data have been
successfully generated by the diagnostic. A missing file is always an
indicator for a failure of the program. file checksum: currently the MD5 checksum is used to verify that contents
of a file are the same. graphics check: for graphic files an additional test is implemented which
verifies that two graphical outputs are identical. This is in particular
useful to verify that outputs of a diagnostic remain the same after code
changes.
Each diagnostic is expected to produce a set of well-defined results, i.e.
files in a variety of formats and types (e.g. graphics, data files, ASCII
files). While testing results of a diagnostic, a special namelist file is
executed by the ESMValTool which runs a diagnostic on a limited set of test
data only, minimizing executing time for testing while ensuring that the
diagnostic produces the correct results. The tests implemented include
Unittests are implemented for each diagnostic independently using nose
(
For the documentation of the code, Sphinx is used
(
The open-source release of ESMValTool (v1.0) that accompanies this paper is intended to work with CMIP5 model output, but the tool is compatible with any arbitrary model output, provided that it is in CF-compliant netCDF format and that the variables and metadata are following the CMOR tables and definitions. The namelists are designed such that it is straightforward to execute the same diagnostics with either CMIP DECK or CMIP6 model output rather than CMIP5 output, and these will be provided when the new simulations are available. As mentioned in the previous section, routines are provided for checking CF/CMOR compliance and fixing the most common minor flaws in the model output submitted to CMIP5. More substantial deviations from the required standards in the model output may be corrected via project- and model-specific procedures defined by the user and automatically applied within the workflow. The current reformatting routines are, however, not able to convert arbitrary model output to the full CF/CMOR standard. In this case, it is the responsibility of the individual modelling groups to perform that conversion. Currently, model-specific reformatting routines are provided for EMAC (Jöckel et al., 2016, 2010), the GFDL CM3 and ESM models (Donner et al., 2011; Dunne et al., 2012, 2013), and for NEMO (Madec, 2008) which is the ocean model used in for example EC-Earth (Hazeleger et al., 2012). Users can develop similar reformatting routines specific to their model using the existing reformat routines for other models as a template. This will allow the tool to run directly on the original model output rather than having to reformat the model output to CF/CMOR beforehand.
Overview of standard namelists implemented in ESMValTool (v1.0)
along with the quantity and ESMValTool variable name for which the namelist
is tested, the corresponding observations or reanalyses, the section and
example figure in this paper, and references for the namelist. When the
namelist is named with a specific paper (naming convention:
Continued.
Continued.
Continued.
Continued.
Continued.
The observations are organized in tiers. Where available, observations from the obs4MIPs and reanalysis from the ana4MIPs archives at the ESGF are used in the ESMValTool. These data sets form “Tier 1”. Tier 1 data are freely available for download to be directly used by the tool since they are formatted following the CF/CMOR standard and do not need any additional processing. For other observational data sets, the user has to retrieve the data from their respective source and reformat them into the CF/CMOR standard. To facilitate this task, we provide specific reformatting routines for a large number of such data sets together with detailed information of the data source, as well as download and processing instructions (see Table 1). “Tier 2” includes other freely available data sets and “Tier 3” includes restricted data sets (e.g. requiring the user to accept a license agreement issued by the data owner). For Tier 2 and 3 data, links and help scripts are provided, so that these observations can be easily retrieved from their respective sources and processed by the user. A collection of all observational data used in ESMValTool (v1.0) is hosted at DLR and the ESGF nodes at BADC and DKRZ, but depending on the license terms of the observations these might not be publicly available.
A number of namelists have been included in the ESMValTool (v1.0) that group a set of performance metrics and diagnostics for a given scientific topic. Namelists that focus on the evaluation of a physical climate process for, respectively, the atmosphere, ocean, and land surface are presented in Sects. 4.1, 4.2, and 4.3. These can be applied to simulations with prescribed SSTs (i.e. AMIP runs) or the CMIP5 historical simulations (simulations for 1850 to the present day conducted with the best estimates of natural and anthropogenic climate forcing) that are run by either coupled AOGCMs or ESMs. Another set of namelists has been developed to evaluate biogeochemical biases present in ESMs when additional components of the Earth system such as the carbon cycle, atmospheric chemistry, or aerosols are simulated interactively (Sects. 4.4 and 4.5 for carbon cycle and aerosols/chemistry, respectively).
In each subsection, we first scientifically motivate the inclusion of the namelist by reviewing the main systematic biases in current ESMs and their importance and implications. We then give an overview of the namelists that can be used to evaluate such biases along with the diagnostics and performance metrics included, and the required variables and corresponding observations that are used in ESMValTool (v1.0). For each namelist we provide one to two example figures that are applied to either all or a subset of the CMIP5 models. An assessment of CMIP5 models is however not the focus of this paper. Rather, we attempt to illustrate how the namelists contained within ESMValTool (v1.0) can facilitate the development and evaluation of climate model performance in the targeted areas. Therefore, the results of each figure are only briefly described in each figure caption.
Overview of the diagnostics included for each namelist along with specific calculations, the plot type, settings in the configuration file (cfg-file), and comments. See also Annex C in the Supplement for additional information.
Continued.
Continued.
Continued.
Continued.
Continued.
Continued.
Continued.
Relative space–time root-mean square error (RMSE) calculated from
the 1980–2005 climatological seasonal cycle of the CMIP5 historical
simulations. A relative performance is displayed, with blue shading
indicating performance being better and red shading worse than the median of
all model results. A diagonal split of a grid square shows the relative error
with respect to the reference data set (lower right triangle) and the
alternate data set (upper left triangle). White boxes are used when data are
not available for the given model and variable or no alternate data set has
been used. The figure shows that performance varies across CMIP5 models and
variables, with some models comparing better with observations for one
variable and another model performing better for a different variable. Except
for global average temperatures at 200 hPa where most but not all models
have a systematic bias, the multi-model mean outperforms any individual
model. Similar to Gleckler et al. (2008) and Fig. 9.7 of Flato et al. (2013)
produced with
Table 1 provides a summary of all namelists included in ESMValTool (v1.0) along with information on the quantities and ESMValTool variable names for which the namelist is tested, the corresponding observations or reanalyses, the section and example figure in this paper, and references for the namelist. Table 2 then provides an overview of the diagnostics included for each namelist along with specific calculations, the plot type, settings in the configuration file (cfg-file), and comments.
A starting point for the calculation of performance metrics is to assess the
representation of simulated climatological mean states and the seasonal cycle
for essential climate variables (ECVs, GCOS, 2010). This is supported by a
large observational effort to deliver long-term, high-quality observations
from different platforms and instruments (e.g. obs4MIPs and the ESA Climate
Change Initiative (CCI,
Following Gleckler et al. (2008) and similar to Fig. 9.7 of Flato et
al. (2013), a namelist has been implemented in the ESMValTool that produces a
“portrait diagram” by calculating the relative space–time root-mean square
error (RMSE) from the climatological mean seasonal cycle of historical
simulations for selected variables
[
Left panel: Zonally averaged temperature profile difference between
MPI-ESM-LR and the ERA-Interim reanalysis data with masked non-significant
values. MPI-ESM-LR has generally small biases in the troposphere
(
Tested variables in ESMValTool (v1.0) that are shown in Fig. 2 are selected levels of temperature (ta), eastward (ua) and northward wind (va), geopotential height (zg), and specific humidity (hus), as well as near-surface air temperature (tas), precipitation (pr), all-sky longwave (rlut) and shortwave (rsut) radiation, longwave (LW_CRE) and shortwave (SW_CRE) cloud radiative effects, and aerosol optical depth (AOD) at 550 nm (od550aer). The models are evaluated against a wide range of observations and reanalysis data: ERA-Interim and NCEP (Kistler et al., 2001) for temperature, winds, and geopotential height, AIRS (Aumann et al., 2003) for specific humidity, CERES-EBAF for radiation (Wielicki et al., 1996), the Global Precipitation Climatology Project (GPCP, Adler et al., 2003) for precipitation, the Moderate Resolution Imaging Spectrometer (MODIS, Shi et al., 2011), and the ESA CCI aerosol data (Kinne et al., 2015) for AOD. Additional observations or reanalyses can be provided by the user for these variables and easily added. The tool can also be applied to additional variables if the required observations are made available in an ESMValTool compatible format (see Sect. 2 and Supplement).
Annual-mean surface air temperature (upper row) and precipitation
rate (mm day
Near-surface air temperature (tas) and precipitation (pr) are the two variables most commonly requested by users of ESM simulations. Often, diagnostics for tas and pr are shown for the multi-model mean of an ensemble. Both of these variables are the end result of numerous interacting processes in the models, making it challenging to understand and improve biases in these quantities. For example, near surface air temperature biases depend on the models' representation of radiation, convection, clouds, land characteristics, surface fluxes, as well as atmospheric circulation and turbulent transport (Flato et al., 2013), each with their own potential biases that may either augment or oppose one another.
The
Monsoon systems represent the dominant seasonal climate variation in the tropics, with profound socio-economic impacts. Current ESMs still struggle to capture the major features of both the South Asian summer monsoon (SASM, Sect. “South Asian summer monsoon (SASM)”) and the West African monsoon (WAM, Sect. “West African Monsoon Diagnostics”). Sperber et al. (2013) and Roehrig et al. (2013) provide comprehensive assessments of the ability of CMIP5 models to represent these two monsoon systems. By implementing diagnostics from these two studies into ESMValTool (v1.0), we aim to facilitate continuous monitoring of progress in simulating the SASM and WAM systems in ESMs.
While individual models vary in their simulations of the SASM, there are known biases in ESMs that span a range of temporal and spatial scales. The namelists in the ESMValTool are targeted toward analysing these biases in a systematic way. Climatological mean biases include excess precipitation over the equatorial Indian Ocean, too little precipitation over the Indian subcontinent, and excess precipitation over orography such as the southern slopes of the Himalayas (Annamalai et al., 2007; Bollasina and Nigam, 2009; Sperber et al., 2013); see also Fig. 4. The monsoon onset is typically too late in the models, and the boreal summer intraseasonal oscillation (BSISO), which has a particularly large socio-economic impact in South Asia, is often weak or not present (Sabeerali et al., 2013). Monsoon low-pressure systems, which generate many of the most intense rain events during the monsoon (Krishnamurthy and Misra, 2011), are often too infrequent and weak (Stowasser et al., 2009). In coupled models, biases in SSTs, evaporation, precipitation, and air–sea coupling are common (Bollasina and Nigam, 2009) and have been shown to affect both present-day simulations and future projections (Levine et al., 2013). Interannual teleconnections with El Niño-Southern Oscillation (ENSO, Lin et al., 2008) and the Indian Ocean Dipole (Ashok et al., 2004; Cherchi and Navarra, 2013) are also not well captured (Turner et al., 2005).
Monsoon precipitation intensity (upper panels) and monsoon
precipitation domain (lower panels) for TRMM and an example of deviations
from observations from three CMIP5 models (EC-Earth, HadGEM2-ES, and
GFDL-ESM2M). The models have difficulties representing the eastward extent of
the monsoon domain over the South China Sea and western Pacific, and several
models (e.g. HadGEM2-ES) underestimate the latitudinal extent of most of the
monsoon regions. The monsoon precipitation intensity tends to be
underestimated in the South Asian, East Asian and Australian monsoon regions,
while in the African and American monsoon regions the sign of the intensity
bias varies between models. Similar to Fig. 9.32 of Flato et al. (2013) and
produced with
Three SASM namelists for the basic climatology, seasonal cycle, intraseasonal
and interannual variability, and key teleconnections have been implemented in
the ESMValTool focusing on SASM rainfall and horizontal winds in
June–September (JJAS) [
Tested variables in ESMValTool (v1.0), some of which are illustrated in Figs. 5 and 6, include precipitation (pr), eastward (ua) and northward wind (va) at various levels, and skin temperature (ts). The primary reference data sets are ERA-Interim for horizontal winds, Tropical Rainfall Measuring Mission 3B43 version 7 (TRMM-3B43-v7; Huffman et al., 2007, for rainfall and HadISST, Rayner et al., 2003, for SST), although the models are evaluated against a wide range of other observational precipitation data sets (see Table 1) and an alternate reanalysis data set: the Modern-Era Retrospective Analysis for Research and Applications (MERRA; Rienecker et al., 2011).
Seasonal cycle of monthly rainfall averaged over the Indian region
(5–30
West Africa and the Sahel are highly dependent on seasonal rainfall
associated with the WAM. Rainfall in the region exhibits strong inter-decadal
variability (Nicholson et al., 2000), with major socio-economic impacts (Held
et al., 2005). Projecting the future response of the WAM to increasing
concentrations of greenhouse gases (GHG) is therefore of critical importance,
as is the ability to make dependable forecasts of the WAM evolution on
monthly to seasonal timescales. Current ESMs exhibit biases in their
representation of both the mean state (Cook and Vizy, 2006; Roehrig et al.,
2013) and temporal variability (Biasutti, 2013) of the WAM. Such biases can
affect the skill of monthly to seasonal predictions of the WAM as well as
long-term future projections. CMIP5 coupled models often exhibit warm SST
biases in the equatorial Atlantic, which induce a southward shift of the WAM
in summer (Richter et al., 2014). Because of the zonal symmetry, the
10
Precipitation (mm day
To evaluate key aspects of the WAM, two namelists have been implemented in
the ESMValTool (v1.0):
The PDO as simulated by 41 CMIP5 models (individual panels labelled
by model name) and observations (upper left panel) for the historical period
1900–2005. These patterns show the global SST anomalies (
Modes of natural climate variability from interannual to multi-decadal timescales are important as they have large impacts on the regional and even global climate with attendant socio-economic impacts. Characterization of internal (i.e. unforced) climate variability is also important for the detection and attribution of externally forced climate change signals (Deser et al., 2012, 2014). Internally generated modes of variability also complicate model evaluation and intercomparison. As these modes are spontaneously generated, they do not need to exhibit the same chronological sequence in models as in nature. However, their statistical properties (e.g. timescale, autocorrelation, spectral characteristics, and spatial patterns) are captured to varying degrees of skill among climate models. Despite their importance, systematic evaluation of these modes remains a daunting task given the wide time range to consider, the length of the data record needed to adequately characterize them, the importance of sub-surface oceanic processes, and uncertainties in the observational records (Deser et al., 2010).
In order to assess natural modes of climate variability in models, the NCAR
Climate Variability Diagnostics Package (CVDP, Phillips et al., 2014) has
been implemented into the ESMValTool. The CVDP has been developed as a
standalone tool. To allow for easy updating of the CVDP once a new version is
released, the structure of the CVDP is kept in its original form and a single
namelist [
Depending on the climate mode analysed, the CVDP package uses the following variables: precipitation (pr), sea level pressure (psl), near-surface air temperature (tas), skin temperature (ts), snow depth (snd), and the basin-average ocean meridional overturning mass stream function (msftmyz). The models are evaluated against a wide range of observations and reanalysis data, for example NCEP for near-surface air temperature, HadISST for skin temperature, and the NOAA-CIRES Twentieth Century Reanalysis Project (Compo et al., 2011) for sea level pressure. Additional observations or reanalysis can be added by the user for these variables. The ESMValTool (v1.0) namelist runs on all CMIP5 models. As an example, Fig. 8 shows the representation of the PDO as simulated by 41 CMIP5 models and observations (HadISST) and Fig. 9 the mean AMOC from 13 CMIP5 models.
Long-term annual mean Atlantic Meridional Overturning Streamfunction
(AMOC; Sv) as simulated by 13 CMIP5 models (individual panels labelled by
model name) for the historical period 1900–2005. AMOC annual averages are
formed, weighted by the cosine of the latitude and by the depth of the
vertical layer, and then the data is masked by setting all those areas to
missing where the variance is less than
1
The MJO is the dominant mode of tropical intraseasonal variability (30–80 day) and has wide impacts on numerous regional climate and weather phenomena (Madden and Julian, 1971). Associated with enhanced convection in the tropics, the MJO exerts a significant influence on monsoon precipitation, e.g. on the South Asian Monsoon (Pai et al., 2011) and on the west African monsoon (Alaka and Maloney, 2012). The eastward propagation of the MJO into the West Pacific can trigger the onset of some El Niño events (Feng et al., 2015; Hoell et al., 2014). The MJO also influences tropical cyclogenesis in various ocean basins (Klotzbach, 2014). Increased vertical resolution in the atmosphere and better representation of stratospheric processes have led to an improvement in MJO fidelity in CMIP5 compared to CMIP3 (Lin et al., 2006). However, current generation models still struggle to adequately capture the eastward propagation of the MJO (Hung et al., 2013) and the variance intensity is typically too weak. Identifying and reducing such biases will be important for ESMs to accurately represent important climate phenomena, such as regional precipitation variability in the tropics arising through the differing impact of MJO phases on ENSO and ENSO forced regional climate anomalies (Hoell et al., 2014).
To assess the main MJO features in ESMs, a namelist with a number of
diagnostics developed by the US CLIVAR MJO Working Group (Kim et al., 2009;
Waliser et al., 2009) has been implemented in the ESMValTool (v1.0)
[
Observation and reanalysis data sets include GPCP-1DD for precipitation,
ERA-Interim and NCEP-DOE reanalysis 2 for wind components (Kanamitsu et al.,
2002) and NOAA polar-orbiting satellite data for OLR (Liebmann and Smith,
1996). The majority of the scripts are based on example scripts at
May–October wavenumber-frequency spectra of
10
In addition to the previously discussed biases in precipitation, many ESMs that rely on parameterized convection exhibit biases related to the diurnal cycle and timing of precipitation. Over land, ESMs tend to simulate a diurnal cycle of continental convective precipitation in phase with insolation, while observed precipitation peaks in the early evening. This constitutes one of the endemic biases of ESMs, in which convective precipitation intensity is often related to atmospheric instability. This bias can have important implications for the simulated climate, as the timing of precipitation influences subsequent surface evaporation, and convective clouds affect radiation differently around noon or in late afternoon. The biases in the diurnal cycle are most pronounced over land areas and the diurnal cycles of convection and clouds during the day contribute to the continental warm bias (Cheruy et al., 2014). Similarly, biases in the diurnal cycle also exist over the ocean (Jiang et al., 2015). Another motivation for looking at the diurnal cycle in models is that its representation is more closely linked to the parameterizations of surface fluxes, boundary-layer, convection and cloud processes than any other diagnostics. The phase of precipitation and radiative fluxes during the day is the consequence of surface warming, boundary-layer turbulence mixing and cumulus clouds moistening, as well as of the triggering criteria used to activate deep convection, and the closure used to compute convective intensity. The evaluation of the diurnal cycle thus provides a direct insight into the representation of physical processes in a model. Recent efforts to improve the representation of the diurnal cycle of precipitation models include modifying the convective entrainment rate, revisiting the quasi-equilibrium hypothesis for shallow and deep convection, and adding a representation of key missing processes such as boundary-layer thermals or cold pools. We envisage that ESMValTool will help to quantify the impact of those improvements in the next generation of ESMs.
Mean diurnal cycle of precipitation (mm h
To help document progress made in the representation of the diurnal cycle of
precipitation (pr) in models, a set of diagnostics has been implemented in
the ESMValTool. After regridding all data on a common
2.5
Climatological (1985–2005) annual-mean cloud radiative effects from
the CMIP5 models against CERES EBAF (2001–2012) in W m
Clouds are a key component of the climate system because of their large impact on the radiation budget as well as their crucial role in the hydrological cycle. The simulation of clouds in climate models has been challenging because of the many non-linear processes involved (Boucher et al., 2013). Simulations of long-term mean cloud properties from the CMIP3 and CMIP5 models show large biases compared to observations (Chen et al., 2011; Klein et al., 2013; Lauer and Hamilton, 2013). Such biases have a range of implications as they affect application of these models to investigate chemistry–climate interactions and aerosol–cloud interactions, while also having an impact on the climate sensitivity of the model.
The namelist
The cloud namelist focuses on precipitation (pr) and four cloud parameters that largely determine the impact of clouds on the radiation budget and thus climate in the model simulations: total cloud amount (clt), liquid water path (lwp), ice water path (iwp), and TOA cloud radiative effect (CRE) consisting of the longwave CRE and shortwave CRE that can also separately be evaluated with the performance metrics namelist (see Sect. 4.1.1). Precipitation is evaluated with GPCP data, total cloud amount with MODIS, liquid water path with passive-microwave satellite observations from the University of Wisconsin (O'Dell et al., 2008), and the ice water path with MODIS Cloud Model Intercomparison Project (MODIS-CFMIP, Pincus et al., 2012; King et al., 2003) data.
The cloud–climate radiative feedback process remains one of the largest
sources of uncertainty in determining the climate sensitivity of models
(Boucher et al., 2013). Traditionally, clouds have been evaluated in terms of
their impact on the mean top of atmosphere fluxes. However, it is possible to
achieve good performance on these quantities through compensating errors; for
example, boundary layer clouds may be too reflective but have insufficient
horizontal coverage (Nam et al., 2012). Williams and Webb (2009) proposed a
Cloud Regime Error Metric (CREM) which critically tests the ability of a
model to simulate both the relative frequency of occurrence and the radiative
properties correctly for a set of cloud regimes determined by the daily mean
cloud top pressure, in-cloud albedo and fractional coverage at each grid box.
Having previously identified the regimes by clustering joint cloud-top
pressure-optical depth histograms from the International Satellite Cloud
Climatology Project (ISCCP, Rossow and Schiffer, 1999) as per Williams and
Webb (2009), each daily model grid box is assigned to the regime cluster
centroid with the closest cloud top pressure, in-cloud albedo and fractional
coverage as determined by the three-element Euclidean distance. The fraction
of grid points assigned to each of the regimes and the mean radiative
properties of those grid points are then compared to the observed values.
This routine also uses a bilinear regridding method with a
2.5
This metric is now implemented in the ESMValTool (v1.0), with references in
the code to tables in the Williams and Webb (2009) study defining the cluster
centroids [
Cloud Regime Error Metric (CREM) from Williams and Webb (2009)
applied to some CMIP5 AMIP simulations with the required data in the archive.
The results show that MIROC5 is the best performing model on this metric,
other models are slightly worse on this metric. The red dashed line shows the
observational uncertainty estimated from applying this metric to independent
data from MODIS. An advantage of the metric is that its components can be
decomposed to investigate the reasons for poor performance. This requires
extra print statements compared to the default code but might help to
identify, for instance, cloud regimes that are too reflective or simulated
too frequently at the expense of some of the other regimes. Produced with
Analysis of ocean model data from ESMs poses several unique challenges for
analysis. First, in order to avoid numerical singularities in their
calculations, ocean models often use irregular grids where the poles have
been rotated or moved to be located over land areas. For example, the global
configuration of the Nucleus for European Modelling of the Ocean (NEMO)
framework uses a tripolar grid (Madec, 2008), with the three poles located
over Siberia, Canada, and Antarctica. Second, transports of scalar quantities
(e.g. overturning stream functions and heat transports) can only be
calculated accurately on the original model grids as interpolation to other
grids introduces errors. This means that e.g. for the calculation of water
transport through a strait, both the horizontal and vertical extent of the
grids on which the
Earth system models often show large biases in the Southern Ocean mixed
layer. For example, Sterl et al. (2012) showed that in EC-Earth/NEMO the
Southern Ocean is too warm and salinity too low, while the mixed layer is too
shallow. These biases are not specific to EC-Earth, but are rather
widespread. At the same time, values for Antarctic Circumpolar Current (ACC)
transport vary between 90 and 264 Sv in CMIP5 models, with a mean of
155
Annual-mean difference between EC-Earth/NEMO and ERA-Interim sea
surface temperatures
A namelist has been implemented in the ESMValTool to analyse these biases
[
One leading cause of SST biases in the Southern Ocean is systematic biases in surface radiation fluxes (Trenberth and Fasullo, 2010) coupled with systematic errors in macrophysical (e.g. cloud amount) and microphysical (e.g. frequency of mixed-phase clouds) cloud properties (Bodas-Salcedo et al., 2014).
A namelist has been implemented in the ESMValTool that compares model
estimates of cloud, radiation, and surface turbulent flux variables over the
Southern Ocean with suitable observations
[
The following diagnostics are calculated with accompanying plots:
(i) seasonal mean absolute-value and difference maps for model data versus
observations covering the Southern Ocean region (30–65
Upper panel: covariability between incoming surface shortwave
radiation (rsds) and total cloud cover (clt). Lower panel: fraction
occurrence histograms of binned cloud cover: observations are CERES-EBAF
(radiation) and CloudSat (cloud cover). The CanESM2 model from the CMIP5
archive is shown as an example for comparison to observations (the namelist
runs on all CMIP5 models). CanESM2 generally reproduces the observed slope of
rsds as a function of clt, although there is a systematic positive bias in
the amount of shortwave radiation reaching the surface for most cloud cover
values. A positive bias is also seen in the CanESM2 histogram of cloud
occurrence, with a strong peak in seasonal cloud fraction of 90 % in most
seasons. Produced with
An accurate representation of the tropical climate is fundamental for ESMs. The majority of solar energy received by the Earth is in the tropics and the potential for thermal emission of absorbed energy back into space is also largest in the tropics due to the high column concentrations of water vapour at low latitudes (Pierrehumbert, 1995; Stephens and Greenwald, 1991). Coupled interactions between equatorial SSTs, surface wind stress, precipitation and upper-ocean mixing are central to many tropical biases in ESMs. This is the case both with respect to the mean state and for key modes of variability, influenced by, or interacting with, the mean state (e.g. ENSO, Choi et al., 2011). Such biases are often reflected in a “double ITCZ” seen in the majority of CMIP3 and CMIP5 CCMs (Li and Xie, 2014; Oueslati and Bellon, 2015). The double ITCZ bias, present in many ESMs, occurs when models fail to simulate a single, year-round, ITCZ rainfall maximum north of the Equator. Instead, an unrealistic secondary maximum in models south of the Equator is present for part or all of the year. Such biases are particularly prevalent in the tropical Pacific, but can also occur in the Atlantic (Oueslati and Bellon, 2015). This double ITCZ is often accompanied by an overextension of the eastern Pacific equatorial cold tongue into the central Pacific, collocated with a positive bias in easterly near-surface wind speeds and a shallow bias in ocean mixed-layer depth (Lin, 2007). Such biases can directly impact the ability of an ESM to accurately represent ENSO variability (An et al., 2010; Guilyardi, 2006) and its potential sensitivity to climate change (Chen et al., 2015), with negative consequences for a range of simulated features, such as regional tropical temperature and precipitation variability, monsoon dynamics, and ocean and terrestrial carbon uptake (Iguchi, 2011; Jones et al., 2001).
To assess such tropical biases with the ESMValTool, we have implemented a
namelist with diagnostics motivated by the work of Li and Xie (2014):
Latitude cross section of seasonal and zonally averaged values of
SSTs and precipitation for the tropical Pacific (zonal averages are made
between 120
Sea ice is a key component of the climate system through its effects on radiation and seawater density. A reduction in sea ice area results in increased absorption of shortwave radiation, which warms the sea ice region and contributes to further sea ice loss. This process is often referred to as the sea ice albedo climate feedback which is part of the Arctic amplification phenomena. CMIP5 models tend to underestimate the decline in summer Arctic sea ice extent observed by satellites during the last decades (Stroeve et al., 2012) which may be related to models' underestimation of the sea ice albedo feedback process (Boé et al., 2009). Conversely in the Antarctic, observations show a small increase in March sea ice extent, while the CMIP5 models simulate a small decrease (Flato et al., 2013; Stroeve et al., 2012). It is therefore important that model sea ice processes are evaluated and improvements regularly assessed. Caveats have been noted with respect to the limitations of using only sea ice extent as a metric of model performance (Notz et al., 2013) as the sea ice concentration, volume, and drift, sea ice thickness and surface albedo, as well as sea ice processes such as melt pond formation or the summer sea ice melt are all important sea ice related quantities. In addition, the atmospheric forcings (e.g. wind, clouds, and snow) and ocean forcings (e.g. salinity and ocean transport) impact on the sea ice state and evolution.
In ESMValTool (v1.0) the sea ice namelist includes diagnostics that cover sea
ice extent and concentration [
Time series (1960–2005) of September mean Arctic sea ice extent
from the CMIP5 historical simulations. The CMIP5 ensemble mean is highlighted
in dark red and the individual ensemble members of each model (coloured
lines) are shown in different linestyles. The model results are compared to
observations from the NSIDC (1978–2005, black solid line) and the Hadley
Centre sea ice and sea surface temperature (HadISST, 1960–2005, black dashed
line). Consistent with observations, most CMIP5 models show a downward trend
in sea ice extent over the satellite era. The range in simulated sea ice is
however quite large (between 3.2 and 12.1
The representation of land surface processes and fluxes in climate models critically affects the simulation of near-surface climate over land. In particular, energy partitioning at the surface strongly influences surface temperature, and it has been suggested that temperature biases in ESMs can be in part related to biases in evapotranspiration. The most notable feature in the majority of CMIP3 and CMIP5 models is a tendency to overestimate evapotranspiration globally (Mueller and Seneviratne, 2014).
A diagnostic to analyse the representation of evapotranspiration in ESMs has
been included in the ESMValTool
[
Bias in evapotranspiration (mm day
Evaluation of precipitation is a challenge due to potentially large errors and uncertainty in observed precipitation data (Biemans et al., 2009; Legates and Willmott, 1990). An alternative or additional option to the direct evaluation of precipitation over land (such as e.g. included in the global precipitation evaluation in Sect. 4.1.2) is the evaluation of river runoff that can in principle be measured with comparatively small errors for most rivers. Routine measurements are performed for many large rivers, generating a large global database (e.g. available at the Global Runoff Data Centre (GRDC, Dümenil Gates et al., 2000)). The length of available time series, however, varies between the rivers, with large data gaps especially in recent years for many rivers. The evaluation of runoff against river gauge data can provide a useful independent measure of the simulated hydrological cycle. If both river flow and precipitation are given with reasonable accuracy, it will also provide an observational constraint on model surface evaporation, provided that the considered averaging time periods are long enough so that changes in surface water storages are negligible (Hagemann et al., 2013), e.g. by considering climatological means of 20 years or more. For present climate conditions ESMs often exhibit a dry and warm near-surface bias during summer over mid-latitude continents (Hagemann et al., 2004). Continental dry biases in precipitation exist in the majority of CMIP5 models over South America, the Mid-West of the US, the Mediterranean region, central and eastern Europe, and western and South Asia (Fig. 4 of this paper and Fig. 9.4 of Flato et al., 2013). These precipitation biases often transfer into dry biases in runoff, but sometimes dry biases in runoff can be caused by a too large evapotranspiration (Hagemann et al., 2013). In order to relate biases in runoff to biases in precipitation and evapotranspiration, the catchment oriented evaluation in this section considers biases in all three variables. This means that the respective variables are considered to be spatially averaged over the drainage basins of large rivers.
Beside bias maps, a set of diagnostics to produce basin-scale comparisons of
runoff (mrro), evapotranspiration (evspsbl) and precipitation (pr) have also
been implemented in ESMValTool [
Biases in runoff coefficient (runoff/precipitation) and
precipitation for major catchments of the globe. The MPI-ESM-LR historical
simulation is used as an example. Even though positive and negative
precipitation biases exist for MPI-ESM-LR in the various catchment areas, the
bias in the runoff coefficient is usually negative. This implies that the
fraction of evapotranspiration generally tends to be overestimated by the
model independently of whether precipitation has a positive or negative bias.
Produced with
Relative space–time RMSE calculated from the 1986–2005
climatological seasonal cycle of the CMIP5 historical simulations over
different sub-domains for net biosphere productivity (nbp), leaf area index
(lai), gross primary productivity (gpp), precipitation (pr) and near-surface
air temperature (tas). The RMSE has been normalized with the maximum RMSE in
order to have a skill score ranging between 0 and 1. A score of 0 indicates
poor performance of models reproducing the phase and amplitude of the
reference mean annual cycle, while a perfect score is equal to 1. The
comparison suggests that there is no clearly superior model for all
variables. All models have significant problems in representing some key
biogeochemical variables such as nbp and lai, with the largest errors in the
tropics mainly because of a too weak seasonality. Similar to Fig. 18 of Anav
et al. (2013) and produced with
A realistic representation of the global carbon cycle is a fundamental
requirement for ESMs. In the past, climate models were directly forced by
atmospheric CO
The diagnostics implemented in the ESMValTool to evaluate simulated
terrestrial biogeochemistry are based on the study of Anav et al. (2013) and
span several timescales: climatological means, and intra-annual (seasonal
cycle), interannual, and long-term trends
[
For land, diagnostics of the land carbon sink net biosphere productivity
(nbp) are essential. Although direct observations are not available, nbp can
be estimated from atmospheric CO
Marine biogeochemistry models form a core component of ESMs and require
evaluation for multiple passive tracers. The increasing availability of
quality-controlled global biogeochemical data sets for the historical period
(e.g. Surface Ocean CO
Error-bar plot showing the 1986–2005 CMIP5 integrated nbp for
different land subdomains. Positive values of nbp correspond to land uptake,
vertical bars are computed considering the interannual variation. The models
are compared to JMA inversion estimates. The models' range is very large and
results show that ESMs fail to accurately reproduce the global net land
CO
A namelist is provided that includes diagnostics to support the evaluation of
ocean biogeochemical cycles at global scales, as simulated by both ocean-only
and coupled climate–carbon cycle ESMs
[
A diagnostic for oceanic net primary production (npp) is also implemented in
the ESMValTool for the climatological annual mean and seasonal cycle, as well
as for interannual variability over the 1986–2005 period
[
Tropospheric aerosols play a key role in the Earth system and have a strong influence on climate and air pollution. The global aerosol distribution is characterized by a large spatial and temporal variability which makes its representation in ESMs particularly challenging (Ghan and Schwartz, 2007). In addition, aerosol interactions with radiation (direct aerosol effect, Schulz et al., 2006) and with clouds (indirect aerosol effects, Lohmann and Feichter, 2005) need to be accounted for. Model-based estimates of anthropogenic aerosol effects are still affected by large uncertainties, mostly due to an incorrect representation of aerosol processes (Kinne et al., 2006). Myhre et al. (2013) report a substantial spread in simulated aerosol direct effects among 16 global aerosol models and attribute it to diversities in aerosol burden, aerosol optical properties and aerosol optical depth (AOD). Diversities in black carbon (BC) burden up to a factor of three, related to model disagreements in simulating deposition processes were also found by Lee et al. (2013). Model meteorology can be a source of diversity since it impacts on atmospheric transport and aerosol lifetime. This in turn relates to the simulated essential climate variables such as winds, humidity and precipitation (see Sect. 4.1). Large biases also exist in simulated aerosol indirect effects (IPCC, 2013) and are often a result of systematic errors in both model aerosol and cloud fields (see Sect. 4.1.6).
Interannual variability in de-trended annual mean surface
To assess current biases in global aerosol models, the aerosol namelist of
the ESMValTool comprises several diagnostics to compare simulated aerosol
concentrations and optical depth at the surface against station data,
motivated by the work of Pringle et al. (2010), Pozzer et al. (2012), and
Righi et al. (2013) [
In the past, climate models were forced with prescribed tropospheric and stratospheric ozone concentration, but since CMIP5 some ESMs have included interactive chemistry and are capable of representing prognostic ozone (Eyring et al., 2013; Flato et al., 2013). This allows models to simulate important chemistry–climate interactions and feedback processes. Examples include the increase in oxidation rates in a warmer climate which leads to decreases in methane and its lifetime (Voulgarakis et al., 2013) or the increase in tropical upwelling (associated with the Brewer–Dobson circulation) in a warmer climate and corresponding reductions in tropical lower stratospheric ozone as a result of faster transport and less time for ozone production (Butchart et al., 2010; Eyring et al., 2010). It is thus becoming important to evaluate the simulated atmospheric composition in ESMs. A common high bias in the Northern Hemisphere and a low bias in the Southern Hemisphere have been identified in tropospheric column ozone simulated by chemistry–climate models participating in the Atmospheric Chemistry Climate Model Intercomparison Project (ACCMIP), which could partly be related to deficiencies in the ozone precursor emissions (Young et al., 2013). Analysis of CMIP5 models with respect to trends in total column ozone show that the multi-model mean of the models with interactive chemistry is in good agreement with observations, but that significant deviations exist for individual models (Eyring et al., 2013; Flato et al., 2013). Large variations in stratospheric ozone in models with interactive chemistry drive large variations in lower stratospheric temperature trends. The results show that both ozone recovery and the rate of GHG increase determine future Southern Hemisphere summer-time circulation changes and are important to consider in ESMs (Eyring et al., 2013).
Time series of global oceanic mean aerosol optical depth (AOD) from
individual CMIP5 models' historical (1850–2005) and RCP 4.5 (2006–2010)
simulations, compared with MODIS and ESACCI-AEROSOL satellite data. All
models simulate a positive trend in AOD starting around 1950. Some models
also show distinct AOD peaks in response to major volcanic eruptions, e.g. El
Chichon (1982) and Pinatubo (1991). The models simulate quite a wide range of
AODs, between 0.05 and 0.20 in 2010, which largely deviates from the observed
values from MODIS and ESACCI-AEROSOL. A significant difference, however,
exists also between the two satellite data sets (about 0.05), indicating an
observational uncertainty. Similar to Fig. 9.29 of Flato et al. (2013) and
produced with
The namelists implemented in the ESMValTool to evaluate atmospheric chemistry
can reproduce the analysis of tropospheric ozone and precursors of Righi et
al. (2015) [
Climatological mean annual mean tropospheric column ozone averaged
between 2000 and 2005 from the CMIP5 historical simulations compared to
MLS/OMI observations (2005–2012). The values on top of each panel show the
global (area-weighted) average, calculated after regridding the data to the
horizontal grid of the model and ignoring the grid cells without available
observational data. The comparison shows a high bias in tropospheric column
ozone in the Northern Hemisphere and a low bias in the Southern Hemisphere in
the CMIP5 multi-model mean. Similar to Fig. 13 of Righi et al. (2015) and
produced with
The relatively new research field of emergent constraints aims to link model
performance evaluation with future projection feedbacks. An emergent
constraint refers to the use of observations to constrain a simulated future
Earth system feedback. It is referred to as emergent because a relationship
between a simulated future projection feedback and an observable element of
climate variability emerges from an ensemble of ESM projections, potentially
providing a constraint on the future feedback. Emergent constraints can help
focus model development and evaluation onto processes underpinning
uncertainty in the magnitude and spread of future Earth system change.
Systematic model biases in certain forced modes, such as the seasonal cycle
of snow cover or interannual variability of tropical land CO
To reproduce the analysis of Wenzel et al. (2014) that provides an emergent
constraint on future tropical land carbon uptake, a namelist is included in
ESMValTool (v1.0) to perform an emergent constraint analysis of the carbon
cycle–climate feedback parameter (
As new model versions are developed, standardized diagnostics suites as presented here allow model developers to compare their results against previous versions of the same model or against other models, e.g. CMIP5 models. Such analyses help to identify different aspects in a model that have either improved or degraded as a result of a particular model development. The benchmarking of ESMs using performance metrics (see Sect. 4.1.1) provides an overall picture of the quality of the simulation, whereas process-oriented diagnostics help determine whether the simulation quality improvements are for the correct underlying physical reasons and point to paths for further model improvement.
Total column ozone time series for
The ESMValTool is intended to support modelling centres with quality control of their CMIP DECK experiments and the CMIP6 historical simulation, as well as other experiments from CMIP6-Endorsed Model Intercomparison Projects (Eyring et al., 2015). A significant amount of institutional resources go into running, post-processing, and publishing model results from such experiments. It is important that centres can easily identify and correct potential errors in this process. The standardized analyses contained in the ESMValTool can be used to monitor the progress of CMIP experiments. While the tool is designed to accommodate a wide range of time axes and configurations, and many of the diagnostics may be run on control or future climate experiments, ESMValTool (v1.0) is largely targeted to evaluate AMIP and the CMIP historical simulations.
The ESMValTool can be run as a stand-alone tool, or integrated into existing modelling workflows. The primary challenge is to provide CF/CMOR compliant data. Not all modelling centres produce CF/CMOR compliant data directly as part of their workflow although we note that more are doing so as the potential benefits are being realized. For many groups conversion to CF/CMOR standards involves significant post-processing of native model output. This may require some groups to perform analysis via the ESMValTool on their model output after conversion to CF/CMOR, or to create intermediate “CMOR-like” versions of the data. Users who wish to use native model output can take advantage of the reformatting routine flexibility (see Sect. 3) to create scripts that convert this data into the CF/CMOR standard. As an example, reformat scripts for the NOAA-GFDL, EMAC and NEMO models are included with the initial release. These scripts are used to convert the native model output for direct use with the ESMValTool. The reformatting routine capability may provide an alternative to more expensive and complete “CMORization” processes that are usually required to formally publish model data on the ESGF.
Large international model inter-comparison projects such as CMIP stimulated the development of a globally distributed federation of data providers, supporting common data provisioning policies and infrastructures. ESGF is an international open-source effort to establish a distributed data and computing platform, enabling worldwide access to Peta- (in the future Exa-) byte-scale scientific climate data. Data can be searched via a globally distributed search index with access possible via HTTP, OpenDAP, and GridFTP. To efficiently run the ESMValTool on CMIP model data and observations alongside the ESGF, the necessary data hosted by the ESGF have to be made locally accessible at the site where ESMValTool is executed. There are various ways this might be achieved. One possibility is to run ESMValTool separately at each site holding data sets required by the analysis; then, combine the results. However, this is limited by the extent to which calculations can be performed without requiring data from another site. A more practical possibility is running ESMValTool alongside a large store of replica data sets gathered from across the ESGF, so that all the required data are in one location. Certain large ESGF sites (e.g. DKRZ, BADC, IPSL, PCMDI) provide replica data set stores, and ESMValTool has been run in such a way at several of these sites.
Replica data set stores do not provide a complete solution however, as it is
impossible to replicate all ESGF data sets at one site, so circumstances will
arise when one or more required data sets are not available locally. The
obvious solution is to download these data sets from elsewhere in the ESGF,
and store them locally whilst the analysis is carried out. The indexed search
facility provided by the ESGF makes it easy to identify the download URL of
such “remote” data sets, and a prototype of the ESMValTool (not included in
v1.0) has been developed that performs this search automatically using
esgf-pyclient (
The Earth System Model Evaluation Tool (ESMValTool) is a diagnostics package
for routine evaluation of Earth System Models (ESMs) with observations and
reanalyses data or for comparison with results from other models. The
ESMValTool has been developed to facilitate the evaluation of complex ESMs at
individual modelling centres and to help streamline model evaluation
standards within CMIP. Priorities to date that are included in ESMValTool
(v1.0) described in this paper concentrate on selected systematic biases that
were a focus of the European Commission's 7th Framework Programme “Earth
system Model Bias Reduction and assessing Abrupt Climate change (EMBRACE)
project, the DLR Earth System Model Evaluation (ESMVal) project and other
collaborative projects, in particular: performance metrics for selected ECVs,
coupled tropical climate variability, monsoons, Southern Ocean processes,
continental dry biases and soil hydrology–climate interactions, atmospheric
CO
Schematic overview of the coupling of the ESMValTool to the ESGF.
ESMValTool (v1.0) can be used to compare new model simulations against CMIP5 models and observations for the selected scientific themes much faster than this was possible before. Model groups, who wish to do this comparison before submitting their CMIP6 historical simulations or AMIP experiments to the ESGF can do so since the tool is provided as open-source software. In order to run the tool locally, observations need to be downloaded and for tiers 2 and 3 reformatted with the help of the reformatting scripts that are included. Model output needs to be either in CF compliant NetCDF or a reformatting routine needs to be written by the modelling group, following given examples for EMAC, GFDL models, and NEMO.
Users of the ESMValTool (v1.0) results need to be aware that ESMValTool (v1.0) only includes a subset of the wide behaviour of model performance that the community aims to characterize. The results of running the ESMValTool need to be interpreted accordingly. Over time, the ESMValTool will be extended with additional diagnostics and performance metrics. A particular focus will be to integrate additional diagnostics that can reproduce the analysis of the climate model evaluation chapter of IPCC AR5 (Flato et al., 2013) as well as the projection chapter (Collins et al., 2013). We will also extend the tool with diagnostics to quantify forcings and feedbacks in the CMIP6 simulations and to calculate metrics such as the equilibrium climate sensitivity (ECS), transient climate response (TCR), and the transient climate response to cumulative carbon emissions (TCRE) (IPCC, 2013). While inclusion of these diagnostics is straightforward, the evaluation of processes and phenomena to improve understanding about the sources of errors and uncertainties in models that we also plan to enhance remains a scientific challenge. The field of emergent constraints remains in its infancy and more research is required how to better link model performance to projections (Flato et al., 2013). In addition, an improved consideration of the interdependency in the evaluation of a multi-model ensemble (Sanderson et al., 2015a, b) as well as internal variability in ESM evaluation is required.
A critical aspect in ESM evaluation is the availability of consistent, error-characterized global and regional Earth observations, as well as accurate globally gridded reanalyses that are constrained by assimilated observations. Additional or longer records of observations and reanalyses will be used as they become available, with a focus on using obs4MIPs – including new contributions from the European Space Agency's Climate Change Initiative (ESA CCI) – and ana4MIPs data. The ESMValTool can consider observational uncertainty in different ways, e.g. through the use of more than one observational data set to directly evaluate the models, by showing the difference between the reference data set and the alternative observations, or by including an observed uncertainty ensemble that spans the observed uncertainty range (e.g. available for the surface temperature data set compiled for HadISST). Often the uncertainties in the observations are not readily available. Reliable and robust error characterization/estimation of observations is a high priority throughout the community, and obs4MIPs and other efforts that create data sets for model evaluation should encourage the inclusion of such uncertainty estimates as part of each data set.
The ESMValTool will be contributed to the analysis code catalogue being
developed by the WGNE/WGCM climate model metrics panel. The purpose of this
catalogue is to make the diversity of existing community-based analysis
capabilities more accessible and transparent, and ultimately for developing
solutions to ensure they can be readily applied to the CMIP DECK and the
CMIP6 historical simulation in a coordinated way. We are currently exploring
options to interface with complimentary efforts, e.g. the PCMDI Metrics
Package (PMP, Gleckler et al., 2016) and the Auto-Assess package that is
under development at the UK Met Office. An international strategy for
organising and presenting CMIP results produced by various diagnostic tools
is needed, and this will be a priority for the WGNE/WGCM climate metrics
panel in collaboration with the CMIP Panel
(
This paper presents ESMValTool (v1.0) which allows users to repeat all the analyses shown. Additional updates and improvements will be included in subsequent versions of the software, which are planned to be released on a regular basis. The ESMValTool works on CMIP5 simulations and, given CMIP DECK and CMIP6 simulations will be in a similar format, it will be straightforward to run the package on these simulations. A limiting factor at present is the need to download all data to a local cache. This limitation has spurred the development allowing ESMValTool to run alongside the ESGF at one of the data nodes. A prototype exists that couples the tool to the ESGF (see Sect. 5.3). An additional limiting factor is that the model output from all CMIP models has to be mirrored to the ESGF data node where the tool is installed. This is facilitated by providing a listing of the variables and time frequencies that are used in ESMValTool (v1.0) which uses a significantly smaller volume than the data request for the CMIP DECK and CMIP6 simulations includes. This reduced set of data could be mirrored with priority.
Several technical improvements are required to make the software package more efficient. One current limitation is the lack of a parallelization. Given the huge amount of data involved in a typical CMIP analysis, this can be highly CPU-time-intensive when performed on a single processor. In future releases, the possibility of parallelizing the tool will be explored. Additional development work is ongoing to create a more flexible pre-processing framework, which will include operations like ensemble-averaging and regridding to the current reformatting procedures as well as an improved coupling to the ESGF. Here, future versions of the ESMValTool will build as much as possible on existing efforts for the backend that reads and reformats data. In this regard it would be helpful if an application programming interface (API) could be defined for example by the WGCM Infrastructure Panel (WIP) that allows for flexible integration of diagnostics across different tools and programming languages in CMIP to this backend.
We aim to move ESM evaluation beyond the state-of-the-art by investing in operational evaluation of physical and biogeochemical aspects of ESMs, process-oriented evaluation and by identifying processes most important to the magnitude and uncertainty of future projections. Our goal is to support model evaluation in CMIP6 by contributing the ESMValTool as one of the standard documentation functions and by running it alongside the ESGF. In collaboration with similar efforts, we aim for a routine evaluation that provides a comprehensive documentation of broad aspects of model performance and its evolution over time and to make evaluation results available at a timescale that was not possible in CMIP5. This routine evaluation is not meant to replace further in-depth analysis of model performance and can to date not strongly reduce uncertainties in global climate sensitivity which remains an active area of research. However, the ability to routinely perform such evaluation will drive the quality and realism of ESMs forward and will leave more time to develop innovative process-oriented diagnostics – especially those related to feedbacks in the climate system that link to the credibility of model projections.
ESMValTool (v1.0) is released under the Apache License, VERSION 2.0. The
latest version of the ESMValTool is available from the ESMValTool webpage at
The development of the ESMValTool (v1.0) was funded by the European Commission's 7th Framework Programme, under Grant Agreement number 282672, the “Earth system Model Bias Reduction and assessing Abrupt Climate change (EMBRACE)” project and the DLR “Earth System Model Validation (ESMVal)” and “Klimarelevanz von atmosphärischen Spurengasen, Aerosolen und Wolken: Auf dem Weg zu EarthCARE und MERLIN (KliSAW)” projects. In addition, financial support for the development of ESMValTool (v1.0) was provided by ESA's Climate Change Initiative Climate Modelling User Group (CMUG). We acknowledge the World Climate Research Program's (WCRP's) Working Group on Coupled Modelling (WGCM), which is responsible for CMIP, and we thank the climate modelling groups for producing and making available their model output. For CMIP the US Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. We thank Björn Brötz (DLR, Germany) for his help with the release of the ESMValTool and Clare Enright (UEA, UK) for support with development of the ocean biogeochemistry diagnostics. We are grateful to Patrick Jöckel (DLR, Germany), Ron Stouffer (GFDL, USA) and to the two anonymous referees for their constructive comments on the manuscript. The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association. Edited by: S. Easterbrook