Model calibration (or “tuning”) is a necessary part of developing and testing coupled ocean–atmosphere climate models regardless of their main scientific purpose. There is an increasing recognition that this process needs to become more transparent for both users of climate model output and other developers. Knowing how and why climate models are tuned and which targets are used is essential to avoiding possible misattributions of skillful predictions to data accommodation and vice versa. This paper describes the approach and practice of model tuning for the six major US climate modeling centers. While details differ among groups in terms of scientific missions, tuning targets, and tunable parameters, there is a core commonality of approaches. However, practices differ significantly on some key aspects, in particular, in the use of initialized forecast analyses as a tool, the explicit use of the historical transient record, and the use of the present-day radiative imbalance vs. the implied balance in the preindustrial era as a target.
Simulation has become an essential tool for understanding processes in the
Earth system, interpreting observations and for making predictions over short
(weather), medium (seasonal), and long (climate) terms. The complexity of
this system is evident in the myriad processes involved (such as the
microphysics of cloud nucleation, land surface heterogeneity, convective
plumes, and ocean mesoscale eddies) and in the dynamic views provided by
remote sensing. This complexity and wide range of scales that need to be
incorporated imply that simulations will necessarily include approximations
to well-understood physics and empirical formulations for unresolved effects.
The simulations are neither a straightforward encapsulation of some
well-known theory, nor are they laboratory experiments probing the real
world, though they have features of both
Since the pioneering work in climate modeling in the mid-20th century
It is worth expanding on why this matters: first, model development involves
expert judgments which are inevitably subjective, and with different choices
there would be differences in emergent responses. For instance, in the MPI
model,
Thus it has become increasingly clear that a more transparent process is
necessary. A survey of modeling groups involved in CMIP5
Climate and weather models consist of three levels of representation of
physical processes: fundamental physics (such as conservation of energy, mass
and momentum), approximations to well-known physical theories (the
discretization of the Navier–Stokes equations, broadband approximations to
line-by-line radiative transfer codes, etc.), and empirical approximations
(“parameterizations”) needed to match the phenomenology of unresolved or
poorly understood subgrid-scale or excluded processes
Parameters in climate models vary widely in their physical interpretation.
Some are well-determined physical values, such as the Coriolis parameter, the
acceleration due to gravity, or the Stefan–Boltzmann constant. Some, such
as reaction rates for chemical or microphysical processes, may be inferred
from laboratory or field measurements (with some uncertainty). Some emerge
from the construction of parameterizations but do not correspond directly to
well-defined physical processes, e.g., “erosion rates” for clouds
Individual parameterizations for a specific phenomenon are generally
calibrated to process-level data using high-resolution modeling and/or field
campaigns to provide constraints. For instance, boundary layer
parameterizations might be tuned to well-observed case studies such as in
Larcfrom
A number of parameters remain that are not strongly constrained by process-level
observations or theory but that nonetheless have large impacts on emergent properties of
the simulation. It is these additional degrees of freedom that are used to “tune” or
calibrate the emergent properties of the model against a selected set of target
observations. The decisions on what to tune, and especially what targets to tune for,
undoubtedly involve value judgments
Additionally, climate simulations depend not only on parameter choices within
an established model structure but also on the structural choices made in the
parameterization itself. Examples include experimentation with alternate
closures and triggers for the cumulus parameterization at GFDL during the
development of GFDL AM3
Targets for possible tuning fall into three classes. First there are targets that need to be
satisfied in order for useful numerical experiments to be performed in the first place (usually related
to the equilibration of model components with long timescales). The most
important of these is a requirement of near energy balance at the top of the atmosphere and surface
in an initial state of a coupled model to prevent temperature drifts over time. Strictly speaking this is not
tuning to an observed quantity, but rather is a tuning to a situation that was approximately
inferred to hold in the “preindustrial” (PI) period. Note that while the concept of a preindustrial
period is a little elusive
A second class of tuning targets are well-characterized climatological observations which might include
annual means, average seasonal cycles, or interannual variance. A third potential class are observations of
transient events (on daily to centennial scales) or trends. Some
observational targets have important (and sometimes unrecognized) structural uncertainties
and therefore any tuning to those targets risks over-fitting the model to imperfect data,
potentially reducing skill in “out-of-sample” predictions (those for which the evaluation
data either did not exist at the time of the prediction or were not used in model development
or tuning). This is a particular problem for transient observations such as estimates of early 20th-century temperature changes
Models equipped for data assimilation or that are used for operational forecasts have the additional possibility of tuning parameters to improve skill scores in those forecasts on multiple timescales – whether they be 6-hourly, daily, weekly, or even for many months for seasonal forecasts of, for instance, the state of the tropical Pacific.
We note here a distinction between fields that are closely monitored during
the model development process (many examples are given below) and specific
tuning targets. Monitored diagnostics tend to be complex emergent diagnostics
that do not depend in any simple way on adjustable parameters, and thus are
difficult (or impractical) to tune for. For example, note that the range of
preindustrial global temperatures in CMIP5 is [12.0,14.8]
The limitations of tuning are well known
Most discussions of tuning deal with explicit calibration of parameters to
match a target observation. However, analysis of the CMIP3 ensemble
Model selection can also act as an implicit form of tuning, even though this
might be seen by others as simple model development. In deciding between two
versions of a dynamical core or convection parameterizations, skill in El
Niño–Southern Oscillation (ENSO) variability or reductions of ocean drifts
may play an important role. Conceivably, a modeling center may decide not to
release or use a particular version because it fails to meet certain criteria
perceived to be essential, though more generally this will simply spur further
development. One candidate criterium would be a realistic simulation of the
20th century; however, the wide spread in 20th-century trends in the CMIP5
ensemble
Within climate models, there is always a choice as to whether to tune a specific component (such as the atmosphere, sea ice, land surface, or ocean) with tightly constrained boundary conditions or to tune the coupled model as a whole. In practice, both approaches are taken, though the relative importance and computation resources available vary across groups. Tuning components is generally fast and efficient, but does not necessarily prove robust when those components are coupled. However, coupled models take a very long time to equilibrate, and their quasi-stable states may be too far from the observed climate to be useful. Assuming that models conserve energy appropriately, all control runs will eventually drift to a quasi-steady state with a near-zero energy balance at the TOA and at the surface of the ocean. However, the realism of the final state is not guaranteed and, indeed, given the long time constants in the ocean, might require many thousands of years of integration to get to the wrong answer. Thus a balance must be struck between approaches.
Each of six US modeling centers described below have specific missions and foci that drive different aspects of their modeling. For instance, NASA GMAO and NCEP have operational data assimilation products for short-term weather, longer seasonal forecasts, and reanalyses that form the core of their tasks. NCAR CESM, GFDL, and NASA GISS have more long-term climate change issues at the forefront of their research, but each with different mandates – respectively, to be a community model, to advance NOAA's mission goal to understand and predict changes in climate, and to help interpret and use NASA remote sensing products. The DOE's Accelerated Climate Modeling for Energy (ACME) project has been tasked with a very specific role to serve DOE's energy planning and computational resource needs.
For each modeling group, we describe the principal targets and tuning
strategies for their atmosphere-only GCM (general circulation model), their coupled ocean–atmosphere GCM, and additional Earth system
components as relevant. The specific models referred to are described in
Table
Climate models discussed in the text.
The prototype version of DOE ACME v0 is closely related to the CESM. The
initial version ACME v1, currently under development, incorporates new ocean
and sea-ice components (Model for Prediction Across Scales: MPAS)
Tuning is performed iteratively at the component levels and on the fully
coupled system. Most of the component-level tuning takes place in the
atmosphere. The atmosphere is primarily tuned using short simulations (2 to
10 years) with climatological SSTs and sea-ice boundary conditions, either
for present-day (circa 2000) or preindustrial conditions. The tuning targets
a near-zero TOA radiation balance for 1850 by adjusting cloud-related
parameters. Overall simulation fidelity is another important aspect of the
tuning process, with the goal of minimizing errors in important climatological
fields such as sea level pressure, short- and longwave cloud radiative
effects, precipitation, near-surface land temperature, surface wind stress,
300 hPa zonal wind, aerosol optical depth, zonal mean temperature, and
relative humidity. The magnitude of the aerosol indirect effects is also
evaluated and adjusted if deemed to be inconsistent with the observed
historical warming (specifically if it has a magnitude greater than
1.5 W m
Most of the tuning is performed using the low-resolution atmosphere. However,
cloud parameterizations need to be retuned separately for the high-resolution
atmosphere. Because of the cost of the high-resolution atmosphere, it is more effective to use
short hindcast simulations
Tuning is also performed with the fully coupled system using perpetual
preindustrial or present-day forcing. Ocean and sea-ice initial conditions
are either from rest
In developing the GFDL atmospheric model AM3 and coupled model CM3, parameter
choices and some structural choices as to how to deploy parameterizations
were guided by multiple goals. In addition to choosing parameters within
plausible ranges suggested by observations, experiments, theory, or
higher-resolution modeling, these goals included simulating thermodynamic and
dynamical fields, as well as TOA regional shortwave and longwave fluxes, as
realistically as possible. The global and annual mean net TOA radiative flux
in integrations with specified, present-day (1981–2000) SSTs was tuned to a slight positive imbalance
(0.8 W m
The choices of a closure based on convective available potential energy
(CAPE) for the
Aspects of AM3 related to variability, including stationary wave patterns,
relationships between the Niño-3 index and regional precipitation,
relationships between the Northern Hemisphere annular mode and regional pressure
and temperature patterns, tropical cyclones, and the tropical wave spectrum, were
monitored during AM3 development
AM3 includes prognostic aerosols based on emissions, transport, chemical processes, and dry and wet removal. An important aerosol tuning parameter is the strength of wet scavenging. In-cloud condensate fractions were prescribed to provide a reasonable simulation of the global mean and regional distribution of aerosol optical depth. These condensate fractions maintain relative solubilities among the various aerosols in AM3.
AM3 is the first GFDL model to include cloud–aerosol interactions. At the
outset of this aspect of AM3 development, estimates of climate forcing by
cloud–aerosol interactions ranged to
The TOA RFP (see Sect. 2) was
monitored during AM3 development, as was the Cess climate sensitivity. A
configuration for which the ratio of the RFP to the Cess sensitivity was
about 15 % less than the value for AM2
The CM3 coupled model was initialized from present-day ocean conditions and
allowed to adjust to a preindustrial, quasi-steady state with a small TOA
energy imbalance (0.2–0.3 W m
Note that the above description applies only to AM3 and CM3.
For CM4
Tuning strategies in GISS ModelE2 are described in
Upon coupling the ocean and atmosphere models, there is an initial drift to a quasi-stable equilibrium which is judged on overall terms for realism, including the overall skill in the climatological metrics for zonal mean temperature, surface temperatures, sea level pressure, short- and longwave radiation fluxes, precipitation, lower stratospheric water vapor, and seasonal sea-ice extent. For the configuration to be acceptable, drifts have to be relatively small, and quasi-stable behavior of the North Atlantic meridional circulation and other ocean metrics, including the Antarctic Circumpolar Circulation, are required. While ENSO metrics are also monitored, they are not specifically tuned for. In practice, longer spin-up integrations help reduce drift, and the model state, once stabilized, can be assessed for suitability. Large drifts at the start of an integration have often been reduced by different tuning choices that either affect surface atmospheric fluxes or (more usually) ocean mixing.
Subsequent to CMIP5, further tuning exercises and development has occurred
for the production of the E2.1 version of the model. One important tuning
success was due to the adjustments made to the convection scheme in order to allow
for the simulation of the Madden–Julian Oscillation
Further fine tuning in the coupled models, for instance for the exact
global-mean surface temperature, is effectively precluded by the long spin-up
times and limited resources available. No tuning is done for climate
sensitivity or for performance in a simulation with transient forcing or
hindcasts. In transient simulations without an explicit aerosol indirect
effect, the aerosol indirect effect was preset to have a value of
In simulations with interactive atmospheric composition, there are two
specific tunings for ozone chemistry: the photolysis rate in the atmospheric
window region for incoming solar radiation and the temperature threshold for the
formation of polar stratospheric clouds (and hence the heterogeneous
chemistry associated with them)
For the E2.1 model and subsequent CMIP6 submissions, all tuning is being done with preindustrial and present-day fully interactive simulations (including chemistry and aerosols and indirect effects) and the noninteractive versions will use the composition derived from those simulations and the same tuning.
The Goddard Earth Observing System model is currently in use at the NASA GMAO at a wide range of resolutions and for a wide range of applications. The range of resolutions and applications for the atmospheric model includes global mesoscale simulations and forecasts at approximately 7 km, atmospheric data assimilation and forecasts at 12 km (with ensemble members running at 50 km), seasonal coupled atmosphere–ocean forecasts at approximately 50 km, present-day climate simulations at 100 km, and present-day coupled chemistry climate simulations at resolutions from 12 to 100 km. The tuning of the GEOS-5 AGCM physical parameterizations, therefore, is designed to allow the model to function across this range of uses and requires fidelity in many aspects of the simulation. The tuning also includes appropriate resolution dependence. Tuning targets differ among the many types of experiments that are conducted as part of the model validation suite. The tuning suite includes present-day (AMIP-style) climate simulations, “replay” experiments at different resolutions (similar to nudging towards a reanalysis), coupled atmosphere–ocean experiments, coupled atmosphere–chemistry simulations, short-term forecasts, and data assimilation experiments.
The tuning of the current version of the GEOS-5 AGCM is described in
The contribution of cloudy effects is approached by adjusting the parameters
that describe the cloud radiative effect (cloud particle size and
autoconversion rates). The clear-sky portion of the TOA fluxes is matched by
tuning the parameters that govern the mean atmospheric humidity and surface
albedo over ice-covered surfaces. The free atmosphere specific humidity is
quite sensitive to the “critical relative humidity” specified in the cloud
macrophysical scheme
The boreal winter mean circulation, compared to reanalyses (as seen by the 200 hPa eddy height or by the 300 hPa velocity potential), was found to be quite sensitive to the intensity of the hydrological cycle, largely dictated by the rates of re-evaporation or sublimation of rain and snow. These parameters are chosen so as to ensure agreement of the seasonal mean circulation with reanalysis, the seasonal mean precipitation with observations from GPCP and TRMM, and the agreement of the cloud radiative effects with CERES and with SRB at the surface. The behavior of the atmosphere–ocean coupled system is particularly sensitive to the geographical distribution of the surface shortwave cloud radiative forcing in the tropics.
Additional observations of aerosol optical depth (from MODIS) and other
chemical constituents (e.g., CO from the Tropospheric Emission Spectrometer:
TES) are used in the GMAO to validate the simulated turbulent and convective
transport. Data from MERRA-2, for example, include an aerosol assimilation to
assess errors in turbulent and convective transport
The GEOS-5 AGCM includes some resolution-dependent parameters that govern the
behavior of the moist processes. The two most important parameters that are
specified to change with resolution in an ad hoc manner are chosen based on
physical arguments and based on results from GEOS-5 global mesoscale
simulations. The first of these is the critical relative humidity for
condensation and evaporation, which accounts for
subgrid-scale variations of total water. Critical RH increases with
resolution based on the expectation and evidence from global mesoscale model
results that subgrid-scale variations of total water decrease with increasing
resolution
At the higher resolutions (25 km and better) the tuning parameters are chosen based on short-term forecasts and the behavior as part of the data assimilation system. Forecast skill scores, the fidelity of the spin-up of tropical cyclones and the innovation vector for data assimilation (observation-forecast statistics) are critical relevant metrics for new tuning choices, and any new choices of tuning parameters are evaluated with an ensemble of forecasts. The analysis increments during both data assimilation and replay experiments provide the key guidance for choosing the parameters to tune. Under the general assumption that the mean analysis increments indicate systematic errors in the model physics (which is not always valid), correlations between the tendency term from any individual physical parameterization and the analysis increment reveals errors due to the behavior of that parameterization, and parameters of that scheme are adjusted so as to minimize the mean analysis increments.
High-resolution forecasts are also evaluated and tuned based on comparisons with spatial and temporal variability of high-resolution top-of-the-atmosphere fluxes and radar-derived precipitation. As with the lower resolutions, the parameters which are adjusted to meet the tuning targets are the autoconversion rates, ice-fall rates, and the cloud droplet size. In addition to these parameters, high-resolution tuning also includes adjustments of the Tokioka limit and the timescale of adjustment in the convective parameterization. As an aside, we note that resolution decisions almost always affect tunings (and development), and the goal that parameterized physics or models can be independent of resolution, while a noble aim, is not yet a reality.
The ability to spin up tropical cyclones and match the correct track was
found to be quite sensitive to the magnitude of low-level drag. Based on
theoretical considerations and the results of laboratory experiments, the
model's function which relates surface stress to roughness height over the
oceans (the “Charnock coefficient”) was adjusted to decrease the drag at
high wind speeds and resulted in substantial improvements in the simulation
of tropical cyclones
In addition to the tuning based on physical reasoning and diagnosis of errors using comparisons with observations, some tuning choices are based on trial-and-error experimentation. These include parameters that govern the magnitude of the different types of surface drag (more drag increases forecast skill score) and the adjustment timescale of mid-latitude parameterized convection (more mid-latitude convection increases forecast skill score).
The suite of different types of experiments with the GEOS-5 GCM at different resolutions are run iteratively as part of the overall tuning process, and the result is a model which meets the variety of tuning targets described here. The trade-offs among the parameter choices to meet the different targets exist, and necessitate prioritization of the tuning targets, but in general this process results in a robust model that functions well in the various applications needed to fulfill the GMAO's goals and mission.
The Community Earth System model (
Tuning begins as a generally separate activity for each component within the working groups. During this initial phase of tuning, periodic preindustrial control coupled simulations are performed as a check on the impact of each components' developments to date on the whole coupled system, and to ensure features of the simulation have not significantly degraded.
The atmosphere model tuning strategy initially performs “stand-alone”
experiments using the AMIP protocol with interactive land and atmosphere
components and with prescribed observed SSTs and sea-ice distributions.
Initial development testing is performed using SSTs of the climatological
period centered around the year 2000 for 5–10-year periods. This length of
simulation is necessary due to the high Arctic variability. The first key measure of a simulation that will be appropriate to
the fully coupled simulation is the TOA energy balance. Estimates of the
observed present-day energy imbalance are on the order of
0.5–1.0 W m
In parallel to the atmosphere component activities, the ocean and ice working
groups perform equivalent “stand-alone experiments” with forcing provided
by multiple cycles of the CORE forcing protocol
Ideally, an atmosphere that was well-tuned in a configuration with SSTs, sea ice, and land conditions relevant to the preindustrial period would in principle translate well to a coupled system close to energy balance, i.e., with no net increase or decrease in energy into the whole coupled system. However, coupled-system biases in the surface distribution of SSTs and sea ice mean that tuning also needs to be performed in the fully coupled system.
Coupled model tuning brings together the individual fully active “tuned”
components and their associated working groups to perform a series of
preindustrial climate experiments. The same performance metrics that are
applied in atmosphere AMIP simulations apply to the coupled simulation,
namely top-of-atmosphere zero-energy imbalance. An equilibrium energy
imbalance is the most challenging task in coupled CESM tuning. The difficulty
lies in spin-up and drift of the system. Two ocean initialization approaches
are used. The first is to use an observed Levitus temperature and salinity
state with the ocean at rest. The second approach is to initialize from an
ocean state of a previously run simulation. This has the advantages of a
spun-up ocean state, and in particular the deep ocean, that is more
“familiar” with the overlying atmosphere component.
However, it is undesirable from the perspective of simulation provenance. A
combination of the two are used. If the equilibrium energy imbalance is
greater than 0.1–0.2 W m
For the coupled simulations to be considered successful, they have to satisfy
many of the requirements outlined above in addition to the dominant ENSO mode
of variability – also a very challenging task. For instance, the initial
implementation of more advanced convection parameterizations in CAM6 gave
rise to a degradation in ENSO performance, but with some tuning to those
schemes, ENSO performance skill was enhanced. Another example of coupled
issues that arose in constructing the CMIP6 version of the code were a
persistent cold bias and excessive sea ice in the Labrador Sea, which was
mitigated by more accurate routing of local river runoff. In previous
versions (such as CCSM4), there were evaluations of the coupled model in
historical transient mode, specifically of the September Arctic sea-ice trend
from 1979, which was improved after adjustments to the sea-ice albedo
formulation to affect the PI ice thickness
In recent history two fully coupled climate models have become operational at
NCEP, the CFS version 1
The daily verification skill scores are the dominant source for tracking model improvement. This is a powerful target for tuning which confronts the model with real-time observations in evolving data assimilation systems and then verifying the forecasts, from the initial conditions provided by these data assimilation systems, with independent observations.
A new CFS is built by taking a snapshot of the latest state-of-the-art GFS as
its atmospheric component, along with state-of-the-art ocean, sea-ice, and
land models which are available at that time. In developing CFSv1 in 2002, a
“large” (
Having achieved some success in the prediction of ENSO in seasonal forecasts
up to 9 months ahead in CFSv1, the goal for CFSv2 was to tackle subseasonal
predictions, mainly of the MJO in the tropics. Prediction of the MJO from
5 days was successfully extended to nearly 21 days by improving model physics
and having a high-resolution state-of-the-art data assimilation system to
assimilate direct satellite radiance data. Also, greenhouse gas
concentration changes were implemented in the NCEP forecast system. While the
NCEP focus is short-term (seasonal) climate prediction, it has been
recognized that even for these predictions, the forecast needs to be warmer
than a “normal” that, by necessity, is based on past data. The increase in
GHGs also played an important role in improving the data assimilation of
satellite radiance data. Each satellite over the 1979–present history was
calibrated using GHG concentrations observed at the time these satellites were
operational. The result was a reasonable upward temperature trend over the
1979–present period, much better than at the time of CFSv1, when the upward
trend over land was brought about only by the warming in initial global ocean
conditions. As described in
Development is now underway for the next model, CFSv3. NCEP/EMC has a strategic plan to unify the global forecast systems and develop a Unified Global Coupled System (UGCS) for both weather and seasonal climate prediction. This system will have six fully coupled model components, namely the atmosphere, ocean, sea ice, land, waves and aerosols. It will also have a strongly coupled data assimilation system in each of these six components.
As might be expected, the broad picture of tuning across the climate model groups is consistent. The key adjustable parameters are those associated with uncertain and poorly constrained processes such as clouds, convection, gravity wave drag, and ocean-mixing parameters. Common too are the broad array of targets against which skill of the models are judged, e.g., the TOA shortwave and longwave radiation, 500 hPa geopotential height, surface temperatures, sea level pressure, and precipitation. However, it is also abundantly clear that the procedures at each group are quite distinct and can reasonably be surmised to reflect different scientific priorities and missions and thus will produce different outcomes.
The model groups also differ in whether they focus on preindustrial
conditions or present-day simulations. The former has the benefit of being
closer to climate stability, while the latter has substantially more
observational data. The groups focusing on the preindustrial are judging
(mostly correctly) that the errors in the control simulation (whether run for
preindustrial or present-day periods) are larger than the trends between those periods.
A stark difference does exist between the models that have operational data
assimilation products (NCEP and GMAO) and those that do not. The ability to
assess improvements in fast physics based on short forecasts is an excellent
resource that, even if the climate models were not run operationally in this
way, should become a more widely used methodology
There are also some clear commonalities in approaches. All groups focus on atmospheric models at first either in an AMIP-style mode (annually varying modern SST and sea ice), using a climatological approach (decadal mean observed ocean conditions and forcings), or in weather forecast mode. Tunings for atmospheric composition and key atmospheric diagnostics use these experiments which have the advantage of fast equilibration times and reduced computational load. Tunings for ocean components can be done in stand-alone experiments, but often are done within the full coupled framework, with at least some model groups tuning sea-ice and ocean-mixing parameterizations to produce acceptable sea-ice cover and ocean circulation metrics.
Use of historical period trends and imbalances during the tuning process.
Because of the high importance and visibility of climate models' simulation of the historical period (PI to PD), model groups have to be particularly clear in how information that reflects the ongoing trends in temperature and ocean heat content have been used in the tuning process.
The descriptions above suggest that increasing knowledge over time about the current radiative
imbalance has clearly influenced model development. Developers prior to CMIP3 (circa 2004)
had a general expectation that net radiative forcing over the 20th century was positive,
but they were not able to use a specific value for the present-day energy imbalance
because oceanic analyses were not accurate enough: compare
We summarize the results in Table
As discussed above, the radiative imbalance can be affected in two ways: by
adjusting internal parameters (mostly associated with clouds) and/or by
using a different historical forcing. Four models adjust their historical
aerosol forcing: GISS, though only in its noninteractive runs, aims for an
indirect aerosol forcing of
At least three of the model groups discussed here find a difference between
the energy imbalance using year 2000 forcings together with observed SST and
sea ice, and the transient coupled simulations for the same time period and
forcings. However, the differences in how this calculation is done can be
important, and the implications for the coupled model simulations are unclear.
For example in the GISS-E2 model, the decadal mean imbalance (1996–2005) in
AMIP simulations, including all forcings and annually varying observed SST
and sea ice, is 1.25 W m
As models are continually evaluated at the process-level against an increasing number of observations, analyses often show that existing parameterizations lack enough flexibility to represent the coupling between the subgrid scale and the environment in all relevant climate regimes. The response is often to increase the complexity of a parameterization, which comes at the cost of an increased number of tunable parameters. With that increase, the challenges faced by the developers also rise, as does the potential for “local minima” to occur, i.e., different parameter combinations have similarly good agreement according to standard GCM validation metrics (e.g., Taylor diagrams, climate state mean biases, spatial correlations).
If these distinct and separate volumes of tuning parameter space lead to simulations that exhibit similarly good agreement with observations, there is no clear scientific reason to prefer one over another. But will our decisions on parameter combinations today have a noticeable impact on the simulated climate several centuries from now or to climate sensitivity more broadly? Specifically, does choosing different local minima in parameter phase space “matter”?
Code availability at each center.
With more combinations, is there room for improving regional biases in
simulations while simultaneously making the tuning process more automated?
These questions have motivated an effort, using the GISS model as a test bed,
for developing a more robust framework for assessing the true existence of
local minima in a multidimensional space (see also
More generally, the large variety of approaches demonstrated among just these
six models indicates that the documentation of tuning procedures across a
multimodel ensemble like CMIP6 will be quite challenging. What role should
the degree of tuning matter when assessing the coupled model skill? Should
simulations be up-weighted in the ensemble because of a closer climatology to
observations, or down-weighted because this is partly due to accommodation?
Should models that are tuned differently but have similar physics be treated
as independent or not?
At the minimum, we recommend that all future model description papers (or
systematic documentation projects such as ES-DOC
While we have only discussed tuning in the context of historical and modern
simulations, it is vital to assess the credibility of models by examining
their performance in out-of-sample situations. This is easy for the models
with an operational weather forecast mode (at least for some aspects of the
climate system), and participation in paleoclimate model tests by NCAR and
GISS are also invaluable. Medium-term climate forecasts based on anticipated
changes in forcings (such as the eruption of Mount Pinatubo (1991) or the rise
in greenhouse gases) have been shown to have skill
The availability of code for the models discussed in
this paper is laid out in Table
The authors declare that they have no conflict of interest.
Climate modeling at GISS and GMAO is supported by the NASA Modeling, Analysis and Prediction program and resources supporting this work were provided by the NASA High-End Computing (HEC) Program through the NASA Center for Climate Simulation (NCCS) at Goddard Space Flight Center. Work at LLNL was supported by the U.S. Department of Energy, Office of Science E3SM project under contract DE-AC52-07NA27344. NCEP is a division of the National Weather Service in NOAA within the Department of Commerce. Discussions at the US Climate Modeling Summit, convened by USGCRP in February 2016, were instrumental for putting this paper together. Manuscript reviews by Larry Horowitz and Levi Silvers (GFDL) are appreciated. We would like to thank Steve Sherwood and an anonymous reviewer for constructive comments on the discussion version of the paper.Edited by: James Annan Reviewed by: Steven Sherwood and one anonymous referee