Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization

. By coordinating the design and distribution of global climate model simulations of the past, current, and future climate, the Coupled Model Intercomparison Project (CMIP) has become one of the foundational elements of climate science. However, the need to address an ever-expanding range of scientiﬁc questions arising from more and more research communities has made it necessary to re-vise the organization of CMIP. After a long and wide community consultation, a new and more federated structure has been put in place. It consists of three major elements: (1) a handful of common experiments,


Overview of the Coupled Model Intercomparison Project
Phase 6 (CMIP6) experimental design and organization Veronika Eyring, Sandrine Bony, Gerald A. Meehl, Catherine A. Senior, Bjorn Stevens, Ronald J. Stouffer, Karl E. Taylor To cite this version: Veronika Eyring, Sandrine Bony, Gerald A. Meehl, Catherine A. Senior, Bjorn Stevens, et al.. Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geoscientific Model Development, European Geosciences Union, 2016, 9 (5), pp.1937-1958. <10.5194/gmd-9-1937. <hal-01339069> Abstract. By coordinating the design and distribution of global climate model simulations of the past, current, and future climate, the Coupled Model Intercomparison Project (CMIP) has become one of the foundational elements of climate science. However, the need to address an everexpanding range of scientific questions arising from more and more research communities has made it necessary to revise the organization of CMIP. After a long and wide community consultation, a new and more federated structure has been put in place. It consists of three major elements: (1) a handful of common experiments, the DECK (Diagnostic, Evaluation and Characterization of Klima) and CMIP historical simulations (1850-near present) that will maintain continuity and help document basic characteristics of models across different phases of CMIP; (2) common standards, coordination, infrastructure, and documentation that will facilitate the distribution of model outputs and the characterization of the model ensemble; and (3) an ensemble of CMIP-Endorsed Model Intercomparison Projects (MIPs) that will be specific to a particular phase of CMIP (now CMIP6) and that will build on the DECK and CMIP historical simulations to address a large range of specific questions and fill the scientific gaps of the previous CMIP phases. The DECK and CMIP historical simulations, together with the use of CMIP data standards, will be the entry cards for models participating in CMIP. Participation in CMIP6-Endorsed MIPs by individual modelling groups will be at their own discretion and will depend on their scientific interests and priorities. With the Grand Science Challenges of the World Climate Research Programme (WCRP) as its scientific backdrop, CMIP6 will address three broad questions: -How does the Earth system respond to forcing?
-What are the origins and consequences of systematic model biases?
-How can we assess future climate changes given internal climate variability, predictability, and uncertainties in scenarios?

Introduction
The Coupled Model Intercomparison Project (CMIP) organized under the auspices of the World Climate Research Programme's (WCRP) Working Group on Coupled Modelling (WGCM) started 20 years ago as a comparison of a handful of early global coupled climate models performing experiments using atmosphere models coupled to a dynamic ocean, a simple land surface, and thermodynamic sea ice (Meehl et al., 1997). It has since evolved over five phases into a major international multi-model research activity (Meehl et al., 2000(Meehl et al., , 2007Taylor et al., 2012) that has not only introduced a new era to climate science research but has also become a central element of national and international assessments of climate change (e.g. IPCC, 2013). An important part of CMIP is to make the multi-model output publicly available in a standardized format for analysis by the wider climate community and users. The standardization of the model output in a specified format, and the collection, archival, and access of the model output through the Earth System Grid Federation (ESGF) data replication centres have facilitated multi-model analyses.
The objective of CMIP is to better understand past, present, and future climate change arising from natural, unforced variability or in response to changes in radiative forcings in a multi-model context. Its increasing importance and scope is a tremendous success story, but this very success poses challenges for all involved. Coordination of the project has become more complex as CMIP includes more models with more processes all applied to a wider range of questions. To meet this new interest and to address a wide variety of science questions from more and more scientific research communities, reflecting the expanding scope of comprehensive modelling in climate science, has put pressure on CMIP to become larger and more extensive. Consequently, there has been an explosion in the diversity and volume of requested CMIP output from an increasing number of experiments causing challenges for CMIP's technical infrastructure (Williams et al., 2015). Cultural and organizational challenges also arise from the tension between expectations that modelling centres deliver multiple model experiments to CMIP yet at the same time advance basic research in climate science.
In response to these challenges, we have adopted a more federated structure for the sixth phase of CMIP (i.e. CMIP6) and subsequent phases. Whereas past phases of CMIP were usually described through a single overview paper, reflecting a centralized and relatively compact CMIP structure, this GMD special issue describes the new design and organization of CMIP, the suite of experiments, and its forcings, in a series of invited contributions. In this paper, we provide the overview and backdrop of the new CMIP structure as well as the main scientific foci that CMIP6 will address. We begin by describing the new organizational form for CMIP and the pressures that it was designed to alleviate (Sect. 2). It also contains a description of a small set of simulations for CMIP which are intended to be common to all participating models (Sect. 3), details of which are provided in the Appendix. We then present a brief overview of CMIP6 that serves as an introduction to the other contributions to this special issue (Sect. 4), and we close with a summary.

CMIP design -a more continuous and distributed organization
In preparing for CMIP6, the CMIP Panel (the authors of this paper), which traditionally has the responsibility for direct coordination and oversight of CMIP, initiated a 2-year process of community consultation. This consultation involved the modelling centres whose contributions form the substance of CMIP as well as communities that rely on CMIP model output for their work. Special meetings were organized to reflect on the successes of CMIP5 as well as the scientific gaps that remain or have since emerged. The consultation also sought input through a community survey, the scientific results of which are described by Stouffer et al. (2015). Four main issues related to the overall structure of CMIP were identified. First, we identified a growing appreciation of the scientific potential to use results across different CMIP phases. Such approaches, however, require an appropriate experimental design to facilitate the identification of an ensemble of models with particular properties drawn from different phases of CMIP (e.g. Rauser et al., 2014). At the same time, it was recognized that an increasing number of Model Intercomparison Projects (MIPs) were being organized independent of CMIP, the data structure and output requirements were often inconsistent, and the relationship between the models used in the various MIPs was often difficult to determine, in which context measures to help establish continuity across MIPs or phases of CMIP would also be welcome.
Second, the scope of CMIP was taxing the resources of modelling centres making it impossible for many to consider contributing to all the proposed experiments. By providing a better basis to help modelling centres decide exactly which subset of experiments to perform, it was thought that it might be possible to minimize fragmented participation in CMIP6. A more federated experimental protocol could also encourage modelling centres to develop intercomparison studies based on their own strategic goals.
Third, some centres expressed the view that the punctuated structure of CMIP had begun to distort the model development process. Defining a protocol that allowed modelling centres to decouple their model development from the CMIP schedule would offer additional flexibility, and perhaps encourage modelling centres to finalize their models and submit some of their results sooner on their own schedule.
Fourth and finally, many groups expressed a desire for particular phases of CMIP to be more than just a collection of Geosci. Model Dev., 9,2016 www.geosci-model-dev.net/9/1937/2016/ Figure 1. CMIP evolution. CMIP will evolve but the DECK will provide continuity across phases.
MIPs, but rather to reflect the strategic goals of the climate science community as, for instance, articulated by WCRP. By focusing a particular phase of CMIP around specific scientific issues, it was felt that the modelling resources could be more effectively applied to those scientific questions that had matured to a point where coordinated activities were expected to have substantial impact. A variety of mechanisms were proposed and intensely debated to address these issues. The outcome of these discussions is embodied in the new CMIP structure, which has three major components. First, the identification of a handful of common experiments, the Diagnostic, Evaluation and Characterization of Klima (DECK) experiments (klima is Greek for "climate"), and CMIP historical simulations, which can be used to establish model characteristics and serves as its entry card for participating in one of CMIP's phases or in other MIPs organized between CMIP phases, as depicted in Fig. 1. Second, common standards, coordination, infrastructure, and documentation that facilitate the distribution of model outputs and the characterization of the model ensemble, and third, the adoption of a more federated structure, building on more autonomous CMIP-Endorsed MIPs.
Realizing the idea of a particular phase of CMIP being centred on a collection of more autonomous MIPs required the development of procedures for soliciting and evaluating MIPs in light of the scientific focus chosen for CMIP6. These procedures were developed and implemented by the CMIP Panel. The responses to the CMIP5 survey helped inform a series of workshops and resulted in a draft experiment design for CMIP6. This initial design for CMIP6 was published in early 2014 (Meehl et al., 2014) and was open for comments from the wider community until mid-September 2014. In parallel to the open review of the design, the CMIP Panel distributed an open call for proposals for MIPs in April 2014. These proposals were broadly reviewed within WCRP with the goal to encourage and enhance synergies among the different MIPs, to avoid overlapping experiments, to fill gaps, and to help ensure that the WCRP Grand Science Challenges would be addressed. Revised MIP proposals were requested and evaluated by the CMIP Panel in summer 2015. The selection of MIPs was based on the CMIP Panel's evaluation of ten endorsement criteria (Table 1). To ensure community engagement, an important criterion was that enough modelling groups (at least eight) were willing to perform all of the MIP's highest priority (Tier 1) experiments and providing all the requested diagnostics needed to answer at least one of its leading science questions. For each of the selected CMIP6-Endorsed MIPs it turned out that at least ten mod- The MIP and its experiments address at least one of the key science questions of CMIP6. 2 The MIP demonstrates connectivity to the DECK experiments and the CMIP6 historical simulations. 3 The MIP adopts the CMIP modelling infrastructure standards and conventions. 4 All experiments are tiered, well defined, and useful in a multi-model context and do not overlap with other CMIP6 experiments. 5 Unless a Tier 1 experiment differs only slightly from another well-established experiment, it must already have been performed by more than one modelling group. 6 A sufficient number of modelling centres ( ∼ 8) are committed to performing all of the MIP's Tier 1 experiments and providing all the requested diagnostics needed to answer at least one of its science questions. 7 The MIP presents an analysis plan describing how it will use all proposed experiments, any relevant observations, and specially requested model output to evaluate the models and address its science questions. 8 The MIP has completed the MIP template questionnaire. 9 The MIP contributes a paper on its experimental design to the GMD CMIP6 special issue. 10 The MIP considers reporting on the results by co-authoring a paper with the modelling groups.
elling groups indicated their intent to participate in Tier 1 experiments at least, thus attesting to the wide appeal and level of science interest from the climate modelling community.

The DECK and CMIP historical simulations
The DECK comprises four baseline experiments: (a) a historical Atmospheric Model Intercomparison Project (amip) simulation, (b) a pre-industrial control simulation (piControl or esm-piControl), (c) a simulation forced by an abrupt quadrupling of CO 2 (abrupt-4×CO2) and (d) a simulation forced by a 1 % yr −1 CO 2 increase (1pctCO2). CMIP also includes a historical simulation (historical or esm-hist) that spans the period of extensive instrumental temperature measurements from 1850 to the present. In naming the experiments, we distinguish between simulations with CO 2 concentrations calculated and anthropogenic sources of CO 2 prescribed (esm-piControl and esm-hist) and simulations with prescribed CO 2 concentrations (all others). Hereafter, models that can calculate atmospheric CO 2 concentration and account for the fluxes of CO 2 between the atmosphere, the ocean, and biosphere are referred to as Earth System Models (ESMs).
The DECK experiments are chosen (1) to provide continuity across past and future phases of CMIP, (2) to evolve as little as possible over time, (3) to be well established, and incorporate simulations that modelling centres perform anyway as part of their own development cycle, and (4) to be relatively independent of the forcings and scientific objectives of a specific phase of CMIP. The four DECK experiments and the CMIP historical simulations are well suited for quantifying and understanding important climate change response characteristics. Modelling groups also commonly perform simulations of the historical period, but reconstructions of the external conditions imposed on historical runs (e.g. land-use changes) continue to evolve significantly, influencing the simulated climate. In order to distinguish among the historical simulations performed under different phases of CMIP, the historical simulations are labelled with the phase (e.g. "CMIP5 historical" or "CMIP6 historical"). A similar argument could be made to exclude the AMIP experiments from the DECK. However, the AMIP experiments are simpler, more routine, and the dominating role of sea surface temperatures and the focus on recent decades means that for most purposes AMIP experiments from different phases of CMIP are more likely to provide the desired continuity.
The persistence and consistency of the DECK will make it possible to track changes in performance and response characteristics over future generations of models and CMIP phases. Although the set of DECK experiments is not expected to evolve much, additional experiments may become enough well established as benchmarks (routinely run by modelling groups as they develop new model versions) so that in the future they might be migrated into the DECK. The common practice of including the DECK in model development efforts means that models can contribute to CMIP without carrying out additional computationally burdensome experiments. All of the DECK and the historical simulations were included in the core set of experiments performed under CMIP5 (Taylor et al., 2012), and all but the abrupt-4×CO2 simulation were included in even earlier CMIP phases.
Under CMIP, credentials of the participating atmosphereocean general circulation models (AOGCMs) and ESMs are established by performing the DECK and CMIP historical simulations, so these experiments are required from all models. Together these experiments document the mean climate and response characteristics of models. They should be run for each model configuration used in a CMIP-Endorsed MIP. A change in model configuration includes any change that might affect its simulations other than noise expected from Geosci. Model Dev., 9,2016 www.geosci-model-dev.net/9/1937/2016/ different realizations. This would include, for example, a change in model resolution, physical processes, or atmospheric chemistry treatment. If an ESM is used in both CO 2emission-driven mode and CO 2 -concentration-driven mode in subsequent CMIP6-Endorsed MIPs, then both emissiondriven and concentration-driven control, and historical simulations should be done and they will be identical in all forcings except the treatment of CO 2 .
The forcing data sets that will drive the DECK and CMIP6 historical simulations are described separately in a series of invited contributions to this special issue. These articles also include some discussion of uncertainty in the data sets. The data will be provided by the respective author teams and made publicly available through the ESGF using common metadata and formats.
The historical forcings are based as far as possible on observations and cover the period 1850-2014. These include: emissions of short-lived species and long-lived greenhouse gases (GHGs), -GHG concentrations, global gridded land-use forcing data sets, solar forcing, stratospheric aerosol data set (volcanoes), -AMIP sea surface temperatures (SSTs) and sea ice concentrations (SICs), for simulations with prescribed aerosols, a new approach to prescribe aerosols in terms of optical properties and fractional change in cloud droplet effective radius to provide a more consistent representation of aerosol forcing, and for models without ozone chemistry, time-varying gridded ozone concentrations and nitrogen deposition.
Some models might require additional forcing data sets (e.g. black carbon on snow or anthropogenic dust). Allowing model groups to use different forcing 1 data sets might better sample uncertainty, but makes it more difficult to assess the uncertainty in the response of models to the best estimate of the forcing, available to a particular CMIP phase. To avoid conflating uncertainty in the response of models to a given forcing, it is strongly preferred for models to be integrated with the same forcing in the entry card historical simulations, and for forcing uncertainty to be sampled in supplementary 1 Here, we distinguish between an applied input perturbation (e.g. the imposed change in some model constituent, property, or boundary condition), which we refer to somewhat generically as a "forcing", and radiative forcing, which can be precisely defined. Even if the forcings are identical, the resulting radiative forcing depends on a model's radiation scheme (among other factors) and will differ among models. simulations that are proposed as part of DAMIP. In any case it is important that all forcing data sets are documented and are made available alongside the model output on the ESGF. Likewise to the extent modelling centres simplify forcings, for instance by regridding or smoothing in time or some other dimension, this should also be documented.
For the future scenarios selected by ScenarioMIP, forcings are provided by the integrated assessment model (IAM) community for the period 2015-2100 (or until 2300 for the extended simulations). For atmospheric emissions and concentrations as well as for land use, the forcings are harmonized across IAMs and scenarios using a similar procedure as in CMIP5 (van Vuuren et al., 2011). This procedure ensures consistency with historical forcing data sets and between the different forcing categories. The selection of scenarios and the main characteristics are described elsewhere in this special issue, while the underlying IAM scenarios are described in a special issue in Global Environmental Change.
An important gap identified in CMIP5, and in previous CMIP phases, was a lack of careful quantification of the radiative forcings from the different specified external forcing factors (e.g. GHGs, sulphate aerosols) in each model (Stouffer et al., 2015). This has impaired attempts to identify reasons for differences in model responses. The effective radiative forcing or ERF component of the Radiative Forcing MIP (RFMIP) includes fixed SST simulations to diagnose the forcing (RFMIP-lite), which are further detailed in the corresponding contribution to this special issue. Although not included as part of the DECK, in recognition of this deficiency in past phases of CMIP we strongly encourage all CMIP6 modelling groups to participate in RFMIP-lite. The modest additional effort would enable the radiative forcing to be characterized for both historic and future scenarios across the model ensemble. Knowing this forcing would lead to a step change in efforts to understand the spread of model responses for CMIP6 and contribute greatly to answering one of CMIP6's science questions.
An overview of the main characteristics of the DECK and CMIP6 historical simulations appears in Table 2. Here we briefly describe these experiments. Detailed specifications for the DECK and CMIP6 historical simulations are provided in Appendix A and are summarized in Table A1.

The DECK
The AMIP and pre-industrial control simulations of the DECK provide opportunities for evaluating the atmospheric model and the coupled system, and in addition they establish a baseline for performing many of the CMIP6 experiments. Many experiments branch from, and are compared with, the pre-industrial control. Similarly, a number of diagnostic atmospheric experiments use AMIP as a control. The idealized CO 2 -forced experiments in the DECK (abrupt-4×CO2 and 1pctCO2), despite their simplicity, can reveal fundamental forcing and feedback response characteristics of models. Table 2. Overview of DECK and CMIP6 historical simulations providing the experiment short names, the CMIP6 labels, brief experiment descriptions, the forcing methods, as well as the start and end year and minimum number of years per experiment and its major purpose. The DECK and CMIP6 historical simulation are used to characterize the CMIP model ensemble. Given resource limitations, these entry card simulations for CMIP include only one ensemble member per experiment. However, we strongly encourage model groups to submit at least three ensemble members for the CMIP historical simulation as requested in DAMIP. Large ensembles of AMIP simulations are also encouraged. In the "forcing methods" column, "All" means "volcanic, solar, and anthropogenic forcings". All experiments are started on 1 January and end on 31 December of the specified years. For nearly 3 decades, AMIP simulations (Gates et al., 1999) have been routinely relied on by modelling centres to help in the evaluation of the atmospheric component of their models. In AMIP simulations, the SSTs and SICs are prescribed based on observations. The idea is to analyse and evaluate the atmospheric and land components of the climate system when they are constrained by the observed ocean conditions. These simulations can help identify which model errors originate in the atmosphere, land, or their interactions, and they have proven useful in addressing a great variety of questions pertaining to recent climate changes. The AMIP simulations performed as part of the DECK cover at least the period from January 1979 to December 2014. The end date will continue to evolve as the SSTs and SICs are updated with new observations. Besides prescription of ocean conditions in these simulations, realistic forcings are imposed that should be identical to those applied in the CMIP historical simulations. Large ensembles of AMIP simulations are encouraged as they can help to improve the signal-to-noise ratio (Li et al., 2015).
The remaining three experiments in the DECK are premised on the coupling of the atmospheric and oceanic circulation. The pre-industrial control simulation (piControl or esm-piControl) is performed under conditions chosen to be representative of the period prior to the onset of large-scale industrialization, with 1850 being the reference year. Historically, the industrial revolution began in the 18th century, and in nature the climate in 1850 was not stable as it was already changing due to prior historical changes in radiative forcings. In CMIP6, however, as in earlier CMIP phases, the control simulation is an attempt to produce a stable quasiequilibrium climate state under 1850 conditions. When discussing and analysing historical and future radiative forcings, it needs to be recognized that the radiative forcing in 1850 due to anthropogenic greenhouse gas increases alone was already around 0.25 W m −2 (Cubasch, 2013) although aerosols might have offset that to some extent. In addition, there were other pre-1850 secular changes, for example, in land use , and as a result, global net annual emissions of carbon from land use and land-use change already were responsible in 1850 for about 0.6 Pg C yr −1 (Houghton, 2010). Under the assumptions of the control simulation, however, there are no secular changes in forcing, so the concentrations and/or sources of atmospheric constituents (e.g. GHGs and emissions of short-lived species) as well as land use are held fixed, as are Earth's orbital characteristics. Because of the absence of both naturally occurring changes in forcing (e.g. volcanoes, orbital or solar changes) and human-induced changes, the control simulation can be used to study the unforced internal variability of the climate system.
An initial climate spin-up portion of a control simulation, during which the climate begins to come into balance with the forcing, is usually performed. At the end of the spin-up period, the piControl starts. The piControl serves as a baseline for experiments that branch from it. To account for the effects of any residual drift, it is required that the piControl simulation extends as far beyond the branching point as any experiment to which it will be compared. Only then can residual climate drift in an experiment be removed so that it is not misinterpreted as part of the model's forced response. The recommended minimum length for the piControl is 500 years.
The two DECK climate change experiments branch from some point in the 1850 control simulation and are designed to document basic aspects of the climate system response to greenhouse gas forcing. In the first, the CO 2 concentration is immediately and abruptly quadrupled from the global annual mean 1850 value that is used in piControl. This abrupt-4×CO2 simulation has proven to be useful for characterizing the radiative forcing that arises from an increase in atmospheric CO 2 as well as changes that arise indirectly due to the warming. It can also be used to estimate a model's equilibrium climate sensitivity (ECS, Gregory et al., 2004). In the second, the CO 2 concentration is increased gradually at a rate of 1 % per year. This experiment has been performed in all phases of CMIP since CMIP2, and serves as a consistent and useful benchmark for analysing model transient climate response (TCR). The TCR takes into account the rate of ocean heat uptake which governs the pace of all timeevolving climate change (e.g. Murphy and Mitchell, 1995). In addition to the TCR, the 1 % CO 2 integration with ESMs that include explicit representation of the carbon cycle allows the calculation of the transient climate response to cumulative carbon emissions (TCRE), defined as the transient global average surface temperature change per unit of accumulated CO 2 emissions (IPCC, 2013). Despite their simplicity, these experiments provide a surprising amount of insight into the behaviour of models subject to more complex forcing (e.g. Bony et al., 2013;Geoffroy et al., 2013).

CMIP historical simulations
In addition to the DECK, CMIP requests models to simulate the historical period, defined to begin in 1850 and extend to the near present. The CMIP historical simulation and its CO 2 -emission-driven counterpart, esm-hist, branch from the piControl and esm-piControl, respectively (see details in Sect. A1.2). These simulations are forced, based on observations, by evolving, externally imposed forcings such as solar variability, volcanic aerosols, and changes in atmospheric composition (GHGs and aerosols) caused by human activities. The CMIP historical simulations provide rich opportunities to assess model ability to simulate climate, includ-ing variability and century timescale trends (e.g. Flato et al., 2013). These simulations can also be analysed to determine whether climate model forcing and sensitivity are consistent with the observational record, which provides opportunities to better bound the magnitude of aerosol forcing (e.g. Stevens, 2015). In addition they, along with the control run, provide the baseline simulations for performing formal detection and attribution studies (e.g. Stott et al., 2006) which help uncover the causes of forced climate change.
As with performing control simulations, models that include representation of the carbon cycle should normally perform two different CMIP historical simulations: one with prescribed CO 2 concentration and the other with prescribed CO 2 emissions (accounting explicitly for fossil fuel combustion). In the second, CO 2 concentrations are predicted by the model. The treatment of other GHGs should be identical in both simulations. Both types of simulation are useful in evaluating how realistically the model represents the response of the carbon cycle anthropogenic CO 2 emissions, but the prescribed concentration simulation enables these more complex models to be evaluated fairly against those models without representation of carbon cycle processes.

Common standards, infrastructure, and documentation
A key to the success of CMIP and one of the motivations for incorporating a wide variety of coordinated modelling activities under a single framework in a specific phase of CMIP (now CMIP6) is the desire to reduce duplication of effort, minimize operational and computational burdens, and establish common practices in producing and analysing large amounts of model output. To enable automated processing of output from dozens of different models, CMIP has led the way in encouraging adoption of data standards (governing structure and metadata) that facilitate development of software infrastructure in support of coordinated modelling activities. The ESGF has capitalized on this standardization to provide access to CMIP model output hosted by institutions around the world. As the complexity of CMIP has increased and as the potential use of model output expands beyond the research community, the evolution of the climate modelling infrastructure requires enhanced coordination. To help in this regard, the WGCM Infrastructure Panel (WIP) was set up, and is now providing guidance on requirements and establishing specifications for model output, model and simulation documentation, and archival and delivery systems for CMIP6 data. In parallel to the development of the CMIP6 experiment design, the ESGF capabilities are being further extended and improved. In CMIP5, with over 1,000 different model/experiment combinations, a first attempt was also made to capture structured metadata describing the models and the simulations themselves. Based upon the Common Information Model (CIM, Lawrence et al., 2012), tools were provided to capture documentation of models and simula-tions. This effort is now continuing under the banner of the international ES-DOC activity, which establishes agreements on common Controlled Vocabularies (CVs) to describe models and simulations. Modelling groups will be required to provide documentation following a common template and adhering to the CVs. With the documentation recorded uniformly across models, researchers will, for example, be able to use web-based tools to determine differences in model versions and differences in forcing and other conditions that affect each simulation. Further details on the CMIP6 infrastructure can be found in the WIP contribution to this special issue. A more routine benchmarking and evaluation of the models is envisaged to be a central part of CMIP6. As noted above, one purpose of the DECK and CMIP historical simulations is to provide a basis for documenting model simulation characteristics. Towards that end an infrastructure is being developed to allow analysis packages to be routinely executed whenever new model experiments are contributed to the CMIP archive at the ESGF. These efforts utilize observations served by the ESGF contributed from the obs4MIPs Teixeira et al., 2014) and ana4MIPs projects. Examples of available tools that target routine evaluation in CMIP include the PCMDI metrics software  and the Earth System Model Evaluation Tool (ESMValTool, Eyring et al., 2016), which brings together established diagnostics such as those used in the evaluation chapter of IPCC AR5 (Flato et al., 2013). The ESMValTool also integrates other packages, such as the NCAR Climate Variability Diagnostics Package (Phillips et al., 2014), or diagnostics such as the cloud regime metric (Williams and Webb, 2009) developed by the Cloud Feedback MIP (CFMIP) community. These tools can be used to broadly and comprehensively characterize the performance of the wide variety of models and model versions that will contribute to CMIP6. This evaluation activity can, compared with CMIP5, more quickly inform users of model output, as well as the modelling centres, of the strengths and weaknesses of the simulations, including the extent to which long-standing model errors remain evident in newer models. Building such a community-based capability is not meant to replace how CMIP research is currently performed but rather to complement it. These tools can also be used to compute derived variables or indices alongside the ESGF, and their output could be provided back to the distributed ESGF archive.

Scientific focus of CMIP6
In addition to the DECK and CMIP historical simulations, a number of additional experiments will colour a specific phase of CMIP, now CMIP6. These experiments are likely to change from one CMIP phase to the next. To maximize the relevance and impact of CMIP6, it was decided to use the WCRP Grand Science Challenges (GCs) as the scientific backdrop of the CMIP6 experimental design. By promoting research on critical science questions for which specific gaps in knowledge have hindered progress so far, but for which new opportunities and more focused efforts raise the possibility of significant progress on the timescale of 5-10 years, these GCs constitute a main component of the WCRP strategy to accelerate progress in climate science (Brasseur and Carlson, 2015). They relate to (1) advancing understanding of the role of clouds in the general atmospheric circulation and climate sensitivity , (2) assessing the response of the cryosphere to a warming climate and its global consequences, (3) understanding the factors that control water availability over land (Trenberth and Asrar, 2014), (4) assessing climate extremes, what controls them, how they have changed in the past and how they might change in the future, (5) understanding and predicting regional sea level change and its coastal impacts, (6) improving near-term climate predictions, and (7) determining how biogeochemical cycles and feedback control greenhouse gas concentrations and climate change.
These GCs will be using the full spectrum of observational, modelling and analytical expertise across the WCRP, and in terms of modelling most GCs will address their specific science questions through a hierarchy of numerical models of different complexities. Global coupled models obviously constitute an essential element of this hierarchy, and CMIP6 experiments will play a prominent role across all GCs by helping to answer the following three CMIP6 science questions: How does the Earth system respond to forcing? What are the origins and consequences of systematic model biases? How can we assess future climate change given internal climate variability, climate predictability, and uncertainties in scenarios?
These three questions will be at the centre of CMIP6. Science topics related specifically to CMIP6 will be addressed through a range of CMIP6-Endorsed MIPs that are organized by the respective communities and overseen by the CMIP Panel (Fig. 2). Through these different MIPs and their connection to the GCs, the goal is to fill some of the main scientific gaps of previous CMIP phases. This includes, in particular, facilitating the identification and interpretation of model systematic errors, improving the estimate of radiative forcings in past and future climate change simulations, facilitating the identification of robust climate responses to aerosol forcing during the historical period, better accounting of the impact of short-term forcing agents and land use on climate, better understanding the mechanisms of decadal climate variability, along with many other issues not addressed satisfactorily in CMIP5 (Stouffer et al., 2015). In endorsing a number of these MIPs, the CMIP Panel acted to minimize overlaps among the MIPs and to reduce the burden on modelling groups, while maximizing the scientific complementarity and synergy among the different MIPs.

The CMIP6-Endorsed MIPs
Close to 30 suggestions for CMIP6 MIPs have been received so far, of which 21 MIPs were eventually endorsed and invited to participate (Table 3). Of those not selected some were asked to work with other proposed MIPs with overlapping science goals and objectives. Of the 21 CMIP6-Endorsed MIPs, 4 are diagnostic in nature, which means that they define and analyse additional output, but do not require additional experiments. In the remaining 17 MIPs, a total of around 190 experiments have been proposed resulting in 40 000 model simulation years with around half of these in Tier 1. The CMIP6-Endorsed MIPs show broad coverage and distribution across the three CMIP6 science questions, and all are linked to the WCRP Grand Science Challenges (Fig. 3).
Each of the 21 CMIP6-Endorsed MIPs is described in a separate invited contribution to this special issue. These contributions will detail the goal of the MIP and the major scientific gaps the MIP is addressing, and will specify what is new compared to CMIP5 and previous CMIP phases. The con- tributions will include a description of the experimental design and scientific justification of each of the experiments for Tier 1 (and possibly beyond), and will link the experiments and analysis to the DECK and CMIP6 historical simulations. They will additionally include an analysis plan to fully justify the resources used to produce the various requested variables, and if the analysis plan is to compare model results to observations, the contribution will highlight possible model diagnostics and performance metrics specifying whether the comparison entails any particular requirement for the simulations or outputs (e.g. the use of observational simulators). In addition, possible observations and reanalysis products for model evaluation are discussed and the MIPs are encouraged to help facilitate their use by contributing them to the obs4MIPs/ana4MIPs archives at the ESGF (see Sect. 3.3). In some MIPs, additional forcings beyond those used in the DECK and CMIP6 historical simulations are required, and these are described in the respective contribution as well. Table 3. List of CMIP6-Endorsed MIPs along with the long name of the MIP, the primary goal(s) and the main CMIP6 science theme as displayed in Fig. 2. Each of these MIPs is described in more detail in a separate contribution to this special issue. MIPs marked with * are diagnostic MIPs. (a) Improving understanding of physical processes in global monsoons system; (b) better simulating the mean state, interannual variability, and long-term changes of global monsoons.

Regional phenomena
HighResMIP

High-Resolution Model Intercomparison Project
Assessing the robustness of improvements in the representation of important climate processes with weather-resolving global model resolutions (∼ 25 km or finer), within a simplified framework using the physical climate system only with constrained aerosol forcing. Quantifying the effects of land use on climate and biogeochemical cycling (past-future), and assessing the potential for alternative land management strategies to mitigate climate change.

Impacts
DynVarMIP * Dynamics and Variability Model Intercomparison Project Defining and analysing diagnostics that enable a mechanistic approach to confront model biases and understand the underlying causes behind circulation changes with a particular emphasis on the two-way coupling between the troposphere and the stratosphere.

Clouds/ Circulation SIMIP * Sea Ice Model Intercomparison Project
Understanding the role of sea ice and its response to climate change by defining and analysing a comprehensive set of variables and processoriented diagnostics that describe the sea ice state and its atmospheric and ocean forcing.
Ocean/Land/ Ice VIACS AB * Vulnerability, Impacts, Adaptation and Climate Services Advisory Board Facilitating a two-way dialogue between the CMIP6 modelling community and VIACS experts, who apply CMIP6 results for their numerous research and climate services, towards an informed construction of model scenarios and simulations and the design of online diagnostics, metrics, and visualization of relevance to society.

Impacts
A number of MIPs are developments and/or continuation of long-standing science themes. These include MIPs specifically addressing science questions related to cloud feedback and the understanding of spatial patterns of circulation and precipitation (CFMIP), carbon cycle feedback, and the understanding of changes in carbon fluxes and stores (C 4 MIP), detection and attribution (DAMIP) that newly includes 21stcentury GHG-only simulations allowing the projected responses to GHGs and other forcings to be separated and scaled to derive observationally constrained projections, and paleoclimate (PMIP), which assesses the credibility of the model response to forcing outside the range of recent variability. These MIPs reflect the importance of key forcing and feedback processes in understanding past, present, and future climate change and have developed new experiments and science plans focused on emerging new directions that will be at the centre of the WCRP Grand Science Challenges. A few new MIPs have arisen directly from gaps in understanding in CMIP5 (Stouffer et al., 2015), for example, poor quantification of radiative forcing (RFMIP), better understanding of ocean heat uptake and sea level rise (FAFMIP), and understanding of model response to volcanic forcing (VolMIP).
Since CMIP5, other MIPs have emerged as the modelling community has developed more complex ESMs with interactive components beyond the carbon cycle. These include the consistent quantification of forcings and feedback from aerosols and atmospheric chemistry (AerChemMIP), and, for the first time in CMIP, modelling of sea level rise from land ice sheets (ISMIP6).
Some MIPs specifically target systematic biases focusing on improved understanding of the sea ice state and its atmospheric and oceanic forcing (SIMIP), the physical and biogeochemical aspects of the ocean (OMIP), land, snow and soil moisture processes (LS3MIP), and improved understanding of circulation and variability with a focus on stratosphere-troposphere coupling (DynVarMIP). With the increased emphasis in the climate science community on the need to represent and understand changes in regional circulation, systematic biases are also addressed on a more regional scale by the Global Monsoon MIP (GMMIP) and a first coordinated activity on high-resolution modelling (High-ResMIP).
For the first time, future scenario experiments, previously coordinated centrally as part of the CMIP5 core experiments, will be run as an MIP ensuring clear definition and wellcoordinated science questions. ScenarioMIP will run a new set of future long-term (century timescale) integrations engaging input from both the climate science and integrated assessment modelling communities. The new scenarios are based on a matrix that uses the shared socioeconomic pathways (SSPs, O'Neill et al., 2015) and forcing levels of the Representative Concentration Pathways (RCP) as axes. As a set, they span the same range as the CMIP5 RCPs (Moss et al., 2010), but fill critical gaps for intermediate forcing levels and questions, for example, on short-lived species and land use. The near-term experiments (10-30 years) are coordinated by the decadal climate prediction project (DCPP) with improvements expected, for example, from the initialization of additional components beyond the ocean and from a more detailed process understanding and evaluation of the predictions to better identify sources and limits of predictability.
Other MIPs include specific future mitigation options, e.g. the land use MIP (LUMIP) that is for the first time in CMIP isolating regional land management strategies to study how different surface types respond to climate change and direct anthropogenic modifications, or the geoengineering MIP (GeoMIP), which examines climate impacts of newly proposed radiation modification geoengineering strategies.
The diagnostic MIP CORDEX will oversee the downscaling of CMIP6 models for regional climate projections. Another historic development in our field that provides, for the first time in CMIP, an avenue for a more formal communication between the climate modelling and user community is the endorsement of the vulnerability, impacts, and adaptation and climate services advisory board (VIACS AB). This diagnostic MIP requests certain key variables of interest to the VIACS community be delivered in a timely manner to be used by climate services and in impact studies.
All MIPs define output streams in the centrally coordinated CMIP6 data request for each of their own experiments as well as the DECK and CMIP6 historical simulations (see the CMIP6 data request contribution to this special issue for details). This will ensure that the required variables are stored at the frequency and resolution required to address the specific science questions and evaluation needs of each MIP and to enable a broad characterization of the performance of the CMIP6 models.
We note that only the Tier 1 MIP experiments are overseen by the CMIP Panel, but additional experiments are proposed by the MIPs in Tiers 2 and 3. We encourage the modelling groups to participate in the full suite of experiments beyond Tier 1 to address in more depth the scientific questions posed.
The call for MIP applications for CMIP6 is still open and new proposals will be reviewed at the annual WGCM meetings. However, we point out that the additional MIPs suggested after the CMIP6 data request has been finalized will have to work with the already defined model output from the DECK and CMIP6 historical simulations, or work with the modelling group to recover additional variables from their internal archives. We also point out that some experiments proposed by CMIP6-Endorsed MIPs may not be finished until after CMIP6 ends.

Summary
CMIP6 continues the pattern of evolution and adaptation characteristic of previous phases of CMIP. To centre CMIP at the heart of activities within climate science and encourage links among activities within the World Climate Research Programme (WCRP), CMIP6 has been formulated scientifically around three specific questions, amidst the backdrop of the WCRP's seven Grand Science Challenges. To meet the increasingly broad scientific demands of the climatescience community, yet be responsive to the individual priorities and resource limitations of the modelling centres, CMIP has adopted a new, more federated organizational structure.
CMIP has now evolved from a centralized activity involving a large number of experiments to a federated activity, encompassing many individually designed MIPs. CMIP6 comprises 21 individual CMIP6-Endorsed MIPs and the DECK and CMIP6 historical simulations. Four of the 21 CMIP6-Endorsed MIPs are diagnostic in nature, meaning that they require additional output from models, but not additional simulations. The total amount of output from CMIP6 is estimated to be between 20 and 40 petabytes, depending on model resolution and the number of modelling centres ultimately participating in CMIP6. Questions addressed in the MIPs are wide ranging, from the climate of distant past to the response of turbulent cloud processes to radiative forc-Geosci. Model Dev., 9,2016 www.geosci-model-dev.net/9/1937/2016/ ing, from how the terrestrial biosphere influences the uptake of CO 2 to how much predictability is stored in the ocean, from how to best project near-term to long-term future climate changes while considering interdependence and differences in model performance in the CMIP6 ensemble, and from what regulates the distribution of tropospheric ozone, to the influence of land-use changes on water availability. The last 3 years have been dedicated to conceiving and then planning what we now call CMIP6. Starting in 2016, the first modelling centres are expected to begin performing the DECK and uploading output on the ESGF. Forcings for the DECK and CMIP6 historical simulations will be ready before mid-2016 so that these experiements can be started, and by the end of 2016 the diverse forcings for different scenarios of future human activity will become available. Past experience suggests that most centres will complete their CMIP simulations within a few years while the analysis of CMIP6 results will likely go on for a decade or more (Fig. 4).
Through an intensified effort to align CMIP with specific scientific questions and the WCRP Grand Science Challenges, we expect CMIP6 to continue CMIP's tradition of major scientific advances. CMIP6 simulations and scientific achievements are expected to support the IPCC Sixth Assessment Report (AR6) as well as other national and international climate assessments or special reports. Ultimately scientific progress on the most pressing problems of climate variability and change will be the best measure of the success of CMIP6.

Data availability
The model output from the DECK and CMIP6 historical simulations described in this paper will be distributed through the Earth System Grid Federation (ESGF) with digital object identifiers (DOIs) assigned. As in CMIP5, the model output will be freely accessible through data portals after registration. In order to document CMIP6's scientific impact and enable ongoing support of CMIP, users are obligated to acknowledge CMIP6, the participating modelling groups, and the ESGF centres (see details on the CMIP Panel website at http://www.wcrp-climate.org/index.php/wgcm-cmip/ about-cmip). Further information about the infrastructure supporting CMIP6, the metadata describing the model output, and the terms governing its use are provided by the WGCM Infrastructure Panel (WIP) in their invited contribution to this special issue. Along with the data, the provenance of the data will be recorded, and DOIs will be assigned to collections of output so that they can be appropriately cited. This information will be made readily available so that published research results can be verified and credit can be given to the modelling groups providing the data. The WIP is coordinating and encouraging the development of the infrastructure needed to archive and deliver this information. In order to run the experiments, data sets for natural and anthropogenic forcings are required. These forcing data sets are described in separate invited contributions to this special issue. The forcing data sets will be made available through the ESGF with version control and DOIs assigned.

Appendix A: Experiment specifications A1 Specifications for the DECK
Here we provide information needed to perform the DECK, including specification of forcing and boundary conditions, initialization procedures, and minimum length of runs. This information is largely consistent with but not identical to the specifications for these experiments in CMIP5 (Taylor et al., 2009).
The DECK and CMIP6 historical simulations are requested from all models participating in CMIP. The expectation is that this requirement will be met for each model configuration used in the subsequent CMIP6-Endorsed MIPs (an entry card). For CMIP6, in the special case where the burden of the entry card simulations is prohibitive but the scientific case for including a particular model simulation is compelling (despite only partial completion of the entry card simulations), an exception to this policy can be granted on a model-by-model basis by the CMIP Panel, which will seek advice from the chairs of the affected CMIP6-Endorsed MIP.
CMIP6 is a cooperative effort across the international climate modelling and climate science communities. The modelling groups have all been involved in the design and implementation of CMIP6, and thus have agreed to a set of best practices proposed for CMIP6. Those best practices include having the modelling groups submit the DECK experiments and the CMIP6 historical simulations to the ESGF, as well as any CMIP6-Endorsed MIP experiments they choose to run. Additionally, the modelling groups decide what constitutes a new model version. The CMIP Panel will work with the MIP co-chairs and the modelling groups to ensure that these best practices are followed.

A1.1 AMIP simulation
As in the first simulations performed under the Atmospheric Model Intercomparison Project (AMIP, Gates et al., 1999), SSTs and SICs in AMIP experiments are prescribed consistent with observations (see details on this forcing data set in the corresponding contribution to this special issue). Land models should be configured as close as possible to the one used in the CMIP6 historical simulation including transient land use and land cover. Other external forcings including volcanic aerosols, solar variability, GHG concentrations, and anthropogenic aerosols should also be prescribed consistent with those used in the CMIP6 historical simulation (see Sect. A2 below). Even though in AMIP simulations models with an active carbon cycle will not be fully interactive, surface carbon fluxes should be archived over land.
AMIP integrations can be initialized from prior model integrations or from observations or in other reasonable ways. Depending on the treatment of snow cover, soil water content, the carbon cycle, and vegetation, these runs may require a spin-up period of several years. One might establish quasi-equilibrium conditions consistent with the model by, for example, running with ocean conditions starting earlier in the 1970s or cycling repeatedly through year 1979 before simulating the official period. Results from the spin-up period (i.e. prior to 1979) should be discarded, but the spin-up technique should be documented.
For CMIP6, AMIP simulations should cover at least the period from January 1979 through December 2014, but modelling groups are encouraged to extend their runs to the end of the observed period. Output may also be contributed from years preceding 1979 with the understanding that surface ocean conditions were less complete and in some cases less reliable then.
The climate found in AMIP simulations is largely determined by the externally imposed forcing, especially the ocean conditions. Nevertheless, unforced variability (noise) within the atmosphere introduces some non-deterministic variations that hamper unambiguous interpretation of apparent relationships between, for example, the year-to-year anomalies in SSTs and their consequences over land. To assess the role of unforced atmospheric variability in any particular result, modelling groups are encouraged to generate an ensemble of AMIP simulations. For most studies, a threemember ensemble, where only the initial conditions are varied, would be the minimum required, with larger size ensembles clearly of value in making more precise determination of statistical significance.

A1.2 Multi-century pre-industrial control simulations
Like laboratory experiments, numerical experiments are designed to reveal cause and effect relationships. A standard way of doing this is to perform both a control experiment and a second experiment where some externally imposed experiment condition has been altered. For many CMIP experiments, including the rest of the experiments discussed in this Appendix, the control is a simulation with atmospheric composition and other conditions prescribed and held constant, consistent with best estimates of the forcing from the historical period.
Ideally the pre-industrial control (piControl) experiment for CMIP would represent a near-equilibrium state of the climate system under the imposed conditions. In reality, simulations of hundreds to many thousands of years would be required for the ocean's depths to equilibrate and for biogeochemical reservoirs to fully adjust. Available computational resources generally preclude integrations long enough to approach equilibrium, so in practice shorter runs must suffice. Usually, a piControl simulation is initialized from the control run of a different model or from observations, and then run until at least the surface climate conditions stabilize using 1850 forcings (see Stouffer et al., 2004, for further discussion). This spin-up period can be as long as several hundred years and variables that can document the spin-up behaviour should be archived (under the experiment labels piControl-Geosci. Model Dev., 9,2016 www.geosci-model-dev.net/9/1937/2016/ spinup or esm-piControl-spinup). At the very least the length of the spin-up period should be documented. Although equilibrium is generally not achieved, the changes occurring after the spin-up period are usually found to evolve at a fairly constant rate that presumably decreases slowly as equilibrium is approached. After a few centuries, these drifts of the system mainly affect the carbon cycle and ocean below the main thermocline, but they are also manifest at the surface in a slow change in sea level. The climate drift must be removed in order to interpret experiments that use the pre-industrial simulation as a control. The usual procedure is to assume that the drift is insensitive to CMIP experiment conditions and to simply subtract the control run from the perturbed run to determine the climate change that would occur in the absence of drift.
Besides serving as controls for numerical experimentation, the piControl and esm-piControl are used to study the naturally occurring, unforced variability of the climate system. The only source of climate variability in a control arises from processes internal to the model, whereas in the more complicated real world, variations are also caused by external forcing factors such as solar variability and changes in atmospheric composition caused, for example, by human activities or volcanic eruptions. Consequently, the physical processes responsible for unforced variability can more easily be isolated and studied using the control run of models, rather than by analysing observations.
A DECK control simulation is required to be long enough to extend to the end of any perturbation runs initiated from it so that climate drift can be assessed and possibly removed from those runs. If, for example, a historical simulation (beginning in 1850) were initiated from the beginning of the control simulation and then were followed by a future scenario run extending to year 2300, a control run of at least 450 years would be required. As discussed above, control runs are also used to assess model-simulated unforced climate variability. The longer the control, the more precisely can variability be quantified for any given timescale. A control simulation of many hundreds of years would be needed to assess variability on centennial timescales. For CMIP6 it is recommended that the control run should be at least 500 years long (following the spin-up period), but of course the simulation must be long enough to reach to the end of the experiments it spawns. It should be noted that those analysing CMIP6 simulations might also require simulations longer than 500 years to accurately assess unforced variability on long timescales, so modelling groups are encouraged to extend their control runs well beyond the minimum recommended number of years.
Because the climate was very likely not in equilibrium with the forcing of 1850 and because different components of the climate system differentially respond to the effects of the forcing prior to that time, there is some ambiguity in deciding on what forcing to apply for the control. For CMIP6 we recommend a specification of this forcing that attempts to balance conflicting objectives to minimize artificial climate responses to discontinuities in radiative forcing at the time a historical simulation is initiated, and minimize artefacts in sea level change due to thermal expansion caused by unrealistic mismatches in conditions in the centennial-scale averaged forcings for the pre-and post-1850 periods. Note that any preindustrial multi-centennial observed trend in global-mean sea level is most likely to be due to slow changes in ice-sheets, which are likely not to be simulated in the CMIP6 model generation.
The first consideration above implies that radiative forcing in the control run should be close to that imposed at the beginning of the CMIP historical simulation (i.e. 1850). The second implies that a background volcanic aerosol and timeaveraged solar forcing should be prescribed in the control run, since to neglect it would cause an apparent drift in sea level associated with the suppression of heat uptake due to the net effect of, for instance, volcanism after 1850, and this has implications for sea level changes (Gregory, 2010;Gregory et al., 2013). We recognize that it will be impossible to entirely avoid artefacts and artificial transient effects, and practical considerations may rule out conformance with every detail of the control simulation protocol stipulated here.
With that understanding, here is a summary of the recommendations for the imposed conditions on the spin-up and control runs, followed by further clarification in subsequent paragraphs: -Conditions must be time invariant except for those associated with the mean climate (notably the seasonal and diurnal cycles of insolation).
-Unless indicated otherwise (e.g. the background volcanic forcing), experiment conditions (e.g. greenhouse gas concentrations, ozone concentration, surface land conditions) should be representative of Earth around the year 1850.
-Orbital parameters (eccentricity, obliquity, and longitude of the perihelion) should be held fixed at their 1850 values.
-Land use should not change in the control run and should be fixed according to reconstructed agricultural maps from 1850. Due to the diversity of model approaches in ESMs for land carbon, some groups might deviate from this specification, and again this must be clearly documented.
-The solar constant should be fixed at its mean value (no 11-year solar cycle) over the first two solar cycles of the historical simulation (i.e. the 1850-1873 mean).
-A background volcanic aerosol should be specified that results in radiative forcing matching, as closely as possible, that experienced, on average, during the historical simulation (i.e. 1850-2014 mean).
-Models without interactive ozone chemistry should specify the pre-industrial ozone fields from a data set produced from a pre-industrial control simulation that uses 1850 emissions and a mean solar forcing averaged over solar cycles 8-10, representative of the mean mid-19th century solar forcing.
-For models with interactive chemistry and/or aerosols, the CMIP6 pre-industrial emissions dataset of reactive gases and aerosol precursors should be used. For models without internally calculated aerosol concentrations, a monthly climatological dataset of aerosol physical and optical properties should be used.
In the CO 2 -concentration-driven piControl, the value of the global annual mean 1850 atmospheric CO 2 concentration is prescribed and held fixed during the entire experiment. There are some special considerations that apply to control simulations performed by emission-driven ESMs (i.e. runs with atmospheric concentrations of CO 2 calculated prognostically rather than being prescribed). In the esm-piControl simulation, emissions of CO 2 from both fossil fuel combustion and land-use change are prescribed to be zero. In this run any residual drift in atmospheric CO 2 concentration that arises from an imbalance in the exchanges of CO 2 between the atmosphere and the ocean and land (i.e. by the natural carbon cycle in the absence of anthropogenic CO 2 emissions) will need to be subtracted from perturbation runs to correct for a control state not in equilibrium. It should be emphasized that the esm-piControl is an idealized experiment and is not meant to mimic the true 1850 conditions, which would have to include a source of carbon of around 0.6 Pg C yr −1 from the already perturbed state that existed in 1850. Due to a wide variety of ESMs and the techniques they use to compute land carbon fluxes, it is hard to make statements that apply to all models equally well. A general recommendation, however, is that the land carbon fluxes in the emission and concentration-driven control simulations should be stable in time and in approximate balance so that the net carbon flux into the atmosphere is small (less than 0.1 Pg C yr −1 ). Further details on ESM experiments with a carbon cycle are provided in the C 4 MIP contribution to this special issue.
The historical time-average volcanic forcing stipulated above for the control run is likely to approximate the much longer term mean. The volcanic aerosol radiative forcing estimates of Crowley (2000) for the historical period and the last millennium are −0.18 and −0.22 W m −2 , respectively. Because the mean volcanic forcing between 1850 and 2014 is small, the discontinuity associated with transitioning from a mean forcing to a time-varying volcanic forcing is also expected to be small. Even though this is the design objective, it is likely that it will be impossible to eliminate all artefacts in quantities such as historical sea level change. For this reason, and because some models may deviate from these specifications, it is recommended that groups perform an additional simulation of the historical period but with only natural forcing included. With this additional run, which is already called for under DAMIP, the purely anthropogenic effects on sea level change can be isolated.
The forcing specified in the piControl also has implications for simulations of the future, when solar variability and volcanic activity will continue to exist, but at unknown levels. These issues need to be borne in mind when designing and evaluating future scenarios, as a failure to include volcanic forcing in the future will cause future warming and sea level rise to be over-estimated relative to a piControl experiment in which a non-zero volcanic forcing is specified. This is accounted for by introducing a time-invariant nonzero volcanic forcing (e.g. the mean volcanic forcing for the piControl) into the scenarios. This is further specified in the ScenarioMIP contribution to this special issue.
These issues, and the potential of different modelling centres adopting different approaches to account for their particular constraints, highlight the paramount importance of adequately documenting the conditions under which this and the other DECK experiments are performed.

A1.3 Abruptly quadrupling CO 2 simulation
Until CMIP5, there were no experiments designed to quantify the extent to which forcing differences might explain differences in climate response. It was also difficult to diagnose and quantify the feedback responses, which are mediated by global surface temperature change . In order to examine these fundamental characteristics of models -CO 2 forcing and climate feedback -an abrupt 4×CO 2 simulation was included for the first time as part of CMIP5. Following Gregory et al. (2004), the simulation branches in January of the CO 2 -concentration-driven piControl and abruptly the value of the global annual mean 1850 atmospheric CO 2 concentration that is prescribed in piControl is quadrupled and held fixed. As the system subsequently evolves toward a new equilibrium, the imbalance in the net flux at the top of the atmosphere can be plotted against global temperature change. As Gregory et al. (2004) showed, it is then possible to diagnose both the effective radiative forcing due to a quadrupling of CO 2 and also effective equilibrium climate sensitivity (ECS). Moreover, by examining how individual flux components evolve with surface temperature change, one can learn about the relative strengths of different feedback, notably quantifying the importance of various feedback associated with clouds.
In the abrupt-4×CO2 experiment, the only externally imposed difference from the piControl should be the change in CO 2 concentration. All other conditions should remain as they were in the piControl, including any background vol-canic aerosols. By changing only a single factor, we can unambiguously attribute all climatic consequences to the increase in CO 2 concentration.
The minimum length of the simulation should be 150 years, but longer simulations would enable investigations of longer-timescale responses. Also there is value, as in CMIP5, in performing an ensemble of short (∼ 5-year) simulations, all prescribing global annual mean 1850 atmospheric CO 2 concentration but initiated at different times throughout the year (in addition to the abrupt-4×CO2 simulation initiated from the piControl in January). Such an ensemble would reduce the statistical uncertainty with which the effective CO 2 radiative forcing could be quantified and would allow more detailed and accurate diagnosis of the fast responses of the system under an abrupt change in forcing (Bony et al., 2013;Gregory and Webb, 2008;Kamae and Watanabe, 2013;Sherwood et al., 2015). Different groups will be able to afford ensembles of different sizes, but in any case each realization should be initialized in a different month and the months should be spaced evenly throughout the year.

A1.4 1 % CO 2 increase simulation
The second idealized climate change experiment was introduced in the early days of CMIP (Meehl et al., 2000). It is designed for studying model responses under simplified but somewhat more realistic forcing than an abrupt increase in CO 2 . In this 1pctCO2 experiment, the simulation is branched from the piControl, and the global annual mean CO 2 concentration is gradually increased at a rate of 1 % yr −1 (i.e. exponentially), starting from its 1850 value that is prescribed in the piControl. A minimum length of 150 years is requested so that the simulation goes beyond the quadrupling of CO 2 after 140 years. Note that in contrast to previous definitions, the experiment has been simplified so that the 1 % CO 2 increase per year is applied throughout the entire simulation rather than keeping it constant after 140 years as in CMIP5. Since the radiative forcing is approximately proportional to the logarithm of the CO 2 increase, the radiative forcing linearly increases over time. Drawing on the estimates of effective radiative forcing (for definitions see Myhre et al., 2013) obtained in the abrupt-4×CO2 simulations, analysts can scale results from each model in the 1 % CO 2 increase simulations to focus on the response differences in models, largely independent of their forcing differences. In contrast, in CMIP6 historical simulations (see Sect. A2), the forcing and response contributions to model differences in simulated climate change cannot be easily isolated.
As in the abrupt-4×CO2 experiment, the only externally imposed difference from the piControl should be the change in CO 2 concentration. The omission of changes in aerosol concentrations is the key to making these simulations easier to interpret.
Models with a carbon cycle component will be driven by prescribed CO 2 concentrations, but terrestrial and marine surface fluxes and stores of carbon will become a key diagnostic from which one can infer emission rates that are consistent with a 1 % yr −1 increase in model CO 2 concentration. This DECK baseline carbon cycle experiment is built upon in C 4 MIP to diagnose the strength of model carbon climate feedback and to quantify contributions to disruption of the carbon cycle by climate and by direct effects of increased CO 2 concentration.
A2 The CMIP6 historical simulations CMIP6 historical simulations of climate change over the period 1850-2014 are forced by common data sets that are largely based on observations. They serve as an important benchmark for assessing model performance through evaluation against observations. The historical integration should be initialized from some point in the control integration (with historical branching from the piControl and the esm-hist branching from esm-piControl) and be forced by varying time, externally imposed conditions that are based on observations. Both naturally forced changes (e.g. due to solar variability and volcanic aerosols) and changes due to human activities (e.g. CO 2 concentration, aerosols, and land use) will lead to climate variations and evolution. In addition, there is unforced variability which can obscure the forced changes and lead to expected differences between the simulated and observed climate variations (Deser et al., 2012).
The externally imposed forcing data sets that should be used in CMIP6 cover the period 1850 through the end of 2014 and are described in detail in various other contributions to this special issue. In the CO 2 -concentration-driven historical simulations, time-varying global annual mean concentrations for CO 2 and other long-lived greenhouse gases are prescribed. If a modelling center decides to represent additional spatial and seasonal variations in prescribed greenhouse gas forcings, this needs to be adequately documented.
Recall from Sect. A1.2 that the conditions in the control should generally be consistent with the forcing imposed near the beginning of the CMIP historical simulation. This should minimize artificial transient effects in the first portion of the CMIP historical simulation. An exception is that for the CO 2emission-driven experiments, the zero CO 2 emissions from fossil fuel and the land-use specifications for 1850 in the esm-piControl could cause a discontinuity in land carbon at the branch point.
As described in Sect. A1.2, the 1850 esm-piControl should be developed for an idealized case that is stable in time and balance so that the net carbon flux into the atmosphere is small. Meanwhile, the start of the esm-hist in 1850 should be as realistic as possible and attempt to account for the fact the land surface was not in equilibrium in 1850 due to prior land-use effects (Houghton, 2010;Hurtt et al., 2011). Some modelling groups have developed methods to achieve these twin goals in a computationally efficient manner, for example, by performing pre-1850 off-line land model simulations Given that the historical simulations start in 1850, the piControl should have fixed 1850 atmospheric composition, not true pre-industrial esm-piControl As in piControl As in piControl As in piControl but with CO 2 concentration calculated, rather than prescribed. CO 2 from both fossil fuel combustion and land-use change are prescribed to be zero.
abrupt-4×CO2 As in piControl As in piControl As in piControl except CO 2 that is 4 times that of piControl 1pctCO2 As in piControl As in piControl As in piControl except CO 2 that is increasing at 1 % yr −1 historical Time-dependent observations Time-dependent observations Time-dependent observations esm-hist As in historical As in historical As in historical but with CO 2 emissions prescribed and CO 2 concentration calculated (rather than prescribed) to account for the land carbon cycle disequilibrium before 1850 and to adequately simulate carbon stores at the start of the historical simulation (Sentman et al., 2011). Due to the wide diversity of modelling approaches for land carbon in the ESMs, the actual method applied by each group to account for these effects will differ and needs to be well documented. As discussed earlier, there will be a mismatch in the specification of volcanic aerosols between control and historical simulations that especially affect estimates of ocean heat uptake and sea level rise in the historical period. This can be minimized by prescribing a background volcanic aerosol in the pre-industrial control that has the same cooling effect as the volcanoes included in the CMIP6 historical simulation. Any residual mismatch will need to be corrected, which requires a special supplementary simulation (see Sect. A1.2) that should be submitted along with the CMIP6 historical simulation.
For model evaluation and for detection and attribution studies (the focus of DAMIP) there would be considerable value in extending the CMIP6 historical simulations beyond the nominal 2014 ending date. To include the more recent observations in model evaluation, modelling groups are encouraged to document and apply forcing data sets representing the post-2014 period. For short extensions (up to a few years) it may be acceptable to simply apply forcing from one of the future scenarios defined by ScenarioMIP. To distinguish between the portion of the historical period when all models will use the same forcing data sets (i.e. 1850-2014) from the extended period where different data sets might be used, the experiment for 1850-2014 will be labelled historical (esm-hist in the case of the emission-driven run) and the period from 2015 through near-present will likely be labelled historical-ext (esm-hist-ext).
Even if the CMIP6 historical simulations are extended beyond 2014, all future scenario simulations (called for by Sce-narioMIP and other MIPs) should be initiated from the end of year 2014 of the CMIP6 historical simulation since the "future" in CMIP6 begins in 2015.
Due to interactions within and between the components of the Earth system, there is a wide range of variability on various time and space scales (Hegerl et al., 2007). The timescales vary from shorter than a day to longer than several centuries. The magnitude of the variability can be quite large relative to any given signal of interest depending on the time and space scales involved and on the variable of interest. To more clearly identify forced signals emerging from natural variability, multiple model integrations (comprising an ensemble) can be made where only the initial conditions are perturbed in some way which should be documented. A common way to do this is to simply branch each simulation from a different point in the control run. Longer intervals between branch points will ensure independence of ensemble members on longer timescales. By averaging many different ensemble members together, the signal of interest becomes clear because the natural variations tend to average out if the ensemble size and averaging period are long enough. If the variability in the models is realistic, then the spread of the ensemble members around the ensemble average is caused Geosci. Model Dev., 9,2016 www.geosci-model-dev.net/9/1937/2016/ by unforced (i.e. internal) variability. To minimize the number of years included in the entry card simulations, only one ensemble member is requested here. However, we strongly encourage model groups to submit at least three ensemble members of their CMIP historical simulation as requested in DAMIP.