The PMIP4 contribution to CMIP6 – Part 1: Overview and over-arching analysis plan

. This paper is the ﬁrst of a series of four GMD papers on the PMIP4-CMIP6 experiments. Part 2 (Otto-Bliesner et al., 2017) gives details about the two PMIP4-CMIP6 interglacial experiments, Part 3 (Jungclaus et al., 2017) about the last millennium experiment, and Part 4 (Kageyama et al., 2017) about the Last Glacial Maximum experiment. The mid-Pliocene Warm Period experiment is part of the Pliocene Model Intercomparison Project (PlioMIP) – Phase 2, detailed in Haywood et al. (2016). The goal of the Paleoclimate Modelling Intercomparison Project (PMIP) is to understand the response of the climate system to different climate forcings for documented climatic states very different from the present and historical climates. name: 127 years lig127k mid-Pliocene million years ago ( midPliocene-eoi400 ). These climatic periods are well documented by palaeoclimatic and palaeoenvironmental records, with climate and environmental changes relevant for the study and projection of future climate changes. This paper describes the motivation for the choice of these periods and the design of the numerical experiments and database requests, with a focus on their novel features compared to the experiments performed in previous phases of PMIP and CMIP. It also outlines the analysis plan that takes advantage of the comparisons of the results across periods and across CMIP6 in collaboration with other MIPs.


Introduction
Instrumental meteorological and oceanographic data show that the Earth has undergone a global warming of ∼ 0.85 • C since the beginning of the Industrial Revolution (Hartmann et al., 2013), largely in response to the increase in atmospheric greenhouse gases. Concentrations of atmospheric greenhouse gases are projected to rise significantly during the 21st century, reaching levels well outside the range of recent millennia. In making future projections, models are operating beyond the conditions under which they have been developed and validated. Changes in the recent past provide only limited evidence for how climate responds to changes in external factors and internal feedbacks of the magnitude expected in the future. Palaeoclimate states radically different from those of the recent past provide a way to test model performance outside the range of recent climatic variations and to study the role of forcings and feedbacks in establishing these climates. Although palaeoclimate simulations strive for verisimilitude in terms of forcings and the treatment of feedbacks, none of the models used for future projection have been developed or calibrated to reproduce past climates.
We have to look back 3 million years to find a period of Earth's history when atmospheric CO 2 concentrations were similar to the present day (the mid-Pliocene Warm Period, mPWP). We have to look back several tens of millions of years further (the early Eocene, ∼ 55 to 50 million years ago) to find CO 2 concentrations similar to those projected for the end of this century. These periods can offer key insights into climate processes that operate in a higher CO 2 , warmer world, although their geographies are different from today (e.g. Lunt et al., 2012;Caballero and Huber, 2010). During the Quaternary (2.58 million years ago to present), Earth's geography was similar to today and the main external factors driving climatic changes were the variations in the seasonal and latitudinal distribution of incoming solar energy arising from cycles in Earth's orbit around the Sun. The feedbacks from changes in greenhouse gas concentrations and ice sheets acted as additional controls on the dynamics of the atmosphere and the ocean. In addition, rapid climate transitions, on human-relevant timescales (decades to centuries), have been documented for this most recent period (e.g. Marcott et al., 2014;Steffensen et al., 2008).
By combining several past periods, the credibility of climate projections can be assessed by using information about longer-term palaeoclimate changes that are as large as the anticipated future change. Replicating the totality of past climate changes with state-of-the-art climate models, driven by appropriate forcings (e.g. insolation, atmospheric composition) and boundary conditions (e.g. ice sheets), is a challenge Harrison et al., 2015). It is challenging, for example, to represent the correct amplitude of past climate changes such as glacial-interglacial temperature differences (e.g. between the Last Glacial Maximum, LGM, ∼ 21 000 years ago, and the pre-industrial temperatures; cf. Harrison et al., 2014) or to represent the northward extension of the African monsoon during the mid-Holocene (MH, ∼ 6000 years ago) (Perez-Sanz et al., 2014). Interpreting palaeoenvironmental data can also be challenging, particularly disentangling the relationships between changes in large-scale atmospheric or oceanic circulation, broad-scale regional climates, and local environmental responses to these changes during climate periods when the relative importance of various climate feedbacks cannot be assumed to be similar to today. This challenge is paralleled by concerns about future local or regional climate changes and their impact on the environment. Therefore, modelling palaeoclimates is a means to understand past climate and environmental changes better, using physically based tools, as well as a means to evaluate model skill in forecasting the responses to major drivers.
These challenges are at the heart of the Paleoclimate Modeling Intercomparison Project (PMIP) and the new set of CMIP6-PMIP4 simulations Jungclaus et al., 2017;Kageyama et al., 2017;Haywood et al., 2016) has the ambition of tackling them. Palaeoclimate experiments for the Last Glacial Maximum, the mid Holocene, and the last millennium were formally included in CMIP during its fifth phase (CMIP5, Taylor et al., 2012), equivalent to the third phase of PMIP (PMIP3, Braconnot et al., 2012). This formal inclusion made it possible to compare the mechanisms causing past and future climate changes in a rigorous way (e.g. Izumi et al., 2015) and to evaluate the models used for projections (e.g. Harrison et al., 2014Harrison et al., , 2015. More than 20 modelling groups took part in PMIP3 and many of the PMIP3 results are prominent in the fifth IPCC assessment report (Masson-Delmotte et al., 2013;Flato et al., 2013). PMIP3 also identified significant knowledge gaps and areas where progress is needed. PMIP4 has been designed to address these issues.
The five periods chosen for PMIP4-CMIP6 (Table 1) were selected because they contribute directly to the CMIP6 objectives; in particular, they address the key CMIP6 question "How does the Earth system respond to forcing?" (Eyring et al., 2016) for multiple forcings and climate states different from the current or historical climates. They are characterized by greenhouse gas concentrations, astronomical parameters, ice sheet extent and height, and volcanic and solar activities different from the current or historical ones (Table 2). This is consistent with the need to provide a large sample of the climate responses to important forcings. The choice of two new periods, the Last Interglacial (LIG, ∼ 127 000 years ago) and the mid-Pliocene Warm Period, was motivated by the desire to explore the relationships between the climateice sheet system and sea level and to expand analyses of climate sensitivity and polar amplification. For each target period, comparison with environmental observations and climate reconstructions enables us to determine whether the modelled responses are realistic, allowing PMIP to address the second key CMIP6 question "What are the origins and consequences of systematic model biases?". PMIP simulations and data-model comparisons will show whether or not the biases in the present-day simulations are found in other climate states. Also, analyses of PMIP simulations will show whether or not present-day biases have an impact on the magnitude of simulated climate changes. Finally, PMIP is also relevant to the third CMIP6 question "How can we assess future climate changes given climate variability, predictability, and uncertainties in scenarios?" because the simulation of the last millennium's climate includes more processes (e.g. volcanic and solar forcings) to describe natural climate variability than the piControl experiment.
The detailed justification of the experimental protocols and analysis plans for each period are given in a series of companion papers: Otto-Bliesner et al. (2017) for the mid-Holocene and lig127ka experiments, Kageyama et al. (2017) for the lgm, Jungclaus et al. (2017) for the past1000, and Haywood et al. (2016) for the midPliocene-eoi400 experiment. These papers also explain how the boundary conditions for each period should be implemented and include the description of sensitivity studies using the PMIP4-CMIP6 simulation as a reference. Here, we provide an overview of the PMIP4-CMIP6 simulations and highlight the scientific questions that will benefit from the CMIP6 environment. In Sect. 2, we give a summary of the PMIP4-CMIP6 periods and the associated forcings and boundary conditions. The analysis plan is outlined in Sect. 3. Critical points in the experimental set-up are briefly described in Sect. 4. A short conclusion is given in Sect. 5.

The PMIP4 experiments for CMIP6 and associated palaeoclimatic and palaeoenvironmental data
The choice of the climatic periods for CMIP6 is based on past PMIP experience and is justified by the need to address new scientific questions, while also allowing the evolution of the models and their ability to represent these climate states to be tracked across the different phases of PMIP (Table 1). The forcings and boundary conditions for each PMIP4-CMIP6 palaeoclimate simulation are summarized in Table 2. All the experiments can be run independently and have value for comparison to the CMIP6 DECK (Diagnostic, Evaluation and Characterization of Klima) and historical experiments (Eyring et al., 2016). They are therefore all considered Tier 1 within CMIP6. It is not mandatory for groups wishing to take part in PMIP4-CMIP6 to run all five PMIP4-CMIP6 experiments. It is however mandatory to run at least one of the two entry cards, i.e. the midHolocene or the lgm.
2.1 PMIP4-CMIP6 entry cards: the mid-Holocene (midHolocene) and Last Glacial Maximum (lgm) The MH and LGM periods are strongly contrasting climate states. The MH provides an opportunity to examine the response to orbitally induced changes in the seasonal and latitudinal distribution of insolation (Fig. 1). It is a period of strongly enhanced Northern Hemisphere summer monsoons, extra-tropical continental aridity, and much warmer summers. The LGM provides an opportunity to examine the climatic impact of changes in ice sheets, continental extent (land area is expanded relative to present due to the lower sea level, Fig. 2), and lower atmospheric greenhouse gas concentrations. It is a particularly relevant time period for understanding near-future climate change because the magnitude of the forcing and temperature response from the LGM to present is comparable to that projected from present to the end of the 21st century . Evaluation of the PMIP3-CMIP5 MH and LGM experiments has demonstrated that climate models simulate changes in the large-scale features that are governed by the energy and water balance reasonably well Li et al., 2013), including changes in landsea contrast and high-latitude amplification of temperature changes . These results confirm that the simulated relationships between large-scale patterns of temperature and precipitation change in future projections are credible. However, the PMIP3-CMIP5 simulations of MH and LGM climates show a limited ability to predict reconstructed patterns of climate change overall (Hargreaves et al., 2013;Hargreaves and Annan, 2014;Harrison et al., 2014Harrison et al., , 2015. At least in part, this likely arises from persistent problems in simulating regional climates. For example, stateof-the-art models cannot adequately reproduce the northward penetration of the African monsoon in response to the MH orbital forcing (Perez-Sanz et al., 2014;Pausata et al., 2016), which has been noted since PMIP1 (Joussaume et al., 1999).
While this likely reflects inadequate representation of feedbacks, model biases could also contribute to this mismatch (e.g. Zheng and Braconnot, 2013). Systematic benchmarking of the PMIP3-CMIP5 MH and LGM also shows that better performance in palaeoclimate simulations is not consistently related to better performance under modern conditions, stressing that the ability to simulate modern climate regimes and processes does not guarantee that a model will be good at simulating climate changes .
For PMIP4-CMIP6, we have modified the experimental design of the midHolocene and lgm experiments with the aim of obtaining more realistic representations of these climates (Table 2, Otto-Bliesner et al., 2017, for midHolocene and Kageyama et al., 2017, for lgm). One of these modifications is the inclusion of changes in atmospheric dust loading (Fig. 3), which can have a large effect on regional climate changes. For midHolocene, realistic values of the concentration of atmospheric CO 2 and other trace gases will be used . This makes this experiment more realistic than in PMIP3, where it was designed as a simple test of the effect of changes in insolation forcing. The PMIP3-CMIP5 lgm experiments considered a single ice sheet reconstruction (Abe-Ouchi et al., 2015), which was created by merging three ice sheet reconstructions that were then available. Two of those three reconstructions have since been updated, yet uncertainty about the geometry of the ice  LGM MH  Lisiecki and Raymo, 2005, scale on the left), and sea level (blue line, Rohling et al., 2014; blue shading, a density plot of 11 mid-Pliocene sea-level estimates from Dowsett and Cronin, 1990;Wardlaw and Quinn, 1991;Krantz, 1991;Raymo et al., 2009;Dwyer and Chandler, 2009;Naish and Wilson, 2009;Masson-Delmotte et al., 2013;Rohling et al., 2014;Dowsett et al., 2016. Scale on the right); (f) and ( (Kopp, et al., 2016, scale on the right); (i) CO 2 for the interval 3.0-3.3 Ma shown as a density plot of eight mid-Pliocene estimates (Raymo et al., 1996;Stap et al., 2016;Pagani et al., 2010;Seki et al., 2010;Tripati et al., 2009;Bartoli et al., 2011;Seki et al., 2010;Kurschner et al., 1996); (j, k) CO 2 measurements ( sheets at the Last Glacial Maximum remains. The protocol for the PMIP4-CMIP6 lgm simulations accounts for this uncertainty by permitting modellers to choose between the old PMIP3 ice sheet (Abe-Ouchi et al., 2015) or one of the two new reconstructions: ICE-6G_C (Argus et al., 2014;Peltier et al., 2015) and GLAC-1D (Tarasov et al., 2012;Briggs et al., 2014;Ivanovic et al., 2016). The impact of the ice sheet and dust forcings will specifically be tested in the lgm experiments by (i) using different ice sheet reconstructions for Tier 1 simulations ( Fig. 2), (ii) performing Tier 2 dust sensitivity experiments (Sect. 3.2.1, Kageyama et al., 2017), and (iii) performing Tier 2 individual forcing sensitivity experiments (Sect. 3.2.2, Kageyama et al., 2017). The inclusion of dust forcing in these simulations is new for PMIP4.

The last millennium (past1000)
The millennium prior to the industrial era, 850-1849 CE, provides a well-documented (e.g. PAGES2k-PMIP3 group, 2015) period of multi-decadal to multi-centennial changes in climate, with contrasting periods such as the Medieval Climate Anomaly and the Little Ice Age. This interval was characterized by variations in solar, volcanic, and orbital forcings ( Fig. 1), which acted under climatic background conditions similar to today. This interval provides a context for analysing earlier anthropogenic impacts (e.g. land-use changes) and the current warming due to increased atmospheric greenhouse gas concentrations. It also helps constrain the uncertainty in the future climate response to a sustained anthropogenic forcing.
The PMIP3-CMIP5 past1000 simulations show relatively good agreement with regional climate reconstructions for the Northern Hemisphere, but less agreement for the Southern Hemisphere. They also provided an assessment of climate variability on decadal and longer scales and information on predictability under forced and unforced condition experiments (Fernández-Donado et al., 2013). Single-model ensembles have provided improved understanding of the importance of internal versus forced variability and of the individual forcings when compared to reconstructions at both global and regional scales (Man et al., 2012Schurer et al., 2014;Otto-Bliesner et al., 2016). Other studies focused on the temperature difference between the warmest and coldest centennial or multi-centennial periods and their relationship with changes in external forcing, in particular variations in solar irradiance (e.g. Hind and Moberg, 2013).
The PMIP4-CMIP6 past1000 simulation (Jungclaus et al., 2017) builds on the DECK experiments, in particular the preindustrial control (piControl) simulation as an unforced reference, and the historical simulations (Eyring et al., 2016). The past1000 simulations provide initial conditions for historical simulations that can be considered superior to the pi-Control state, as they integrate information from the forcing history (e.g. large volcanic eruptions in the early 19th century). It is therefore mandatory to continue the past1000 simulations into the historical period when running this simulation. The PMIP4-CMIP6 past1000 protocol uses a new, more comprehensive reconstruction of volcanic forcing  and ensures a more continuous transition from the preindustrial past to the future. The final decisions resulted from strong interactions with the groups producing the different forcing fields for the historical simulations (Jungclaus et al., 2017).

The Last Interglacial (lig127k)
The Last Interglacial (ca. 130-115 ka before present) was characterized by a Northern Hemisphere insolation seasonal cycle that was even larger than for the mid-Holocene . This resulted in a strong amplification of high-latitude temperatures and reduced Arctic sea ice. Global mean sea level was at least 5 m higher than now for at least several thousand years (e.g. Dutton et al., 2015). Both the Greenland and Antarctic ice sheets contributed to this sea-level rise, making it an important period for testing our knowledge of climate-ice sheet interactions in warm climates. The availability of quantitative climate reconstructions for the Last Interglacial (e.g. Capron et al., 2014) makes it feasible to evaluate these simulations and assess regional climate changes.
Climate model simulations of the Last Interglacial, reviewed in the IPCC AR5 (Masson-Delmotte et al., 2013), varied in their forcings and were not necessarily made with the same model as the CMIP5 future projections. There are  (Hopcroft et al., 2015), and reconstructed from a global interpolation of palaeodust data (Lambert et al., 2015). large differences between simulated and reconstructed mean annual surface temperature anomalies compared to present, particularly for Greenland and the Southern Ocean, and in the temperature trends in transient experiments run for the whole interglacial Lunt et al., 2013). Part of this discrepancy stems from the fact that the climate reconstructions comprised the local maximum interglacial warming, and this was not globally synchronous, an issue which is addressed in the PMIP4-CMIP6 protocol.
The PMIP4-CMIP6 lig127k experiment will help to determine the interactions between a warmer climate (higher atmospheric and oceanic temperatures, changed precipitation, and changed surface mass and energy balance) and the ice sheets (specifically, their thermodynamics and dynamics). The major changes in the experimental protocol for lig127k, compared to the pre-industrial DECK experiment, are changes in the astronomical parameters and greenhouse gas concentrations (Table 2 and Otto-Bliesner et al., 2017). Meaningful analyses of these simulations are now possible because of the concerted effort to synchronize the chronologies of individual records and thus provide a spatial-temporal picture of Last Interglacial temperature change (Capron et al., 2014, and also to document the timing of the Greenland and Antarctic contributions to sea level (Winsor et al., 2012;Steig et al., 2015). Regional responses of tropical hydroclimate and polar sea ice to the climate forcing can be assessed and compared to the mid-Holocene. Outputs from the lig127k experiment will be used by ISMIP6 to force stand-alone ice sheet experiments (lastIntergacialforcedism) in order to quantify the potential sea-level change associated with this climate.

The mid-Pliocene Warm Period
(midPliocene-eoi400) The midPliocene-eoi400 experiment focuses on the last time in Earth's history when atmospheric CO 2 concentrations approached current values (∼ 400 ppmv) with a continental configuration similar to today (Table 2, Figs. 1, 2). Vegetation reconstructions indicate that there was less desert, and boreal forests were present in high northern latitude regions that are covered by tundra today (Salzmann et al., 2008). Climate model simulations from PlioMIP (concomitant with PMIP3) produced global mean surface air temperature anomalies ranging from +1.9 to +3.6 • C (relative to each model's pre-industrial control) and an enhanced hydrological cycle  with strengthened monsoons . These simulations also show that meridional temperature gradients were reduced (due to highlatitude warming), which has significant implications for the stability of polar ice sheets and sea level in the future (e.g. Miller et al., 2012). Model-data comparisons provide high confidence that mean surface temperature was warmer than pre-industrial temperature (Dowsett et al., 2012;Masson-Delmotte et al., 2013). However, as is the case for the Last Interglacial, the PlioMIP simulations were not always performed using the same models that were used in PMIP3-CMIP5. The PMIP4-CMIP6 midPliocene-eoi400 experiment  is designed to elucidate the longterm response of the climate system to a concentration of atmospheric CO 2 close to the present one: 400 ppm (long-term climate sensitivity or Earth system sensitivity). It will also be used to assess the response of ocean circulation, Arctic sea ice, modes of climate variability (e.g. El Niño-Southern Oscillation), the global hydrological cycle, and regional monsoon systems to elevated concentrations of atmospheric CO 2 . The simulations have the potential to be informative about which emission reduction scenarios are required to keep the increase in global annual mean temperatures below 2 • C by 2100 CE. Boundary conditions (Table 2) include modifications to global ice distributions (Fig. 2), topography/bathymetry, vegetation, and CO 2 and are provided by the US Geological Survey Pliocene Research and Synoptic Mapping Project (PRISM4: Dowsett et al., 2016).

Palaeoclimatic and palaeoenvironmental data for the PMIP4-CMIP6 periods
The choice of the time periods for the PMIP4-CMIP6 simulations has been made bearing in mind the availability of palaeoenvironmental and/or palaeoclimate reconstructions that can be used for model evaluation and diagnosis. Past environmental and climatic changes are typically documented at specific sites, whether on land, in ocean sediments or in corals, or from ice cores. The evaluation of climate simulations such as those conducted for PMIP4-CMIP6 requires these palaeoclimatic and palaeoenvironmental data to be synthesized for specific time periods. A major challenge in building such syntheses is to synchronize the chronologies of the different records. There are many syntheses of information on past climates and environments. Table 3 lists some of the sources of quantitative reconstructions for the PMIP4-CMIP6 time periods, but it is not our goal to provide an extensive review of these resources here.
Much of the information on palaeoclimates comes from the impact of climatic changes on the environment, such as on fires, dust, marine microfauna, and vegetation. Past climatic information is also contained in isotopic ratios of oxygen and carbon, which can be found in ice sheets, speleothems, or the shells of marine organisms. Ocean circulation can be documented by geochemical tracers in marine sediments from the sea floor (e.g. 14 C, δ 13 C, 231 Pa / 230 Th, ε Nd ). The fact that these physical, chemical, or biological indicators are indirect records of the state of the climate system and can also be sensitive to other factors (for example, vegetation is affected by atmospheric CO 2 concentrations) has to be taken into account in model-data comparisons. Compar-isons with climate model output can therefore be performed from different points of view: either the climate model output can be directly compared to reconstructions of past climate variables or the response of the climatic indicator itself can be simulated from climate model output and compared to the climate indicator. Such "forward" models include dynamical vegetation models, tree ring models, or models computing the growth of foraminifera, for which specific output is needed (cf. Sect. 4.3). Some palaeoclimatic indicators, such as meteoric water isotopes and vegetation, are computed by the climate model as it is running and are also examples of this forward modelling approach. Modelling the impacts of past climate changes on the environment is key to understanding how climatic signals are transmitted to past climate records. It also provides an opportunity to test the types of models that are used in the assessment of the impacts of future climate changes on the environment.

Analysing the PMIP4-CMIP6 experiments
The community using PMIP simulations is very broad, from climate modellers and palaeoclimatologists to biologists studying recent changes in biodiversity and archaeologists studying potential impacts of past climate changes on human populations. Here, we highlight several topics of analyses that will benefit from the new experimental design and from using the full PMIP4-CMIP6 ensemble.

Comparisons with palaeoclimate and palaeoenvironemental reconstructions, benchmarking, and beyond
Model-data comparisons for each period will be one of the first tasks conducted after completion of these simulations.
One new feature common to all periods is that we will make use of the fact that modelling groups must also run the historical experiment, in addition to the piControl one. Indeed, existing palaeoclimate reconstructions have used different modern reference states (e.g. climatologies for different time intervals) for their calibration, and this has been shown to have an impact on the magnitude of changes reconstructed from climate indicators (e.g. Hessler et al., 2014). These reconstructed climatic changes were usually compared to simulated climate anomalies w.r.t. a piControl simulation because running the historical simulations was not systematic in previous phases of PMIP. This prevented investigation of the impact of these reference state assumptions on modeldata comparisons. More precisely, understanding the impact of the reference states is important for quantifying the uncertainties in interpretations of climate proxies and hence for evaluating model results.
The PMIP4-CMIP6 simulations make a unique contribution to CMIP6 because they enable us to evaluate model performance for different climates against palaeoclimatic re-constructions, and thus identify possible model biases or other problems (e.g. Hopcroft and Valdes, 2015a). An ensemble of metrics has already been developed for the PMIP3-CMIP5 midHolocene and lgm simulations (e.g. . Applied to the PMIP4-CMIP6 midHolocene and lgm "entry card" simulations, these will provide a rigorous assessment of model improvements compared to previous phases of PMIP. Furthermore, for the first time, thanks to the design of the PMIP4-CMIP6 experiments, we will be able to consider the impact of forcing uncertainties on simulated climate in the benchmarking. The benchmarking metrics will also be expanded to other periods and data sets so that systematic biases for different periods and for the present day can be compared. Benchmarking the ensemble of the PMIP4-CMIP6 simulations for all the periods will therefore allow quantification of the climate-state dependence of the model biases, a topic which is highly relevant for a better assessment of potential biases in the projected climates in CMIP6. In addition, it will be possible to analyse the potential relationships between model biases in different regions and/or in different variables (such as temperature vs. hydrological cycle) across the PMIP ensemble, as well as for the recent climate. One further objective for the PMIP4-CMIP6 benchmarking will be to develop more process-oriented metrics, making use of the fact that palaeoclimatic data document different aspects of climate change. There are many aspects of the climate system that are difficult to measure directly, and which are therefore difficult to evaluate using traditional methods. The "emergent constraint" (e.g. Sherwood et al., 2014) concept, which is based on identifying a relationship with a more easily measurable variable, has been successfully used by the carbon-cycle and modern climate communities and holds great potential for the analysis of palaeoclimate simulations. Using multiple time periods to examine "emergent" constraints will ensure that they are robust across climate states.

Analysing the response of the climate system to multiple forcings
Multi-period analyses provide a way of determining whether systematic model biases affect the overall response and the strength of feedbacks independently of climate state. One challenge will be to develop new approaches to analyse the PMIP4-CMIP6 ensemble so as to separate the impacts of model structure (including choice of resolution, parameterizations, and complexity) on the simulated climate. Similarly, the uncertainties in boundary conditions will be addressed for periods with proposed alternative forcings. Quantifying the role of forcings and feedbacks in creating climates different from today has been a focus of PMIP for many years. Many CMIP6 models will include representations of new forcings, such as dust, or improved representations of major radiative feedback processes, such as those related to clouds. This will allow a broader analysis of feed-backs than was possible in PMIP3-CMIP5. We will evaluate the impact of these new processes and improved realizations of key forcings on climates at global, large (e.g. polar amplification, land-sea contrast), as well as regional scales, together with the mechanisms explaining these impacts. A particular emphasis will be put on the modulation of the climate response to a given forcing by the background climate state and how it affects changes in cloud feedbacks, snow and ice sheets (such as in e.g. Yoshimori et al., 2011), vegetation, and ocean deep water formation. Identification of similarities between past climates and future climate projections such as that found for land-sea contrast or polar amplification Masson-Delmotte et al., 2006), or for snow and cloud feedbacks for particular seasons , will be used to provide better understanding of the relationships between patterns and timescales of external forcings and patterns and timing of the climate responses.
These analyses should provide new constraints on climate sensitivity. Previous attempts that used information about the LGM period have been hampered by the fact that there were too few lgm experiments to draw statistically robust conclusions (Crucifix et al., 2006;Hargreaves et al., 2012;Harrison et al., 2014;Hopcroft and Valdes, 2015b). These attempts also ignored uncertainties in forcings and boundary conditions. PMIP4-CMIP6 is expected to result in a much larger ensemble of lgm experiments. The issue of climate sensitivity and Earth system sensitivity (PALEOSENS Project Members, 2012) will also be examined through joint analysis of multiple palaeoclimate simulations and climate reconstructions from different archives.
The PMIP4-CMIP6 ensemble will allow new analyses of the impact of smaller (mPWP) or larger (LGM) ice sheets. The ocean and sea ice feedbacks will also be analysed. The representation of sea ice and Southern Ocean circulation proved to be problematic in previous simulations of colder (LGM, Roche et al., 2012) and warmer climates (LIG, Bakker et al., 2013;Lunt et al., 2013) and we are eager to analyse improved models for this area which is key for atmosphere-ocean carbon exchanges. For the LGM, there is evidence of a shallower and yet active overturning circulation in the North Atlantic (e.g. Lynch-Stieglitz et al., 2007;Böhm et al., 2015). Understanding this oceanic circulation for the LGM and the other PMIP4 periods, as well as its links to surface climate, is a topic of high importance since the Atlantic Meridional Overturning Circulation could modulate future climate changes at least in regions around the North Atlantic. The PMIP4 multi-period ensemble, for which we require improved simulations in terms of spin-up, will strengthen the analyses for this particular topic compared to previous phases of PMIP (Marzocchi and Jansen, 2017).
Multi-period analyses will also be useful for understanding the relationship between mean climate state and modes of natural variability (e.g. Liu et al., 2014;Saint-Lu et al., 2015). Analyses of multiple long simulations with different forcings should provide a better understanding of changes in ENSO behaviour (Zheng et al., 2008;An and Choi, 2014) and help determine whether state-of-the-art climate models underestimate low-frequency variability (Laepple and Huybers, 2014). Analyses will focus on how models reproduce the relationship between changes in seasonality and interannual variability (Emile-Geay et al., 2016), the diversity of El Niño events (Capotondi et al., 2015;Karamperidou et al., 2015;Luan et al., 2015), and the stability of teleconnections within the climate system (e.g. Gallant et al., 2013;Batehup et al., 2015).

Interactions with other CMIP6 MIPs and the WCRP Grand Challenges
PMIP has already developed strong links with several other CMIP6 MIPs (Table 4). CFMIP includes an idealized experiment that allows the investigation of cloud feedbacks and associated circulation changes in a colder versus warmer world. This will assist in disentangling the processes at work in the PMIP4 simulations. We have also required CFMIP-specific output to be implemented in the PMIP4-CMIP6 simulations so that the same analyses can be carried out for both the PMIP4 and CFMIP simulations. This will ensure that the simulated cloud feedbacks in different past and future climates can be directly compared.
Interactions between PMIP and other CMIP6 MIPs have mutual benefits: PMIP provides (i) simulations of large climate changes that have occurred in the past and (ii) evaluation tools that capitalize on extensive data syntheses, while other MIPs will employ diagnostics and analyses that will be useful for analysing the PMIP4 experiments. We are eager to settle collaborations with the CMIP6 MIPs listed in Table 4 and have ensured that all the outputs necessary for the application of common diagnostics between PMIP and these MIPs will be available (see Sect. 4.3). Links with CFMIP and ISMIP6 mean that PMIP will contribute to the World Climate Research Programme (WCRP) Grand Challenges on "Clouds, Circulation and Climate Sensitivity" and "Cryosphere and Sea Level" respectively. PMIP will provide input to the WCRP Grand Challenge on "Regional Climate Information", through a focus on evaluating the mechanisms of regional climate change in the past.
4 Model configuration, experimental set-up, documentation, and required output.
To achieve the PMIP4 goals and benefit from other simulations in CMIP6, particular care must be taken with model versions and the implementation of the experimental protocols. Here we summarize the guidelines that are common to all the experiments, focusing on the requirements to ensure strict consistency between CMIP6 and PMIP4 experiments. These concern model complexity, forcings, and mineral dust, which is a new feature in the PMIP4 experiments. This section also provides guidelines for the documentation and required output. The reader is referred to the PMIP4 companion papers on the specific periods for details of the set-up of each PMIP4-CMIP6 experiment.

Model version, set-up, and common design of all PMIP4-CMIP6 experiments
The climate models taking part in CMIP6 are very diverse: some represent solely the physics of the climate system, some include the carbon cycle and other biogeochemical cycles, and some include interactive natural vegetation and/or interactive dust cycle/aerosols. It is mandatory that the model version used for the PMIP4-CMIP6 experiments is exactly the same as for the DECK and historical simulations. It is highly preferable that it is also exactly the same as for any other CMIP6 experiments, for ease and robustness of comparison between the MIPs. The experimental set-up for each simulation is based on the DECK pre-industrial control (pi-Control) experiment (Eyring et al., 2016), i.e. the piControl forcings and boundary conditions are modified to obtain the forcings and boundary conditions necessary for each PMIP4-CMIP6 palaeoclimate experiment (Table 2). No additional interactive component should be included in the model unless it is already included in the DECK version. Such changes would prevent rigorous analyses of the responses to forcings across multiple time periods or between MIPs (Sect. 3) because the differences between the experiments could then arise from both the models' characteristics and the response to changes in external forcings. Adding an interactive component usually affects the piControl simulation as well as simulations of past climates (Braconnot et al., 2007), so it is very important that experiments for PMIP4-CMIP6 and DECK are run with exactly the same model version.
Because of this, even though environmental records show that natural vegetation patterns during each of the PMIP4-CMIP6 periods were different from today, the PMIP4-CMIP6 palaeoclimate simulations should use the same model configuration as the DECK and historical simulations. If the DECK and historical simulations use dynamic vegetation, then the PMIP4-CMIP6 palaeoclimate simulations should do so too. If the DECK and historical simulations use prescribed vegetation, then the same vegetation should be prescribed in the PMIP4-CMIP6 palaeoclimate simulations. One exception to this is the midPliocene-eoi400 experiment, where models that prescribe vegetation in the DECK and historical simulations should prescribe the mid-Pliocene vegetation . The other exception is for models including an interactive dust cycle for the LGM, which should impose vegetation that allows dust emissions over LGM dust emission regions. Sensitivity experiments to prescribed vegetation are encouraged for each period, as is described in the companion papers. Table 3. Examples of data syntheses for the PMIP4-CMIP6 periods. MAT: mean annual temperature; MAP: mean annual precipitation; α: ratio of the actual evaporation over potential evaporation; MTCO: mean temperature of the coldest month; MTWA: mean temperature of the warmest month; SST: sea-surface temperature. Two experiments, lgm and midPliocene-eoi400, require modified ice sheets (Fig. 2), which also implies consistent modification of the coastlines, ocean bathymetry (if feasible for midPliocene-eoi400), topography, and land surface types over the continents, and to ensure that rivers reach the ocean in order to close the global freshwater budget. The initial global mean ocean salinity should be adjusted for these ice volume changes and modelling groups are advised to ensure that the total mass of the atmosphere remains the same in all experiments.
For each experiment, the greenhouse gases and astronomical parameters should be modified from the DECK piControl experiment (Table 2). Spin-up procedures will differ according to the model and type of simulation, but the spin-up should be long enough to avoid significant drift in the analysed data. Initial conditions for the spin-up can be taken from an existing simulation. The model should be run until the absolute value of the trend in global mean sea-surface temperature is less than 0.05 K per century and the Atlantic Meridional Overturning Circulation is stable. A parallel requirement for carbon-cycle models and/or models with dynamic vegetation is that the 100-year mean global carbon uptake or release by the biosphere is < 0.01 Pg C yr −1 .

A new feature of the PMIP simulations: mineral dust
Natural aerosols show large variations on glacial-interglacial timescales, with glacial climates having higher dust loadings than interglacial climates (Kohfeld and Harrison, 2001;Maher et al., 2010). Dust emissions from northern Africa were significantly reduced during the MH (McGee et al., 2013). As is the case with vegetation, the treatment of dust in the midHolocene, lig127k, and lgm simulations should parallel the treatment in the piControl. However, for models with interactive dust schemes, maps of soil erodibility that account for changes in the extension of possible dust sources are provided for the midHolocene, lig127k, and lgm experiments. Dust anomalies/ratios compared to the pre-industrial background should be used for consistency with the DECK pi-Control simulation. As there have been instances of runaway climate-vegetation-dust feedback, leading to unrealistically cold LGM climates (Hopcroft and Valdes, 2015a), it is advis- Dedicated common idealized sensitivity experiment to be run in aquaplanet set-up, AMIPminus4K, to be coanalysed in CF-MIP and PMIP.
ISMIP6 (Nowicki et al., 2016) Ice Sheet Model Intercomparison Project for CMIP6 Assessment of the climate and cryosphere interactions and the sea-level changes associated with large ice sheets. In particular, the lig127k simulation will be used to force ice sheet models in ISMIP6. Additional experiments co-designed by the PMIP and ISMIP groups are foreseen outside the CMIP6 exercise: transient interglacial experiments, with climate model output forcing an ice sheet model, and coupled climate-ice sheet experiments.
OMIP (Griffies et al., 2016) Ocean Model Intercomparison Project Mutual assessment of the role of the ocean in lowfrequency variability, e.g. multi-decadal changes in ocean heat content or heat transport. Provide initial conditions for the ocean, including long-term forcing history.
SIMIP (Notz et al., 2016) Sea Ice Model Intercomparison Project Assessment of the role of sea ice in climate changes.
AerChemMIP (Collins et al., 2017) Aerosols and Chemistry Model Intercomparison Project Assessment of the role of aerosols in climate changes (very helpful since this is a new aspect in PMIP experiments for the mid-Holocene, Last Interglacial, and LGM). C4MIP  Coupled Climate Carbon Cycle Model Intercomparison Project Assessment of carbon-cycle evolution and feedbacks between sub-components of the Earth system. Evaluation of palaeo-reconstructions of carbon storage.
LUMIP  Land-Use Model Intercomparison Project Analysis of climate changes associated with land-use changes (past1000 experiment) VolMIP (Zanchettin et al., 2016) Volcanic Forcings Model Intercomparison Project Analysis of specific volcanic events (very useful for critical analysis of past1000 simulations). VolMIP will systematically assess uncertainties in the climate response to volcanic forcing, whereas past1000 simulations describe the climate response to volcanic forcing in long transient simulations where related uncertainties are caused by chosen input data for volcanic forcing: mutual assessment of forced response.
DAMIP (Gillett et al., 2016) Detection and Attribution Model Intercomparison Project past1000 simulations provide a long-term reference background, including natural climate variability for detection and attribution.

Model Intercomparison Project
Compare radiative forcing from LGM GHG as computed by climate models and by offline fine-scale radiative transfer codes.
able to test the atmosphere model behaviour before running the fully coupled lgm simulation.
To allow experiments with prescribed dust changes, threedimensional monthly climatologies of dust atmospheric mass concentrations are provided for the piControl, midHolocene, and lgm. These are based on two different models (Albani et al., 2014(Albani et al., , 2015(Albani et al., , 2016Hopcroft et al., 2015, Fig. 3) and modelling groups are free to choose between these data sets. Additional dust-related fields (dust emission flux, dust load, dust aerosol optical thickness, short-and long-wave radiation, surface and top of the atmosphere dust radiative forcing) are also available from these simulations. Implementation should follow the same procedure as for the historical experiment. The implementation for the lig127k experiment should use the same data set as for the midHolocene one. Since dust plays an important role in ocean biogeochemistry (e.g. Kohfeld et al., 2005), three dust maps are provided for the lgm experiment. Two of these are consistent with the climatologies of dust atmospheric mass concentrations; the other is primarily derived from palaeoenvironmental observations (Lambert et al., 2015, Fig. 3). The modelling groups should use consistent data sets for the atmosphere and the ocean biogeochemistry. The Lambert et al. (2015) data set can therefore be used for models that cannot include the changes in atmospheric dust according to the other two data sets. information about the initial conditions and spin-up technique used. A measure of the changes in key variables (Table 5) should be provided in order to assess remaining drift.

Documentation and required model
Documentation should be provided via the ESDOC website and tools provided by CMIP6 (http://es-doc.org/) to facilitate communication with other CMIP6 MIPs. This documentation should also be provided for the PMIP4 website to facili-tate linkages with non-CMIP6 simulations. The PMIP4 special issue, shared between Geoscientific Model Development and Climate of the Past, provides a further opportunity for modelling groups to document specific aspects of their simulations. We also require the groups to document the spin-up phase of the simulations by saving a limited set of variables during this phase (Table 5). The data stored in the CMIP6 database should be representative of the equilibrium climates of the MH, LGM, LIG, and mPWP periods, and of the transient evolution of climate between 850 and 1849 CE for the past1000 simulations. A minimum of 100 years' output is required for the equilibrium simulations but, given the increasing interest in analysing multi-decadal variability (e.g. Wittenberg, 2009), modelling groups are encouraged to provide outputs for 500 years or more if possible. Daily values should also be provided and will allow the calendar issue to be accounted for (see Appendix). The list of variables required to analyse the PMIP4-CMIP6 palaeoclimate experiments (https://wiki.lsce.ipsl.fr/ pmip3/doku.php/pmip3:wg:db:cmip6request) reflects plans for multi-time period analyses and for interactions with other CMIP6 MIPs. We have included relevant variables from the data requests of other MIPs, including the CFMIP-specific diagnostics on cloud forcing, as well as land surface, snow, ocean, sea ice, aerosol, carbon cycle, and ice sheet variables from LS3MIP, OMIP, SIMIP, AerChemMIP, C4MIP, and IS-MIP6 respectively. Some of these variables are also required to diagnose how climate signals are recorded by palaeoclimatic sensors via models of tree growth , vegetation dynamics , or marine planktonic foraminifera (e.g. Lombard et al., 2011;, for example. The only set of variables defined specifically for PMIP are those describing oxygen isotopes in the climate system. Isotopes are widely used for palaeoclimatic reconstruction and are explicitly simulated in several models. We have asked that mean annual cycles of key variables are included in the PMIP4-CMIP6 data request for equilibrium simulations, as these proved exceptionally useful for analyses in PMIP3-CMIP5.

Conclusions
The PMIP4-CMIP6 simulations provide a framework to compare current and future anthropogenic climate change with past natural variations of the Earth's climate. PMIP4-CMIP6 is a unique opportunity to simulate past climates with exactly the same models as are used for simulations of the future. This approach is only valid if the model versions and implementation of boundary conditions are consistent for all periods, and if these boundary conditions are seamless for overlapping periods.
PMIP4-CMIP6 simulations are important in terms of model evaluation for climate states significantly different from the present and historical climates. We have chosen  Figure 4. The PMIP4-CMIP6 experiments in the framework of CMIP6 (a), with associated MIPs; and in the framework of PMIP4, with its working groups (b). climatic periods well documented by palaeoclimatic and palaeoenvironmental records, with climate and environmental changes relevant for the study and projections of future climate changes: the millennium prior to the industrial epoch (past1000), 6000 years ago (midHolocene), the Last Glacial Maximum (lgm), the Last Interglacial (lig127k), and the mid-Pliocene Warm Period (midPliocene-eoi400).
The PMIP4-CMIP6 experiments will also constitute reference simulations for projects developed in the broader PMIP4 initiative. The corresponding sensitivity experiments, or additional experiments, are embedded in the PMIP4 project and are described in the companion papers to this overview Otto-Bliesner et al., 2017;Jungclaus et al., 2017;Kageyama et al., 2017). They are essential for a deeper understanding of the drivers of past climate changes for the PMIP4-CMIP6 climates or as initial conditions for transient simulations (e.g. Ivanovic et al., 2016, for the last deglaciation; Otto-Bliesner et al., 2017, for the Last Interglacial and the Holocene), or for examining time periods from deeper time with high atmospheric CO 2 concentrations . Figure 4 summarizes the position of the PMIP4-CMIP6 experiments with respect to the other PMIP4 initiatives (right-hand side). The left-hand side of Fig. 4 shows how the PMIP4-CMIP6 experiments relate to the CMIP6 DECK and some other CMIP6 MIPs. PMIP4-CMIP6 experiments have been designed to be analysed by both communities.
The PMIP community anticipates major benefits from analysis techniques developed by the other CMIP6 MIPs, in particular in terms of learning about the processes of past climate changes in response to forcings (e.g. greenhouse gases, astronomical parameters, ice sheet and sea-level changes) as well as the role of feedbacks (e.g. clouds, ocean, sea ice). PMIP4-CMIP6 has the potential to be mutually beneficial for the palaeoclimate and present/future climate scientists to learn about natural large climate changes and the mechanisms at work in the climate system for climate states that are different from today, as future climate is projected to be.
Data availability. All data mentioned in the present paper can be downloaded following the instructions given in the companion PMIP4-CMIP6 protocol papers Jungclaus et al., 2017;Kageyama et al., 2017;Haywood et al., 2016).
Appendix A: Justification of the requirement to save high-frequency output (daily and 6-hourly) Variations in the shape of the Earth's orbit govern the latitudinal and seasonal distribution of insolation, and also produce variations in the lengths of individual "months" (where months are defined alternatively as either (a) the duration in days for the Earth to complete one-twelfth of its orbit, i.e. the "celestial" or "angular" calendar; or (b) a specific number of days, for example 31 days in January, 30 days in June, i.e. the "conventional" or "modern" calendar). For example, at 6 ka, perihelion occurs in August and aphelion occurs in February. Those months were approximately 1.5 days shorter and longer than at present respectively (Fig. A1). The effect of the changing calendar on the calculation of long-term means can be as large as the potential differences among the means themselves (Joussaume and Braconnot, 1997;Pollard and Reusch, 2002;Timm et al., 2008;Chen et al., 2011). Therefore, variations in the lengths of months (or seasons) must be taken into consideration when examining experiment-minus-control long-term mean differences. The size of the potential calendar effect (or "bias") is illustrated in Fig. A1, and is even larger for lig127k, when eccentricity was large. This figure shows the difference between present-day long-term means for October temperature and precipitation, and those calculated using the appropriate celestial month lengths for 6 and 127 ka. Modifications to month length have not usually been taken into account in the post-processing procedures for model output (but see Harrison et al., 2014). An approach to deal with the calendar issue is to use bias-correction, such as that of Pollard and Reusch (2002), with the mean-preserving daily interpolation approach of Epstein (1991). For the PMIP4-CMIP6 simulations, we strongly recommend that daily data are provided for the calculation of monthly or seasonal means, and so we include those in the PMIP4-CMIP6 data request for some key variables. Daily or 6-hourly data are also useful for running regional models. It is important to test the use of regional models for climate model projections at the regional scale. Regional models are also used to produce fine-scale palaeoclimate scenarios for use by the impact community, for example, to study past climate impacts on biodiversity via ecological niche modelling. Palaeoclimate indicators often respond to climate features not adequately captured with monthly data alone (such as growing season length). Daily weather variables are therefore required for some forward models, as well as for the computation of bioclimatic variables that are reconstructed from pollen data, for example (e.g. Bartlein et al., 2011). Figure A1. The calendar effect: (a) month-length anomalies, 140 ka to present, with the PMIP4 experiment times indicated by vertical lines. The month-length anomalies were calculated using the formulation in Kutzbach and Gallimore (1988). (b, c) The calendar effect on October temperature at 6 and 127 ka, calculated using Climate Forecast System Reanalysis near-surface air temperature (https://www.earthsystemcog. org/projects/obs4mips/), 1981-2010 long-term means, and assuming the long-term mean differences in temperature are zero everywhere.
(e, f) The calendar effect on October precipitation at 6 and 127 ka, calculated using the CPC Merged Analysis of Precipitation (CMAP) enhanced precipitation (http://www.esrl.noaa.gov/psd/data/gridded/data.cmap.html), 1981-2010 long-term means, and again assuming that the long-term mean differences in precipitation are zero everywhere. Calendar effects were calculated by interpolating present-day monthly temperature or precipitation to a daily time step as in Pollard and Reusch (2002, but using a mean-preserving algorithm for pseudo-daily interpolation for monthly values; Epstein, 1991), and then recalculating the monthly means using the appropriate palaeo-calendar (Bartlein and Shafer, 2016). Note that the 6 and 127 ka map patterns for both variables, while broadly similar, are not simply rescaled versions of one another.
Author contributions. MK, PB, and SH organized the main text. The text for each period was initially provided by the leaders of the companion papers, who collated and summarized the contribution of their groups: JJ for the last millennium, BOB for the interglacials, MK for the Last Glacial Maximum, and AH for the mid-Pliocene Warm Period. All authors agreed on the experimental design and contributed text on the background, forcing data sets, or topics in the analysis plan. In addition, the consistency of the forcing data sets for all periods was carefully checked by the contributors of the data sets. The Appendix was written and illustrated by PJB. The final text was refined by RI.
Competing interests. The authors declare that they have no conflict of interest.