PMIP4­CMIP6: the contribution of the Paleoclimate Modelling Intercomparison Project to CMIP6 the contribution of the Paleoclimate Modelling Intercomparison Project to CMIP6. Geoscientific Model

. The goal of the Palaeoclimate Modelling Intercomparison Project (PMIP) is to understand the response of the climate system to changes in different climate forcings and to feedbacks. Through comparison with observations of the environmental impacts of these climate changes, or with climate reconstructions based on physical, 5 chemical or biological records, PMIP also addresses the issue of how well state-of-the-art models simulate climate changes. Palaeoclimate states are radically different from those of the recent past documented by the instrumental record and thus provide an out-of-sample test of the models used for future climate projections and a way to assess whether they have the correct sensitivity to forcings and feedbacks. Five distinctly different periods have been selected as focus for the core palaeoclimate experiments that are designed to contribute to the 10 objectives of the sixth phase of the Coupled Model Intercomparison Project (CMIP6). This manuscript describes the motivation for the choice of these periods and the design of the numerical experiments, with a focus upon their novel features compared to the experiments performed in previous phases of PMIP and CMIP as well as the benefits of common analyses of the models across multiple climate states. It also describes the information needed to document each experiment and the model outputs required for analysis and benchmarking. 15


Abstract.
The goal of the Palaeoclimate Modelling Intercomparison Project (PMIP) is to understand the response of the climate system to changes in different climate forcings and to feedbacks. Through comparison with observations of the environmental impacts of these climate changes, or with climate reconstructions based on physical, 5 chemical or biological records, PMIP also addresses the issue of how well state-of-the-art models simulate climate changes. Palaeoclimate states are radically different from those of the recent past documented by the instrumental record and thus provide an out-of-sample test of the models used for future climate projections and a way to assess whether they have the correct sensitivity to forcings and feedbacks. Five distinctly different periods have been selected as focus for the core palaeoclimate experiments that are designed to contribute to the 10 objectives of the sixth phase of the Coupled Model Intercomparison Project (CMIP6). This manuscript describes the motivation for the choice of these periods and the design of the numerical experiments, with a focus upon their novel features compared to the experiments performed in previous phases of PMIP and CMIP as well as the benefits of common analyses of the models across multiple climate states. It also describes the information needed to document each experiment and the model outputs required for analysis and benchmarking. 15 1 Introduction

Why model paleoclimates?
Instrumental meteorological and oceanographic data, available for the period extending from the middle of the 19 th century, describe the manner in which Earth's surface climate has evolved since the beginning of the industrial revolution. These data show a global warming of ~0.85°C to have occurred since this time, a warming 5 that is more intense over land than over the oceans, and more intense at high latitudes compared to the tropics (Hartmann et al, 2013, Sutton et al, 2007. This recent climate change has been substantially controlled by the increase of atmospheric greenhouse gases due to human activities, amplified by the action of feedbacks associated with atmospheric water vapor and clouds (e.g. Dufresne and Bony, 2008), the albedos of snow and ice, with changes in the land cover or in ocean properties and circulation (Cubasch et al, 2013). This process-10 based understanding of the climate system is embedded within the climate models used to project changes in future climates. The skill of these climate models is most commonly evaluated in comparison to the present climate and climate change since the pre-industrial age (1850 CE). However concentrations of atmospheric greenhouse gases are projected to increase significantly during the 21 st century, reaching levels well outside the range of recent millennia. Thus, in making future projections, models are operating well outside the conditions 15 for which they have been validated. The credibility of climate projections needs to be assessed using information on longer-term palaeoclimate changes, particularly for intervals when the climate change compared to present was as large as the anticipated future change.
We have to look back several million years to find a period of Earth's history when atmospheric CO 2 20 concentrations were similar to the present day (the mid-Pliocene warm period, 3.2 million years ago) and several tens of million years (e.g. the early Eocene, ~55 to 50 million years ago) for much higher levels. During these ancient periods, topography, bathymetry, land-ocean distributions and/or ice sheets were different from today, and the mechanisms for increasing atmospheric CO 2 were likely much slower than anthropogenic fossil fuel emissions. However, although these periods are not perfectly analogous to the future, they offer key insight into 25 climate processes that operate in a higher CO 2 , warmer world (e.g. Lunt et al, 2010, Caballero and Huber, 2010. On the other hand, the main drivers of climatic changes in Earth's most recent period, the Quaternary (2.5 million years ago to present), are the astronomical parameters driving the seasonal and latitudinal distribution of incoming solar energy, as well as greenhouse gas fluctuations, with levels much lower than present. During this period, the Earth's geography was more similar to today and some of the more rapid climate transitions that took 30 place occurred on human-relevant timescales (decades to centuries; e.g. Marcott et al, 2014, Steffensen et al, 2008. By combining several past periods, we can provide a broad picture of the climate response to external forcings, and to benefit from the rich resource of paleoclimates and paleoenvironments. There are numerous palaeoclimate records documenting the evolution of Earth's climate before instrumental 35 records (Masson-Delmotte et al, 2013). Some of these records are based on physical and chemical properties of the atmosphere, vegetation and ocean; such as oxygen and carbon isotopes, which have been preserved in various geological archives such as ice, speleothems or microscopic plankton shells (e.g. Caley et al, 2014, for a model-isotopic data comparison). Other records, such as changes in marine and terrestrial floral and faunal assemblages and distributions (MARGO Project Members, 2009;Prentice et al., 2000) for changes in and 40 surface hydrology and water storage (Kohfeld and Harrison 2000), reflect the impact of climate changes on the ambient environment but can be used to reconstruct climate parameters either qualitatively or statistically (e.g. MARGO Project Members, 2009;Bartlein et al., 2011). Overall, there is a wealth of palaeoclimatic and palaeoenvironmental data showing large variations in the Earth's climate prior to the industrial era, commensurate with the magnitude of projected changes in the future. 5 Replicating the totality of those climate changes with state-of-the-art climate models is a challenge (Braconnot et al, 2012. It is challenging, for example, to represent the correct amplitude of past climate changes such as glacial-interglacial temperature differences (e.g. the temperatures at the Last Glacial Maximum, ~21,000 years ago, vs. the pre-industrial temperatures, cf. Harrison et al., 2014) or the correct spatial patterns 10 such as the northward extension of the African monsoon during the mid-Holocene, ~ 6,000 years ago (Perez-Sanz et al., 2014). Interpreting palaeoenvironmental data can also be challenging, and in particular disentangling the relationships between changes in large-scale atmospheric or oceanic circulation, broad-scale regional climates and local environmental responses to these changes. This challenge is paralleled by concerns about future local or regional climate changes and their impact on the environment. Modelling palaeoclimates is 15 therefore a means to understand past climate and environmental changes better, using physically based tools, as well as a means to evaluate model skill in forecasting the responses to major drivers.

The Palaeoclimate Modelling Intercomparison Project (PMIP)
The Palaeoclimate Modelling Intercomparison Project (PMIP) was established in the 1990's in order to understand the mechanisms of past climate changes, in particular the role of the different climate feedbacks, and 20 to evaluate how well climate models used for climate projections simulate well-documented climates outside the range of present and recent climate variability. To achieve these goals, PMIP has actively fostered paleo-data syntheses, model-data comparisons and multi-model analyses. PMIP provides a forum for discussion of experimental design and appropriate techniques for comparing model results with palaeoclimatic reconstructions. 25 Since its initial phase the evolution of PMIP has closely followed model developments for the Atmospheric Model Intercomparison Project (AMIP) and then the Coupled Model Intercomparison project (CMIP). The initial focus was on the results from Atmospheric General Circulation Models (PMIP1, Joussaume and Taylor 1995) and was extended to coupled Atmosphere-Ocean General Circulation Models (AOGCMs) and AOGCMs 30 including representations of the carbon cycle feedbacks in PMIP2 (Braconnot et al, 2007) and PMIP3 . Two climatic periods have been a major focus in PMIP since its initial phase: the mid-Holocene (MH, ~6,000 years ago) and the Last Glacial Maximum (LGM, ~21,000 years ago). The rationale for studying the Last Glacial Maximum was to evaluate model performance in a well-documented cold climatic extreme and to examine the role of forcings and feedbacks in creating this climate state. The rationale for the 35 mid-Holocene was to evaluate and analyse the models during a period when the northern hemisphere was characterized by enhanced monsoons, extra-tropical continental aridity and much warmer summers. These two periods are considered as reference points for assessing the sensitivity of the climate system to changes in atmospheric CO 2 concentration and orbitally-induced changes in tropical circulation and the monsoons, 4 Geosci. Model Dev. Discuss., doi:10.5194/gmd-2016-106, 2016 Manuscript under review for journal Geosci. Model Dev. Published: 26 May 2016 c Author(s) 2016. CC-BY 3.0 License.
respectively . Evaluations of the simulations of these two periods made in successive phases of PMIP provide a unique overview of the evolution of the ability of climate model to reproduce large changes compared to today , Flato et al, 2013.
Palaeoclimate experiments were included for the first time in the ensemble of simulations made in during the 5 fifth phase of CMIP (Taylor et al, 2012). In addition to the MH and LGM simulations described above, transient simulations of the millennium prior to the industrial epoch (LM, 850-1850 CE) were also included in CMIP5 (Schmidt et al, 2011(Schmidt et al, , 2012, to study the mechanisms of decadal to centennial climate variability (natural variability vs. impact of solar, volcanic and anthropogenic forcings). Simulations of the LM have used models of varying complexity, evolving from energy balance models (e.g. Crowley, 2000), via Earth system models of 10 intermediate complexity (Goosse et al., 2005), to complex coupled atmosphere -ocean general circulation models (AOGCM, e.g. Gonzalez-Rouco et al., 2006) and Earth System Models that include components like the carbon cycle (Jungclaus et al., 2010). The focus in CMIP5 has been on coupled model evaluation based on a common protocol describing a variety of suitable forcing boundary conditions (Schmidt et al., 2011;. and process understanding (e.g. Lehner et al., 2013;Sicre et al., 2013;Jungclaus et al., 2014), including the 15 assessment of variability modes (e.g. Raible et al., 2014) and comparisons with reconstructions (e.g. Bothe et al., 2013;Fernandez-Donado et al., 2013). Single-model ensembles of simulations have provided an understanding of the importance of internal versus forced variability and the individual forcings when comparing to reconstructions Schurer et al., 2014;Otto-Bliesner et al., 2016). Thanks to this formal inclusion of the LM, MH and LGM simulations in the CMIP5 exercise, it was possible to compare the 20 mechanisms causing past and future climate changes in a rigorous way and evaluate of the models used for projections under very different climate states from the present one (e.g. . In its third phase, PMIP became an umbrella for analyses of other time periods and provided a framework for 25 analyses across multiple time periods. PlioMIP (Haywood et al., , 2011 coordinates climate model experiments for the mid-Pliocene Warm Period (mPWP, ca. 3.3 to 3 million years ago). The mPWP had CO 2 levels similar to today, but vegetation reconstructions (Salzmann et al., 2008) indicate that the area of deserts decreased and boreal forests replaced tundra. Climate model simulations produce global mean surface air temperature ranging from +1.9°C and +3.6°C (relative to each model's pre-industrial control) and an enhanced 30 hydrological cycle , with strengthened monsoons ). These simulations also show that meridional temperature gradients were reduced (due to high latitude warming), which has significant implications for the stability of polar ice sheets and sea level in the future (e.g. Miller et al. 2012).
PMIP3 also saw the initiation of comparison of available simulations and reconstruction for the last interglacial period ) and discussions about the ability of climate models to produce a rate of ice-sheet 35 melting in agreement with a global sea level at least 5m higher than now (Masson-Delmotte et al., 2013;Dutton et al., 2015). First discussions on transient simulations of climate behaviour, focusing on the last interglacial period and the last deglaciation (Ivanovic et al, 2015) were also initiated. A measure of the success of PMIP3 is provided by the number of participating groups (more than 20) and the fact that PMIP results were used for ten figures in the last IPCC report (Masson-Delmotte et al. 2013, Flato et al. 2013). However, the project also identified significant knowledge gaps and areas where progress is needed; PMIP4 has been designed to address these.

PMIP4 experiments in CMIP6 5
The design of PMIP4 simulations to be included as part of CMIP6 was built on the recognition that PMIP simulations naturally address the key CMIP6 question "How does the Earth System respond to forcing" for multiple forcings and in climates states very different from the current or historical climates. Comparisons with observations enable us to determine whether the modelled responses are realistic. PMIP also addresses key question 2 "What are the origins and consequences of systematic model biases?" PMIP simulations and data-10 model comparisons will show whether the biases in the present-day simulations are also found in other climate states. More importantly, analyses of PMIP simulations will show whether present-day biases have an impact on the magnitude of simulated climate changes. Finally, PMIP is also relevant for question 3 "How can we assess future climate changes given climate variability, predictability and uncertainties in scenarios?" through examination of these questions for documented past climate states and via the use of the last millennium 15 simulations as reference state for natural variability.  . All the experiments can be run independently and have value for comparison to the CMIP6 DECK and historical experiments. We have therefore given them equal priority, Tier 1, within CMIP6 35 (Table 1). It is not mandatory for groups wishing to take part in PMIP4-CMIP6 to run all five PMIP4-CMIP6 Tier 1 experiments. It is however mandatory to run at least one of the experiments that were run in previous phases of PMIP, i.e. the midHolocene or the lgm. These are considered as "entry cards" for participation in PMIP4-CMIP6.  Intercomparison of simulated responses to specific drivers across models are interesting as sensitivity 20 experiments, but the true power of PMIP is the connection to the observations which allows an assessment of model skill to be made. As the choice of these periods and of the experimental design was also motivated by the fact that model-observational comparisons are as essential to the project as the comparisons across the model ensemble, it is important to assess all the issues that might make those comparisons difficult. Uncertainties in the observations, or perhaps more broadly, in the inferences from those observations, are a key part of PMIP 25 analyses, as is the structural uncertainty across the model responses. Both of these factors have been part of the PMIP approach from the beginning. What has only recently become more apparent is the importance of understanding the uncertainty in the drivers themselves. This encompasses time-uncertainty for reconstructions (i.e. what are the appropriate orbital parameters to use for the last interglacial or mid-Pliocene?) as well as structural uncertainty in the boundary conditions applied (e.g. in the continental reconstructions, ice sheet height 30 and extent, vegetation cover), or the transient forcings (for instance in the last millennium simulations for solar, volcanic aerosol or land use/land cover change). Different reconstructions of these aspects have clear differences that can impact assessment of model skill. Attitudes to this do vary across the author team, and compromises have had to be made in the experimental designs in the lgm and past1000 experiments, alternative forcings are thus possible. 35 In section 2, we give more background on these periods and the associated forcings and boundary conditions. The experimental set-up of the experiments is described in section 3. The analysis plan is outlined in Section 4.
A short conclusion is given in section 5.

The PMIP4-CMIP6 simulations 40
2.1 PMIP4-CMIP6 entry cards: the mid-Holocene (midHolocene) and last glacial maximum (lgm) As discussed above, the MH and the LGM provide examples of strongly contrasted climate states ( Figure 1, Table 1). There are extensive syntheses of marine and terrestrial data for both intervals, documenting environmental responses to changing climate. The MH provides an opportunity to examine the response to orbitally-induced changes in the seasonal and latitudinal distribution of insolation. The LGM provides an opportunity to examine the impact of changes in ice sheets, land-sea distribution and greenhouse gases on climate. The LGM is particularly relevant because the forcing and temperature response was as large as (although of opposite sign) to that projected for the end of the 21st century. Both periods constitute test cases for our understanding of mechanisms of climate change, such as the interplay between circulation changes and 5 radiation/cloud changes, the respective strengths of feedbacks from different components of the climate system, and for our understanding of the connections between global and regional climate changes. Because these periods have been studied in earlier phases of PMIP, they provide the opportunity to evaluate whether increased model resolution and complexity has led to improvement in the representation of circulation patterns and in the fidelity of regional climate changes. 10 Evaluation of the PMIP3-CMIP5 MH and LGM experiments has demonstrated that climate models simulate changes in large-scale features of climate that are governed by the energy and water balance reasonably well, including changes in land-sea contrast ( Figure 2a) and high-latitude amplification of temperature changes Izumi et al., 2015). They also simulate the scaling of precipitation changes with respect to 15 temperature changes at a hemispheric scale realistically (Li et al., 2013). Thus, evaluation of the PMIP3-CMIP5 MH and LGM simulations confirms that the relationships between large-scale patterns of temperature and precipitation change in future projections are believable . However, the PMIP3-CMIP5 simulations of MH and LGM climates show only moderate skill in predicting observed patterns of climate change overall (Hargreaves et al., 2013;Hargreaves and Annan, 2014;Harrison et al., 2014;Harrison et al., 20 2015) and this arises because of persistent problems in simulating regional climates (e.g. Mauri et al., 2014;Perez-Sanz et al., 2014;Harrison et al., 2015). State-of-the-art models still cannot reproduce the northward penetration of the African monsoon in response to MH orbital forcing (Figure 2b, Perez-Sanz et al., 2014, for example. Both inadequate representation of feedbacks and model biases could contribute to this mismatch (see e.g. Zheng and Braconnot, 2013) but are unlikely to be sufficient to reconcile the PMIP3-25 CMIP5 simulations with observations. Systematic biases in the simulation of regional climates means that state-of-the-art models are generally better at simulating mean values of any climate variable than at simulating the spatial variability or the geographical patterning in that variable . Although the benchmarking of the PMIP3-CMIP5 MH and 30 LGM experiments shows that some models consistently perform better than others , better performance in palaeo-simulations is not consistently related to better performance under modern conditions . The ability to simulate modern climate regimes and processes does not guarantee that a model will be good at simulating climate changes, emphasising the importance of testing models against the palaeorecord to increase confidence in projections of future climate Hargreaves and 35 Annan, 2014;Schmidt et al., 2014).

5
There are small differences in the boundary conditions to be used for PMIP4-CMIP6 compared to those used in PMIP3. In PMIP3, the MH CO 2 concentration was prescribed to be the same as in the pre-industrial control simulation because the focus was on testing the impact of the insolation forcing on meridional climate gradients and seasonality. Realistic values of CO 2 concentration and other trace gases will be used in PMIP4-CMIP6 10 ( Table 2). This will allow the midHolocene experiment to be used as the initial state for transient simulations of impact of these different ice-sheet forcings will be a focus for sensitivity experiments in PMIP4. There are uncertainties about other boundary conditions for the midHolocene and lgm experiments, including dust and vegetation (section 3.5), and these will also be investigated as part of the analysis of the entry-card simulations.

The last millennium (past1000)
The millennium before the industrial era provides a well-documented (e.g. PAGES2k-PMIP3 group, 2015) 25 period of multi-decadal to multi-centennial changes in climate, with contrasting periods such as the Medieval Climate Optimum and the Little Ice Age. This interval was characterised by variations in solar, volcanic and orbital forcings (Figure 1). Investigating the response to (mainly) natural forcing under climatic background conditions not too different from today is crucial for an improved understanding of climate variability, circulation, and regional connectivity. This interval also provides a context for earlier anthropogenic impacts 30 (e.g. land-use changes) and the current warming by increased greenhouse gas concentrations and helps constrain uncertainty in the future climate response to a sustained anthropogenic impact. scales Schurer et al., 2014;Man et al., 2012;Otto-Bliesner et al., 2016). The LM simulations show relatively good agreement with regional climate reconstructions for the northern hemisphere, but less agreement with southern hemisphere records. The simulations exhibit more regional coherence than shown by southern hemisphere records, though it is not clear whether this is due to deficiencies in the southern hemisphere records, or poor representation of internal variability and/or an 5 overestimation of the forced response in the simulations. The PMIP4-CMIP6 past1000 simulations will be based on experience gained in PMIP3-CMIP5, in which more than a dozen modelling groups participated and a total of 15 past1000 experiments where stored in the ESGF database. The PMIP4-CMIP6 past1000 simulations build on the DECK experiments, in particular the pre-15 industrial control (piControl) simulation as unforced reference, and the historical simulations (Eyring et al., 2015). Moreover, past1000 simulations provide initial conditions for historical simulations starting in the 19th century that are considered superior to the piControl state as it includes integrated information from the forcing history (e.g. large volcanic eruptions in the early 19th century). The PMIP4-CMIP6 past 1000 simulation will benefit from a new, more comprehensive reconstruction of volcanic forcing (Sigl et al., 2015) and an 20 experimental protocol that ensures a more continuous transition from the pre-industrial past to the future.
Higher-resolution simulations will allow a greater range of regional processes, such as the role of storm-tracks and blocking on regional precipitation, to be analyzed.

The last interglacial (lig127k) 25
The Last Interglacial (ca 130-115 ka) was characterized by a northern hemisphere insolation seasonal cycle even larger than for the mid-Holocene ( Figure 1, Table 1), resulting in a strong polar amplification of temperatures and reduced Arctic sea ice, and global sea level was at least 5 m higher than now for at least several thousand years (Masson-Delmotte et al., 2013;Dutton et al., 2015). Both the Greenland and Antarctic ice sheets contributed to this sea level rise, making it an important period for testing our knowledge of climate-ice sheet 30 interactions in warm climates. There are more quantitative climate reconstructions available for the Last Interglacial than earlier interglacials, despite challenges in establishing the reliable chronologies, making it feasible to assess regional climate changes.
Climate model simulations of the Last Interglacial, reviewed and assessed in the AR5, varied in their forcings 35 and were not necessarily made with the same model/same resolution as the CMIP5 future projections.
Quantitative reconstructions of annual surface temperature change were available for comparison to these simulations ( Figure 4) though with the caveat that the warmest phases were not necessarily globally synchronous (Masson-Delmotte et al., 2013). Nevertheless, comparison exercises showed large-scale discrepancies between simulations and reconstructions, particularly in regard to temperature trends over Greenland and the Southern 40 Ocean (Bakker et al., 2013. As in (c) but with symbols representing terrestrial proxy records as compiled from published literature ( Table 5.A.5). Observed seasonal terrestrial anomalies larger than 10°C or less than -6°C are not shown.

In (c) and (d) JJA denotes June -July -August and DJF December -January -February, respectively. 20
The PMIP4-CMIP6 lig127k experiment will help to determine the interplay of warmer atmospheric and oceanic temperatures, changed precipitation, and changed surface energy balance on ice sheet thermodynamics and dynamics ( Table 1). The major changes in the experimental protocol for lig127k, compared to the pre-industrial DECK experiment, are changes in astronomical parameters and greenhouse gases (Table 2; Otto-Bliesner et al, 25 2016). Analyses of these simulations will benefit from the concerted effort by the paleodata community to provide a spatial-temporal picture of last interglacial temperature change (Capron et al., 2014) as well as phasing of the timing of the contributions of Greenland and Antarctica to the global sea level (Winsor et al., 2012;Steig et al., 2015). Regional responses of tropical hydroclimate and of polar sea ice can be assessed and compared to the mid-Holocene. Outputs from the lig127k experiment will be used by ISMIP6 to force standalone ice sheet 30 experiments (lastIntergacialforcedism). The lig127k experiment will also be the starting point of a transient experiment covering the interglacial to be run within PMIP4.

The mid-Pliocene Warm Period (midPliocene-eoi400)
The Pliocene epoch was the last time in Earth history when atmospheric CO 2 concentrations approached modern values (~400 ppmv) whilst at the same time retaining a near modern continental configuration ( Figure 1, Table 1,  35 Erreur ! Source du renvoi introuvable.). The IPCC 5 th Assessment report chapter 5 (Masson-Delmotte et al., 2013) states that model-data comparisons for the Pliocene provide high confidence that mean surface temperature was warmer than pre-industrial (Dowsett et al., 2012;. However, as was the case for the Last Interglacial, the mid-Pliocene simulations were not always derived from the same model at the same resolution as the CMIP5 future projections. 40 The PMIP4-CMIP6 midPliocene-eoi400 experiment is designed to understand the long term response of the climate system to a near modern concentration of atmospheric CO 2 (longer term climate sensitivity or Earth 20 System Sensitivity), and to understand the response of ocean circulation, Arctic sea-ice, modes of climate variability (e.g. El Niño Southern Oscillation), as well as the global response in the hydrological cycle and regional changes in monsoon systems (Table 1) simulation has societal relevance because of its potential to inform policy makers on required emission reduction scenarios designed to prevent an increase in global annual mean temperatures by more than 2 to 3 °C beyond 2100 AD.

Experimental set up and model configuration
The modified forcings and boundary conditions for each PMIP4-CMIP6 palaeoclimate simulation are 30 summarised in Table 2

Model version and set-up
The climate models taking part in CMIP6 are very diverse: some representing the solely physics of the climate system; some including the carbon cycle and other biogeochemical cycles; some even including interactive  For each experiment, the greenhouse gases and astronomical parameters should be modified from the DECK piControl experiment according to Table 2. In the following sections, we give more detail on the implementation of the boundary conditions which require specific attention to ensure consistency withing CMIP6 and PMIP4. 15

Implementation of ice sheets
The mid-Pliocene and Last Glacial Maximum experiments require changes in ice sheets. This implies changes in ice sheet height, land surface type, seas level and hence land-sea mask, and ocean bathymetry ( Figure 6). These changes in boundary conditions should be implemented as follows: 1. The land-sea mask should be implemented in the ocean and atmosphere/land surface models. This step 20 is optional for the midPlioceneEoi400 experiment, but mandatory for the lgm. It is important to check the newly glaciated areas in the lgm experiment to ensure that grid cells under the grounded ice sheets (e.g. in the Hudson Bay area and over present-day Barents-Kara seas) are not specified as ocean cells.
2. The ice sheet mask should be implemented in the atmosphere/land surface model. 10 Some ice-sheet related changes must be implemented in the initial conditions: -This atmospheric mass must be the same as today. For some models, this means that the initial surface pressure field has to be adjusted to the change in surface elevation.
-The mean ocean salinity has to be increased by +1 PSU everywhere at the beginning of the lgm 15 simulation, to account for the lowering of sea level. Alkalinity also needs to be adjusted if an ocean biogeochemistry model is used. Land-use changes have to be implemented for the past1000 simulation in the same manner as for the historical simulation (Hurtt et al., in prep.), using the land-use forcing provided by the Land Use Model Intercomparison Project and the CMIP6 Land Use Harmonization dataset (https://cmip.ucar.edu/lumip; Hurtt et al.,in prep.;35 Jungclaus et al.,in prep.). This data set is derived from the HYDE3.2 (Klein Goldewijk et al., in prep.) estimates of the area of cropland, managed pasture, rangeland, urban, and irrigated land. Different crop types are treated separately and estimates of wood harvest are also provided.

Mineral Dust
Natural aerosols show large variations on glacial-interglacial time scales, with glacial climates having higher dust loadings than interglacial climates (Kohfeld and Harrison, 2001;Maher et al, 2010). Dust emissions from northern Africa were significantly reduced during the MH (McGee et al., 2013). As is the case with vegetation, 5 the treatment of dust in the midHolocene and lgm simulations should parallel the treatment in the piControl.
However, some of the models in CMIP6 include representations of interactive dust. For those models, maps of soil erodibility, accounting for changes in the extension of possible dust sources, will be provided from recent simulations (Albani et al, 2014(Albani et al, , 2015Hopcroft et al, 2015) for the pre-industrial, mid-Holocene and the LGM periods. Dust anomalies/ratios compared to the pre-industrial background should be used, for consistency with 10 the DECK piControl simulation. As there have been instances of runaway climate-vegetation-dust feedback, leading to unrealistically cold LGM climates (Hopcroft and Valdes, 2015), it is advisable to test model behaviour before running the lgm simulation. To allow experiments with prescribed dust changes, a three-dimensional monthly climatology of dust atmospheric mass concentrations will be provided for the pre-industrial, MH, and LGM based on two different modeling studies (Albani et al., 2014, Hopcroft et al., 2015. 15 Additional dust-related fields (dust emission flux, dust load, dust aerosol optical thickness, short-and long-wave, surface and top of the atmosphere dust radiative forcing) will also be available from these simulations. Implementation should follow the same procedure as for the historical run (Albani et al, 2014(Albani et al, , 2015. Since dust plays an important role in ocean biogeochemistry (e.g. Kohfeld et al, 2005), three dust maps will be provided.
Two of these are consistent with the climatologies of dust atmospheric mass concentrations; the other is 20 primarily derived from observations (Lambert et al., 2015).

Volcanoes and stratospheric aerosols
The past1000 experiment includes changes in volcanic aerosols, although these are not included in other PMIP4-CMIP6 experiments. The estimates of sulphur injections are derived from a recent compilation of synchronized Antarctic and Arctic ice core records, which provides an improved history of the timing and magnitude of 30 eruptions over the last 2500 years (Sigl et al. 2013). Ice core sulphate fluxes are translated into a time series of stratospheric sulphur injection via linear scaling (similar to Gao et al., 2008) and by matching the ice-core signals to historically confirmed eruptions. Unidentified eruptions are assigned as tropical when there are matching northern and southern hemisphere signals, signals only registered in the northern or southern hemisphere are considered to be extratropical in origin. Modeling groups using interactive aerosol modules and 35 sulphur injections in their historical simulations will follow the same method for the past1000 experiment and use sulphur injection estimates directly. However, estimates of aerosol radiative properties as a function of latitude, height, and wavelength will be provided for other modelling groups using the Easy Volcanic Aerosol (EVA) module , which is a parameterized three-box model of stratospheric transport that uses simple scaling relationships to derive mid-visible aerosol optical depth (AOD) and aerosol effective radius 40 15 Geosci. Model Dev. Discuss., doi:10.5194/gmd-2016-106, 2016 Manuscript under review for journal Geosci. Model Dev. Published: 26 May 2016 c Author(s) 2016. CC-BY 3.0 License.
(r eff ) from stratospheric sulphate mass. EVA uses model-specific information (grid, wave-length distribution) to produce annual volcanic aerosol forcing files for wavelength dependent aerosol extinction (EXT), single scattering albedo (SSA) and scattering asymmetry factor (ASY) as function of time, latitude, height and wave length. There are uncertainties associated with this approach, so additional sensitivity experiments to assess the impacts of these uncertainties on the past1000 simulations will be made as part of the PMIP4 (see Jungclaus et 5 al.,in prep.).

Spin-up and duration of experiments
The data stored in the CMIP6 database should be representative of the equilibrium climates of the mid-Holocene, Last Glacial Maximum, Last Interglacial and mid-Pliocene Warm period, and of the transient evolution of climate between 850-1850 CE for the past1000 simulations. Spin-up procedures will differ for different models 10 and time periods, but the spin up should be long enough to avoid significant drift in the analysed data. Initial conditions can be taken from an existing simulation. A minimum of 100 years output is required for the equilibrium simulations but, given the increasing interest in analysing multi-decadal variability (e.g. Wittenberg, 2009), modelling groups are encouraged to provide outputs for a longer period of 500 years.

Documentation 15
Detailed documentation of the PMIP4-CMIP6 simulations is required. This should include: -a description of the model and its components; -information about the boundary conditions used, particularly when alternatives are allowed (Table 2) website to facilitate linkages with non-CMIP6 simulations to be carried out in PMIP4. A PMIP4 special issue, shared between Geoscientific Model Development and Climate of the Past, will provide a further opportunity for modelling groups to document specific aspects of their simulations.

Plan of Analyses
The compatibility of past, historical and future climate simulations, through the use of seamless forcings and identical model versions, will allow benchmarking based on extensive syntheses of palaeoclimate data to be applied to models used for future projections. Planned analyses of the PMIP4-CMIP6 palaeoclimate simulations will make full use of the fact that modelling groups must also run the piControl, historical and abrupt4xCO2 5 DECK experiments, by focusing on analyses that link past and future climates. The piControl and the historical simulations provide two alternative reference states for palaeoclimate simulations. Existing palaeoclimate reconstructions have used different modern reference states, and this has been shown to have an impact on the magnitude of reconstructed changes (e.g. Hessler et al., 2014). Comparisons of the simulated piControl and the historical climates will provide a way of quantifying this source of reconstruction uncertainty. Furthermore, 10 links established with other CMIP6 MIPs (Section 4.3 and Table 3) will make it possible to capitalise on their analyses to improve understanding of specific aspects past climates and vice versa.

Making use of PMIP4-CMIP6 multi time period
Systematic benchmarking of each of the PMIP4-CMIP6 simulations will be a major aspect of the planned multi-15 period approach. This will require the development of new data syntheses, assessments of the regional-scale consistency of different sources of information, as well as the use of new tools that simulate the palaeoclimate sensors explicitly. Forward modelling of specific palaeoenvironmental records provides a way to quantify uncertainties in the climate reconstructions used for benchmarking. The ensemble of metrics developed in PMIP3-CMIP5 (e.g.  will be expanded to include more process-oriented metrics. Multi-20 period analyses will be particularly helpful for analyses of the hydrological cycle and the monsoons, including the how changes in land hydrology affect freshwater inputs to the ocean and water mass properties. Multi-period analyses will also help to address the role of vegetation feedbacks, particularly given the ambiguity as to whether these feedbacks are reproduced appropriately in simulations of the mid-Holocene.

25
There are many aspects of the climate system which are difficult to measure directly, and which are therefore difficult to evaluate using traditional methods. The "emergent constraint" (e.g. Sherwood et al., 2014) concept, which is based on identifying a relationship to a more easily measurable variable, has been successfully used by the carbon-cycle and modern climate communities and holds great potential for the analysis of palaeoclimate simulations. This could be particularly valuable to examine the realism of cloud feedbacks in the simulations or 30 the contribution of seasonal climate changes to hydrological budgets. Joint analysis of multiple paleoclimate simulations and climate reconstructions from different archives will be used to address the issue of climate sensitivity (sensu stricto) and earth-system sensitivity (PALEOSENS Project Members, 2012). The relationship between radiative forcing and global temperature is not straightforward, 35 (Crucifix 2006, Yoshimori et al, 2011, partly because the nature of the forcing that drives the Earth to a cold climate differ from those that drive it into a warmer state. Nevertheless, estimates of climate sensitivity based on past climate states provide a starting point to establish the bounds of climate sensitivity to CO 2 doubling (Hargreaves 2012). The multi-period approach will bring new constraints to this analysis. Additional constraints 17 Geosci. Model Dev. Discuss., doi: 10.5194/gmd-2016-106, 2016 Manuscript under review for journal Geosci. Model Dev. Published: 26 May 2016 c Author(s) 2016. CC-BY 3.0 License.
can be obtained by using perturbed-physics experiments, in which different members differ by the values of the parameters (Annan et al., 2005, Yoshimori et al, 2011. The 'perturbed forcing' approach (Bounceur et el., 2015, Araya-Melo 2015, using sensitivity experiments carried out in PMIP4, could provide a way to chart the sensitivity of the climate system in a multi-dimensional space of forcing conditions. The equilibrium palaeoclimate experiments in PMIP4-CMIP6 provide an opportunity to sample simulations for long enough, at least 250 years, to obtain robust estimates of ENSO changes (Stevenson et al, 2010) and analyses of multiple long simulations with different forcings should provide a better understanding of changes in ENSO behaviour (Zheng et al. 2008 and to determine whether state-of-the-art climate models underestimate low frequency noise (Laepple and Huybers, 2014). The PMIP Paleovariability Working Group 15 will develop diagnostics for climate variability (Philips et al, 2014) to be applied to all the PMIP4-CMIP6 simulations. Analyses will focus on how models reproduce the relationship between changes in seasonality and interannual variability (Emile-Geay et al. 2016), the diversity of El-Niño events (Capotondi et al. 2015;Karamperidou et al. 2015, Luan et al 2015, and the stability of teleconnections within the climate system (e.g. Gallant et al., 2013;Batehup et al., 2015). 20

Interactions with other CMIP6 MIPs and the WCRP Grand Challenges
Interactions between PMIP and other CMIP6 MIPs have mutual benefits: PMIP provides simulations of large climate changes that have occurred in the past and evaluation tools capitalizing on extensive data syntheses, while other MIPs will employ diagnostics and analyses which will be useful for analyzing the PMIP4 experiments. This is the case of AerChemMIP for the aerosol forcings, SIMIP (Notz et al, 2016) and OMIP 25 (Griffies et al, 2016) for the sea-ice and ocean components, LS3MIP (van den Hurk, 2016) for the land surface, C4MIP (Jones et al, 2016) for the carbon cycle, ISMIP for ice sheets, and CFMIP for the cloud forcing and feedback analyses. VolMIP (Zanchettin et al, 2016) and LUMIP ) analytical tools will be relevant for the analyses of the impacts of volcanic and land use forcings in the past1000 simulation. The past1000 experiment also offers a long time series perturbed by natural forcings and observed land use changes 30 for detection and attribution exercises and is therefore relevant for DAMIP (Gillett et al, 2016). We have ensured that all the outputs necessary for the application of common diagnostics across PMIP and other CMIP6 MIPs will be available (see section 4.4).
PMIP has already developed strong links with several other CMIP6 MIPs (Table 3) VolMIP for the study of the impact of large past volcanic eruptions and ISMIP6 for the impact of the last interglacial climate on the Greenland ice sheet. Links with CFMIP and ISMIP6 mean that PMIP will also contribute to the WCRP Grand Challenges "Clouds, Circulation and Climate Sensitivity" and "Cryosphere and Sea Level" respectively. PMIP will also provide input to the WCRP Grand Challenge on "Regional Climate 5 Information", through a focus on evaluating the mechanisms of regional climate change in the past.

Implications: required variables for the PMIP4-CMIP6 database 10
The list of variables required to analyse the PMIP4-CMIP6 palaeoclimate experiments (https://wiki.lsce.ipsl.fr/pmip3/doku.php/pmip3:wg:db:cmip6request) reflects plans for multi-time period analyses and for interactions with other CMIP6 MIPs. We have included pertinent variables from the data requests of other MIPs, including the CFMIP specific diagnostics on cloud forcing, land surface, snow, ocean, sea ice, aerosol, carbon cycle and ice sheet variables from LS3MIP, OMIP, SIMIP, AerChemMIP, C4MIP, and 15 ISMIP6 respectively. Some of these variables are also required to diagnose how climate signals are recorded by palaeoclimatic sensors via models of e.g. tree growth , vegetation dynamics (Prentice et al., 2011) or marine micro-flora/fauna (e.g. planktonic foraminifera: Lombard et al, 2011, Kageyama et al, 2013.
The only set of variables defined specifically for PMIP are those describing oxygen isotopes in the climate system. Isotopes are widely used for palaeoclimatic reconstruction and are explicitly simulated in several 20 models.
We have asked that average annual cycles of key variables are included in the PMIP4-CMIP6 data request for equilibrium simulations, as these proved exceptionally useful for analyses in PMIP3-CMIP5. Daily values of some variables are required for analyzing simulations with large changes in astronomical parameters 25 (midHolocene and lig127k), as these changes result in modifications of the duration of each month of the year (Braconnot and Joussaume 1997). Modifications to month length are not usually taken into account in the model output post-treatment procedures. Daily values are also useful for running regional models. It is important to test the use of regional models for climate model projections at the regional scale. These models are also used to produce fine-scale palaeoclimate scenarios for use by the impact community, for example to study past climate 30 impacts on biodiversity via ecological niche modelling. The PMIP community anticipates major benefits from analysis techniques developed by the other CMIP6 MIPs, in particular in terms of learning about the processes of past climate changes in response to forcings (e.g. 10 greenhouse gases, astronomical parameters, ice sheet and sea level changes) as well as feedbacks (e.g. clouds, ocean, sea-ice). Collaborations have already been developed with e.g. CFMIP, ISMIP6 and VolMIP, but the hope is to build additional collaborations with other CMIP6 MIPs. PMIP4-CMIP6 has the potential to be mutually beneficial for the paleoclimate and present/future climate scientists to learn about natural large climate changes and the mechanisms at work in the climate system for climates states as different from today as future 15 climate is projected to be.

Data availability
All data mentioned in the present manuscript can be found on the following web sites: -http://pmip4.lsce.ipsl.fr -http://geology.er.usgs.gov/egpsc/prism/7_pliomip2.html. 20 They will also be provided via the ESGF system when this is set-up, along with forcing files for other CMIP6 experiments. 3.2 Ma ago a) Earth System response to a long term to CO 2 forcing analogous to that of the modern b) Significance of CO 2 -induced polar amplification for the stability of the ice sheets, sea-ice and sea-level Tier 1* Table 1: Characteristics, purpose and CMIP6 priority of the five PMIP4-CMIP6 experiments. * All experiments can be run independently. It is not mandatory to perform all Tier 1 experiments to take part in PMIP4-CMIP6, but it is mandatory to run at least one of the PMIP4-CMIP6 entry cards. 5 35 Geosci. Model Dev. Discuss., doi:10.5194/gmd-2016-106, 2016   Analysis of specific volcanic events very useful for critical analysis of past1000 simulations. VolMIP would systematically assess uncertainties in the climate response to volcanic forcing, whereas past1000 simulations describe the climate response to volcanic forcing in long transient simulations where related uncertainties are due to chosen input data for volcanic forcing: mutual assessment of forced response. DAMIP past1000 simulations provide long-term reference background including natural climate variability for detection and attribution.  LGM, last glacial maximum; MH, mid-Holocene; LM, last millennium; H, CMIP6 historical simulation): (a)-(d) insolation anomalies (differences from 1950 CE), for July at 65°N, calculated using the programs of Laskar et al. (2004, panel (a)) and Berger (1978, panels (b)-(d)); (e) δ 18 O (magenta, Lisiecki and 10 Raymo, 2005, scale at left), and sea level (blue line, Rohling et al., 2014; blue shading, a density plot of eleven Mid-Pliocene sea level estimates (Dowsett and Cronin 1990;Wardlaw and Quinn, 1991;Krantz, 1991;Raymo et al., 2009;Dwyer and Chandler, 2009;Naish and Wilson, 2009;Masson-Delmotte et al., 2013;Rohling et al., 2014;Dowsett et al., 2016) (g)), (h) sea level (Kopp, et al., 2016, scale at right); (i) CO 2 for the interval 3.0-3.3 Ma shown as a density plot of eight Mid-Pliocene estimates (Raymo et al., 1996;Stap et al., 2016;Pagani et al., 2010;Seki et al., 2010;Tripati et al., 2009;Bartoli et al., 2011;Seki et al., 2010;Kurschner et al., 1996); (j) and (k) CO 2 measurements (Bereiter et al., 2015, scale at left); (l) CO 2 measurements (Schmidt 20 et al, 2011, scale at right); (m) and (n) CH 4 measurements , scale at left); (o) CH 4 measurements (Schmidt et al, 2011, scale at right); (p) volcanic radiative forcing (Schmidt et al., 2012, scale at right); (q) total solar irradiance (Schmidt et al., 2012, scale at right). regression line (magenta) shows that land-ocean contrasts are maintained across different climate states and are also consistent with palaeoclimatic data. (b) Boxplots of reconstructions based on fossil-pollen data (gray, Bartlein et al. 2011) and simulations (at the locations of the data) for the difference in mean annual precipitation (MAP) for the mid-Holocene (relative to present) in northern Africa (20°W-30°E; 5-30°N). The comparison shows that although all models simulated wetter-that-present conditions in northern Africa for the mid-Holocene, 35 they systematically underestimated the magnitude of the precipitation difference.  . As in (c) but with symbols representing terrestrial proxy records as compiled from published literature (Table 5.A.5). Observed seasonal terrestrial anomalies larger than 10°C or less than -6°C are not shown. In (c) and (d) JJA denotes June -July -August and DJF December -January -February, respectively. 15  LGM (Albani et al., 2014). Maps of dust deposition (g m-2 a-1) for the LGM d. simulated with the Hadley Centre Global Environment Model 2-45