The Model Intercomparison Project on the climatic response to Volcanic forcing (VolMIP): experimental design and forcing input data for CMIP6

. The enhancement of the stratospheric aerosol layer by volcanic eruptions induces a complex set of responses causing global and regional climate effects on a broad range of timescales. Uncertainties exist regarding the climatic response to strong volcanic forcing identiﬁed in coupled climate simulations that contributed to the ﬁfth phase of the Coupled Model Intercomparison Project (CMIP5). In order to better understand the sources of these model diversities, the Model Intercomparison Project on the climatic response to Volcanic forcing (VolMIP) has deﬁned a coordinated set of idealized volcanic perturbation experiments to be carried out in alignment with the CMIP6 protocol. VolMIP provides a common stratospheric aerosol data set for each experiment to minimize differences in the applied volcanic forcing. It deﬁnes a set of initial conditions to assess how internal climate variability contributes to determining the response. VolMIP will assess to what extent volcanically forced responses of the coupled ocean–atmosphere system are robustly simulated by state-of-the-art coupled climate models and identify the causes that limit robust simulated behavior, especially differences in the treatment of physical processes. This paper illustrates the design of the idealized volcanic perturbation experiments in the VolMIP protocol and describes the common aerosol forcing input data sets to be used.


Introduction
Volcanic eruptions that eject substantial amounts of sulfur dioxide (SO 2 ) into the atmosphere have been one of the dominant natural causes of externally forced annual to multidecadal climate variability during the last millennium (Hegerl et al., 2003;Myhre et al., 2013;Schurer et al., 2014). Significant advances have been made in recent years in our understanding of the core microphysical, physical, and chemical processes that determine the radiative forcing resulting from volcanic sulfur emissions and the consequent dynamical responses of the coupled ocean-atmosphere system (e.g., . However, the fifth phase of the Coupled Model Intercomparison Project (CMIP5) has demonstrated that climate models' capability to accurately and robustly simulate observed and reconstructed volcanically forced climate behavior remains poor.
For instance, the largest uncertainties in radiative forcings (Driscoll et al., 2012) and in lower troposphere temperature trends (Santer et al., 2014) from historical CMIP5 simulations occur during periods of strong volcanic activity. CMIP5 models tend to overestimate the observed posteruption global surface cooling and subsequent warming (Marotzke and Forster, 2015), although the discrepancy decreases when accounting for the post-eruption phase of the El Niño-Southern Oscillation (ENSO) (Lehner et al., 2016). Driscoll et al. (2012) and Charlton-Perez et al. (2013) found large uncertainty across CMIP5 models concerning the average dynamical atmospheric response during the first two post-eruption winters, especially the post-eruption strengthening of the northern hemispheric (NH) winter polar vortex and its tropospheric signature. Climate models reproduce the main features of observed precipitation response to volcanic forcing but significantly underestimate the magnitude of the regional responses in particular seasons (Iles and Hegerl, 2014).
Volcanic events during the instrumental period are, however, few and of limited magnitude, and their associated dynamical climatic response is very noisy (e.g., Hegerl et al., 2011). Furthermore, there is inter-model disagreement about post-eruption oceanic evolution, particularly concerning the response of the thermohaline circulation (e.g., Mignot et al., 2011;Hofer et al., 2011;Zanchettin et al., 2012;Ding et al., 2014). Substantial uncertainties still exist about decadalscale climate variability during periods of strong volcanic forcing and in the role of the ocean in determining the surface air temperature response to volcanic eruptions.
Climate-proxy-based reconstructions covering the last millennium are a major source of information about how the climate system responds to volcanic forcing (e.g., D'Arrigo et al., 2009;Corona et al., 2010;Gennaretti et al., 2014). Recent studies have explored new reconstruction methods applied on high-quality proxy records to produce more rigorous regional climate reconstructions and allow for an improved evaluation of climate models (e.g., Luterbacher et al., 2016). However, discrepancies exist between simulated and reconstructed climate variability during periods of the last millennium characterized by strong volcanic activity, concerning, for instance, the magnitude of posteruption surface cooling (e.g., Mann et al., 2012Mann et al., , 2013Anchukaitis et al., 2012;Stoffel et al., 2015;Luterbacher et al., 2016) and the interdecadal response to volcanic clusters of tropical precipitation (Winter et al., 2015) and large-scale modes of atmospheric variability (Zanchettin et al., 2015a).
The lack of robust behavior in climate simulations likely depends on various reasons. First, inter-model spread can be caused by differences in the models' characteristics, such as the spatial resolution, and the imposed volcanic forcing. The latter stems from choices about the employed data set describing climatically relevant parameters related to the eruption source -especially the mass of emitted SO 2 -and about the stratospheric aerosol properties such as spatial extent of the cloud, optical depth, and aerosol size distribution (e.g., . As instrumental observations of volcanic eruptions are limited, with the 1991 eruption of Mt. Pinatubo being the best documented event (e.g., Minnis et al., 1993), forcing characteristics must often be reconstructed based on indirect evidence such as ice-core measurements (e.g., Devine et al., 1984;Sigl et al., 2014). These reconstructions rely on a simplified hypothesis of scaling between ice-core sulfate concentrations and aerosol optical depths based on the relation observed for the 1991 eruption of Mt. Pinatubo (Crowley and Unterman, 2013). The consideration of aerosol microphysical processes also produces substantial inconsistencies between available volcanological data sets . Furthermore, even when the same volcanic aerosol forcing is used to force different models, these may generate different radiative forcing due to the modelspecific implementation of the volcanic forcing Toohey et al., 2014).
The simulated climatic response to individual volcanic eruptions also critically depends on the background climate, including the mean climate state (Berdahl and Robock, 2013;Muthers et al., 2014Muthers et al., , 2015, the ongoing internal climate variability (e.g., Thomas et al., 2009;Pausata et al., 2015aPausata et al., , 2016Swingedouw et al., 2015;Zanchettin et al., 2013a;Lehner et al., 2016), and the presence of additional forcing factors such as variations in solar irradiance (Zanchettin et al., 2013a;Anet et al., 2014). As a result, different models, forcing inputs, and internal climate variability similarly con-  Gao et al., 2008Crowley et al., 2008 Time ( Braconnot et al., 2012); (c) comparison between simulated (PMIP3, 11-year smoothing, colors) and reconstructed (black line: mean; shading: 5th-95th percentile range) Northern Hemisphere average summer temperature anomalies (relative to 1799-1808); (d) same as (c) but for a pre-PMIP3 single-model ensemble (ECHAM5/MPIOM; Zanchettin et al., 2013a, b). Reconstructed data are the full raw calibration ensemble by Frank et al. (2010). tribute to simulation-ensemble spread. This can be seen, for instance, by comparing hemispheric temperature evolution from a multi-model ensemble and a single-model ensemble of last-millennium simulations during the early 19th century (Fig. 1), a period characterized by the close succession of two strong tropical volcanic eruptions in 1809 and 1815. The individual impact of these sources of uncertainty can be hard to distinguish in transient climate simulations. Therefore, the Model Intercomparison Project on the climatic response to Volcanic forcing (VolMIP) -an endorsed contribution to CMIP6 (Eyring et al., 2016, this issue) -provides the basis for a coordinated multi-model assessment of climate models' performances under strong volcanic forcing conditions. It defines a set of idealized volcanic-perturbation experiments where volcanic forcing -defined in terms of volcanic aerosol optical properties -is well constrained across participating models. VolMIP will therefore assess to what extent responses of the coupled ocean-atmosphere system to the same applied strong volcanic forcing are robustly simulated across state-of-the-art coupled climate models and identify the causes that limit robust simulated behavior, es-pecially differences in their treatment of physical processes. Ensemble simulations sampling appropriate initial conditions and using the same volcanic forcing data set accounting for aerosol microphysical processes can help assess the signal-to-noise ratio and reduce uncertainties regarding the magnitude of post-eruption surface cooling (Stoffel et al., 2015). Careful sampling of initial climate conditions and the opportunity to consider volcanic eruptions of different strengths will allow VolMIP to better assess the relative role of internally generated and externally forced climate variability during periods of strong volcanic activity. VolMIP also contributes toward more reliable climate models by helping to identify the origins and consequences of systematic model biases affecting the dynamical climatic response to volcanic forcing. As a consequence, VolMIP will improve our confidence in the attribution and dynamical interpretation of reconstructed post-eruption regional features and provide insights into regional climate predictability during periods of strong volcanic forcing.
VolMIP experiments will provide context to CMIP6-DECK AMIP and "historical" simulations , the decadal climate prediction experiments of the Decadal Climate Prediction Panel (DCPP) (Boer et al., 2016), and the "past1000" simulations of the Paleoclimate Model Intercomparison Project (PMIP) (Kageyama et al., 2016) where volcanic forcing is among the dominant sources of climate variability and inter-model spread. The importance of VolMIP is enhanced as the specification of the volcanic stratospheric aerosol for the CMIP6 historical experiment is based on "time-dependent observations" , and some modeling groups may therefore perform the simulations using online calculation of volcanic radiative forcing based on SO 2 emissions. This paper is organized as follows. First, in Sect. 2 we provide a general description of the individual experiments included in the VolMIP protocol. Then, Sect. 3 provides details about the volcanic forcing for each experiment, including implementation and the forcing input data to be employed, for which this paper also serves as a reference. We discuss the limitations of VolMIP and potential follow-up research in Sect. 4, before summarizing the most relevant aspects of this initiative in Sect. 5.

Experiments: rationale and general aspects
The VolMIP protocol consists of a set of idealized volcanic perturbation experiments based on historical eruptions. In this context, "idealized" means that the volcanic forcing is derived from radiation or source parameters of documented eruptions but the experiments generally do not include information about the actual climate conditions when these events occurred. The experiments are designed as ensemble simulations, with sets of initial climate states sampled from the CMIP6-DECK "piControl" (i.e., preindustrial control) simulation describing unperturbed preindustrial climate conditions , unless specified otherwise.
VolMIP experiments are designed based on a multifold strategy. A first set of experiments ("volc-pinatubo") focuses on the systematical assessment of uncertainty and intermodel differences in the seasonal-to-interannual climatic response to an idealized 1991 Pinatubo-like eruption, chosen as representative of the largest magnitude of volcanic events that occurred during the observational period. volcpinatubo experiments highlight the role of internal interannual variability for volcanic events characterized by a rather low signal-to-noise ratio in the response of global-average surface temperature. The short-term dynamical response is sensitive to the particular structure of the applied forcing . Using carefully constructed forcing fields and sufficiently large simulation ensembles, VolMIP allows us to investigate the inter-model robustness of the short-term dynamical response to volcanic forcing and elucidate the mechanisms through which volcanic forcing leads to changes in atmospheric dynamics. The proposed set of volc-pinatubo experiments includes sensitivity experiments designed to determine the different contributions to such uncertainty that are due to the direct radiative (i.e., surface cooling) and to the dynamical (i.e., stratospheric warming) response.
A second set of experiments ("volc-long") is designed to systematically investigate inter-model differences in the long-term (up to the decadal timescale) dynamical climatic response to volcanic eruptions that are characterized by a high signal-to-noise ratio in the response of global-average surface temperature. A third set of experiments ("volccluster") is designed to investigate the climatic response to a close succession of strong volcanic eruptions. The main goal of volc-long and volc-cluster experiments is to assess how volcanic perturbation signals propagate within the simulated climates, e.g., into the subsurface ocean, the associated determinant processes, and their representation across models.
The VolMIP protocol defines criteria for sampling desired initial conditions whenever this is necessary to ensure comparability across different climate models. Desired initial conditions and hence ensemble size are determined based on the state of dominant modes of climate variability, which are specifically defined for each experiment. The ensemble size must be sufficiently large to account for the range of climate variability concomitantly depicted by such modes. As a general rule, three initialization states are determined for each given mode based on an index describing its temporal evolution. Specifically, the predetermined ranges for the sampling are: the lower tercile (i.e., the range of values between the minimum and the 33rd percentile) for the negative/cold state, the mid-tercile (i.e., the range of values between the 33rd and 66th percentiles) for the neutral state, and the upper tercile (i.e., the range of values between the 66th percentile and the maximum) for the positive/warm state. If n modes are sampled concomitantly, this yields an ensemble with 3 n members. For instance, in the case of two modes, an ensemble of at least nine simulations is requested. The choice of the climate modes to be considered for initialization essentially depends on the timescales of interest: seasonal to interannual modes for volc-pinatubo experiments and interannual and decadal modes for volc-long experiments (selection of initialization states is less important for volc-cluster experiments). The sampled years refer to the second integration year of the VolMIP experiment, when the volcanic forcing is generally strongest. Therefore, if, for instance, year Y of the control integration matches the desired conditions for the sampling, then the corresponding VolMIP simulation should start with restart data from year Y-1 of the control, for the day of the year specified for the experiment. Restart files from piControl must be accordingly selected and documented in the metadata of each simulation. If no restart data are available for the day of the year when the experiment starts, the control simulation must be re-run based on the first (backward in time) available restart file until the start date of the VolMIP experiment. All experiments except the decadal prediction experiment (Sect. 2.1.4) and the millennium cluster experiment (Sect. 2.4.4) maintain the same constant boundary forcing as the piControl integration, except for the volcanic forcing.
Some experiments are designed in cooperation with the Dynamics and Variability of the Stratosphere-Troposphere System Model Intercomparison Project (DynVarMIP) (Gerber and Manzini, 2016, this issue). DynVarMIP defines requirements for diagnosing the atmospheric circulation and variability in the context of CMIP6. DynVarMIP diagnostics include a refinement of the vertical resolution of standard variables archived as daily and monthly means, zonal mean diagnostics focused on the transport and exchange of momentum within the atmosphere and between the atmosphere and surface, and zonal mean diagnostics describing the interaction between radiation, moisture, and the circulation. For a detailed description of these diagnostics and the output format requested by DynVarMIP see Gerber and Manzini (2016).
An overview of the experimental design of the proposed experiments is provided in Tables 1, 2, and 3, where they are summarized according to their prioritization: Tier 1 experiments are mandatory; Tier 2 and Tier 3 experiments have decreasing priority. The experiments are individually described in the following subsections. Figure 2 sketches how the different experiments included in CMIP6 tackle different aspects of the climatic response to volcanic forcing. The codes for the naming conventions of the experiments are in Tables 1-3.
VolMIP has defined a new group of variables (volcanic instantaneous radiative forcing, or VIRF; see Table 4), which includes additional variables that were not in the CMIP5 data request and are necessary to generate the volcanic forcing for the "volc-pinatubo-surf"/"strat" experiments (see Sect. 3.3). In particular, all VIRF diagnostics are instantaneous 6 h data, so some interpolation in time may be required.

volc-pinatubo-full
Tier 1 experiment based on a large ensemble of short-term "Pinatubo" climate simulations aimed at accurately estimating simulated responses to volcanic forcing that may be comparable to the amplitude of internal interannual climate variability (Table 1). Initialization is based on equally distributed predefined states of ENSO (cold/neutral/warm states) and of the North Atlantic Oscillation (NAO, negative/neutral/positive states). Sampling of an eastern phase of the Quasi-Biennial Oscillation (QBO), as observed after the 1991 Pinatubo eruption, is preferred for those models that spontaneously generate such mode of stratospheric variability. VIRF diagnostics must be calculated for this experiment for the whole integration and for all ensemble members, as these are required for the "volc-pinatubo-strat"/"surf" experiments (see Sect. 2.1.2). For models participating in Dyn-VarMIP, DynVarMIP diagnostics shall be calculated for all simulations and for the whole integration period. A minimum length of integration of 3 years is requested.
The recommended ENSO index is the NH winter (DJF, with January as reference for the year) Nino3.4 sea-surface temperature index, defined as the spatially averaged, winteraverage sea-surface temperature over the region bounded by 120-170 • W and 5 • S-5 • N. The recommended NAO index is calculated based on the latitude-longitude two-box method by Stephenson et al. (2006) applied on Z500 data, i.e., as the pressure difference between spatial averages over (20-55 • N; 90 • W-60 • E) and (55-90 • N; 90 • W-60 • E).

volc-pinatubo-surf and volc-pinatubo-strat
Tier 1 simulations aimed at investigating the mechanism(s) connecting volcanic forcing and short-term climate anomalies (Table 1). These experiments aim to disentangle the dynamical responses to the two primary thermodynamic consequences of aerosol forcing: stratospheric heating (volcpinatubo-strat) and surface cooling (volc-pinatubo-surf). Both experiments are built upon "volc-pinatubo-full" and use the VIRF diagnostics calculated from the different realizations of this experiment. Integration length, ensemble size, and restart files are the same as for volc-pinatubo-full. For models participating in DynVarMIP, DynVarMIP diagnostics shall be calculated for both experiments, for all simulations and for the whole integration period.

volc-pinatubo-slab
Non-mandatory slab-ocean experiment, which is proposed to clarify the role of coupled atmosphere-ocean processes (most prominently linked to ENSO) in determining the dynamical response (Table 3). A reference simulation ("control-slab") shall be set up using the spatially nonuniform annual-average mixed layer depth climatology of the coupled model. control-slab should include a minimum of 20-year spin-up followed by a 10-year control integration. A minimum length of integration of 3 years and at least 25 ensemble members are requested for "volc-pinatubo-slab". VIRF diagnostics shall be calculated for all simulations and for the whole integration period. For models participating in DynVarMIP, DynVarMIP diagnostics shall be calculated for all simulations and for the whole integration period.

volc-pinatubo-ini
Non-mandatory experiment to address the impact of volcanic forcing on seasonal and decadal climate predictability (Table 3). The experiment will address the climatic implication of a future Pinatubo-like eruption. The experiment is designed in cooperation with DCPP and is the same as DCPP experiment C3.4 (Boer et al., 2016). It complies with the VolMIP protocol about the forcing and its implementation. The experiment is initialized on 1 November 2015 or on any 1 is volc-long-eq; 2 is volc-pinatubo-full; 3 is volc-pinatubo-surf; 4 is volc-pinatubo-strat; 5 is volc-long-hlN/-hlS; 6 is volc-cluster-ctrl/mill/-21C; 7 is volc-pinatubo-slab; 8 is volc-pinatubo-ini. The red box encompasses the processes related to the climatic response to volcanic forcing that are accounted for in VolMIP; the green box encompasses the processes regarding volcanic forcing that are neglected by VolMIP. other date in November or December for which initialized hindcasts are available (depending on the modeling center). Ten decadal simulations are requested for this experiment. Calculation of DynVarMIP diagnostics is recommended for the first 3 years of integration for at least one realization, but preferably for all of them. DCPP diagnostics must be calculated for all realizations and for the whole integration period.

volc-long-eq
Tier 1 experiment designed to understand the long-term response to a single volcanic eruption with radiative forcing comparable to that estimated for the 1815 eruption of Mt. Tambora, Indonesia (e.g., Oppenheimer, 2003) (Table 1). A recent review paper (Raible et al., 2016) describes the 1815 Tambora eruption as a test case for high impacts on the Earth system. Initialization spans cold/neutral/warm states of ENSO and weak/neutral/strong states of the Atlantic Meridional Overturning Circulation (AMOC), resulting in a ninemember ensemble. A minimum length of integration of 20 years is requested to cover the typical duration of the simulated initial post-eruption AMOC anomaly (e.g., Zanchettin et al., 2012). Longer integration times (50 years) are recommended to capture the later AMOC evolution Pausata et al., 2015b) and related climate anomalies. The recommended AMOC index is defined as the annual-average time series of the maximum value of the zonally integrated meridional stream function in the North Atlantic Ocean in the latitude band 20-60 • N. VIRF diagnostics shall be calculated for the first 3 years of integration and for just one realization. For models participating in Dyn-VarMIP, DynVarMIP diagnostics shall be calculated for the first 3 years of integration and for all realizations.

volc-long-hlN and volc-long-hlS
Non-mandatory experiments that apply the same approach as "volc-long-eq" and allow extending the investigation to the case of idealized strong high-latitude volcanic eruptions (Tables 2 and 3). "Volc-long-hlN" and "volc-long-hlS" are designed as a NH and a southern hemispheric (SH) extratropical eruption, respectively, both with SO 2 injection equal to half the total amount injected for the volc-long-eq experiment. This choice was based on the assumption that for an equatorial eruption the injected mass is roughly evenly distributed between the two hemispheres, increasing comparability between volc-long-eq and volc-long-hlN/hlS as both should yield similar forcing over the eruption's hemisphere (but see Sect. 3.3). The initialization procedure and required integration length are the same as for volc-long-eq. Both experiments are expected to contribute to open questions about the magnitude of the climatic impact of high-latitude eruptions, especially concerning the interhemispheric response. VIRF diagnostics shall be calculated for the first 3 years of Geosci. Model Dev., 9, 2701-2719, 2016 www.geosci-model-dev.net/9/2701/2016/ The eruption magnitude corresponds to recent estimates for the 1815 Tambora eruption (Sigl et al., 2015), the largest tropical eruption of the last 5 centuries, which was linked to the so-called "year without a summer" in 1816.
piControl, 1 April 9 20 180 Uncertainty in the climatic response to strong volcanic eruptions, with focus on coupled ocean-atmosphere feedbacks and interannual-to-decadal global as well as regional responses. The mismatch between reconstructed and simulated climatic responses to historical strong volcanic eruptions, with focus on the role of simulated background internal climate variability. volc is volcano; long is long-term simulation; pinatubo is short-term simulation of the 1991 Pinatubo eruption; eq is Equator; full is full-forcing simulation; surf is shortwave forcing only; strat is stratospheric thermal forcing only.
integration and for just one realization. For models participating in DynVarMIP, DynVarMIP diagnostics shall be calculated for the first 3 years of integration, for all realizations.
The eruption strength is about 4 times stronger than that estimated for the Mt. Katmai/Novarupta eruption in 1912 (Oman et al., 2005). The eruption used in volc-long-hlN should not be considered directly comparable to the 1783-84 Laki eruption -one of the strongest high-latitude eruptions that occurred in historical times -since the experiment does not try to reproduce the very specific characteristics of Laki, including multistage releases of large SO 2 mass paced at short temporal intervals (e.g., Thordarson and Self, 2003;Schmidt et al., 2010;Pausata et al., 2015b).

volc-cluster-ctrl
This non-mandatory experiment investigates the climatic response to a close succession of strong volcanic eruptions, socalled "volcanic cluster" ( Table 2). The experiment is motivated by the large uncertainties in the multidecadal and longer-term climate repercussions of multiple eruptions, including volcanic double events (e.g., Toohey et al., 2016b) and prolonged periods of strong volcanic activity (e.g., Miller  (Zanchettin et al., 2015a;Winter et al., 2015). In addition, long-term repercussions may be relevant for the initialization of CMIP6 historical simulations. At least an ensemble of three 50-year simulations is requested. Due to the long-term focus of the experiment, selection of initialization states is of second-order importance. Nonetheless, it is recommended to sample initial states pacing them at a minimum 50-year intervals. Initial states shall be sampled from the piControl for consistency with the volclong experiments.

volc-cluster-mill
A parallel experiment to "volc-cluster-ctrl" using restart files from PMIP-past1000 instead of from piControl (see Table 3). Starting from a climate state that experienced realistic past natural forcing, this experiment allows us to explore the sensitivity of the ocean response to the initial state (e.g., Gregory, 2010;Zanchettin et al., 2013a). "volc-cluster-mill" is more suitable for a direct comparison with early instrumental data and paleoclimate reconstructions and allows one to explore the role of ocean initial conditions on sea ice response, ocean response, and surface temperature response by comparison with volc-cluster-ctrl.
This non-mandatory experiment requires that at least one PMIP-past1000 realization has been performed. One simulation is requested, but an ensemble of three simulations is recommended. The proper experiment starts in the year 1809 as volc-cluster-ctrl. However, the simulation must be initialized in 1 January 1790 to avoid interferences due to the decadal drop of solar activity associated with the Dalton Minimum. Hence, the experiment proper lasts 50 years as volc-clusterctrl, but a total of 69 years for each ensemble member are actually requested. Different members of the volc-cluster-mill ensemble can be obtained by either using restart files from different ensemble members of PMIP-past1000, if available, or through introducing small perturbations to the same restart file. All external forcings, except volcanic forcing, are set as a perpetual repetition of the year 1790 for the full duration of the experiment.

volc-cluster-21C
A parallel experiment to volc-cluster-ctrl using restart files from the end of the historical simulation instead of from piControl, and boundary conditions from the 21st-century SSP2-4.5 scenario experiment of ScenarioMIP (O'Neill et al., 2016), except for volcanic forcing during the volcanic cluster period (see Table 3). The experiment is designed to explore the climatic response to volcanic eruptions under warmer background conditions compared to preindustrial climates and to investigate the potential uncertainties in future climate projections due to volcanic activity. The experiment uses the same volcanic forcing used in volc-cluster-ctrl/mill, with the first eruption of the cluster (i.e., the 1809 eruption) placed on the year 2015. Simulations shall be run to the Outstanding questions about the magnitude of the climatic impact of highlatitude eruptions.
volc is volcano; long is long-term simulation; pinatubo is short-term simulation of the 1991 Pinatubo eruption; eq is Equator; slab is slab ocean simulation; ini is simulation initialized for decadal prediction; mill is initial conditions from full forcing transient simulation of the last millennium; 21C is 21st-century scenario experiment; hlS is southern hemispheric high-latitude eruption.
end of the 21st century for full comparability with the corresponding scenario simulation. At the end of the volcanic cluster, volcanic forcing input shall be kept constant at the same constant value prescribed for the piControl simulation for consistency with the SSP2-4.5 scenario experiment. We encourage modeling groups that are interested in both VolMIP and ScenarioMIP to also coordinate experiments where the same volcanic cluster is placed later on in the scenario integration (e.g., with the first eruption in the year 2050).

Implementation: general aspects
VolMIP identifies a volcanic forcing data set for each experiment included in the protocol. The forcing parameters either are provided in terms of aerosol optical properties and distributions in time and space, as for the case when available data were identified as consensus reference, or can be calculated based on the tool and guidelines described in the protocol. The latter is the case for the volc-long and volc-cluster experiments that use forcing input data specifically generated for VolMIP.
In addition, the implementation of the forcing (e.g., spectral interpolation) is constrained to ensure that the imposed radiative forcing is consistent across the participating models. Surface albedo changes due to tephra deposition and indirect cloud radiative effects are neglected in all the experiments.

volc-pinatubo
volc-pinatubo-full will use the CMIP6 stratospheric aerosol data set (Thomason et al., 2016) for the volcanic forcing of the 1991 Pinatubo eruption, which is compiled for the CMIP6 historical experiment. Specifically, the reference stratospheric aerosol forcing data set for the CMIP6 historical experiment includes model-specific data for aerosol extinction, single scattering albedo, and asymmetry factor, all as a function of latitude, height, and the spectral bands of the model (see ftp://iacftp.ethz.ch/pub_read/luo/CMIP6 and https://pcmdi.llnl.gov/projects/input4mips). We recommend following the same protocol for implementation of the forcing in the historical experiment and therefore recommend to replace forcing input data below the model tropopause by climatological or other values of tropospheric aerosol used by the models. volc-pinatubo-surf and volc-pinatubo-strat will not account for forcing based on imposed aerosol optical properties as is the usual approach in VolMIP. Instead, they will use output from the corresponding volc-pinatubo-full experiment. Specifically, volc-pinatubo-surf will specify a prescribed perturbation to the shortwave flux to mimic the attenuation of solar radiation by volcanic aerosols, and therefore the cooling of the surface. The goal is to isolate the impact of shortwave reflection from the impact of aerosol heating in the stratosphere. The changes must be prescribed at the top of atmosphere under clear sky conditions (variable swtoafluxaerocs of VIRF). Similarly, volc-pinatubo-strat will specify a prescribed perturbation to the total (longwave plus shortwave) radiative heating rates, seeking to mimic the local impact of volcanic aerosol (variables zmlwaero and zmswaero of VIRF). This must be implemented by adding an additional temperature tendency. VolMIP does not enforce the same perturbation across all models in volc-pinatubosurf and volc-pinatubo-strat, as for both experiments priority is given to the consistency with the corresponding volcpinatubo-full experiment.

volc-long and volc-cluster
These experiments are based on pre-industrial volcanic events for which no direct observation is available. VolMIP recognizes the need to overcome the uncertainties and the limitations of currently available volcanic forcing data sets for the pre-industrial period (see Fig. 1a), which poses the need to identify a single, consensus forcing data set for each one of the volc-long and volc-cluster experiments. Therefore, for the volc-long-eq experiment, coordinated simulations of the 1815 eruption of Mt. Tambora (see Table 5) were performed with different climate models including modules for stratospheric aerosol microphysics and chemistry (chemistry-climate models). The imposed SO 2 injection of 60 Tg at the Equator used in these simulations is deduced from reanalysis of bipolar ice-core data used in recent vol-canic forcing reconstructions (Stoffel et al., 2015;Gao et al., 2008) and calculations based on geological data (Self et al., 2004). The easterly QBO phase and altitude of injection are based on satellite and lidar observations of QBO, SO 2 , and sulfate after the Pinatubo eruption (McCormick and Veiga, 1992;Read et al., 1993;Herzog and Graf, 2010). The results show large uncertainties in the estimate of volcanic forcing parameters derived from different state-of-the-art chemistryclimate models perturbed with the same sulfur injections (Fig. 3a). How these results are traced back to the different treatment of aerosol microphysics and climate physical processes in the different models is the subject of a dedicated study. Here, we only conclude that existing uncertainties prevent the identification, within the time constraints of the CMIP6 schedule, of a single consensus forcing estimate for a given volcanic eruption based on a multi-model ensemble with current chemistry-climate models.
Therefore, VolMIP proposes for the volc-long and volccluster experiments forcing data sets constructed with the Easy Volcanic Aerosol (EVA) module version 1.0 (Toohey et al., 2016a). EVA provides an analytic representation of volcanic stratospheric aerosol forcing, prescribing the aerosol's radiative properties and primary modes of spatial and temporal variability. It creates volcanic forcing from a given eruption sulfur injection and latitude with an idealized spatial and temporal structure, constructed so as to produce good agreement with observations of the aerosol evolution following the 1991 Pinatubo eruption. Scaling to larger eruption magnitudes is performed in a manner similar to the forcing reconstruction of Crowley and Unterman (2013). EVA is also used to construct the volcanic forcing data set used for the PMIP-past1000 experiment (Kageyama et al., 2016). This augments the comparability between PMIP and VolMIP results concerning those eruptions that are featured by both MIPs. The EVA module outputs data resolved for given latitudes, heights, and wavelength bands. It therefore is an improvement compared to previously available volcanic forcing data sets for the pre-observational period. The forcing sets produced with EVA have the same format as the CMIP6 standard forcing files, i.e., aerosol extinction, single scattering albedo, and asymmetry factor, all as a function of latitude, height, and the spectral bands of the model. The aerosol forcing produced by EVA decays to 0 around the tropopause. Therefore, differently from the forcing used in the volc-pinatubo experiments, no clipping of the forcing is necessary at the tropopause for experiments using EVA forcing. Toohey et al. (2016a) provide technical details about EVA.
VolMIP requests that all modeling groups use EVA to generate the specific forcing input data set for their model, using the same sulfur emission estimates to be specified for use in the PMIP-past1000 experiment. Figure 3 provides an overview of the EVA forcing for an estimated SO 2 injection for the 1815 Tambora eruption of 56.2 Tg to be used in volc-long-eq and volc-cluster experiments. volc-cluster ex-  (Sheng et al., 2015), UM-UKCA (Dhomse et al., 2014), and CAMB-UPMC-M2D (Bekki, 1995;Bekki et al., 1996). For models producing an ensemble of simulations, the line and shading are the ensemble mean and ensemble standard deviation respectively. periments also include all eruptions represented in the PMIP-past1000 experiment for the overlapping period.
The reference SO 2 emission for the volc-long-hlN/hlS experiments is equal to one-half the Tambora value. The evolution of aerosol optical depth (AOD) by EVA for a NH highlatitude injection of 28.1 Tg of SO 2 is illustrated in Fig. 4. The NH average AOD for the volc-long-hlN and volc-longeq experiments are quite similar in magnitude and temporal structure. Differences occur mainly due to the seasonal dependence of the tropical-to-extratropical transport parameterized in EVA. The reduced stratospheric transport into the NH in the summer months after the April eruptions leads to a time lag in the peak NH mean AODs for volc-long-eq compared to volc-long-hlN. It also leads to generally somewhat less aerosol transported to the Northern compared to the Southern Hemisphere for volc-long-eq, which explains the lower peak AOD for this experiment than for volc-long-hlN. Similar considerations stand for volc-long-hlS.

Follow-up research and synergies with other modeling activities
We expect the VolMIP experiments not only to generate broad interest within the climate modeling community but also to stimulate research across many different branches of climate sciences. Cooperation between VolMIP and other ongoing climate modeling initiatives and MIPs increases VolMIP's relevance for climate model evaluation. In particular, synergies be-tween VolMIP and the WCRP/SPARC Stratospheric Sulfur and its Role in Climate (SSiRC) coordinated multimodel initiative (Timmreck et al., 2016b) as well as between VolMIP and the Radiative Forcing Model Intercomparison Project (RFMIP) (Pincus et al., 2016, this issue) will help to building a scientific basis to distinguish between differences in volcanic radiative forcing data and differences in climate model response to volcanic forcing. VolMIP provides a welldefined set of forcing parameters in terms of aerosol optical properties and is thus complementary to SSiRC, which uses global aerosol models to investigate radiative forcing uncertainties associated with given SO 2 emissions. Precise quantification of the forcing to which models are subject is central for both RFMIP and VolMIP: RFMIP has planned transient volcanic and solar forcing experiments with fixed preindustrial sea-surface temperature to diagnose volcanic and solar effective forcing, instantaneous forcing, and adjustments, which is complementary to the volc-pinatubo experiments of VolMIP.
VolMIP has synergies with the Geoengineering Model Intercomparison Project (GeoMIP; Kravitz et al., 2015), which includes proposals to simulate a long-duration stratospheric aerosol cloud to counteract global warming. Furthermore, PMIP and VolMIP provide complementary perspectives on one of the most important and less understood factors affecting climate variability during the last millennium. Specifically, VolMIP systematically assesses uncertainties in the climatic response to volcanic forcing associated with different initial conditions and structural model differences. In Geosci. Model Dev., 9, 2701Dev., 9, -2719Dev., 9, , 2016 www.geosci-model-dev.net/9/2701/2016/ contrast, the PMIP-past1000 experiment describes the climatic response to volcanic forcing in long transient simulations where related uncertainties are due to the reconstruction of past volcanic forcing, the implementation of volcanic forcing within the models, initial conditions, the presence and strength of additional forcings, and structural model differences. The "past1000_volc_cluster" experiment of PMIP consists of an ensemble of full-forcing simulations covering the early 19th century whose design is aligned with VolMIP volc-cluster experiments (Jungclaus et al., 2016, this issue). This hierarchy of volcanic cluster experiments will allow us to investigate the interactions between different natural forcing factors and the role of background climate conditions during one of coldest periods of the last millennium, when discrepancies exist between information from available climate simulations and reconstructions (e.g., Winter et al., 2015;Zanchettin et al., 2015a). Modeling groups who participate in both VolMIP and PMIP are encouraged to output the VIRF diagnostics for the following tropical eruptions simulated in the past1000 experiment: 1257 Samalas, 1453 Kuwae, 1600 Huaynaputina, 1809 Unidentified, and 1815 Tambora. VIRF diagnostics will be calculated for a period of 5 years starting from the eruption year and will be useful for future studies to expand the investigation based on volcpinatubo-strat and volc-pinatubo-surf. VolMIP and the Detection and Attribution Model Intercomparison Project (DAMIP) (Gillet et al., 2016, this issue) share the CMIP6 science theme of characterizing forcing. The experiments "histALL", "histNAT", "histVLC", and "histALL_estAER2" of DAMIP include the 1991 Pinatubo eruption within transient climate situations and therefore provide context to the volc-pinatubo set of VolMIP experiments. The experiment "volc-cluster-21C" is built on and complements the SSP2-4.5 scenario experiment of ScenarioMIP.
VolMIP and DCPP are working closely together on the impact of volcanic eruptions on seasonal and decadal predic-tions, and they have designed a common experiment ("volcpinatubo-ini" and the DCPP experiment C3.4 are different labels for the same experiment). DynVarMIP puts a particular emphasis on the two-way coupling between the troposphere and the stratosphere, and it is therefore deeply involved in the design and analysis of the volc-pinatubo-full/strat/surf experiments.
We envisage follow-up research stimulated by VolMIP's links to the Grand Challenges of the World Climate Research Program (Brasseur and Carlson, 2015) focusing on the following.
-"Clouds, circulation, and climate sensitivity," in particular through improved characterization of volcanic forcing and improved understanding of how the hydrological cycle and the large-scale circulation respond to volcanic forcing. Volcanic sulfate aerosols can affect clouds also by acting as cloud condensation nuclei (Graf et al., 1997; see also : Mather et al., 2004;Seifert et al., 2011;Schmidt et al., 2012;Meyer et al., 2015), thereby affecting regional precipitation (e.g., . Volcanic eruptions are among the natural aerosol sources producing the strongest simulated cloud albedo effect (Rap et al., 2013). Assessments of cloud responses to volcanic forcing in VolMIP must take into account that in all experiments only the radiative effects of volcanic aerosols are represented (see Sect. 3). VolMIP further contributes to the initiative on leveraging the past record through planned experiments describing the climatic response, in an idealized context, to historical eruptions that are not (or not sufficiently) covered by CMIP6-DECK, historical, or other MIPs.
-"Water availability," in particular through the assessment of how strong volcanic eruptions affect the monsoon systems and the occurrence of extensive and prolonged droughts.
-"Melting ice and global consequences," in particular concerning the onset of volcanically forced long-term feedbacks involving the cryosphere, which is suggested by recent studies (e.g., Miller et al., 2012, Berdahl andZanchettin et al., 2014).
Ocean heating and circulation, annual-to-decadal timescales, and short-lived climate forcers were identified among those areas where the WCRP's grand challenges seem most in need of broadened or expanded research (Brasseur and Carlson, 2015). VolMIP is expected to advance knowledge in all such areas. VolMIP is designed based on a limited number of idealized volcanic forcing experiments. We recognize that an eruption's characteristics are a major source of uncertainty for its climatic impacts. We encourage modeling groups interested in performing sensitivity experiments based on the experiments proposed here and concerning, e.g., the magnitude and the season of the eruption to use VolMIP as a platform for coordinating such efforts within a multi-model framework. The flexibility of the EVA module is, in this regard, a valuable advantage.
Follow-up research must take into account that the design of the simulations reflects necessary constraints on the overall resources required to perform the whole set of mandatory experiments. This implies limitations such as the possibly insufficient representation of the whole range of variability of climate modes not explicitly accounted in the design. This includes, for instance, the SH annular mode (e.g., Karpechko et al., 2010;Zanchettin et al., 2014) and modes of internal stratospheric variability like the QBO. VolMIP's experiments are designed based on observed or reconstructed forcing characteristics of historical volcanic eruptions (1815 Tambora and 1991 Pinatubo for the Tier 1 experiments). Comparison with observational or reconstructed evidence must, however, take into account the idealized character of VolMIP's experiments, including the simplified setting for generating volcanic forcing parameters provided by the EVA module. Specifically, the evolution of the volcanic aerosol cloud in EVA does not account for the meteorological conditions at the time of the eruption and cannot represent the aerosol properties at anything other than the largest scales. Eccentricities of the aerosol evolution, due to variations in stratospheric transport such as the QBO, midlatitude mixing, and the polar vortex, cannot be reliably included in any reconstruction of aerosol forcing which relies only on sparse proxy records. Additionally, observation-simulation assessments need to include the identification of the origins and consequences of systematic model biases affecting the dynamical climatic response to volcanic forcing.

Summary
VolMIP is a coordinated climate modeling activity to advance our understanding of how the climate system responds to volcanic forcing. VolMIP contributes to identifying the causes that limit robustness in simulated volcanically forced climate variability, especially concerning differences in models' treatment of physical processes. It further allows for the evaluation of key climate feedbacks in coupled climate simulations following relatively well-observed eruptions.
The protocol detailed in this paper aims at improving comparability across the participating climate models by (i) constraining the applied radiative forcing, prescribing for each experiment a consensus set of forcing parameters to be employed, and (ii) constraining the background climate conditions upon which the volcanic forcing is applied. The protocol entails three main sets of experiments: the first focusing on the short-term (seasonal to interannual) atmospheric response, the second focusing on the long-term (interannual to decadal) response of the coupled ocean-atmosphere system, and the third focusing on the climatic response to close successions of volcanic eruptions (so-called volcanic clusters). Experiments are further prioritized into three tiers. Careful sampling of initial climate conditions and the opportunity to consider volcanic eruptions of different strengths will allow a better understanding of the relative role of internal and externally forced climate variability during periods of strong volcanic activity, hence improving both the evaluation of climate models and our ability to accurately simulate past and future climates.

Code and data availability
The model output from the all simulations described in this paper will be distributed through the Earth System Grid Federation (ESGF) with digital object identifiers (DOIs) assigned. As in CMIP5, the model output will be freely accessible through data portals after registration. In order to document CMIP6's scientific impact and enable ongoing support of CMIP, users are obligated to acknowledge CMIP6, the participating modeling groups, and the ESGF centers (see details on the CMIP Panel website at http://www.wcrp-climate. org/index.php/wgcm-cmip/about-cmip). Further information about the infrastructure supporting CMIP6, the metadata describing the model output, and the terms governing its use are provided by the WGCM Infrastructure Panel (WIP) in their invited contribution to this Special Issue. Along with the data, the provenance of the data will be recorded, and DOIs will be assigned to collections of output so that they can be appropriately cited. This information will be made readily available so that published research results can be verified and credit can be given to the modeling groups providing the data. In order to run the experiments, data sets for natural and anthropogenic forcings defined for the DECK and the CMIP6 historical simulations are required. These forcing data sets are described in separate invited contributions to this special issue. In addition, specific volcanic forcings are required for the VolMIP experiments that are described in this paper. The forcing data sets for the volc-pinatubo experiments will be made available through the ESGF with version control and DOIs assigned. EVA version 1.0 code, a user's manual, sample input data files, and driver scripts are included as a Supplement by Toohey et al. (2016a). The data request, which contains the list of all variables requested for each model intercomparison project, is available at https:// www.earthsystemcog.org/projects/wip/CMIP6DataRequest. Acknowledgements. VolMIP is dedicated to the memory of Thomas Crowley (1948, whose pioneering work on volcanic forcing on climate has inspired many researchers and strongly contributed to the foundation upon which VolMIP was built. We thank the broad scientific community for the stimulating discussions that motivated VolMIP and for their contribution to the definition of the experiments and the comments on this draft. We thank the climate modeling groups who have committed to perform the VolMIP experiments. We are grateful to the CMIP6 Panel who guided our work throughout the endorsement process, in particular concerning their recommendation to upgrade the volc-pinatubo-strat/surf experiments, which led to a stronger Tier 1 experimental palette. We thank Christoph Raible and an anonymous reviewer for their helpful comments on the manuscript and on the VolMIP protocol. The volc-cluster-21C experiment was added to the VolMIP protocol following a suggestion by Ingo Bethke. We thank Martin Juckes for his assistance in preparing the CMIP6 data request and Karl Taylor for his assistance throughout the endorsement process. We thank Andrew Schurer for discussion about solar forcing. We acknowledge the support provided by the World Climate Research Programme (WCRP), which is responsible for the CMIP5. M. Khodri acknowledges grant support from the LABEX L-IPSL, funded by the French Agence Nationale de la Recherche under the "Programme d'Investissements d'Avenir"(grant no. ANR-10-LABX-18-01), a grant from the Agence Nationale de la Recherche MORDICUS, under the "Programme Environnement et Société" (rant no. ANR-13-SENV-0002-02) and benefited from the IPSL data access PRODIGUER. C. Timmreck acknowledges support from the German Federal Ministry of Education (BMBF), research program "MiKlip" (FKZ: 01LP1517B/01LP1130A) and the European Project 603557-STRATOCLIM under program FP7-ENV.2013.6.1-2; STRAOCLIM also partially supported S. Bekki's work. M. Toohey acknowledges support from the BMBF, research program "MiKlip" (FKZ: 01LP1130B). A. Robock is supported by US National Science Foundation (NSF) grant AGS-1430051. E. P. Gerber acknowledges NSF grant AGS-1264195. A. Schmidt was supported by an Academic Research Fellowship from the University of Leeds and NERC grant NE/N006038/1. W. T. Ball was funded by the Swiss National Science Foundation projects 149182 and 163206. G. Hegerl is supported by the ERC project TI-TAN (EC-320691), by NCAS and the Wolfson Foundation and the Royal Society as a Royal Society Wolfson Research Merit Award (WM130060) holder. E. Rozanov was partially supported by the Swiss National Science Foundation under grant CRSII2_147659 (FUPSOL II).
Edited by: S. Valcke Reviewed by: C. C. Raible and one anonymous referee