The Interactive Stratospheric Aerosol Model Intercomparison Project (ISA-MIP): Motivation and experimental design

The Stratospheric Sulfur and its Role in Climate (SSiRC) Interactive Stratospheric Aerosol Model Intercomparison Project (ISA-MIP) explores uncertainties in the processes that connect volcanic emission of sulfur gas species and the radiative forcing associated with the resulting enhancement of the stratospheric aerosol layer. The central aim of ISA-MIP is to constrain and improve interactive stratospheric aerosol models and reduce uncertainties in the stratospheric aerosol forcing by comparing results of standardized model experiments with a range of observations. In this paper we present four co-ordinated inter-model experiments designed to investigate key processes which influence the formation and temporal development of stratospheric aerosol in different time periods of the observational record. The Background (BG) experiment will focus on microphysics and transport processes under volcanically quiescent conditions, when the stratospheric aerosol is controlled by the transport of aerosols and their precursors from the troposphere to the stratosphere. The Transient Aerosol Record (TAR) experiment will explore the role of smallto moderate-magnitude volcanic eruptions, anthropogenic sulfur emissions, and transport processes over the period 1998– 2012 and their role in the warming hiatus. Two further experiments will investigate the stratospheric sulfate aerosol evolution after major volcanic eruptions. The Historical Eruptions SO2 Emission Assessment (HErSEA) experiment will focus on the uncertainty in the initial emission of recent large-magnitude volcanic eruptions, while the Pinatubo EmPublished by Copernicus Publications on behalf of the European Geosciences Union. 2582 C. Timmreck et al.: ISA-MIP: motivation and experimental design ulation in Multiple models (PoEMS) experiment will provide a comprehensive uncertainty analysis of the radiative forcing from the 1991 Mt Pinatubo eruption.


Introduction
Stratospheric aerosol is an important component of the Earth system, which influences atmospheric radiative transfer, composition, and dynamics, thereby modulating the climate. The effects of stratospheric aerosol on climate are especially evident when the opacity of the stratospheric aerosol layer is significantly increased after volcanic eruptions. Through changes in the radiative properties of the stratospheric aerosol layer, volcanic eruptions are a significant driver of climate variability (e.g. Myhre et al., 2013;Zanchettin et al., 2016). Major volcanic eruptions inject vast amounts of SO 2 into the stratosphere, which is converted into sulfuric acid aerosol with an e-folding time of about a month, which might be prolonged due to OH depletion within the dense SO 2 cloud in the first weeks following a large volcanic eruption .
Observations show that the stratospheric aerosol layer remains enhanced for several years after major eruptions (SPARC, 2006). Such long-lasting volcanic perturbations cool the Earth's surface by scattering incoming solar radiation and warm the stratosphere by the absorption of infrared solar and long-wave terrestrial radiation which affect the dynamical structure as well as the chemical composition of the atmosphere (e.g. Robock, 2000;Timmreck, 2012). The consequent heating of the stratospheric sulfate layer, impacts stratospheric dynamics in various ways. It amplifies the Brewer-Dobson circulation (BDC) and modifies the equator-to-pole temperature gradient, driving changes in geostrophic zonal winds and the propagation of atmospheric waves (e.g. Bittner et al., 2016;Toohey et al., 2014) and strengthening the polar vortex (e.g. Charlton-Perez et al., 2013). The heating from continued SO 2 injection to the stratosphere may further disturb or even "shut down" the quasi-biennial oscillation (QBO) (e.g. Aquila et al., 2014;Niemeier and Schmidt, 2017). The radiatively driven changes also influence the transport and the lifetime of longlived species (N 2 O, CH 4 ) (Pitari et al., 2016a;Visioni et al., 2017). The enhanced stratospheric aerosol layer after large volcanic eruptions also causes large mean age-of-air variations on timescales of several years (e.g. Ray et al., 2014;Muthers et al., 2016;Garfinkel et al., 2017).
As the ocean has a much longer memory than the atmosphere, large volcanic eruptions could have a long-lasting impact on the climate system that extends beyond the duration of the volcanic forcing (e.g., Zanchettin et al., 2012;Swingedouw et al., 2017). The chemical and radiative effects of the stratospheric aerosol are strongly influenced by its particle size distribution. Heterogeneous chemical reac-tions, which most notably lead to substantial ozone depletion (e.g. WMO/UNEP, 2007, chap. 3), take place on the surface of the stratospheric aerosol particles and are dependent on the aerosol surface area density. Aerosol particle size determines the scattering efficiency of the particles (e.g. Lacis et al., 1992) and their atmospheric lifetime (e.g., Pinto et al., 1989;Timmreck et al., 2010). Smallermagnitude eruptions than the 1991 Mt Pinatubo eruption can also have significant impacts on climate. It is now established that a series of relatively small-magnitude volcanic eruptions caused the increase in stratospheric aerosol observed between 2000 and 2010 based on ground and satelliteborne observations (Vernier et al., 2011b;Neely III et al., 2013). Studies have suggested that this increase in stratospheric aerosol partly counteracted the warming due to increased greenhouse gases over that period (e.g. Solomon et al., 2011;Ridley et al., 2014;Santer et al., 2015). Small to moderate volcanic eruptions after 2008 also show an impact on the stratospheric circulation in the Northern Hemisphere, in particular on the pattern of decadal mean age variability and its trends during 2002-2011 (Diallo et al., 2017). Since the 2006 SPARC Assessment of Stratospheric Aerosol Properties Report (SPARC, 2006, herein referred to as ASAP2006) the increase in observations of stratospheric aerosol and its precursor gases and in the number of models which treat stratospheric aerosol interactively has advanced scientific understanding of the stratospheric aerosol layer and its effects on the climate (Kremser et al., 2016, herein referred to as KTH2016). In particular, research findings have given to the community a greater awareness of the role of the tropical tropopause layer (TTL) as a distinct pathway for transport into the stratosphere, of the interactions between stratospheric composition and dynamics, and of the importance of moderate-magnitude eruptions in influencing the stratospheric aerosol loading. In addition, over the last decade several new satellite instruments producing observations relevant to the stratospheric aerosol layer have become operational. For example, we now have a 2002-2012 record of global altitude-resolved SO 2 , carbonyl sulfide (OCS) and aerosol volume density measurements provided by the Michelson Interferometer for Passive Atmospheric Sounding Environmental Satellite (MIPAS ENVISAT; Höpfner et al., 2013Glatthor et al., 2015;Günther et al., 2018). Furthermore aerosol extinction vertical profiles are available from limb-profiling instruments, such as the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY;2002-2012Bovensmann et al., 1999;von Savigny et al., 2015), the Optical Spectrograph and InfraRed Imager System (OSIRIS;2001-present;Bourassa et al., 2007), and the Ozone Mapping and Profiler Suite-Limb Profiler (OMPS-LP;2011-present;Rault and Loughman, 2013), and from the active sensor lidar measurements such as the Cloud-Aerosol Transport System (CATS; 2015present; Yorks et al., 2015) and Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP;2006-present;Vernier et Geosci. Model Dev., 11, 2581-2608, 2018 www.geosci-model-dev.net/11/2581/2018/ al., 2009). Existing measurements have become more robust, for example by homogenizing the observations of aerosol properties derived from optical particle counter (OPC) and satellite measurements during stratospheric aerosol background periods (Kovilakam and Deshler, 2015), which previously showed large differences (Thomason et al., 2008). Other efforts include combining and comparing different satellite data sets (e.g. Rieger et al., 2015). However, some notable discrepancies still exist between different measurement data sets. For example,  showed that aircraft-borne Focused Cavity Aerosol Spectrometer (FCAS) measurements of the particle size distribution during the late 1990s yield surface area densities a factor 1.5 to 3 higher than that derived from Stratospheric Aerosol and Gases Experiment (SAGE-II) measurements. On the modelling side there has been an increasing amount of global three-dimensional stratospheric aerosol models developed within the last years and used by research teams around the world (KTH2016). The majority of these global models explicitly simulate aerosol microphysical processes and treat the full life cycle of stratospheric aerosol, from the initial injection of sulfur-containing gases and their transformation into aerosol particles to their final removal from the stratosphere. Several of these models also include the interactive coupling between aerosol microphysics, atmospheric chemistry, dynamics, and radiation.
Given the improvements in observations and modelling of stratospheric aerosol since ASAP2006, we anticipate further advances in our understanding of stratospheric aerosol by combining the recent observational record with results from the current community of interactive stratospheric aerosol models. An Interactive Stratospheric Aerosol Model Intercomparison Project (ISA-MIP) has therefore been developed within the Stratospheric Sulfur and its Role in Climate (SSiRC) framework. The SPARC activity SSiRC (http: //www.sparc-ssirc.org, last access: 26 June 2018) was initiated with the goal of reducing uncertainties in the properties of stratospheric aerosol and assessing its climate forcing. In particular, constraining simulations of historical eruptions with available observational data sets gives the potential to evaluate and substantially improve the accuracy of the volcanic forcing data sets used in climate models. This will not only enhance consistency with observed stratospheric aerosol properties and the underlying microphysical, chemical, and dynamical processes but also improve their conceptual understanding. The use of such new volcanic forcing data sets has the potential to increase the reliability of the simulated climate impacts of volcanic eruptions, which have been identified as a major influence on decadal global mean surface temperature trends in climate models (Marotzke and Forster, 2015). The first international model inter-comparison of global stratospheric aerosol models was carried out within ASAP2006 and indicated that model simulations and satellite observations of stratospheric background aerosol extinc-tion agree reasonably well in the visible wavelengths but not in the infrared. It also highlighted systematic differences between modelled and retrieved aerosol size, which have later been linked to shortcomings in the retrieval methods with regard to the detection of Aitken mode sized particles(R < 50 nm) in the lower stratosphere (Thomason et al., 2008;Hommel et al., 2011). While in ASAP2006, only five global two-and three-dimensional stratospheric aerosol models were included in the analysis, there are today more than 15 global three-dimensional models available worldwide (KTH2016). No large comprehensive model intercomparison has ever been carried out to identify differences in stratospheric aerosol properties amongst these new interactive models. The models often show significant differences in terms of their simulated transport, chemistry, and removal of aerosols with inter-model differences in stratospheric circulation, radiative dynamical interactions, and exchange with the troposphere likely to play an important role (e.g. Aquila et al., 2012;Niemeier and Timmreck, 2015). The formulation of microphysical processes is also important (e.g. English et al., 2013), as are differing assumptions regarding the sources of stratospheric aerosols and their precursors. A combination of these effects likely explain the large inter-model differences as seen in Fig. 1 among the global stratospheric aerosol models which participated in the Tambora intercomparison, a precursor to the "consensus volcanic forcings" aspects of the CMIP6 Model Intercomparison Project on the climatic response to volcanic forcing (VolMIP; Zanchettin et al., 2016;Marshall et al., 2018). Even for the relatively recent 1991 Mt Pinatubo eruption, to reach the best agreement with observations, interactive stratospheric models have used a wide range of SO 2 injections amounts, from as low at 10 Tg SO 2 (Dhomse et al., 2014;Mills et al., 2016) to as high as 20 Tg SO 2 (e.g. Aquila et al., 2012;English et al., 2013).
Volcanic eruptions are commonly taken as a real-world analogue for hypothesized geoengineering via stratospheric sulfur solar radiation management (SS-SRM). Indeed many of the assumptions and uncertainties related to simulated volcanic perturbations of stratospheric aerosol are also frequently given as caveats around research findings from modelling studies which seek to quantify the likely effects from SS-SRM (e.g. National Research Council, 2015), the mechanism steps between sulfur injection and radiative cooling being common to both aspects . We expect the analysis of the ISA-MIP experiments to improve our understanding of model sensitivities to key sources of uncertainty and to inform the interpretation of coupled climate model simulations and the next Intergovernmental Panel on Climate Change (IPCC) assessment. It will also provide a foundation for co-operation to assess the atmospheric and climate changes when the next large-magnitude eruption takes place.
In this paper, we introduce the new model intercomparison project ISA-MIP developed within the SSiRC framework. In Figure 1. Uncertainty in estimates of radiative forcing parameters for the 1815 eruption of Mt Tambora: global average aerosol optical depth (AOD) in the visible band from an ensemble of simulations with chemistry-climate models forced with a 60 Tg SO 2 equatorial eruption, from the Easy Volcanic Aerosol (EVA; Toohey et al., 2016b) module with 56.2 Tg SO 2 equatorial eruptions (thick dashed magenta line), from Stoffel et al. (2015), from Crowley and Unterman (2013), and from Gao et al. (2008; aligned so that the eruption starts on April 1815). The estimate for the Pinatubo eruption as used in the CMIP6 historical experiment is also reported for comparison. The black triangle shows latitudinal position and timing of the eruption. Chemistry-climate models are CESM (WACCM) , MAECHAM5-HAM , SOCOL (Sheng et al., 2015a), UM-UKCA (Dhomse et al., 2014), and CAMB-UPMC-M2D (Bekki, 1995;Bekki et al., 1996). For models producing an ensemble of simulations, the line and shading are the ensemble mean and ensemble standard deviation respectively. Figure  Sect. 2 we provide an overview of the current state of stratospheric sulfur aerosol modelling and its greatest challenges. In Sect. 3 we describe the scopes and protocols of the four model experiments planned within ISA-MIP. A concluding summary is provided in Sect. 4.

Modelling stratospheric aerosol: overview and challenges
Before we discuss the current state of stratospheric aerosol modelling and its greatest challenges in detail, we briefly describe the main features of the stratospheric sulfur cycle. We are aware of the fact that the stratospheric aerosol layer also contains organics and inclusions of meteoritic dust (Ebert et al., 2016) and, after volcanic events, also co-exists with volcanic ash (e.g. Pueschel et al., 1994. However, the focus of the ISA-MIP experiments described here is on a comparison with measurements of the overall optical and physical properties of the stratospheric aerosol layer, which is mainly determined by sulfate.

The stratospheric aerosol life cycle
The stratospheric aerosol layer and its temporal and spatial variability are determined by the transport of aerosol and aerosol precursors in the stratosphere and their modification by chemical and microphysical processes (Hamill et al., 1997;ASAP2006;KTH2016). Volcanic eruptions can inject sulfur-bearing gases directly into the stratosphere, which significantly enhances the stratospheric aerosol load for years. A number of observations show that stratospheric aerosol increased over the first decade of the 21st century (e.g. Hofmann et al., 2009;Vernier et al., 2011b;Ridley et al., 2014). Although such an increase was attributed to the possible cause of Asian anthropogenic emission increase (Hofmann et al., 2009), later studies have shown that small-to-moderate magnitude volcanic eruptions are likely to be the major source of this recent increase (Vernier et al., 2011b;Neely III et al., 2013;Brühl et al., 2015). Besides major volcanic eruptions, the photochemical oxidation of OCS, an insoluble gas mainly inert in the troposphere, is a stratospheric source. Tropospheric aerosols and aerosol precursors also enter the stratosphere through the tropical tropopause and through convective updrafts in the Asian and North American monsoons (Hofmann et al., 2009;Hommel et al., 2011;Vernier et al., 2011a;Bourassa et al., 2012;Yu et al., 2015). In the stratosphere, new sulfate aerosol particles are formed by binary homogenous nucleation (Vehkamäki et al., 2002), a process in which sulfuric acid vapour (H 2 SO 4 (g)) and water vapour condense simultaneously to form a liquid droplet. The condensation of H 2 SO 4 (g) onto pre-existing aerosol particles and the coagulation among particles shift the aerosol size distribution to greater radii. This takes place especially under volcanically perturbed conditions, when the concentrations of aerosol in the stratosphere are higher (e.g. Deshler, 2008).
From the tropics, where most of the tropospheric aerosol enters the stratosphere and the OCS chemistry is most active, the stratospheric aerosol particles are transported poleward within the large-scale BDC and removed through gravitational sedimentation and cross-tropopause transport in the extratropical regions. Internal variability associated with the QBO alters the isolation of the tropical stratosphere and subsequently the poleward transport of tropical stratospheric aerosol and modifies its global dispersal, particle size distribution, and residence time (e.g. Trepte and Hitchman, 1992;Hommel et al., 2015;Pitari et al., 2016b) In general, under volcanically perturbed conditions with larger amounts of injected SO 2 , aerosol particles grow to much larger radii than in volcanically quiescent conditions (e.g. Deshler, 2008). The simulation of extremely large volcanic sulfur-rich eruptions shows a shift to particle sizes even larger than observed after the Pinatubo eruption and predicts Geosci. Model Dev., 11, 2581-2608, 2018 www.geosci-model-dev.net/11/2581/2018/ a reduced cooling efficiency compared to moderate eruptions with moderate sulfur injections (e.g. Timmreck et al., 2010;English et al., 2013).

Global stratospheric aerosol models: current status and challenges
A comprehensive simulation of the spatio-temporal evolution of the particle size distribution is a continuing challenge for stratospheric aerosol models. Due to computational constraints, the formation of the stratospheric aerosol and the temporal evolution of its size distribution are usually parameterized with various degrees of complexity in global models. The simplest way to simulate the stratospheric aerosol distribution in global climate models is the mass-only (bulk) approach (e.g. Timmreck et al., 1999aTimmreck et al., , 2003Aquila et al., 2012), where only the total sulfate mass is prognostically simulated and chemical and radiative processes are calculated assuming a fixed typical particle size distribution. More complex methods are size-segregated approaches, such as the modal approach (e.g. Niemeier et al., 2009;Toohey et al., 2011;Brühl et al., 2012;Dhomse et al., 2014;Mills et al., 2016), where the aerosol size distribution is simulated using one or more modes, usually of log-normal shape. The mean radius of each mode of these size distributions varies in time and space. Another common approach is the sectional method (e.g. English et al., 2011;Hommel et al., 2011;Sheng et al., 2015a; for ref prior to 2006 see ASAP2006, chap. 5), where the particle size distribution is divided into distinct size sections. Number and width of the size sections are dependent on the specific model configuration but are fixed throughout time and space. Size sections may be defined by an average radius, or by an average mass of sulfur, and are often spaced geometrically. The choice of methods has an influence on simulated stratospheric aerosol size distributions and therefore on radiative and chemical effects. While previous model intercomparison studies in a box model (Kokkola et al., 2009) or in a two-dimensional framework (Weisenstein et al., 2007) were very useful for the microphysical schemes, they could not address uncertainties in the spatial transport pattern, e.g. transport across the tropopause and the subtropical transport barrier or regional/local differences in wet and dry removal. These uncertainties can only be addressed in a global three-dimensional model framework and with a careful validation with a variety of observational data.
The June 1991 eruption of Mt Pinatubo, with the vast net of observations that tracked the evolution of the volcanic aerosol, provides a unique opportunity to test and validate global stratospheric aerosol models and their ability to simulate stratospheric transport processes. Previous model studies (e.g. Timmreck et al., 1999b;Aquila et al., 2012) highlighted the importance of an interactive online treatment of stratospheric aerosol radiative heating for the simulated transport of the volcanic cloud. A crucial point is the simulation of the tropical stratospheric aerosol reservoir (i.e., the tropical pipe, Plumb, 1996) and the meridional transport through the subtropical transport barrier. Some models show a very narrow tropical maximum in comparison to satellite data (e.g., Dhomse et al., 2014) while others show too fast a transport to higher latitudes and fail to reproduce the long persistence of the tropical aerosol reservoir (e.g. Niemeier et al., 2009;English et al., 2013). Sulfate geoengineering studies confirm the importance of the model-dependent meridional transport through the subtropical barrier (e.g. Niemeier and Timmreck, 2015;Visoni et al., 2018;Kleinschmitt et al., 2018). Reasons for these differences need to be understood with a multimodel comparison study, as suggested for example by Tilmes et al. (2015).

The ISA-MIP experiments
Many uncertainties remain in the model representation of stratospheric aerosol. Figure 2 summarizes the main processes that determine the stratospheric sulfate aerosol mass load, size distribution, and the associated optical properties. The four experiments in ISA-MIP are designed to address these key processes under a well-defined experiment protocol with prescribed boundary conditions (sea surface temperatures (SSTs), emissions). All simulations will be compared to observations to evaluate model performances and understand model strengths and weaknesses. The experiment Background (BG) focuses on microphysics and transport (Sect. 3.1) under volcanically quiescent conditions, when stratospheric aerosol is only modulated by seasonal 4. What is the confidence interval for volcanic forcing of the Pinatubo eruption simulated by interactive stratospheric aerosol models and to which parameter uncertainties are the predictions most sensitive to? (Sect. 3.4) Table 1 gives an overview over all ISA-MIP experiments, which are described in detail below. In general each ex-periment will include several simulations from which only a subset is mandatory (Tier1). The modelling groups are free to choose in which of the experiments they would like to participate; however, the BG Tier1 simulation is mandatory for all groups and the entry card for the ISA-MIP intercomparison. All model results will be saved in a consistent format (netCDF), made available via https://cera-www. dkrz.de/WDCC/ui/cerasearch/ (last access: 26 June 2018), and compared to a set of benchmark observations. More detailed technical information about data requests can be found in the Supplement and on the ISA-MIP webpage: http://www.isamip.eu (last access: 26 June 2018). It is mandatory for participating models to run with interactive sulfur chemistry (see review in SPARC ASAP2006) in order to capture the oxidation pathway from precursors to aerosol particles, including aerosol growth due to condensation of H 2 SO 4 . Chemistry-climate models (CCMs) with full interactive chemistry follow the Chemistry Climate Initiative (CCMI) hindcast scenario REF-C1 (Eyring et al., 2013, http://www.met.reading.ac.uk/ccmi/?page_id=11, last access: 26 June 2018) for the treatment of chemical fields and emissions of greenhouse gases (GHGs), ozonedepleting substances (ODSs), and very short-lived substances (VSLSs). Sea surface temperatures and sea ice extent are prescribed as monthly climatologies from the MetOffice Hadley Center Observational Dataset (Rayner et al., 2003). An overview of the boundary conditions is included in the Supplement (Table S1). Table S2 reports the inventories to be used for tropospheric emissions of aerosols and aerosol precursors. Anthropogenic sulfur emissions and biomass burning are taken from the Monitoring Atmospheric Composition and Climate (MACC)-CITY climatology (Granier et al., 2011). S emissions from continuously erupting volcanoes are taken into account using Dentener et al. (2006), which is based on Andres and Kasgnoc (1998). OCS concentrations  (3), quantify sensitivity of predicted reduced (5), or Pinatubo perturbation stratospheric standard (8) aerosol properties and radiative parameter set (see effects to uncertainties in injection also Table 10) settings and model processes; quantify and intercompare sources of uncertainty in simulated Pinatubo radiative forcing for the different complexity models a Each model will need to include an appropriate initialization and spin-up time for each ensemble member (∼ 3-6 years depending on model configuration). b As explained in the caption to Table 11 and Sect. 3.4, models will need to restrict the PoEMS parameter scaling to volcanically enhanced air masses (either via a total-sulfur vmr (volume mixing ratio) threshold or a passive volcanic SO 2 tracer). c Although the Pinatubo enhancement to the stratospheric aerosol layer remained apparent until 1997 (e.g. Wilson et al., 2008), whereas the HErSEA experiments will continue for longer, the PoEMS analysis will require only 3 post-eruption years to be run, as this gives sufficient time after the peak aerosol to characterize decay timescales robustly (e.g. ASAP2006, Sect. 5).
are fixed at the surface at a value of 510 pptv (Montzka et al., 2007;ASAP2006 Modelling groups are encouraged to include a set of passive tracers to diagnose the atmospheric transport independently of emissions, mostly following the CCMI recommendations (Eyring et al., 2013). These tracers are listed in Table S3. Models diagnose aerosol parameters as specified in Tables S4 and S5. Additionally, volume mixing ratios of specified precursors are diagnosed.  (1989)  3.1 Stratospheric background aerosol (BG)

Summary of experiment
The overall objective of the BG experiment is to better understand the processes involved in maintaining the stratospheric background aerosol layer, i.e. stratospheric aerosol not resulting from direct volcanic injections into the stratosphere. The simulations prescribed for this experiment are time-slice simulations for the year 2000 with prescribed SST including all sources of aerosols and aerosol precursors except for explosive volcanic eruptions. The result of BG will be a multi-model climatology of aerosol distribution, composition, and microphysical properties in the absence of volcanic eruptions. By comparing models with different aerosol microphysics parameterization and simulations of background circulation with a variety of observational data (Table 2), we aim to assess how these processes impact the simulated aerosol characteristics.

Motivation
The total net sulfur mass flux from the troposphere into the stratosphere is estimated to be about 181 Gg S yr −1 based on simulations by Sheng et al. (2015a) using the SOCOL-AER model, 1.5 times larger than reported in ASAP2006 (KTH2016). This estimate, however, could be highly dependent on the specific characteristics of the model used, such as the strength of convective systems, scavenging efficiency, and the occurrence of stratosphere-troposphere exchange. Therefore, especially in the lower stratosphere, the simulated distribution of stratospheric background aerosol could show a very large inter-model variability.
OCS is still considered the largest contributor to the aerosol loadings in the middle stratosphere. Several studies have shown that the transport to the stratosphere of tropospheric aerosol and aerosol precursors constitutes an important source of stratospheric aerosol (KTH2016 and ref-erences herein) although new in situ measurements indicate that the cross-tropopause SO 2 flux is negligible over Mexico and Central America (Rollins et al., 2017). Observations of the Asian Tropopause Aerosol Layer (ATAL; Vernier et al., 2011a) show that, particularly in the UTLS, aerosol of tropospheric origin can significantly enhance the burden of aerosol in the stratosphere. This tropospheric aerosol has a more complex composition than traditionally assumed for stratospheric aerosol: Yu et al. (2015), for instance, showed that carbonaceous aerosol makes up to 50 % of the aerosol loadings within the ATAL. The rate of stratospheric-tropospheric exchange (STE) is influenced by the seasonality of the circulation and the frequency and strength of convective events in large-scale phenomena such as the Asian and North American monsoon or in small-scale phenomena such as strong storms. Model simulations by Hommel et al. (2015) also revealed significant QBO signatures in aerosol mixing ratio and size in the tropical middle stratosphere (Fig. 3). Hence, the model-specific implementation of the QBO (nudged or internally generated) could impact its effects on the stratospheric transport and, subsequently, on the stratospheric aerosol layer.
In this experiment, we aim to assess the inter-model variability of the background stratospheric aerosol layer and of the sulfur mass flux from the troposphere to the stratosphere and vice versa. We will exclude changes in emissions and focus on the dependence of stratospheric aerosol concentrations and properties on stratospheric transport and STE. The goal of the BG experiment aims to understand how the model-specific transport characteristics (e.g. isolation of the tropical pipe, representation of the QBO and the strength of convective systems) and aerosol parameterizations (e.g. aerosol microphysics and scavenging efficiency) affect the representation of the background aerosol.

Experiment set-up and specifications
The BG experiment prescribes one mandatory (BG_QBO) and two recommended (BG_NQBO and BG_NAT) simulations (see Table 3). BG_QBO is a time-slice simulation with conditions characteristic of the year 2000 1 , with the goal of understanding sources, sinks, composition, and microphysical characteristics of stratospheric background aerosol under volcanically quiescent conditions. The time-slice simulation should be at least 20 years long, after a spin-up period of at least 10 years to equilibrate stratospherically relevant quantities such as OCS concentrations and the age of air. The period seems to be sufficient to study differences in the aerosol properties but needs to be extended if dynamical changes, e.g. in NH winter variability, are to be analysed. Modelling groups should run this simulation with varying QBO, either internally generated or nudged to the 1981-2000 period. If resources allow, each model should perform the sensitivity experiments BG_NQBO and BG_NAT. The specifics of these two experiments are the same as for BG_QBO, but BG_NQBO should be performed without varying QBO 2 and BG_NAT without anthropogenic emissions of aerosol and aerosol precursors, as indicated in Table S1. The goals of these sensitivity experiments are to understand the effect of the QBO on the background aerosol characteristics and the contribution of anthropogenic sources to the background aerosol loading in the stratosphere.

Summary of experiment
The aim of the TAR experiment is to investigate the relative contributions of volcanic and anthropogenic sources to the temporal evolution of the stratospheric aerosol layer between 1998 and 2012. Observations show that there is a transient increase in stratospheric aerosol loading, in particular after the year 2003, with small-to moderate-magnitude volcanic eruptions contributing significantly to this increase (e.g. Solomon et al., 2011;Vernier et al., 2011b;Neely III et al., 2013;Ridley et al., 2014;Santer et al., 2015;Brühl et al., 2015). TAR model simulations will be performed using specified dynamics, prescribed sea surface temperature and timevarying SO 2 emissions. The simulations are suitable for any general circulation or chemistry transport models that simulate the stratospheric aerosol interactively and have the capability to nudge meteorological parameters to reanalysis data. The TAR protocol covers the period from January 1998 to December 2012, when only volcanic eruptions have affected the UTLS aerosol layer with SO 2 emissions about an order of magnitude smaller than Pinatubo. Time-varying surface emission data sets contain anthropogenic and natural sources of sulfur aerosol and their precursor species. The volcanic SO 2 emission inventories contain information of all known eruptions that emitted SO 2 into the UTLS during this period. It comprises the geolocation of each eruption, the amount of SO 2 emitted, and the height of the emissions. SO 2 emissions from continuously degassing volcanoes are also included.

Experiment set-up and specifications
Participating models are encouraged to perform up to seven experiments, based on five different volcanic SO 2 emission databases (hereafter referred to as VolcDB). Four experiments are mandatory; three others are optional. The volcanic experiments are compared to a reference simulation (TAR_base) that does not use any of the volcanic emission databases but emissions from continuously degassing volcanoes. The aim of the reference simulation is to simulate the non-volcanically perturbed state of the stratospheric aerosol layer. In contrast to the experiment protocol BG (Sect. 3.1), here time-varying surface boundary conditions (SST/SIC) are applied, whereas BG intercompares model simulations under climatological mean conditions and uses constant 2000 conditions. An overview of the volcanic emission inventories is given in Table 4 and in Fig. 4. VolcDB1/2/3 are new compilations (Bingen et al., 2017;Neely and Schmidt, 2016;Carn et al., 2016), whereas a fourth inventory (VolcDB4; Diehl et al., 2012), provided earlier, for the AeroCom community modelling initiative, is optional. The databases use SO 2 observations from different sources and apply different techniques for the estimation of injection heights and the amount of emitted SO 2 . The four inventories are provided in the form of tabulated point sources, with each modelling group to translate emitted SO 2 mass for each eruption into model levels spanning the upper and lower emission altitudes. To test the effect of the implementation strategy (point source vs. cloud), an additional non-mandatory experiment has been set up: TAR_db1_3D with VolcDB1_3D as corresponding data set which provides a series of discrete 3-D gridded SO 2 injections at specified times. In both versions of VolcDB1, the integral SO 2 mass of each injection is consistent.
We recommend performing one additional non-mandatory experiment TAR_sub in order to quantify and isolate the effects of eight volcanic eruptions that either had a statistically significant effect on, for instance, tropospheric temperatures (Santer et al., , 2015 or emitted significant amounts of SO 2 over the 1998 to 2012 time period. This experiment uses a subset of volcanic emissions (Vol-cDBSUB) that were derived based on the average mass of SO 2 emitted using VolcDB1, VolcDB2, and VolcDB3 for the following eruptions:  (Table S6) although these are not discernible in climate proxies (Kravitz et al., 2010;Santer et al., 2014Santer et al., , 2015. To summarize the number of experiments to be conducted within TAR, four are mandatory (TAR_base with no volcanic emission, Tar_db1/2/3), one additional one is recommended (TAR_sub), and two others are optional (TAR_db4 and TAR_db1_3D; see Table 5 for an overview).

Volcanic SO 2 emission databases
VolcDB1 (Bingen et al., 2017;Brühl, 2018) are updates from Brühl et al. (2015) using satellite data of MI-PAS and OMI. For TAR, VolcDB1 has been extended based on data from Global Ozone Monitoring by Occultation of Stars (GOMOS), SAGE II, Total Ozone Mapping Spectrometer (TOMS), and the Smithsonian database. The VolcDB1_3D data set, for the optional experiment TAR_db1_3D, contains volume mixing ratio distributions of the injected SO 2 cloud on a T42 Gaussian grid with 90 levels. The integral SO 2 mass for each injection is the same. Vol- cDB2 Neely and Schmidt, 2016) contains volcanic SO 2 emissions and plume altitudes for eruptions that have been detected by satellite instruments including TOMS, OMI, OMPS, the Infrared Atmospheric Sounding Interferometer (IASI), the Global Ozone Monitoring Experiment (GOME/2), the Atmospheric Infrared Sounder (AIRS), the Microwave Limb Sounder (MLS), and the MIPAS instrument. The database is compiled based on published estimates of the eruption source parameters and reports from the Smithsonian Global Volcanism Program (http://volcano. si.edu/, last access: 26 June 2018), NASA's Global Sulfur Dioxide Monitoring website (http://so2.gsfc.nasa.gov/, last access: 26 June 2018) as well as the Support to Aviation Control Service (http://sacs.aeronomie.be/, last access: 26 June 2018). The tabulated point source database also includes volcanic eruptions that emitted SO 2 into the troposphere only, as well as direct stratospheric emissions, and has been used and compared to observations in Mills et al. (2016) and Solomon et al. (2016).
VolcDB3 uses the most recent compilation of the volcanic degassing database of Carn et al. (2016). Observations from the satellite instruments TOMS, the High-resolution Infrared Sounder (HIRS/2), AIRS, OMI, MLS, IASI, and OMPS are considered, measuring in the UV, IR, and microwave spectral bands. Similar to VolcDB1/2, VolcDB3 also includes tropospheric eruptions.
Historically VolcDB4 is an older data set, which relies on information from TOMS, OMI, the Global Volcanism Program (GVP), and other observations from the literature, covering the time period from 1979 to 2010. In contrast to the other inventories, VolcDB4 has previously been applied by a range of models within the AeroCom community (http: //aerocom.met.no/emissions.html, last access: 26 June 2018; Diehl et al., 2012;Dentener et al., 2006). Hence, it adds valu-Geosci. Model Dev., 11, 2581-2608, 2018 www.geosci-model-dev.net/11/2581/2018/      Table 8), The model tropical mean extinction in the mid-visible (550 nm) and near-infrared (1020 nm) is compared to that from SAGE-II measurements. Only two of the five injection realizations inject below 20 km and the impact on the timing of the peak and general evolution of the aerosol optical properties is apparent. In this model the growth to larger particle sizes and subsequent sedimentation to lower altitudes is able to explain certain signatures seen in the satellite data (see also Mann et al., 2015). able information to the TAR experiments because it allows an estimation of how the advances in observational methods impact modelling results. It should be noted that VolcDB4 already contains the inventory of Andres and Kasgnoc (1998) for S emissions from continuously erupting volcanoes and should not be allocated twice when running this experiment.

Boundary conditions, chemistry, and forcings
To reduce uncertainties associated with model differences in the reproduction of synoptic and large-scale transport processes, models are strongly encouraged to perform TAR experiments with specified dynamics, where meteorological parameters are nudged to a reanalysis such as the ECMWF ERA-Interim (Dee et al., 2011). This allows models to reasonably reproduce the QBO and planetary wave structure in the stratosphere and to replicate as closely as possible the state of the BDC in the simulation period. Nudging also allows comparing directly to available observations of stratospheric aerosol properties (Table 2), such as the extinction profiles and aerosol optical depth (AOD), and should enable the models to simulate the ATAL (Vernier et al., 2011a;Thomason and Vernier, 2013), which, so far, has been studied only by very few global models in great detail (e.g. Neely III et al., 2014;Yu et al., 2015).

Summary of experiment
This HErSEA experiment will involve each participating model running a limited ensemble of simulations for each of the three largest volcanic perturbations to the stratosphere in the last 100 years: 1963 Mt Agung, 1982 El Chichón, and 1991 Mt Pinatubo. The main aim is to use a wide range of stratospheric aerosol observations to constrain uncertainties in the SO 2 emitted for each eruption (amount, injection height). Several different aerosol metrics will be intercompared to assess how effectively the emitted SO 2 translates into perturbations to stratospheric aerosol properties and simulated radiative forcings across interactive stratospheric aerosol CCMs with a range of different complexities. Whereas the TAR simulations (see Sect. 3.2) use specified dynamics and are suitable for chemistry transport models, for this experiment, simulations must be free-running with radiative coupling to the volcanically enhanced stratospheric aerosol, thereby ensuring the composition-radiation-dynamics interactions associated with the injection are resolved. We are aware that this specification inherently excludes chemistry transport models, which must impose atmospheric dynamics. However, since the aim is to apply stratospheric aerosol observations in concert with the models to re-evaluate current best estimates of the SO 2 input and in light of the first-order impact the strato-spheric heating has on hemispheric dispersion from these major eruptions (e.g. R. E. Young et al., 1994), we assert that this apparent exclusivity is entirely justified in this case.
As well as analysing and evaluating the individual model skill and identifying model consensus and disagreement for these three specific eruptions, we also seek to learn more about major eruptions which occurred before the era of satellite and in situ stratospheric measurements. Our understanding of the effects from these earlier eruptions relies on deriving volcanic forcings from proxies such as sulfate deposition to ice sheets (Gao et al., 2007;Sigl et al., 2015;Toohey et al., 2013), from photometric measurements from astronomical observatories (Stothers, 1996(Stothers, , 2001, or from documentary evidence (Stothers, 2002;Stothers and Rampino, 1983;Toohey et al., 2016a). Although HErSEA has no specific experiment to understand the relationship between the ice core sulfate deposition and the stratospheric aerosol layer enhancements that drive the surface cooling, there is the potential for a systematic inter-model study (e.g. similar to Marshall et al., 2018) to identify how uncertain historic volcanic forcings derived from ice core sulfate deposition may be.

Motivation
In the days following the June 1991 Pinatubo eruption, satellite SO 2 measurements show (e.g. Guo et al., 2004a) that the peak gas phase sulfur loading was 7 to 11.5 Tg S (or 14-23 Tg SO 2 ). The chemical conversion to sulfuric aerosol that occurred in the tropical reservoir over the following weeks and the subsequent transport to mid-and high latitudes caused a major enhancement to the stratospheric aerosol layer. The peak particle sulfur loading, through this global dispersion phase, reached only around half that in the initial SO 2 emission; the maximum particle sulfur loading was measured as 3.7 to 6.7 Tg S Baran and Foot, 1994), based on an aqueous sulfuric acid composition range of 59 to 77 % by weight .
Whereas some model studies with aerosol microphysical processes find consistency with observations for SO 2 injection values of 8.5 Tg S (e.g., Niemeier et al., 2009;Toohey et al., 2011;Brühl et al., 2015), several recent microphysical model studies (Dhomse et al., 2014;Sheng et al., 2015a;Mills et al., 2016) find best agreement for an injected sulfur amount at, or even below, the lower end of the range of the satellite SO 2 measurements; see also Fig. 5. Model predictions are known to be sensitive to differences in assumed injection height (e.g. Sheng et al., 2015b;Jones et al., 2016), and whether models resolve radiative heating and "self-lofting" effects also affects subsequent transport pathways (e.g. R. E. Young et al., 1994;Timmreck et al., 1999b;Aquila et al., 2012). Another potential mechanism that could explain part of the apparent model-observation discrepancy is that a substantial proportion of the sulfur may have been removed from the plume in the first months after the erup- tion due to accommodation onto co-emitted ash/ice (Guo et al., 2004b) and subsequent sedimentation.
This ISA-MIP experiment will explore these issues further, with the participating models carrying out co-ordinated experiments of the three most recent major eruptions, with specified common SO 2 amounts and injection heights (Table 6). This design ensures the analysis can focus on key inter-model differences such as stratospheric circulation/dynamics, the impacts from radiative dynamical interactions, and the effects of aerosol microphysical schemes. Analysing how the vertical profile of the enhanced stratospheric aerosol layer evolves during global dispersion and decay will provide a key indicator for why the models differ, and what the key driving mechanisms are. Furthermore, the actual response of the BDC and mean age of air to Pinatubo is poorly constrained by existing reanalysis data (Garfinkel et al., 2017). While some modelling studies reported a decreas-ing mean age of air following volcanic eruptions throughout the stratosphere (Garcia et al., 2011;Garfinkel et al., 2017), others show an increase in mean age (Diallo et al., 2017). Moreover, Muthers et al. (2016) found a decreasing mean age of air in the middle and upper stratosphere and an increasing mean age below, while Pitari et al. (2016a) found a decreasing mean age at higher levels of 30 hPa in the tropics and 10 hPa in the middle latitudes after the Pinatubo eruption. The HErSEA experiment in combination with a passive volcanic tracer might therefore help to better constrain the response of the BDC to volcanic eruptions using observations and help to clarify the uncertainties in the age-of-air changes after the Pinatubo eruption. For all three major eruptions, we have identified key observational data sets ( Table 7) that will provide benchmark tests to evaluate the vertical profile, covering a range of different aerosol metrics.  Avdyushin et al. (1993); Nardi et al. (1993), Stevens et al. (1994) El Chichón Satellite extinction/AOD 1000 nm (SAM-II) Hamill and Brogniez (SPARC, 2006 & references Balloon-borne particle concentration profiles therein) Ground-based lidar Rosen (1983, 1987). NDACC archive Agung Surface radiation measurements Dyer and Hicks (1965), Pueschel et al. (1972), Moreno (global dataset gathered in Dyer and Hicks, 1968) and Stock (1964), Flowers and Viebrock (1965) Balloon-borne measurements Rosen (1964Rosen ( , 1966Rosen ( , 1968, Pittock (1966) Ground-based lidar, searchlight, and twilight Clemesha et al. (1966), Grams and Fiocco (1967), Kent et measurements al. (1967), Elterman et al. (1969), Volz (1964Volz ( , 1965Volz ( , 1970) Aircraft measurements Mossop (1963Mossop ( , 1964, Friend (1966)

Experiment set-up and specifications
Each modelling group will run a mini-ensemble of transient AMIP-type runs for the three eruptions with upper and lower bound SO 2 emissions and three different injection height settings: two shallow (e.g. 19-21 and 23-25 km) and one deep (e.g. 19-25 km) (see Table 7). The seasonal cycle of the BDC affects the hemispheric dispersion of the aerosol plume (e.g. Toohey et al., 2011), and the phase of the QBO is also known to be a key control for tropical eruptions (e.g. Trepte and Hitchman, 1992). In order to quantify the contribution of the tracer transport, it is recommended to additionally initialize and transport a passive tracer Volc (Table S3). Note that since the AMIP-type simulations will be transient, prescribing time-varying sea surface temperatures, the models will automatically match the surface climate state (ENSO, NAO) through each post-eruption period. Where possible, models should re-initialize (if they have internally generated QBO) or use specified dynamics approaches (e.g. Telford et al., 2008) to ensure the model dynamics are consistent with the QBO evolution through the post-eruption period. General circulation models should use GHG concentrations appropriate for the period, and models with interactive stratospheric chemistry should ensure the loading of ODSs matches that for the time period. Table 8 shows the settings for the SO 2 injection for each eruption. Note that experience of running interactive stratospheric aerosol simulations shows that the vertical extent of the enhanced stratospheric aerosol will be different from the altitude range in which the SO 2 is injected. So, these sensi-tivity simulations will allow us to assess the behaviour of the individual models with identical settings for the SO 2 injection.
For these major eruptions, where the perturbation is much larger than in TAR, model diagnostics include AOD and extinction at multiple wavelengths and heating rates (K day −1 ) in the lower stratosphere to identify the stratospheric warming induced by simulated volcanic enhancement, including exploring compensating effects from other constituents (e.g. Kinne et al., 1992). To allow the global variation in size distribution to be intercompared, models will also provide a 3-D monthly effective radius, which also includes cumulative number concentration at several size cuts for direct comparison to balloon measurements. Examining the co-variation of the particle size distribution with variations in extinction at different wavelengths will be of particular interest in relation to approaches used to interpret astronomical measurements of eruptions in the pre in situ era (Stothers, 1996(Stothers, , 2001. A three-member ensemble will be submitted for each different injection setting.

Summary of experiment
The PoEMS experiment will involve each interactive stratospheric aerosol model running a perturbed parameter ensemble (PPE) of simulations through the 1991-1995 Pinatuboperturbed period. Variation-based sensitivity analysis will derive a probability distribution function (PDF) for each model's predicted Pinatubo forcing, following techniques apwww.geosci-model-dev.net/11/2581/2018/ Geosci. Model Dev., 11, 2581-2608, 2018 Table 8. Settings to use for initializing the mini-ensemble of interactive stratospheric aerosol simulations for each eruption in the HErSEA experiment. For Pinatubo the upper range of SO 2 emission is based on TOMS/TOVS SO 2 observations (Guo et al., 2004a). The SO 2 emissions flux ranges and central values (in parentheses) are specifically for application in interactive stratospheric aerosol (ISA) models rather than any new data compilation. The lower range and the central values are defined according to some recent Pinatubo studies (Dhomse et al., 2014;Mills et al., 2016;Sheng et al., 2015a) which have identified a modest downward adjustment of initially observed SO 2 amounts to agree with HIRS/ISAMS measurements of peak sulfate aerosol loading (Baran and Foot, 1994). The adjustment assumes either uncertainties in the satellite measurements or that loss pathways in the first few weeks after these eruptions are either underpredicted (e.g. due to coarse spatial resolution) or omitted completely (accommodation onto ash/ice) in the ISA models. The El Chichón SO 2 central estimate is taken from Krueger et al. (2008), and an emission range is based on assumed ±33 %, while for Agung the SO 2 emission estimate is from Self and King (1996). For Pinatubo, injection height ranges for the two shallow and one deep realization are taken from Antuña et al. (2002). The El Chichón values are based on the tropical lidar signal from Fig. 4.34 of Hamill and Brogniez (2006), whereas for Agung we considered the measurements presented in Dyer and Hicks (1968) including balloon soundings (Rosen, 1964) and ground-based lidar (Grams and Fiocco, 1967). plied successfully to quantify and attribute sources of uncertainty in tropospheric aerosol forcings (e.g. Carslaw et al., 2013). The approach will teach us which aspects of the radiative forcing from major eruptions is most uncertain and will enable us to identify how sensitive model predictions of key features (e.g. timing and value of peak forcing and decay timescales) are to uncertainties in several model parameters. Comparing the time signatures of different underlying aerosol metrics (mid-visible AOD, effective radius, particle number) between models, and crucially also against observations, may also help to reduce the natural forcing uncertainty, potentially thereby making the next generation of climate models more robust.

Motivation
The sudden global cooling from major eruptions is a key signature in the historical climate record and a natural global warming signature occurs after peak cooling as volcanic aerosol is slowly removed from the stratosphere. Quantitative information on the uncertainty range of volcanic forcings is therefore urgently needed. The amount of data collected by satellite-, ground-, and airborne instruments in the period following the 1991 eruption of Mount Pinatubo (see, e.g., Sect. 3.3.2, Table 7) provides an opportunity to test model capabilities in simulating large perturbations of stratospheric aerosol and their effect on the climate. Recent advances in quantifying uncertainty in climate models (e.g. Rougier et al., 2009;Lee at al., 2011) involve running ensembles of simulations to systematically explore combinations of different external forcings to scope the range of possible realizations. There are now a large number of general circulation models (GCMs) with prognostic aerosol modules, which tend to assess the stratospheric aerosol perturbation through the Pinatubo-perturbed period (see Table 9). Al-though these different models achieve reasonable agreement with the observations, this consistency of skill is achieved with considerable diversity in the values assumed for the initial magnitude and distribution of the SO 2 injection. The SO 2 injections prescribed by different models range from 5 to 10 Tg S, and the upper edge of the injection altitude varies among models from as low as 18 km to as high as 29 km, as shown in Table 9. Such simulations also differ in the choice of the vertical distribution of SO 2 injection (e.g. uniform, Gaussian or triangular distributions) and the horizontal injection area (one to several grid boxes). The fact that different choices of injection parameters lead to similar results in different models points to differences in the models' internal treatment of aerosol evolution. Accurately capturing microphysical processes such as coagulational, growth, and subsequent rates of sedimentation has been shown to be important for volcanic forcings , but some studies (e.g. Mann et al., 2015) identify that these processes interplay also with aerosol-radiation interactions, the associated dynamical effects changing the fate of the volcanic sulfur and its removal into the troposphere. The PoEMS experiment will specifically assess this issue by adjusting the rate of specific microphysical processes in each model simultaneously with perturbations to SO 2 emission and injection height, thereby assessing the footprint of their influence on subsequent volcanic forcing in different complexity aerosol schemes and the relative contribution to uncertainty from emissions and microphysics.

Experiment set-up and specifications
For each model, an ensemble of simulations will be performed varying SO 2 injection parameters and a selection of internal model parameters within a realistic uncertainty distribution. A maximin Latin hypercube sampling strategy will Geosci. Model Dev., 11, 2581-2608, 2018 www.geosci-model-dev.net/11/2581/2018/  Mills et al. (2016) 18-20 7 Sheng et al. (2015a, b) 17-30 8.5 Timmreck et al. (1999a, b) 20-27 8.5 Niemeier et al. (200924 Toohey et al. (2011) 8.5 Brühl et al. (2015 18-26 * 10 Pitari and Mancini (2002) 18-25 10 Oman et al. (2006) 19-29 10 Aquila et al. (2012) 16-18, 17-27 10 English et al. (2013 be used to define parameter values to be set in each PPE member in order to obtain good coverage of the parameter space. The maximin Latin hypercube is designed such that the range of every single parameter is well sampled and the sampling points are well spread through the multidimensional uncertainty space -this is achieved by splitting the range of every parameter into N intervals and ensuring that precisely one point is in each interval in all dimensions, where N is the total number of model simulations, and the minimum distance between any pair of points in all dimensions is maximized. Figure 6 shows the projection onto two dimensions of a Latin hypercube built in eight dimensions with 50 model simulations. The size of the Latin hypercube needed will depend on the number of model parameters to be perturbed; the number of simulations to be performed will be equal to 10 times the number of parameters -7 per parameter to build the emulator and 3 per parameter to validate the emulator. All parameters are perturbed simultaneously in the Latin hypercube. In order to be inclusive of modelling groups with less computing time available and of different types of aerosol schemes, we define three options of experimental design with different numbers of perturbed parameters and thus simulation ensemble members. The three options involve varying all eight (standard set), five (reduced set), or three (minimum set) of the list of uncertain parameters, resulting in ensembles of 80 (standard), 50 (reduced), or 30 (minimum) PPE members. The parameters to be varied are shown in Table 10 and include variables related to the volcanic injection, such as its magnitude, height, latitudinal extent, and composition, and to the life cycle of the volcanic sulfate, such as the sedimentation rate, its microphysical evolution, and the SO 2 to SO 2− 4 conversion rate. Prior to performing the full PPE, modelling groups are encouraged to run "one-at-a-time" (OAT) test runs with each of the process parameters increased/decreased to its maximum/minimum value. Submission of these OAT test runs is encouraged (following the naming convention in Table 11) because as well as being an important check that the model parameter scaling is being implemented as intended, the results will also enable intercomparison of single-parameter effects between participating models ahead of the full ensemble. When imposing the parameter scalings, the models must only enact that change in grid boxes with volcanically enhanced air masses. This can be determined either via total sulfur volume mixing ratio threshold suitable for the particular model or via the "passive tracer Volc" recommended in Sect. 3.3.3. Restricting the perturbation to the Pinatubo sulfur will leave pre-eruption conditions and tropospheric aerosol properties unchanged, ensuring a clean "uncertainty pdf" for the "volcanic forcing".
That this restriction to the parameter scalings is operational is an important preparatory exercise and will need to have been verified when running the OAT test runs.
Once a modelling group has performed the PPE of simulations as defined by the Latin hypercube a statistical analysis will be performed. Emulators for each of a selection of key metrics will be built, following the approach described by Lee et al. (2011), to examine how the parameters lead to uncertainty in key features of the Pinatubo-perturbed stratospheric aerosol. The emulator builds a statistical model between the ensemble design and the key model output and once validated allows sampling of the whole parameter space to derive a PDF of each key model output.
Variance-based sensitivity analysis will then be used to decompose the resulting probability distribution into its sources providing information on the key sources of uncertainty in any model output. The two sensitivity indices of interest are called the main effect and the total effect. The main effect measures the percentage of uncertainty in the simulated met- Table 10. Groups will need to translate the 0-1 latitude-spread parameter into a sequence of fractional injections into all grid boxes between the equator and 15 • N. For example for a model with 2.5 • latitude resolution, the relative injection in the six latitude bins between 0 and 15 • N would take the form [0, 0, 0, 0, 0, 0, 1] for extent factor = 0 and [0. 167, 0.167, 0.167, 0.167, 0.167, 0.167] for extent factor = 1. Injection ratios for intermediate values of the spread factor would be calculated by interpolation between these two end member cases.

Parameters
Minimum Reduced Standard Uncertainty range set set set 1 Injected SO 2 mass X X X 5-10 Tg S 2 Mid-point height of 3 km thick injection X X X 18-30 km 3 Latitudinal extent of the injection X X X Factor 0-1 to varies from one-box injection at 15 • N (factor = 0) to equator-to-15 • N (factor = 1) 4 Sedimentation velocity X X Multiply model calculated velocity by a factor 0.5 to 2. 5 SO 2 oxidation scaling X X Scale gas phase oxidation of SO 2 by a factor 0.5 to 2 6 Nucleation rate of sulfate particles X Scale model calculated rate by a factor 0.5 to 2. 7 Sub-grid particle formation factor. X Emit fraction of SO 2 as sulfuric acid particles formed at sub-grid scale (0 to 10 %) 8 Coagulation rate X Scale the model calculated rate by a factor 0.5 to 2. www.geosci-model-dev.net/11/2581/2018/ ric due to each parameter variation individually. The total effect measures the percentage of uncertainty in the key model output due to each parameter, including the additional contribution from its interaction with other uncertain parameters. The sources of model parametric uncertainty (i.e. the sensitivity indices) will be identified for each model with discussion with each group to check the results. By then comparing the sensitivity to the uncertain parameters across the range of participating models, we will learn about how the model's differing treatment of aerosol processes and the inherent dynamical and chemical processes resolved in the host model together determine the uncertainty in its predicted Pinatubo radiative forcings. The probability distribution of observable key model outputs will also be compared to observations in order to constrain the key sources of uncertainty and thereby reduce the parametric uncertainty in individual models. The resulting model constraints will be compared between models providing quantification of both parametric uncertainty and structural uncertainty for key variables such as AOD, effective radius, and radiative flux anomalies. This sensitivity analysis will also identify the variables for which better observational constraints would yield the greatest reduction in model uncertainties.

Conclusions
The ISA-MIP experiments will improve our understanding of stratospheric aerosol processes, chemistry, and dynamics and constrain climate impacts of background aerosol "variability", small volcanic eruptions, and large volcanic eruptions. The experiments will also help to resolve some disagreements amongst global aerosol models, for instance the difference in volcanic SO 2 forcing efficacy for Pinatubo (see Sect. 3.3.2). The results of this work will help constrain the contribution of stratospheric aerosols to the early 21st century global warming hiatus period, the effects of hypothetical geoengineering schemes, and other climate processes that are influenced by the stratosphere. Overall, they provide an excellent opportunity to answer some of these questions as part of the greater WCRP SPARC and CMIP6 efforts. For example, the CMIP6 Geoengineering Model Intercomparison Project (GeoMIP, Kravitz et al., 2015) investigates common ways in which climate models treat various geoengineering scenarios some of them via sulfate aerosols (e.g. . However, there is a large inter-model spread for the cooling efficiency of sulfate aerosol, i.e. the normalized cooling rate per injected unit of sulfur (Moriyama et al., 2016). ISA-MIP is therefore of special importance for Ge-oMIP as it could help to understand the reason for these uncertainties, to better constrain the forcing efficiency and to improve future scenarios. Furthermore, it is so far not clear whether the large inter-model spread of the CMIP5 models in the simulated post-volcanic climate response mostly depends on uncertainties in the imposed volcanic forcing or on an insufficient representation of climate processes. To discriminate between the individual uncertainty factors, it is useful to develop standardized experiments/model activities that systematically address specific uncertainty factors. Hence, ISA-MIP, which covers the uncertainties in the pathway from the eruption source to the volcanic radiative forcing, will complement the CMIP6 VolMIP project (Zanchettin et al., 2016), which addresses the pathway from the forcing to the climate response and the feedback by studying the uncertainties in the post-volcanic climate response to a well-defined volcanic forcing. ISA-MIP also complements the chemistry climate model initiative (CCMI; Eyring et al., 2013) and the Aerosol Comparison (AeroCom) initiative  as well as the Aerosol Chemistry Model Intercomparison Project (AerChemMIP; Collins et al., 2017) as it concentrates on stratospheric aerosol which is not in the focus of all these activities.
As well as identifying areas of agreement and disagreement among the different complexities of models in top-level comparisons focussing on fields such as zonal-mean midvisible AOD and extinction profiles in different latitudes, ISA-MIP also intends to explore relationships between key parameters. For example, how does sulfate deposition to the polar ice sheets relate to volcanic forcing in the different interactive stratospheric aerosol models that predict the transport and sedimentation of the particles? Or how do model "spectral extinction curves" evolve through the different volcanically perturbed periods and how do they relate to simulated effective radius compared to the theoretical approach to derive effective radius from Stothers (1997Stothers ( , 2001? There is considerable potential to apply the model uncertainty analysis to make new statements to inform our confidence in volcanic forcings derived from ice core and astronomical measurements for eruptions before the in situ measurement era. Code and data availability. The model output from the all simulations described in this paper will be distributed through the World Data climate Center https://cera-www.dkrz.de/WDCC/ui/ cerasearch/ with digital object identifiers (DOIs) assigned. The model output will be freely accessible through this data portal after registration. Author contributions. CT, GWM VA, RH, LAL, AS, CB, SC MC, SSD, TD, JME, MJM, RN, JXS, MT, and DW designed the experiments. CT and GWM co-ordinated the writing, and drafted the paper. All authors have contributed to the writing and have approved of the final version of the paper.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. The authors thank their SSiRC colleagues for continuing support and discussion. We acknowledge the scientific guidance (and sponsorship) of the World Climate Research Programme to motivate this work, to be co-ordinated in the framework of SPARC. Claudia Timmreck, Matthew Toohey and Rene Hommel acknowledge support from the German federal Ministry of Education (BMBF), the research programme "MiKlip" (FKZ:01LP1517(CT):/01LP1130B(MT)), and ROMIC-ROSA (FKZ:01LG1212A (RH)). Claudia Timmreck is also supported by the European Union project StratoClim (FP7-ENV.2013.6.1-2). Christoph Brühl's PhD student Jennifer Schallock, who contributed to the compilation of the volcano inventory, is also supported by StratoClim. Anja Schmidt was funded by an Academic Research Fellowship from the School of Earth and Environment, University of Leeds, and NERC grant NE/N006038/1. Matthew Toohey acknowledges support by the Deutsche Forschungsgemeinschaft (DFG) in the framework of the priority programme "Antarctic Research with comparative investigations in Arctic ice areas" through grant TO 967/1-1. The National Center for Atmospheric Research is funded by the National Science Foundation. Lindsay Lee is a Leverhulme Early Career Fellow funded under the Leverhulme Trust grant ECF-2014-524.
The article processing charges for this open-access publication were covered by the Max Planck Society.
Edited by: Slimane Bekki Reviewed by: two anonymous referees