Solar Forcing for CMIP6 (v3.1)

This paper describes the recommended solar forcing dataset for CMIP6 and highlights changes with respect to CMIP5. The solar forcing is provided for radiative properties, namely total solar irradiance (TSI), solar spectral irradiance (SSI), and the F10.7 index as well as particle forcing, including geomagnetic indices Ap and Kp, and ionization rates to account for effects of solar protons, electrons, and galactic cosmic rays. This is the first time that a recommendation for solar-driven particle forcing has been provided for a CMIP exercise. The solar forcing datasets are provided at daily and monthly resolution separately for the CMIP6 preindustrial control, historical (1850–2014), and future (2015–2300) simulations. For the preindustrial control simulation, both constant and time-varying solar forcing components are provided, with the latter including variability on 11-year and shorter timescales but no long-term changes. For the future, we provide a realistic scenario of what solar behavior could be, as well as an additional extreme Maunder-minimum-like sensitivity scenario. This paper describes the forcing datasets and also provides detailed recommendations as to their implementation in current climate models. 
 
For the historical simulations, the TSI and SSI time series are defined as the average of two solar irradiance models that are adapted to CMIP6 needs: an empirical one (NRLTSI2–NRLSSI2) and a semi-empirical one (SATIRE). A new and lower TSI value is recommended: the contemporary solar-cycle average is now 1361.0 W m−2. The slight negative trend in TSI over the three most recent solar cycles in the CMIP6 dataset leads to only a small global radiative forcing of −0.04 W m−2. In the 200–400 nm wavelength range, which is important for ozone photochemistry, the CMIP6 solar forcing dataset shows a larger solar-cycle variability contribution to TSI than in CMIP5 (50 % compared to 35 %). 
 
We compare the climatic effects of the CMIP6 solar forcing dataset to its CMIP5 predecessor by using time-slice experiments of two chemistry–climate models and a reference radiative transfer model. The differences in the long-term mean SSI in the CMIP6 dataset, compared to CMIP5, impact on climatological stratospheric conditions (lower shortwave heating rates of −0.35 K day−1 at the stratopause), cooler stratospheric temperatures (−1.5 K in the upper stratosphere), lower ozone abundances in the lower stratosphere (−3 %), and higher ozone abundances (+1.5 % in the upper stratosphere and lower mesosphere). Between the maximum and minimum phases of the 11-year solar cycle, there is an increase in shortwave heating rates (+0.2 K day−1 at the stratopause), temperatures ( ∼  1 K at the stratopause), and ozone (+2.5 % in the upper stratosphere) in the tropical upper stratosphere using the CMIP6 forcing dataset. This solar-cycle response is slightly larger, but not statistically significantly different from that for the CMIP5 forcing dataset. 
 
CMIP6 models with a well-resolved shortwave radiation scheme are encouraged to prescribe SSI changes and include solar-induced stratospheric ozone variations, in order to better represent solar climate variability compared to models that only prescribe TSI and/or exclude the solar-ozone response. We show that monthly-mean solar-induced ozone variations are implicitly included in the SPARC/CCMI CMIP6 Ozone Database for historical simulations, which is derived from transient chemistry–climate model simulations and has been developed for climate models that do not calculate ozone interactively. CMIP6 models without chemistry that perform a preindustrial control simulation with time-varying solar forcing will need to use a modified version of the SPARC/CCMI Ozone Database that includes solar variability. CMIP6 models with interactive chemistry are also encouraged to use the particle forcing datasets, which will allow the potential long-term effects of particles to be addressed for the first time. The consideration of particle forcing has been shown to significantly improve the representation of reactive nitrogen and ozone variability in the polar middle atmosphere, eventually resulting in further improvements in the representation of solar climate variability in global models.

future (2015-2300) simulations. For the preindustrial control simulation, both constant and time-varying solar forcing components are provided, with the latter including variability on 11-year and shorter timescales but no long-term changes. For the future, we provide a realistic scenario of what solar behavior could be, as well as an additional extreme Maunderminimum-like sensitivity scenario. This paper describes the forcing datasets and also provides detailed recommendations as to their implementation in current climate models.
For the historical simulations, the TSI and SSI time series are defined as the average of two solar irradiance models that are adapted to CMIP6 needs: an empirical one (NRLTSI2-NRLSSI2) and a semi-empirical one (SATIRE). A new and lower TSI value is recommended: the contemporary solar-cycle average is now 1361.0 W m −2 . The slight negative trend in TSI over the three most recent solar cycles in the CMIP6 dataset leads to only a small global radiative forcing of −0.04 W m −2 . In the 200-400 nm wavelength range, which is important for ozone photochemistry, the CMIP6 solar forcing dataset shows a larger solar-cycle variability contribution to TSI than in CMIP5 (50 % compared to 35 %).
We compare the climatic effects of the CMIP6 solar forcing dataset to its CMIP5 predecessor by using timeslice experiments of two chemistry-climate models and a reference radiative transfer model. The differences in the long-term mean SSI in the CMIP6 dataset, compared to CMIP5, impact on climatological stratospheric conditions (lower shortwave heating rates of −0.35 K day −1 at the stratopause), cooler stratospheric temperatures (−1.5 K in the upper stratosphere), lower ozone abundances in the lower stratosphere (−3 %), and higher ozone abundances (+1.5 % in the upper stratosphere and lower mesosphere). Between the maximum and minimum phases of the 11-year solar cycle, there is an increase in shortwave heating rates (+0.2 K day −1 at the stratopause), temperatures (∼ 1 K at the stratopause), and ozone (+2.5 % in the upper stratosphere) in the tropical upper stratosphere using the CMIP6 forcing dataset. This solar-cycle response is slightly larger, but not statistically significantly different from that for the CMIP5 forcing dataset.
CMIP6 models with a well-resolved shortwave radiation scheme are encouraged to prescribe SSI changes and include solar-induced stratospheric ozone variations, in order to better represent solar climate variability compared to models that only prescribe TSI and/or exclude the solarozone response. We show that monthly-mean solar-induced ozone variations are implicitly included in the SPARC/CCMI CMIP6 Ozone Database for historical simulations, which is derived from transient chemistry-climate model simulations and has been developed for climate models that do not calculate ozone interactively. CMIP6 models without chemistry that perform a preindustrial control simulation with time-varying solar forcing will need to use a modified version of the SPARC/CCMI Ozone Database that includes solar variability. CMIP6 models with interactive chemistry are also encouraged to use the particle forcing datasets, which will allow the potential long-term effects of particles to be addressed for the first time. The consideration of particle forcing has been shown to significantly improve the representation of reactive nitrogen and ozone variability in the polar middle atmosphere, eventually resulting in further improvements in the representation of solar climate variability in global models.

Introduction
Solar variability affects the Earth's atmosphere in numerous and often intricate ways through changes in the radiative and energetic particle forcing (Lilensten et al., 2015). For many years, the role of the Sun in climate model simulations was reduced to its sole total radiative output, named total solar irradiance (TSI), and this situation prevailed in the assessment reports of the IPCC until 2007 (Alley et al., 2007). However, there has been growing evidence that other aspects of solar variability are major players for climate, in particular solar spectral irradiance (SSI) variations and, more recently, energetic particle precipitation (EPP).
For about a decade, studies involving stratospheric resolving (chemistry) climate models have included SSI variations (e.g., Haigh, 1996;Matthes et al., 2003Matthes et al., , 2006Austin et al., 2008;Gray et al., 2010). Whereas relative TSI variations in the 11-year solar cycle are small, about 0.1 %, SSI changes are wavelength-dependent, and may vary by up to 10 % at 200 nm in the ultraviolet (UV) wavelength range (Lean, 1997). Variations in UV radiation over the solar cycle have significant impacts on the radiative heating and ozone budget of the middle atmosphere (Haigh, 1994).
Through dynamical feedback mechanisms, solar forcing can also influence the lower atmosphere and the ocean (e.g., Gray et al., 2010). Therefore, its importance is becoming increasingly evident, in particular for regional climate variability (e.g., Gray et al., 2010;. Together with volcanic activity, solar variability is an important external source of natural climate variability. Because of its prominent 11-year cycle, solar variability on timescales of years and beyond may offer a degree of predictability for regional climate and could therefore help reduce uncertainties in decadal climate predictions. However, there are still uncertainties in the observed atmospheric signals of solar variability (Mitchell et al., 2015a) and its transfer mechanism(s) to the surface. Proposed transfer mechanisms include changes in TSI and SSI, as well as in solar-driven energetic particles (e.g., . In addition, recent work suggests a lagged response in the North Atlantic and European sector due to atmosphere-ocean coupling (e.g., Gray et al., 2013;Scaife et al., 2013), as well as a synchronization of decadal variability in the North Atlantic Oscillation (NAO) by the solar cycle (Thieblemont et al., 2015). Lagged responses have been also attributed to particle effects , and hence the observed solar surface signal could be a combination of topdown solar UV and particle mechanisms as well as bottomup atmosphere-ocean mechanisms.
Since some of the climate models that were run under the previous fifth Coupled Model Intercomparison Project (CMIP5) included the stratosphere and the mesosphere for the first time, and were thus able to capture the so-called "top-down" mechanism for solar-climate coupling, both TSI and SSI variations were recommended by the WCRP/SPARC SOLARIS-HEPPA activity (http://solarisheppa.geomar.de/ cmip5). Recent modeling efforts have made progress in defining the prerequisites to simulate solar influence on regional climate more realistically (e.g., Gray et al., 2013;Scaife et al., 2013;Thieblemont et al., 2015), but the lessons learned from CMIP5 show that a more process-based analysis of climate models within CMIP6 is required to better understand the differences in model responses to solar forcing (e.g., Mitchell et al., 2015b;Misios et al., 2016;Hood et al., 2015). In particular, the role of solar-induced ozone changes and the need for a suitable resolution of climate model radiation schemes to capture SSI variations is becoming increasingly evident, and will be touched upon in this paper. In addition we will, for the first time, provide the solar-driven energetic particle forcing together and consistent with the radiative forcing.
The quantitative assessment of radiative solar forcing has been systematically hampered so far by the large uncertainties and the instrumental artifacts that plague SSI observations, and to a lesser degree TSI observations (e.g., Ermolli et al., 2013;Solanki et al., 2013). Another problem is the sparsity of the observations, which only started in the late 1970s with the satellite era. These problems have deprived us of the hindsight that is needed to properly assess variations on timescales that are relevant for climate studies. Another issue is the uncertainty regarding their absolute level. Since CMIP5, the nominal TSI has been reduced to 1361.0 ± 0.5 W m −2 (see Prša et al., 2016, and also Kopp and Lean, 2011). This adjustment has inevitable implications for understanding the Earth's radiation budget.
On multidecadal timescales, proxy reconstructions of solar activity reveal occasional phases of unusually low or high solar activity, which are respectively called grand solar minima and maxima (Usoskin et al., 2014). Of particular interest in this regard is the future evolution of long-term solar activity. Solar activity reached unusually high levels in the second half of the twentieth century, so that one could expect subsequent activity to fall back to levels closer to the historical mean, an expectation buttressed by the low amplitude of current activity cycle 24. Moreover, some recent empirical long-term forecast even predict a phase of very low activity in the second half of the twenty-first century -perhaps akin to the 1645-1715Maunder Minimum (Abreu et al., 2008Barnard et al., 2011;Steinhilber et al., 2012). However, how deep and how long such phase of low solar activity would be is still largely uncertain. Recent studies have investigated the climate impacts of a large reduction in solar forcing over the 21st century, revealing only a small impact on a global scale (Feulner and Rahmstorf, 2010;Anet et al., 2013;Meehl et al., 2013). However, a systematic assessment of the regional impacts of a more realistic future solar forcing is still to be done. For example, on regional scales, a future grand solar minimum could potentially reduce Arctic amplification  and reduce long-term warming trends over western Europe .
The above-mentioned uncertainty in the SSI is particularly challenging in the UV band (Ermolli et al., 2013). All climate model intercomparison studies relied so far on the NRLSSI1 dataset (Lean, 2000). However, it is becoming increasingly evident that its solar-cycle variability in the UV part of the spectrum may be too low compared to updated and more recent SSI reconstructions by models such as NRLSSI2 (Coddington et al., 2016) and SATIRE (Yeo et al., 2014). Recent studies have emphasized the sensitivity to UV forcing changes due to top-down effects (Ermolli et al., 2013;Langematz et al., 2013;Thieblemont et al., 2015;Maycock et al., 2015;Ball et al., 2016), thereby stressing the need for a state-of-the-art representation of the SSI, and in particular the UV band, in the CMIP6 solar forcing recommendation. For that reason, we will focus on the SSI uncertainty and possible impacts of the higher SSI variability in CMIP6 with respect to the CMIP5 solar forcing recommendation (http://solarisheppa.geomar.de/cmip5).
Analysis of model simulations and observations have shown a response of global surface temperature to TSI variations over the 11-year solar cycle of about 0.1 K Misios et al., 2016). However, the observed lag and the spatial pattern of the solar-cycle response are poorly represented in CMIP5 models (e.g., Mitchell et al., 2015b;Misios et al., 2016;Hood et al., 2015).
In addition, Gray et al. (2010) report that previous longterm variations in solar forcing used in some experiments (Alley et al., 2007) may be too weak due to an unfortunate choice of epoch (around 1750) for the preindustrial (PI) solar forcing, as this was a period of relatively high solar activity.
More recently, it has become better established that there is a solar response in the Arctic Oscillation (AO) and NAO from the top-down mechanism (Shindell et al., 2001;Kodera, 2002;Matthes et al., 2006;Woollings et al., 2010;Lockwood et al., 2010;Ineson et al., 2011;Langematz et al., 2013;Maycock et al., 2015;Thieblemont et al., 2015). Earlier models often employed a lower vertical domain, missing key physical processes by which solar signals in the stratosphere couple to surface winter climate. However, some of the more recent studies using stratosphere-resolving (chemistry) climate models confirm a stratospheric downward influence on the NAO from solar variability, which is particularly associated with changes in UV radiation and possibly through interaction with stratospheric ozone (e.g., Matthes et al., 2006;Ineson et al., 2011;Chiodo et al., 2012;Langematz et al., 2013;Thieblemont et al., 2015;. Some of these studies also suggest weaker model responses than are apparent in observations, although with large uncertainty (e.g., Gray et al., 2013;Scaife et al., 2013).
Another very important solar forcing mechanism after electromagnetic radiation is energetic particle precipitation Lilensten et al., 2015). Although the impact of EPP on the atmosphere is well documented, it had been ignored in solar forcing recommendations for earlier phases of CMIP. The term EPP encompasses particles with very different origins: solar, magnetospheric, and from beyond the solar system. These particles are mainly protons and electrons, and occasionally α-particles and heavier ions.
Solar protons with energies of 1 MeV to several hundred mega-electron volts are accelerated in interplanetary space during large solar perturbations called coronal mass ejections (Reames, 1999;Richardson and Cane, 2010). These sporadic events, also known as solar proton events (SPEs), are associated with the presence of complex sunspots, and are therefore more frequent during solar maximum.
Auroral electrons originate from the Earth's magnetosphere, and are accelerated to energies of 1-30 keV during auroral substorms (Fang et al., 2008). Sudden enhancements of their flux occur during geomagnetic active periods, which are more frequent 1-2 years after the peak of the 11-year solar cycle. Medium-energy electrons are accelerated to energies of a few hundred keV during geomagnetic storms in the terrestrial radiation belts (Horne et al., 2009). Precipitation of medium-energy electrons can be triggered by both solar coronal mass ejections and high-speed solar wind streams, leading to more frequent events near solar maximum and during the declining phase of the solar cycle. Particle precipitation, regardless of its origin, is thus modulated by solar activity, and varies with the solar cycle. However, these intermittent variations take place on different timescales, and at regions of varying altitude. Their sources and variability have recently been reviewed by Mironova et al. (2015).
EPP affects the ionization levels in the polar middle and upper atmosphere, leading to significant changes of the chemical composition. In particular, the production of odd nitrogen and odd hydrogen species causes changes in ozone abundances via catalytic cycles, potentially affecting temperature and winds (see, e.g., the review by Sinnhuber et al., 2012). Recent model studies and the analysis of meteorological data have provided evidence for a dynamical coupling of this signal to the lower atmosphere, leading to particleinduced surface climate variations on a regional scale (e.g., Seppälä et al., 2009;Baumgaertner et al., 2011;Rozanov et al., 2012;Maliniemi et al., 2014).
The third and most energetic component of EPP is represented by galactic cosmic rays (GCRs), which mainly consist of protons with energies ranging from hundreds of megaelectron volts to tera-electron volts. This continuous flux of particles is the main source of ionization in the troposphere and lower stratosphere. GCRs are deflected by the solar magnetic field, and hence their flux is anticorrelated with the solar cycle. Laboratory-based studies have confirmed the existence of ion-mediated aerosol formation and growth rates; however, the connection between GCR ionization and cloud production, and therefore convection, may be weak (Dunne et al., 2016) but this is still under debate. Meanwhile, the chemical impact via ozone-depleting catalytic cycles and subsequent dynamical forcing is rather well understood Rozanov et al., 2012).
The effect of various components of EPP on surface climate is an emerging research topic. However, the particle impact on regional climate may add to that of the UV forcing . One of the major challenges here is to quantify the long-term climate impact of such local and mostly intermittent particle precipitations.
The uncertainties in the solar forcing itself are compounded by possible errors in the simulated climate response to this forcing in models (e.g., Stott et al., 2003;Scaife et al., 2013). Possible errors in climate model responses could be related to biases in the representation of dynamical processes and dynamical variability, the inability of model radiation schemes to properly resolve SSI changes (Forster et al., 2011), or to the missing or inadequate representation of UV and particle-induced ozone signals (Hood et al., 2015). Any comparison of climate model simulations with observations could be affected by a combination of these possible sources of error. In addition, the comparison of models with observations is inhibited by the insufficient length of the observational records, and in some cases model simulations.
This paper will provide the first complete overview on solar forcing (radiative, particle, and ozone forcing) recommendations for CMIP6 from preindustrial times to the future and provides in this respect an advance to earlier model intercomparison projects (CMIP5, CCMVal, CCMI) as it gives a complete and state-of-the-art overview on our current understanding of solar variability and provides the dataset in a user-friendly way.
Section 2 presents the historical to present-day solar forcing dataset with individual subsections on solar irradiance (Sect. 2.1) and particle forcing (Sect. 2.2). Section 3 provides a description of the future solar forcing recommendation, Sect. 4 describes the PI control forcing, and finally Sect. 5 is comprised of a description of the solar induced ozone signal. A summary with respect to differences to the CMIP5 recommendation is given in Sect. 6.
2 Historical (to present) forcing data  In this section we first describe the solar irradiance dataset (including the TSI, the F10.7 decimetric radio index, and the SSI; see Sect. 2.1) and subsequently address the energetic particle datasets (including solar protons, auroral electrons, medium-energy electrons, and galactic cosmic rays; see Sect. 2.2).

Solar irradiance (TSI, SSI, and F10.7)
This subsection starts with a description of the available TSI and SSI datasets from two different solar irradiance models (NRLSSI and SATIRE), and one observational estimate (SOLID), before introducing the CMIP6 recommendation. Afterwards a recommendation on how to implement the solar irradiance forcing in CMIP6 models is provided. An evaluation of the comparison between different SSI forcing datasets, with a focus on CMIP5 and CMIP6 solar irradiance recommendations in a line-by-line model and two state-ofthe-art CCMs (i.e., CESM1(WACCM) and EMAC), is performed at the end to highlight the effects of solar irradiance variability on the atmosphere and possible effects on atmospheric dynamics all the way to the ocean.

Description of solar irradiance datasets NRLTSI2 and NRLSSI2
The Naval Research Laboratory (NRL) family of SSI models (Lean, 2000;Lean et al., 2011) is based on the premise that changes in solar irradiance from background-quiet Sun conditions can be described by a balance between bright facular and dark sunspot features on the solar disk. These two contributions are determined by linear regression between solar proxies, and direct observations of TSI and SSI by satellite missions such as SORCE (Rottman, 2005). These models are thus empirical.
Both the TSI and the SSI consist of a baseline solar contribution, with a wavelength-dependent contribution. The Magnesium (MgII) index, for example, represents the contribution of bright faculae, whereas the sunspot area represents the contribution of sunspots. The time dependency in TSI and SSI thus emerges from the temporal variability in the solar proxies. SORCE measurements at solar-minimum conditions are the basis for the adopted quiet Sun irradiance (Kopp and Lean, 2011) The recently updated version of the NRL models, named NRLTSI2 (for TSI) and NRLSSI2 (for SSI), have been transitioned to the National Centers for Environmental Information (NCEI) as part of their Climate Data Record (CDR) program (see http://www.ngdc.noaa.gov), and operational updates are provided on a near-quarterly basis. Coddington et al. (2016) describe the model algorithm, the uncertainty estimation approach, and comparisons to observations in detail. Please note that our version differs slightly from the one published by Coddington et al. (2016) by using a different scaling factor between the sunspot area as measured by the Royal Greenwich Observatory (from 1874 to 1976) and the NOAA/USAF Solar Observing Optical Network (SOON) since 1966. The future release of NRLSSI2 will use the same scaling factor as the version we use.
In NRLSSI2, a multiple linear regression approach of solar proxy inputs with observations of TSI from SORCE/TIM (Kopp et al., 2005), and observations of SSI from the SORCE/SOLSTICE (McClintock et al., 2005) and SORCE/SIM (Harder et al., 2005) instruments is used to determine the scaling coefficients that convert the proxy indices to irradiance variability. Because the wavelength-dependent scaling coefficients used in the NRLSSI2 model are derived for solar rotation timescales (i.e., the SSI observations and the proxy indices are detrended with an 81-day run-ning mean), concerns with respect to the long-term stability of the SORCE SSI observations (Lean and DeLand, 2012) are not shared with regard to the SORCE TSI record. However, because regression coefficients derived from detrended SSI time series differ from those developed from nondetrended SSI time series, a further adjustment is required to extend the SSI variability from solar-rotational to solar-cycle timescales. In NRLSSI2, this adjustment is made by a linear scaling that is constrained by the TSI variability. This adjustment is made in the separate facular and sunspot proxy records and the magnitude of the adjustment is smaller than the assumed uncertainty in the proxy indices themselves. In this approach, the integral of the SSI tracks the TSI; however, the relative facular and sunspot contributions at any given wavelength are not constrained to match their specific TSI contributions.
The NRLTSI2 and NRLSSI2 irradiances also include a speculated long-term facular contribution that produces a secular (i.e., underlying the solar activity cycle) net increase in irradiance from a small accumulation of total magnetic flux. This secular impact is specific to historical timescales (i.e., prior to 1950) and is consistent with simulations from a magnetic flux transport model .

SATIRE
The SATIRE (spectral and total irradiance reconstruction) family of semi-empirical models assumes that the changes in the solar spectral irradiance are driven by the evolution of the photospheric magnetic field (Fligge et al., 2000;Krivova et al., 2003Krivova et al., , 2011. The model makes use of the calculated intensity spectra of the quiet Sun, faculae, and sunspots generated from model solar atmospheres with a radiative transfer code (Unruh et al., 1999). SSI at a particular time is given the sum of these spectra, weighted by the fractional solar surface that is covered by faculae and sunspots, as apparent in solar observations.
The implementation of SATIRE employing solar images in visible light and solar magnetograms (magnetic field intensity and polarity) is termed SATIRE-S (Wenzler et al., 2005;Ball et al., 2012;Yeo et al., 2014), and that based on the sunspot number (SSN) is SATIRE-T (Krivova et al., 2010). Individual records are accessible at http://www2.mps.mpg. de/projects/sun-climate/data.html. We use here SATIRE-S for the satellite era (available from 1974 to 2015).
Prior to 1974 the CMIP6 SATIRE data were calculated with the SATIRE-T model (Krivova et al., 2010), although this was done using annual SSN (Version 1) instead of daily Group SSN (GSSN) (Hoyt and Schatten, 1998) while keeping all other inputs (including sunspot area) identical to the original version. SSI variability on subannual timescales, taken from the SATIRE-T model, was added afterwards. We shall henceforth call this model SATIRE in the following.
On decadal to centennial timescales, SATIRE reproduces observations such as the composite of the Lyman-α line at 121.6 nm (since 1947(since , Woods et al., 2000, the measured solar photospheric magnetic flux (since 1967), the empirically reconstructed solar open magnetic flux (since 1845, Lockwood et al., 2014), and the 44 Ti activity in stony meteorites (Krivova et al., 2010;Vieira et al., 2011;Yeo et al., 2014). SATIRE and NRLSSI2 are internally consistent, in the sense that the integral of the modeled spectral irradiances equals the TSI. These are among the best model reconstructions we currently have. Note, however, that both models reconstruct the SSI prior to the satellite era by assuming the relationship between sunspot number and SSI to be timeinvariant. The model uncertainty associated with this assumption is difficult to quantify. For that reason, it is not included in the uncertainties that are provided with the model datasets, which may therefore be underestimated.

Proxies used
Both NRLSSI2 and SATIRE rely on the sunspot number when no other solar proxies are available. For the CMIP6 composite, we decided to rely on version 1.0 of the international sunspot number (from http://www.sidc.be/silso), even though a newer version 2.0 recently came out (Clette et al., 2014). Indeed, SSI models have not yet been thoroughly trained and tested with this new sunspot number. Recent results by Kopp et al. (2016) suggest that this revision has little impact after 1885, and leads to greater solar-cycle fluctuations prior to that. Note that the NRLSSI version employed here uses the GSSN (Hoyt and Schatten, 1998), while SATIRE uses the annually averaged international sunspot number (v1.0). This affects the long-term trend of the final product in the presatellite era (see Sect. 2.1.2).
In NRLSSI2 the proxy index for facular brightening is the composite MgII index from the University of Bremen. The MgII index (Viereck et al., 2001) is the core-to-wing ratio of the disk-integrated MgII emission line at 280 nm. This quantity is used by many models as a UV proxy. The MgII index is available from 1978 onwards; values prior to that are estimated from the sunspot number.
In NRLSSI2 and in SATIRE, the proxy index for sunspot darkening is the sunspot area as recorded by ground-based observatories in white light images since 1882 (Lean et al., 1998). The sunspot darkening prior to 1882 is estimated from the sunspot number.

SOLID composite
The task at hand -to determine the most likely temporal variation in SSI -is challenged by the paucity of direct SSI observations, and the numerous instrumental artifacts that affect these observations. Recently, this task has been addressed by an international consortium, which has produced an observational SSI composite (Haberreiter et al., 2017). This SOLID 1 composite is the first of its kind to include a large ensemble of observations, which are listed in Table 1. These observations are combined by using a probabilistic approach, without any model input. We consider this observational composite here mainly as an independent means for comparing the SSI reconstructions. While it is premature to use this composite as a benchmark for testing models, it definitely represents the most comprehensive description to date of SSI observations (Haberreiter et al., 2017). The making of this composite involves several steps. First, the SSI datasets provided by the instrument teams (see the list of instruments in Table 1) are preprocessed, e.g., corrected for outliers and aligned in time. Furthermore, the short-term and long-term uncertainties of the SSI time series are determined. These steps are detailed in Schöll et al. (2016).
Second, for each individual dataset all data gaps are filled by expectation-maximization (Dudok de Wit, 2011). This approach makes use of observed proxies representing the SSI variation of different wavelength ranges in the solar spectrum, as listed in Table 2. We emphasize here that the gapfilling is required here to decompose the records into different timescales; at the end, the interpolated values are excluded from the composite. Third, each individual time series is decomposed by wavelet transform into 13 timescales a = 2 j , with j being the level of the scale. These scales go from 1 day (Level 0) to 11.2 years (Level 12). For each timescale, the uncertainty is determined by taking into account the short-term and long-term uncertainties. Fourth, the decomposed records are recombined by calculating the Our aim was to keep this composite fully independent from existing models. This means that no SSI models have been used to correct the observational data, which are taken at their face value, without any correction.
One challenge of this -as with any statistical approachis its reliance on the number of independent datasets. While for the past decades several missions were dedicated to measuring the UV band of the solar spectrum, the picture becomes bleaker when considering recent observations in the visible and near-UV parts of the spectrum. After 2003, the only remaining observations that are continuous are from SORCE/SIM, whose out-of-phase behavior (Harder et al., 2009) is controversial (Lean and DeLand, 2012;Ermolli et al., 2013). Let us therefore stress that the SOLID composite is based on observations only, and will necessarily undergo revisions as new physical constraints are incorporated, or new versions of the datasets are released.

CMIP6-recommended solar irradiance forcing
NRLSSI and SATIRE are not the only available models for reconstructing the SSI (Ermolli et al., 2013). However, they are the only ones that have been widely tested, and can easily cover the 1850-2300 time span for CMIP6 with one single and continuous record. The resulting homogeneity in time is a major asset of our reconstructions, and a necessary condition for obtaining a realistic solar forcing.
NRLTSI2 and SATIRE-TSI agree well on timescales of days to months and show the same long-term trend before 1986 in their original versions. Note, however, that in the CMIP-adapted version of SATIRE (see Sect. 2.1.1), the longterm change over this period is slightly weaker. In contrast to NRLTSI2, SATIRE-TSI declines after 1986. NRLSSI2 and SATIRE-SSI show significantly different spectral profiles of the variability between about 250 and 400 nm. This has fueled a debate (e.g., Yeo et al., 2015) that is unlikely to settle soon. The two models have been derived independently, and as of today there is no consensus regarding their relative performance. In this context, and for the time being, the most reasonable approach (in a maximum-likelihood sense) consists of averaging their reconstructions, weighted by their uncertainty. Since, in addition, we are lacking uncertainties that can be meaningfully compared, our current recommendation is to simply take the arithmetic mean of the two model datasets: (i) the empirical model NRLTSI2 and NRLSSI2 (Coddington et al., 2016) and (ii) the semi-empirical model SATIRE (Yeo et al., 2014;Krivova et al., 2010). Note that multimodel averaging is a widely used practice in climate modeling (e.g., Smith et al., 2013).
For historical data (1 January 1850-31 December 2014) both models rely, as described above, on one or several of the following: the international sunspot number V1.0, sunspot area distribution (after 1882), solar photospheric magnetic field (after 1974), and the MgII index (after 1978). Since NRLSSI2 and NRLTSI2 have yearly averages only before 1882, we reconstructed subyearly variations by using an AR-MAX (autoregressive-moving-average with exogenous input) model (Box et al., 2015) that uses the sunspot number as input.
The extreme ultraviolet (EUV) band (10-121 nm) is required for CMIP6 but is not provided by NRLSSI2 and SATIRE, whose shortest wavelength is 115.5 nm. We thus added it with spectral bins from 10.5 to 114.5 nm by using a nonlinear regression from the SSI in the 115.5-123.5 nm band, trained with TIMED/SEE data from 2002 to 2011. This is further detailed in Appendix A.
In some climate models the EUV flux is parameterized as a function of the F10.7 index, which is the daily radio flux at 10.7 cm from Penticton Observatory, adjusted to 1 AU, and measured daily since 1947 (Tapping, 2013). For practical purposes, we also provide this index. Values prior to 1947 are obtained by multilinear regression to the first 20 principal components of the SSI and application of minor nonlinear adjustments. Let us note that while the F10.7 index is a good proxy for EUV variability on daily to yearly timescales, this may not be true anymore on multidecadal timescales. As of today, the lack of direct EUV observations does not allow us to constrain its long-term evolution, whereas the F10.7 index at solar minimum has not significantly varied since 1947, when its first measurements started.
The dataset, together with a technical description, and a routine for how to read and integrate the SSI data to the radiation bands used in climate models can be found at http: //solarisheppa.geomar.de/cmip6. In addition, a recommendation on how to implement the SSI changes in the models is provided in the Appendix B. A detailed description of the CMIP6 solar irradiance forcing in TSI and SSI and a comparison to the CMIP5 recommendation are presented in the following.
K. Matthes et al.: CMIP6 solar forcing Total solar irradiance (TSI) Figure 1 presents time series of the TSI from the CMIP5, CMIP6, and the CMIP6-adapted versions of NRLTSI2 and SATIRE datasets, along with one observational composite from PMOD (version 42.64.1508) 2 . We stress that all the data are taken at their face value, using their latest version, without any adjustments or scaling, except for NRLTSI1, whose value we uniformly reduced by 5 W m −2 to account for the new recommendation for average TSI (see below).
All TSI records agree well on daily to yearly timescales, and in some cases (e.g., NRLTSI1 and NRLTSI2) they match as well on multidecadal timescales. The major difference arises in the long-term behavior of SATIRE and NRLTSI2 (see Sect. 2.1.1), which impacts the CMIP6 composite, and leads to a weaker trend compared to the CMIP5 recommendation (which was based on NRLTSI1 only). In both models, the historical reconstructions are sensitive to the assumptions made when constraining them to direct (satellite era) observations that suffer from large uncertainties. There is no consensus yet as to which one better represents long-term solar variability, and this is what motivated us to average them for making the CMIP6 composite.
More subtle differences between the different TSI datasets arise in the satellite era, especially with the unusually deep solar minimum that occurred in 2008-2010: the NRLTSI2 model has a weak negative trend between successive solar minima, whereas the SATIRE reconstruction exhibits a larger one. The resulting trend in the CMIP6 composite is comparable to the observational TSI composite from PMOD (grey area). Figure 1 does not show any model uncertainties, because these are either absent or difficult to compare. We do provide uncertainties, however, for the observational PMOD composite, based on an instrument-independent approach that is described in . Note that both models are mostly within the ±1σ confidence interval, which highlights how delicate it is to constrain them by observational data.
After CMIP5, the recommended value of the average TSI during solar minimum was reduced from 1365.4 ± 1.3 W m −2 to a lower value of 1360.8 ± 0.5 W m −2 after reexamination by Kopp and Lean (2011), later confirmed independently by Schmutz et al. (2013). Based on this, the International Astronomical Union recently recommended 1361.0 ± 0.5 W m −2 as the nominal value of the TSI, averaged over solar cycle 23, which lasted from 1996 to 2008 (Prša et al., 2016). Our CMIP6 composite complies with this recommendation.
To summarize for the TSI, the CMIP6 and CMIP5 recommendations are comparable on decadal and subdecadal timescales. They differ, however, by a weaker secular trend in CMIP6. Between 1980 and 1880, the difference be-tween TSI(CMIP6) and TSI(CMIP5) progressively increases from 0.1 to 0.4 W m −2 (after correcting the aforementioned 5 W m −2 offset in CMIP5). This results in a weaker change in solar forcing, which will be detailed in Sect. 2.1.3.
To estimate the impact of these different trends on the radiative forcing, we have conducted a high-spectral-resolution calculation using a single profile with a line-by-line radiative transfer code (libradtran) described in more detail below. This indicated an instantaneous change in downward solar flux of −0.16 W m −2 over the 1986-2009 period for the combined CMIP6 dataset. A crude estimate of the global mean forcing from this change is −0.04 W m −2 , which is relatively small in comparison to other forcings over this period.

Solar spectral irradiance (SSI)
To investigate differences and similarities between the SSI datasets, following Ermolli et al. (2013), we concentrate on four specific wavelength ranges: 120-200 nm (UV1), 200-400 nm (UV2), 400-700 nm (VIS), and 700-1000 nm (NIR), with special emphasis on the CMIP5 (i.e., NRLSSI1) and the CMIP6 (average of NRLSSI2 and SATIRE) datasets. These ranges are relevant for climate studies; see for example Table 3 below. Figure 2 shows the SSI time series from 1880 through 2014. Note that we added vertical offsets by adjusting the mean values to facilitate their comparison, using CMIP6 as a reference. We note the following: -The long-term increase from 1880 to 1980 is similar in NRLSSI2, SATIRE, and CMIP6, but NRLSSI2 predicts a slightly larger increase in the VIS and NIR. NRLSSI1 predicted a larger increase in the VIS, compensated by a smaller increase in the NIR and UV2.
-As already described for the TSI behavior above ( Fig. 1), SATIRE predicts a significant downward trend of the baseline for the last three solar cycles, as can be seen by comparing the SSI at solar minima between cycles 21-22 (1985), 22-23 (1995), and 23-24 (2008). NRLSSI2 does not predict significant variations and therefore the recommended CMIP6 time series has a slower downward trend than SATIRE in the recent cycles. This trend was not apparent in the dataset recommended for CMIP5.
-The solar-cycle variability in CMIP6 exceeds that of CMIP5, particularly in the UV2 and NIR ranges, while it is approximately the inverse in the VIS. The change in the NRLSSI model can be explained by the use of new and higher-quality data from the SORCE mission on the rotational timescale in NRLSSI2, while NRLSSI1 was based on data from older satellite missions. In the UV2, SATIRE predicts larger solar-cycle amplitudes, which can be explained by a larger weight of the network at these wavelengths. 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000   In Fig. 2, the apparently less regular solar-cycle reconstruction by NRLSSI2 between 1940 and 1960 is most likely caused by the transition from one sunspot record to another in that model (see Coddington et al., 2016). Figure 3 compares our CMIP6 dataset with the observational SOLID composite (see description above) and some direct SSI satellite observations. Generally speaking, the observations and observation-based composite agree very well with each other, and the CMIP6 dataset up to 200 nm. Larger cycle variations than in the CMIP6 SSI occur above about 200 nm in the observations. Such discrepancies are inherent to the observation of small variations over 11 years. On the -- Figure 3. CMIP6-recommended SSI time series (black) from 1980 to 2015 together with the SOLID.beta data composite (green) and relevant instrument observations for the following wavelength bins: 120-200 nm (left) and 200-400 nm (right). The SOLID and instrument time series have been adjusted to match the average level of the CMIP6 time series. Note that the longest wavelength observed by TIMED/SEE is 189 nm, the longest observed wavelength by SORCE/SOLSTICE is 309 nm and the shortest observed wavelength by SORCE/SIM is 240 nm. All time series are running averages over 2 years.
. Figure 4. Contribution, in percent, of various wavelength ranges to the TSI variability between the maximum of cycle 22 and the minimum between cycles 22 and 23. Contributions between 120 and 200 nm have been multiplied by 10 for improved visibility. Maximum and minimum values have been taken over an 81-day period centered on November 1989 and on November 1994, respectively. right panel of Fig. 3, one can notice the different influences of the various datasets on the SOLID composite, as a consequence of their uncertainty at different scales. For example, the SORCE/SIM data have only a minor (but significant) effect on the long-term variations of the composite. In the VIS and NIR part of the spectrum, the only available measurements are from the SORCE/SIM instrument, whose solarcycle variation is controversial (Lean and DeLand, 2012) and hence should be considered with great caution. Figure 4 shows the contribution of the different wavelength ranges to TSI variations between solar maximum on November 1989 (solar cycle 22) and solar minimum on November 1994 (between cycles 22 and 23) for the different solar irradiance models. Both extrema are averaged over 81 days. We use the same spectral bands and color coding as in Fig. 2 of the review by Ermolli et al. (2013). The latter figure, though, applies to the next solar cycle, when SORCE/SIM is operating. Our dates coincide with the ones chosen in the CCM time-slice experiments; see Sect. 2.1.3. Please note that the sum of the SSI variability of the various models is not equal to the TSI variability because the IR part is missing in Fig. 4.
Both SSI models agree very well for the 120-200 nm wavelength range. Discrepancies arise for wavelengths longer than 200 nm, as already discussed in Fig. 2. In the 200-400 nm range, the SATIRE model shows the largest variability, followed by NRLSSI2 and NRLSSI1. This results in a CMIP6 variability that is larger than for CMIP5, 45 % compared to 32 % (Fig. 4). In the VIS range this reverses, with CMIP6 showing a smaller variability than CMIP5 (30 % compared to 40 %). Also remarkable is the very good agreement between NRLSSI2 and SATIRE. In the NIR, CMIP6 shows slightly larger variability than CMIP5. The implications of these different spectral variabilities on the atmospheric heating and ozone chemistry and subsequent thermal and dynamical effects, with respect to both climatological differences between CMIP5 and CMIP6 and the solar cycle signals in CMIP5 and CMIP6, will be discussed in Sect. 2.1.3. Figure 5 illustrates the reconstruction of the EUV band by comparing spectra obtained at high and low levels of solar activity, and by showing the historical reconstruction of the band-integrated flux. As explained in Sect. 2.1.2, we estimate the EUV flux by nonlinear regression from the SSI at longer UV wavelengths, using the first 7 years of observations from TIMED/SEE. Not surprisingly, this reconstruction agrees well with the observations from TIMED/SEE. However, due to a lack of other long-duration EUV observations that are of sufficient radiometric quality, it is very difficult to assess the quality of our reconstruction. For the same reason, multidecadal variations are poorly constrained, and in particular, the presence of trends remains largely unknown. Note that wavelengths below 28 nm require more caution, since they rely on TIMED/XPS observations that were partly degraded . One future improvement of our dataset involves reconstructions of the EUV band that are based on more advanced models such as NRLEUV2 .

Evaluation of SSI datasets in climate models
Providing a first assessment of implications employing the SSI recommended for CMIP6 in comparison to CMIP5, we present results of two state-of-the-art chemistry-climate models (CCMs): the Whole Atmosphere Community Climate Model (CESM1(WACCM); Marsh et al., 2013) and the ECHAM/MESSy atmospheric chemistry model (EMAC; Jöckel et al., 2010Jöckel et al., , 2016. Additionally, we include results of single-profile radiative transfer calculations performed with the line-by-line radiative transfer code "libradtran" (Mayer and Kylling, 2005). We use the latter to present estimates of direct shortwave (SW) radiative heating impacts neglecting the ozone chemistry feedback which is included in the CCM results.

Chemistry-climate model descriptions
WACCM, the Whole Atmosphere Community Climate Model (version 4; Marsh et al., 2013), is an integrative part of the Community Earth System Model (CESM) suite (version 1.0.6; Hurrell et al., 2013). CESM1(WACCM) is a "hightop" CCM covering an altitude range from the surface to the lower thermosphere, i.e., up to 5 × 10 −6 hPa equivalent to approx. 140 km. It is an extension of the Community Atmospheric Model (CAM4; Neale et al., 2013) with all its physical parameterizations. For this study the model is integrated with a horizontal resolution of 1.9 • latitude × 2.5 • longitude and 66 levels in the vertical. CESM1(WACCM) contains a middle-atmosphere chemistry module based on the Model for Ozone and Related Chemical Tracers (MOZART3; Kinnison et al., 2007). It contains all members of the O x , NO x , HO x , ClO x , and BrO x chemical groups as well as tropospheric source species N 2 O, H 2 O, and CH 4 as well as CFCs and other halogen components (59 species and 217 gas-phase chemical reactions in total). Its photolysis scheme resolves 100 spectral bands in the UV and VIS range (121-750 nm; see also Table 3). The SW radiation module is a combination of different parameterizations. Above approx. 70 km the spectral resolution is identical to the photolysis scheme (plus the parameterization of Solomon and Qian, 2005, based on F10.7 solar radio flux to account for EUV irradiances). Below approx. 60 km the SW radiation of CAM4 is retained, employing 19 spectral bands between 200 and 5000 nm (Collins, 1998). For the transition zone (60-70 km) SW heating rates are calculated as weighted averages of the two approaches. Table 3 contains an overview of the SW radiation and photolysis schemes in comparison to EMAC, the second CCM utilized for this study. CESM1(WACCM) features relaxation of stratospheric equatorial winds to an observed or idealized Quasi-Biennial Oscillation (QBO; Matthes et al., 2010).
EMAC, The ECHAM/MESSy atmospheric chemistry (EMAC) model, is a CCM that includes submodels describing tropospheric and middle atmospheric processes and their interaction with oceans, land, and human influences (Jöckel et al., 2010). It uses the second version of the Modular Earth Submodel System (MESSy2) to link multiinstitutional computer codes. The core atmospheric model is the fifthgeneration European Centre Hamburg general circulation model (ECHAM5, Roeckner et al., 2006). For the present study we applied EMAC (ECHAM5 version 5.3.02, MESSy version 2.51, Jöckel et al., 2016) in the T42L47MA resolution, i.e., with a spherical truncation of T42 (corresponding to a quadratic Gaussian grid of approx. 2.8 × 2.8 • in latitude and longitude) with 47 hybrid pressure levels up to 0.01 hPa (∼ 80 km). The applied model setup comprises, among oth-ers, the following submodels: MECCA, JVAL, RAD/RAD-FUBRAD, and QBO. MECCA (Module Efficiently Calculating the Chemistry of the Atmosphere) (R.  provides the atmospheric chemistry model. JVAL (Sander et al., 2014) provides photolysis rate coefficients based on updated rate coefficients recommended by JPL (S. P. . RAD/RAD-FUBRAD (Dietmüller et al., 2016) provides the parameterization of radiative transfer based on Fouquart and Bonnel (1980) and Roeckner et al. (2003) (RAD). For a better resolution of the UV-VIS spectral band, RAD-FUBRAD is used for pressures lower than 70 hPa, increasing the spectral resolution in the UV-VIS from 1 band to 106 bands (Nissen et al., 2007;Kunze et al., 2014). Table 3 presents more details of the SW radiation and photolysis schemes in comparison to WACCM. The submodel QBO is used to relax the zonal wind near the equator towards the observed zonal wind in the lower stratosphere (Giorgetta and Bengtsson, 1999).

CCM experimental design
The CCM simulations with CESM1(WACCM) and EMAC are identically conducted in an atmosphere-only time-slice configuration. This means that the external forcings such as the solar and the anthropogenic forcing are fixed for the whole simulation period, i.e., 45 model years plus spin-up (∼ 5 years for EMAC, ∼ 3 years for CESM1(WACCM)). Concentrations of greenhouse gases (GHGs) and ozonedepleting substances (ODSs) are set to constant conditions representative for the year 2000. The lower-boundary forcing is specified by the mean annual cycle of SSTs and sea ice of the decade 1995-2004 derived from the HadISST1.1dataset (Rayner et al., 2003). All simulations are nudged towards an observed (EMAC) or idealized 28-month varying (CESM1(WACCM)) QBO. The only difference between the simulations is in the solar forcing. Four simulations for each of the following SSI datasets have been performed with EMAC and WACCM: CMIP6-SSI, its constituent datasets NRLSSI2 (Coddington et al., 2016), and SATIRE (Krivova et al., 2010;Yeo et al., 2014), as well as NRLSSI1 (Lean, 2000). The latter was recommended as solar forcing for CMIP5 including a uniform scaling of the spectrum to match TSI measurements of the Total Irradiance Monitor (TIM) instrument. As one emphasis of this study is to highlight differences to the previous phase of CMIP, we employed NRLSSI1 (including this scaling) and refer to it as NRLSSI1(CMIP5) in the following. Runs for each of the four datasets have been performed with both CCMs for a solar-minimum time slice and a solar-maximum time slice, respectively. For solarmaximum time slices, SSIs averaged over November 1989 are used (maximum of solar cycle 22) while for the solarminimum time slices averages over November 1994 are chosen. The latter does not match the absolute minimum of solar cycle 21-22 (June 1996). However, solar activity in November 1994 was already close to the minimum. The dif-ferences in solar activity between our solar-minimum and solar-maximum time slices for the respective datasets are within a range of 0.988 W m −2 for NRLSSI1(CMIP5) to 1.057 W m −2 for NRLSSI2.
It should be noted that these experiments will illustrate only one part of solar influence on climate. Given the atmosphere-only set-up of the runs, oceanic absorption of (mainly visible) solar irradiance and subsequent heating and feedbacks to the atmosphere -the so-called bottom-up mechanism (see Gray et al., 2010, and references therein) -is not represented in our simulations. Therefore we focus only on stratospheric signals and "top-down" dynamically induced responses in the troposphere. A second constraint of this study's experimental set-up is the choice of one solar cycle. Solar activity and hence spectral irradiance vary between different solar cycles. However, these differences are relatively small compared to a typical solar-cycle amplitude and will probably not affect the main results of this study. It should also be noted that the time-slice simulations were designed as a sensitivity study to test the impact of the different solar input datasets. They do not represent the full feedbacks of transient CMIP6 simulations.

Radiative transfer model libradtran
Radiative transfer calculations were performed with the high-resolution model libradtran (Mayer and Kylling, 2005), which is a library of radiative transfer equation solvers widely used for UV and heating-rate calculations (www. libradtran.org). Libradtran was configured with the pseudospherical approximation of the DISORT solver, which accounts for the sphericity of the atmosphere, running in a sixstreams mode. Calculations pertain to a cloud-and aerosolfree tropical atmosphere (0.56 • N), the surface reflectivity is set to a constant value of 0.1 and effects of Rayleigh scattering are enabled. The atmosphere is portioned into 80 layers extending from the surface to 80 km. The model output is annual averages of spectral heating rates from 120 to 700 nm in 1 nm spectral resolution, calculated according to the recommendations for the Radiation Intercomparison of the Chemistry-Climate Model Validation Activity (CCMVal) (Forster et al., 2011). As for the CCM simulations described above, calculations of the heating rates were performed for CMIP6-SSI, SATIRE, NRLSSI2, and NRLSSI1(CMIP5). The same climatological ozone profile is specified for both solar-maximum and solar-minimum conditions in order to assess the direct effects in atmospheric heating by SSI variations only. As such, the line-by-line calculations do not take into account the positive ozone feedback with the solar cycle, and SW heating-rate changes are expected to be weaker compared to the signatures in the two CCM simulations. Table 3. Summary of spectral resolution of the SW radiation and photolysis schemes in EMAC and CESM1(WACCM). Boundaries of spectral intervals and further refinement in brackets when larger than 1.

Methods
The analyses presented in the following consist of differences between climatologies derived from the various simulations. Given the time-slice configuration of the CCM runs with all external forcings equal except for the SSI dataset, we assume that statistically significant differences of two climatologies are the result of the differing solar irradiance forcings. Confidence intervals (95 %) as presented in Figs. 6 and 8, as well as statistical significance (p < 5 %) as marked in Figs. 10 and 11, are based on 1000-fold bootstrapping. Confidence intervals in Figs. 6 and 8 are only given for the CCM results related to CMIP6 SSI.

Climatological differences to CMIP5
Although all solar irradiance reconstructions subject to this analysis agree fairly well in TSI (see Fig. 1), they disagree significantly with respect to the spectral distribution of energy input, i.e., the shape of the solar spectrum. This is obvious from the offsets noted in Fig. 2 for the different spectral regions above 200 nm. Hence, we focus first on the climatological differences between the solar forcing in CMIP5 and CMIP6. We therefore compare the minimum time-slice simulations from the two CCMs and libradtran in Fig. 6 with respect to the climatological annual mean SW heating rates, as well as the temperatures and ozone concentrations between the two CCMs resulting from CMIP6-SSI, NRLSSI2, and SATIRE, respectively, as differences to equivalent simulations forced by NRLSSI1(CMIP5). The profiles represent the tropical (averaged over 25 • S-25 • N) stratosphere and mesosphere (100-0.01 hPa) for annual mean conditions for the CCMs and libradtran. Employing CMIP6-SSI results in significantly decreased radiative heating of large parts of the mesosphere and stratosphere (above 10 hPa) compared to NRLSSI1(CMIP5). Whereas the largest differences can be found at the stratopause with approx. −0.35 K day −1 according to both CCMs, and even more, −0.42 K day −1 , for libradtran (without any ozone chemistry feedback), libradtran and EMAC yield slightly increased SW heating rates below ∼ 7 and 10 hPa, respectively. This weaker SW heating in the new CMIP6 SSI dataset in the upper stratosphere and the stronger heating in the lower stratosphere are confirmed by the wavelength-dependent percentage changes between the CMIP6 and CMIP5 SSI datasets with respect to the radiation and photolysis schemes (Fig. 7). Regardless of the number of bands in the radiation code, both models show a smaller percentage difference of −5 % below about 300 nm and weaker or negligible differences above 300 nm (Fig.7).
Significant differences in radiative heating throughout the stratosphere related to the three state-of-the art SSI recon-structions are produced only with radiation codes of high spectral resolution such as in libradtran or -to a lesser degree -in EMAC (for the middle to lower stratosphere). Comparisons between CMIP6-SSI and its constituents NRLSSI2 and SATIRE in WACCM and EMAC lead to the conclusion that the choice of the CCM and its specific radiation and photolysis scheme is more important than the choice of the SSI dataset with respect to SW heating rates. In addition the ozone chemistry damps the SW heating response in the CCMs compared to libradtran, which misses the ozone feedback. Less SW radiation below 300 nm reduces ozone production (note also the reduced photolysis rates around 240 nm in Fig. 7), and hence less ozone is available to absorb SW radiation and results in a relative cooling of the upper stratosphere.
Corresponding to the SW heating-rate differences, large parts of the stratosphere and mesosphere are significantly cooler (up to −1.6 K at the stratopause) in simulations using CMIP6-SSI compared to NRLSSI1(CMIP5) irradiances. Note that libradtran results are shown for the SW heating-rate differences only, as temperature and ozone profiles are prescribed for the radiative transfer calculations. No significant differences in temperature are found when employing NRLSSI2 or SATIRE instead of CMIP6-SSI in CESM1(WACCM) which has a coarser spectral resolution in the SW heating parameterization than EMAC (Table 3 and Fig. 7). EMAC instead simulates significantly lower (higher) temperatures in the stratosphere when using NRLSSI2 (SATIRE) than CMIP6-SSI forcing and in general a warmer stratosphere (and cooler stratopause and mesosphere) than CESM1(WACCM).
The impact of CMIP6-SSI, compared to NRLSSI1(CMIP5) irradiance changes on ozone, is more complicated. In the middle tropical stratosphere, ozone concentrations are significantly lower (peaking at ∼ 7 hPa with approx. −3.2 %). In contrast, ozone concentrations around the stratopause are significantly higher for CMIP6-SSI (+0.8 and +1.6 % according to EMAC and CESM1(WACCM), respectively) than under NRLSSI1(CMIP5) irradiances. Despite the considerable differences in spectral resolution of the photolysis schemes (Table 3 and Fig. 7), for larger parts of the stratosphere below about 3 hPa, CESM1(WACCM) and EMAC agree fairly well. For both models the SATIRE irradiances show larger signals than NRLSSI2 irradiances, with the signal for CMIP6 in between. The ozone signals start to differ at and above the stratopause, probably due to the more detailed photolysis code and the higher model top in CESM1(WACCM) compared to EMAC. The ozone signal is much more uncertain with respect to the different SSI forcings than the SW heating rate and the temperature signals.
In summary, the CMIP6-SSI irradiances lead to lower SW heating rates and lower temperatures as well as smaller ozone signals in the lower stratosphere and larger ozone signals in the upper stratosphere and lower mesosphere than the CMIP5-SSI irradiances. Differences between the three tested SSI datasets occur in the SW heating rates only with a very high spectral resolution of the radiation code (libradtran, EMAC), and the differences are more prominent for ozone in a similar way for both CCMs, i.e., stronger effects occur for SATIRE than NRLSSI2. These direct radiative effects in the tropical stratosphere lead to a weakening of the meridional temperature gradient and hence to a statistically significant weakening of the stratospheric polar night jet in early winter (not shown).

Impacts of solar-cycle variability
The second question tackled by this evaluation is the atmospheric impact of the 11-year solar cycle using different SSI irradiance reconstructions. A special focus lies on the comparison of the new CMIP6 dataset with its predecessor NRLSSI1(CMIP5). Figure 8 provides annual mean tropical (25 • S-25 • N) profiles analogous to Fig. 6 but now illustrating differences between perpetual solar-maximum and perpetual solar-minimum conditions according to simulations forced by the various SSI-datasets.
All models and SSI-forcings produce the well-known solar-cycle impact of enhanced SW heating at solar maximum throughout the upper stratosphere and mesosphere. Differences to solar-minimum forcing peak at the stratopause with approx. +0.19 to +0.23 K day −1 . Only the libradtrancalculations -that do not include any ozone feedback -yield considerably weaker responses.
According to libradtran and CESM1(WACCM), CMIP6-SSI produces slightly higher SW heating-rate differences than NRLSSI1(CMIP5). However, for EMAC this is not the case. For both CCMs and libradtran, the usage of SATIRE leads to the strongest solar-cycle-induced SW heating-rate signals, while NRLSSI2 is associated with the weakest response (though not significantly different from NRLSSI1(CMIP5) for EMAC and libradtran).
Temperatures in the tropical stratosphere and mesosphere are generally higher during solar maximum than during phases of low solar activity. A local maximum of temperature differences is found at the stratopause with positive differences of 0.8-1.0 K compared to solar minimum. According to both CCMs, CMIP6-SSI forcing yields slightly higher temperatures (up to +0.2 K in the mesosphere in CESM1(WACCM)) for the stratopause region and the (lower) mesosphere than NRLSSI1(CMIP5). However, most of these differences are not statistically significant. Comparing CMIP6-SSI-forced results with its components NRLSSI2 and SATIRE yields heterogeneous results. According to EMAC, NRLSSI2 leads to a slightly weaker solarcycle response throughout the stratosphere, while the mesospheric response is stronger than SATIRE and CMIP6-SSI. CESM1(WACCM)-results show that the stratospheric (up to approx. 2 hPa) solar-cycle response to CMIP6-SSI-forcing in temperature is slightly weaker than in both NRLSSI2-and Figure 6. Impact of solar forcing for perpetual solar-minimum conditions according to CMIP6 (black) as well as constituent NRLSSI2 (red) and SATIRE (blue) datasets on climatological (annual mean) profiles of SW heating rates (top), temperature (center), and ozone concentrations (bottom) averaged over the tropics (25 • S-25 • N) when compared to NRLSSI1(CMIP5) solar forcing; derived from simulations with CESM1(WACCM) (long-dashed), EMAC (shortdashed), and libradtran radiative transfer calculations (solid, only top panel) only shown for SW heating rates; 95 % confidence intervals for CMIP6 simulations (hatched) estimated by bootstrapping. SATIRE-driven simulations. As opposed to that, simulations forced by SATIRE and CMIP6-SSI yield very similar warming signals in the mesosphere while NRLSSI2 produces a (significantly) weaker response in the mesosphere.
The solar-cycle signal in ozone is very consistent for most parts of the stratosphere and mesosphere, with respect to the SSI datasets. More important for the solar ozone signals seems to be the choice of the CCM (with its specific photolysis scheme; see also Fig. 9), especially for the lower stratosphere (10 hPa and below). In the lower mesosphere, however, the dataset-induced differences are larger than the model-induced ones. All analyzed combinations of CCMs and forcing datasets agree very well on the (relative) peak of the ozone response (+2.3-2.5 %) to the solar cycle at 3-5 hPa. In the lower mesosphere (0.2-1 hPa), CMIP6-SSI (and SATIRE) leads to a significantly weaker solar-cycle ozone response (+0.3-0.5 % at 0.5 hPa) than NRLSSI1(CMIP5) (and NRLSSI2; +0.6-0.8 % at 0.5 hPa). For the lower stratosphere (below 7 hPa), both CCMs agree that SATIRE leads to the strongest solar-cycle ozone signals, though still within the uncertainty associated with CMIP6-SSI-forced simulations. The comparison between CMIP6-SSI and NRLSSI1(CMIP5) yields no unequivocal result: CESM1(WACCM) exhibits a secondary maximum ozone response at approx. 70 hPa that is weaker with CMIP6-SSI than with NRLSSI1(CMIP5) while the opposite is seen in EMAC. Given the large uncertainty in the lower stratospheric solar ozone signal, we can only conclude that the signal is positive.
In summary, the CMIP6-SSI irradiance forcing leads to slightly enhanced solar-cycle signals in SW heating rates, temperatures, and ozone than the CMIP5-SSI irradiance forcing. In general, differences between the different SSI datasets are not statistically significant. Note that statistically significant differences in the irradiance amplitude between CMIP5 and CMIP6-SSI irradiances are observed between 300 and 350 nm in particular, a wavelength region important for ozone destruction (below 320 nm), consistently in both CCMs (Fig. 9).
The direct radiative effects in the tropical stratosphere from the CMIP6-SSI dataset, i.e., enhanced solar-cycle signals in SW heating rates, temperatures, and ozone in the tropical upper stratosphere lead to the expected strengthening of the meridional temperature gradient and hence to a statistically significant stronger stratospheric polar night jet which propagates poleward and downward during winter from December through January (Fig. 10) and significantly affects the troposphere with a positive AO-like signal developing in late winter, i.e., January and February (Fig. 11). This signal is very similar and statistically significant for both CCMs, and therefore the ensemble mean of both models is shown. Besides the radiative impact of the solar cycle, energetic particles also have an impact on the atmosphere and will be discussed in the following.

Particle forcing
Precipitating energetic particles ionize the neutral atmosphere leading to the formation of NO  (Porter et al., 1976;Rusch et al., 1981;Solomon et al., 1981) as well as some more minor species (Verronen et al., 2008;Funke et al., 2008;Winkler et al., 2009;Verronen et al., 2011a;Funke et al., 2011) due to both dissociation and ionization of the most abundant species, as well as due to complex ion chemistry reaction chains. The formation of NO x and HO x radicals leads to catalytic ozone loss that further triggers changes of the thermal and dynamical structure of the middle atmosphere. Energetic particle precipitation (EPP) thus introduces chemical changes to the middle atmospheric composition and can therefore only be considered explicitly in climate simulations that employ interactive chemistry. In the following we provide recommendations for the consideration of EPP effects in CCMs separately for auroral and radiation belt electrons (Sect. 2.2.1), for solar protons (Sect. 2.2.2), and for galactic cosmic rays (Sect. 2.2.3). In most cases, particle forcing can be expressed in terms of ion pair production rates. Recommendations for their implementation into chemistry schemes are provided in Sect. 2.2.4.  . Zonal mean zonal wind response to the 11-year solar cycle according to CMIP6-SSI in December and January as "ensemble mean" of CESM1(WACCM) and EMAC simulations; hatched areas denote statistical significance (p < 5 %) of shown differences.

Geomagnetic forcing (auroral and radiation belt electrons)
Energetic particles are trapped in the space around the Earth dominated by the geomagnetic field (known as the magneto- Figure 11. 500 hPa geopotential height response to the 11-year solar cycle according to CMIP6-SSI in January and February as "ensemble mean" of CESM1(WACCM) and EMAC simulations; hatched areas denote statistical significance (p < 5 %) of shown differences.
sphere). The loss of electrons into the atmosphere is termed "electron precipitation". Due to the Earth's magnetic field configuration, electron precipitation occurs mainly in the polar auroral and subauroral regions, i.e., at geomagnetic latitudes typically higher than 50 • . Enhanced loss fluxes are associated with geomagnetic storms, which can occur randomly, and also with periodicities ranging from the ∼ 27 day solar rotation to the 11-year solar cycle, and even to multidecadal timescales. The altitudes at which precipitating electrons deposit their momentum are dependent on their energy spectrum, with lower energy particles impacting the atmosphere at altitudes higher than those with higher energies (e.g., Turunen et al., 2009). Auroral electrons, originating principally from the plasma sheet, have energies < 10 keV and affect the lower thermosphere (95-120 km). Processes that occur in the outer radiation belt typically generate midenergy electron (MEE) precipitation within the energy range ∼ 10 keV to several MeV, affecting the atmosphere at altitudes of ∼ 50-100 km (Codrescu et al., 1997). Odd nitrogen, produced by precipitating electrons, is longlived during polar winter and can then be transported down from its source region into the stratosphere, to altitudes well below 30 km. This has been postulated already by Solomon et al. (1982) and observed many times (Callis et al., 1996; Stratospheric ozone loss due to electron-induced NO x production in the upper mesosphere-lower thermosphere and subsequent downward transport has been postulated by model experiments many times (Solomon et al., 1982;Schmidt et al., 2006;Marsh et al., 2007;Baumgaertner et al., 2009;Reddmann et al., 2010;Semeniuk et al., 2011;Rozanov et al., 2012). However, observational evidence for EPP-induced variations of stratospheric ozone linked to geomagnetic activity, characterized by a negative anomaly moving down with time during polar winter, have been given only very recently (Fytterer et al., 2015a;Damiani et al., 2016).
In addition, mesospheric ozone effects have been observed (Andersson et al., 2014a;Fytterer et al., 2015b) which are caused by HO x increases during MEE precipitation (Verronen et al., 2011b). Although the HO x -driven response is short-lived, the frequency of MEE events is large enough to cause solar-cycle variability in ozone (Andersson et al., 2014a). HO x response is seen at magnetic latitudes connected to the outer radiation belts, with, for example, the yearly amount of HO x varying with the observed magnitude of precipitation (Andersson et al., 2014b). The consideration of the effects of MEE on atmospheric species other than NO x , HO x , and ozone have not been investigated in detail to date, but they can be expected to be qualitatively similar to those caused by solar proton events (Verronen and Lehmann, 2013).
The impact of magnetospheric particles on the atmosphere is strongly linked to the strength of geomagnetic activity; this has been shown both for the direct production of NO in the thermosphere (Marsh et al., 2004;Hendrickx et al., 2015) and mesosphere (Sinnhuber et al., 2016), for mesospheric OH production (Fytterer et al., 2015b), and for the EPP indirect effect Funke et al., 2014a). Geomagnetic activity can be constrained over centennial timescales by means of proxy data provided by geomagnetic indices. Since our forcing dataset for magnetospheric particle precipitation relies on these indices, their reconstruction and homogenization is discussed first.

Reconstruction of geomagnetic indices
Geomagnetic indices provide a measure of the level of geomagnetic activity resulting from the response of the magnetosphere-ionosphere system to variability in the solar and near-Earth solar wind forcings. Many geomagnetic indices have been constructed and different indices are sensitive to different aspects of magnetospheric and ionospheric dynamics (Mayaud, 1980). The Kp and Ap geomagnetic indices (Bartels, 1949) are directly related by a quasilogarithmic conversion; they are proxies for the global level of geomagnetic activity, and are used as inputs to parameterizations of magnetospheric particle precipitation. For the historical solar forcing data, daily values of the Kp and Ap indices from 1850 to 2014 are required. However, these indices, provided by the International Service of Geomagnetic Indices (http://isgi.unistra.fr/), have only been produced from 1932 onwards. It is not possible to directly and consistently extend the Kp and Ap indices prior to 1932, as they use data from 13 geomagnetic observatories around the globe, and these data are unavailable further back in time. So, before 1932 the Kp and Ap indices must be estimated from other geomagnetic indices. The aa index (Mayaud, 1972) is the most appropriate choice, as it was constructed to be as similar as possible to the Ap index on annual timescales . However, the original aa index only extends back to 1868 (also available from http://isgi.unistra.fr/), and so an extension (Nevanlinna, 2004) to the aa index is also employed, extending it back to 1844 by use of the Ak indices from the Helsinki geomagnetic observatory, spanning 1844-1912. In addition, we implement a correction to the aa index to account for a change in the derivation of the index in 1957; see Lockwood et al. (2014).
On larger than annual timescales, the response of the aa and Ap indices is similar, and the indices are positively linearly correlated. However, on daily timescales the relationship between aa and Ap is not linear, and also displays a regular annual variation. Therefore, to estimate the daily Ap indices during the period 1868-1931, we used piecewise polynomial fits between the daily Ap and aa values for the period 1932-present, for each calendar month. These fits were then extrapolated to estimate the Ap values between 1868 and 1931 from the aa values. This process was repeated to estimate the relationship between the Ak indices provided by Nevanlinna (2004), and the Ap values estimated from the aa index. The piecewise polynomial fits for each calendar month were calculated using the overlap period between the Ak and estimated Ap records, 1868-1912. These were then extrapolated to estimate Ap in the period 1850-1867. Figure 12 shows the time series of the reconstructed Ap index and the aa and Ak indices used for extension, back to 1850.
The daily Kp index for the period 1868-1931 was estimated by using the monthly piecewise polynomial aa-Ap fits to estimate the 3-hourly ap index values from the aa index values. These 3-hourly ap values were then converted to the corresponding Kp indices, from which the daily mean was calculated. Since only daily Ak data are available, such an approach is not possible for the period 1850-1867, and so here the daily estimates of Ap, derived from Ak, are directly converted into daily Kp. The quasi-logarithmic nature of the conversion between the hourly Kp and ap indices, means that calculating daily values of Kp in this manner results in lower values than the standard method of averaging the eight 3hourly values in a day, resulting in a slight bias in the Kp estimates. A statistical correction for this bias was employed by estimating the bias using the difference between the aaderived Kp and the Ak-derived Kp for the period 1868-1912. The estimated bias was then subtracted from the Ak-derived Kp estimates for the period 1850-1867.

Auroral electrons
Lower thermospheric nitric oxide production by auroral electron precipitation can only be considered explicitly in CCMs extending up to 120 km or higher. There were only a few Earth system models of this characteristic in CMIP5 and it is expected that the number of such models will not increase significantly within CMIP6. Most of the models falling into this category use parameterizations for the calculation of auroral ionization rates or NO productions in the polar cusp and polar cap (Schmidt et al., 2006;Marsh et al., 2007). Those parameterizations are typically driven by geomagnetic indices and we hence recommend the use of the extended Ap or Kp time series described above. Figure 13 demonstrates the improvement in 2004-2009 wintertime polar NO y modeling when production due to electron precipitation is included. The simulations are from the SD-WACCM model version 4  nudged to the NASA Global Modeling and Assimilation Office Modern-Era Retrospective Analysis for Research and Applications (MERRA) (Rienecker et al., 2011) dynamics, and they are compared to observations from the ACE-FTS instrument (Jones et al., 2011). The auroral electron contribution was calculated with a Kp-based parameterization and was further controlled through eddy diffusion affecting the NO x descent from lower thermosphere. MEE ionization and NO x production was calculated using electron flux observations from the NOAA SEM-2 medium energy proton and electron detector (MEPED) instrument onboard the POES spacecraft (Evans and Greer, 2000), using methods described in more detail in the following MEE section. Enhancing the transport of auroral NO x from the lower thermosphere and including the mesospheric NO x production by MEE clearly improves the wintertime NO y near the stratopause. Around 0.1 hPa, modeled NO y increases by 100 % in both hemispheres, which leads to better agreement with ACE-FTS. Both auroral electrons and MEE have a clear impact, although the auroral contribution is larger. However, it should be noted that 2004-2009 was a period of weak MEE in general, and during other periods of stronger MEE the contributions become more equal such that the effect on model NO y is stronger (not shown).

Mid-energy electrons (MEEs) from the radiation belts
Highly energetic particles trapped in the radiation belts mainly consist of electrons and protons, forming inner and outer belts separated by a "slot" region (Van Allen and Frank, 1959). The outer radiation belt (located 3.5-8 Earth radii from the Earth's center) is highly dynamic, with electron fluxes changing by several orders of magnitude on timescales of hours to days (e.g., Morley et al., 2010). These changes are caused by the acceleration and loss of energetic electrons, through enhancements in radial diffusion and waveparticle interactions, during and after geomagnetic storms (e.g., Reeves et al., 2003). Storm-driven dynamic variations in the underlying cold plasma density influence the effectiveness of such processes in different regions of the inner magnetosphere (e.g., Summers et al., 2007).
In order to characterize the electron precipitation into the atmosphere since 1850, it is necessary to develop a model that uses in situ satellite observations from the modern era. The most comprehensive, long-duration, and appropriate set of observations is provided by the NOAA SEM-2 MEPED instrument onboard the POES spacecraft (Evans and Greer, 2000;Rodger et al., 2010). The MEPED instrument covers an energy range from 50 eV to 2700 keV. In this study we are primarily concerned with measurements made with the three medium energy integral electron detectors, i.e., > 30, > 100, and > 300 keV, as the lower "auroral" energy range has been well characterized in previous work. The SEM-2 instrument has been flown on low-Earth-orbiting (∼ 800 km) Sun-synchronous satellites since 1998, with up to six instruments operating simultaneously on occasion. Electron precipitation fluxes from the outer radiation belt are measured with the 0 • detectors, which are mounted approximately parallel to the Earth-center-to-satellite vector.
Improved calibration of the SEM-2 detectors has been undertaken by Yando et al. (2011) using modeling techniques contained in the GEANT-4 code to determine the detector geometric conversion factor, or detector efficiency (following the original work described in Evans and Greer, 2000).
Further treatment of the data requires correction for the false counts caused by incident proton fluxes, which we undertake using the technique described in Lam et al. (2010). Solid black line is ACE, black dots are the average standard deviation of the monthly means. The grey line is WACCM with weak transport of auroral NO y from the lower thermosphere and no mesospheric production by medium energy electrons (MEEs), the dotted red line is stronger NO x transport but no production by MEEs, and the red line is stronger NO x transport and production by MEEs.
These calibrations and corrections have been tested through comparison with other satellite (e.g., Whittaker et al., 2014b) and ground-based observations (e.g., Rodger et al., 2013;Neal et al., 2015). We convert the satellite position into the geomagnetic latitude parameter L (McIlwain, 1961) using the International Geomagnetic Reference Field (IGRF; see Appendix C) and bin the precipitating flux data into zonal means with 0.25 L resolution from L = 2-10 (40-75 • geomagnetic latitude).
Using observed electron flux data in 2002-2012, a precipitation model for radiation belt electrons was created by van de Kamp et al. (2016). The precipitation model was fitted to the corrected observations of the MEPED/POES detectors following the approach outlined in Whittaker et al. (2014a). In the CMIP6 application of this model, the Ap index is used as the driving input parameter. Ap defines the level of magnetospheric disturbance and the location of the plasmapause, both of which are needed to calculate precipitating-electron fluxes at different magnetic latitudes. Thus, the reconstructed Ap record, as described earlier, can be readily used to create a continuous electron precipitation time series for the whole CMIP6 period. As output, the model provides daily spectral parameters of precipitation: integrated flux at energies above 30 keV and a power-law spectral gradient. A test of high-energy resolved precipitating electron flux measurements made by the DEMETER satellite found that the power-law fit consistently provides the best representation of the flux (Whittaker et al., 2013). The model output has been shown to compare well with the spectral parameters derived from POES satellite data (van de Kamp et al., 2016).
An atmospheric ionization dataset has been calculated based on the Ap-based precipitation model, using a computationally fast ionization parameterization (Fang et al., 2010) and atmospheric composition from the NRLMSISE-00 model (Picone et al., 2002). This calculation considered MEE (30-1000 keV) with maximum energy deposition at altitudes between about 60 and 90 km (van de Kamp et al., 2016). Note that the ionization parameterization does not consider the contribution of Bremsstrahlung, which could be significant only at altitudes below 50 km (Frahm et al., 1997). Figure 14 shows examples of solar-cycle variability of the modeled atmospheric MEE ionization rates at ≈ 80 km altitude. At 68 • magnetic latitude (L shell 7.25), MEE precipitation is mostly driven by magnetic substorms and the solarcycle variability is relatively weak, except in around 2009 and the mid-1960s when extended periods of very low geomagnetic activity occurred. At 64 • (L shell 5.25), precipitation is driven by high-speed solar wind streams. A more clear solar-cycle variability can be seen with maximum ionization lagging the sunspot maximum by 1-2 years. At 56 • (L shell 3.25), precipitation is mainly driven by coronal mass ejections which lead to more of an event-type behavior. Relatively infrequent ionization peaks are contrasted with long periods of very low ionization. Similar behavior is seen at other altitudes as well (not shown).
In the following, we demonstrate with examples the MEE impact in WACCM simulations. The purpose is to present a proof of concept, i.e., show that the MEE ionization dataset can be used in chemistry-climate modeling and is producing the expected direct effect in the mesosphere. We simulated the 2002-2012 period, including the Ap-driven MEE ionization rates, and analyzed mesospheric OH and ozone responses at 0.040-0.015 hPa (approx. 70-80 km in altitude). This altitude region was selected because of the clear and direct MEE impact seen in satellite observations (e.g., Andersson et al., 2014a, b;Fytterer et al., 2015b;Sinnhuber et al., 2016). WACCM version 4 (see above) was used with 1.9 • × 2.5 • horizontal resolution extending from the surface to 5.9 × 10 −6 hPa (≈ 140 km geometric height) in the specified dynamics mode, nudged to MERRA reanalysis at every dynamics time step below about 50 km. Figure 15a and b shows global differences in yearly median OH mixing ratios due to MEE. Distinct features on the map are the stripes of enhanced values at magnetic latitudes between 55 and 75 • (both hemispheres) which connect through the magnetic field to the outer radiation belt. The impact decreases from 2005 to 2009 due to the decline in geomagnetic activity and MEE precipitation (as shown in Fig. 14). These features are of expected quality and magnitude, and similar to those based on Microwave Limb Sounder (MLS) data analysis (Andersson et al., 2014b). Figure 15c and d show relative differences in wintertime mean ozone due to MEE in the Southern Hemisphere. As expected, ozone is affected at high polar latitudes. In 2009, when MEE precipitation was weak, a maximum of 5-10 % decrease is seen near the South Pole relative to a reference WACCM simulation. In 2005, with much stronger MEE precipitation, the effect reaches up to 10-20 % and covers the whole polar cap above about 60 • latitude. The magnitude of the 2005 response, tens of percent, is comparable to that seen in MLS observations (Andersson et al., 2014a).

The EPP indirect effect: odd nitrogen upper-boundary condition
Those models with their upper lid in the mesosphere, i.e., those which do not represent the entire EPP source region, require an odd nitrogen upper-boundary condition, accounting for EPP productions higher up, in order to allow for simulating the introduced EPP indirect effect in the model domain. Odd nitrogen UBCs have been previously used in CCMs. In some model studies, the UBC was taken directly from NO x observations (e.g., Reddmann et al., 2010;Salmi et al., 2011), which, however, implies the restriction to the relatively short time period spanned by the observations. In other cases, a simple parameterization in dependence of the seasonally averaged Ap index (Baumgaertner et al., 2009) was employed (e.g., Baumgaertner et al., 2011;Rozanov et al., 2012), enabling extended simulations over multidecadal time periods. We recommend the use of the UBC model described in Funke et al. (2016) which is designed for the latter application and represents an improved parameterization due to its more detailed representation of geomagnetic modulations, latitudinal distribution, and seasonal evolution. This semi-empirical model for computing time-dependent global zonal mean NO y concentrations (in units of cm −3 ) or EPP-NO y molecular fluxes (in units of cm −2 s −1 ) at pressure levels within 1-0.01 hPa is available at http://solarisheppa.geomar.de/solarisheppa/cmip6. The UBC model has been trained with the EPP-NO y record inferred from Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) observations (Funke et al., 2014a). Inter-annual variations of the EPP indirect effect at a given time of the winter are related to variations of the EPP source strength, the latter being considered to depend linearly on the Ap index. A finite impulse response approach is employed to describe the impact of vertical transport on this modulation. Interannual variations of the EPP-NO y seasonal dependence, driven by variations of chemical losses and transport patterns, are not considered in the standard mode of the UBC model. Optionally, episodes of accelerated descent associated with elevated stratopause (ES) events in Arctic winters can be considered by means of a dedicated parameterization, taking into account the dependence of the EPP-NO y amounts and fluxes on the event timing (Holt et al., 2013). Although its application is recommended in principle, we note that it requires the implementation of the UBC model into the climate model system since ES events cannot be predicted in free-running model simulations. Further, the ES detection criterion might need to be tuned for each individual model system.
We recommend prescribing NO y concentrations, as this has already been tested successfully in a CCM. As an exam-ple, Fig. 16  ] in order to avoid model artifacts at the upper boundary (primarily triggered by the loss reaction of NO 2 with atomic oxygen). The simplest way to achieve this is to set [NO] = [NO y ] while forcing the concentrations of all other NO y species to be zero. Note that below the vertical domain where NO y is prescribed, MEE ionization still might occur and its consideration (as described before) is recommended. However, its consideration should be strictly limited to this vertical range since at and above the UBC, MEE is already implicitly accounted for by the prescribed NO y from the observationbased UBC model.
The UBC was tested in the EMAC CCM version 2.50 (see also Sect. 2.1.2 and Jöckel et al., 2010) with a T42L90 resolution. NO y concentrations were prescribed as NO in the uppermost four model boxes at pressure levels from 0.09 to 0.01 hPa. There, NO 2 was set to zero to suppress artificial NO 2 buildup. The model was run from 1999 to 2010 in the specified dynamics mode, nudged to ERA-Interim reanalysis data (Dee et al., 2011) below 1 hPa. A special treatment of ES events was disabled in the UBC model and SPEs were not considered. A comparison of polar NO y from EMAC with MIPAS observations is shown in Fig. 17 for 0.1 hPa (just below the prescription altitudes) and 1 hPa. A very good agreement between model predictions and observations is found at 0.1 hPa in both hemispheres, with the exception of periods of large SPEs (October/November 2003) in both hemispheres and ES events (January 2004 and February 2009) in the Northern Hemisphere. At 1 hPa, the agreement is still very good during winter, but EMAC underestimates the summer maximum of NO y slightly. This is also observed in the base model run without employing the UBC (see Fig. 17).
The interannual variation of ozone in the stratosphere and lower mesosphere has been investigated in this model in a similar way to a three-satellite composite (Fytterer et al., 2015a). The ozone difference between austral winters with high and low geomagnetic activity during 2005-2010 is shown in Fig. 18 for 27-day running means relative to the mean of all years. This period has been chosen because of its low SSI variability. Cross-correlations between SSI and particle impact are thus minimized. EMAC results are in excellent agreement with the observations provided in Fytterer et al. (2015a, Fig. 5), showing a clear negative ozone anomaly of 5-10 % moving down from the upper stratosphere to below 10 hPa (∼ 30 km) from July to October. Below, a positive anomaly of smaller amplitude is observed both in EMAC and the three-satellite composite, which might be due to a combination of self-healing, dynamical feedbacks, and chemical feedbacks (NO x -induced chlorine buffering in the processed "ozone hole" area).

Solar protons
Solar eruptive events sometimes result in large fluxes of highenergy solar protons at the Earth, especially near the maximum and declining periods of activity of a solar cycle. This disturbed time, wherein the solar proton flux is generally elevated for a few days, is known as a solar proton event. Solar protons are guided by the Earth's magnetic field and impact both the northern and southern polar-cap regions (> 60 • geomagnetic latitude, e.g., see Jackman and McPeters, 2004). These protons can impact the neutral middle atmosphere (stratosphere and mesosphere) and produce both hydrogen radicals and reactive nitrogen constituents.
Decreases in mesospheric and upper stratospheric ozone are mostly caused by SPE-induced HO x increases, which were predicted to occur over 42 years ago (e.g., see Swider and Keneshea, 1973). Direct measurements of SPE-caused OH and HO 2 enhancements have confirmed these early predictions (e.g., Damiani et al., 2008;Jackman et al., 2011Jackman et al., , 2014. Other observations of increased H 2 O 2  and of chlorine-containing constituents HOCl (an increase; see von Jackman et al., 2008;Damiani et al., 2008Damiani et al., , 2012Funke et al., 2011) and HCl (a decrease; see Winkler et al., 2009;Damiani et al., 2012) support the SPE-caused HO x enhancement theory. Since HO x constituents have relatively short lifetimes (hours), these SPE-enhanced species have only a short-term impact on ozone.
The SPE-induced NO y enhancements, on the other hand, cause a much lengthier reduction in ozone, given their much longer atmospheric lifetime (months) in the stratosphere. SPE-caused NO x increases have been shown in several studies (e.g., McPeters, 1986;Zadorozhny et al., 1992Zadorozhny et al., , 1994Randall et al., 2001;López-Puertas et al., 2005a;Jackman et al., 1995Jackman et al., , 2005bJackman et al., , 2008Jackman et al., , 2011Jackman et al., , 2014Funke et al., 2011;Friederich et al., 2013). Other NO y constituents like HNO 3 , HNO 4 , N 2 O 5 , and ClONO 2 (e.g.,   (Jackman et al., 1990) and IMP 8 was used for the fluxes from 1974 to 1993 (Vitt and Jackman, 1996). The National Oceanic and Atmospheric Administration (NOAA) Geostationary Operational Environmental Satellites (GOES) were used for proton fluxes from 1994 to 2014 (e.g., Jackman et al., 2005aJackman et al., , 2014. Other precipitating particles are associated with SPEs, besides protons. These include alpha particles, which comprise, on average (but this value may vary from event to event) about 10 % of the positively charged solar particles; other ions, which account for less than 1 % of the remainder; and electrons (e.g., Mewaldt et al., 2005). Only solar protons are included in energy deposition computations given in this paper. Please note that other charged particles could add modestly to this energy deposition in the middle atmosphere during SPEs.
The proton fluxes of energies 1-300 MeV were used to compute daily average ion pair production profiles using an energy deposition scheme first discussed in Jackman et al. (1980). The scheme includes the deposition of energy by the protons and assumes 35 eV is required to produce one ion pair (Porter et al., 1976). Note that this approach misses development of the atmospheric cascade (Sect. 2.2.3). This pro-cess, crucial for GCRs, is minor for SPEs in the upper atmosphere but may contribute modestly to the energy deposition in the lower stratosphere.
The dataset for daily average ion pair production rates at 60-90 • geomagnetic latitudes from SPEs was computed over a 52-year time period , when proton flux measurements from satellites were available. A longer-term dataset for these SPE-caused ion pair production rates was created for the 1850-1962 time period using activity levels of the measured sunspots over the solar cycles. SPEs are much more frequent during years of maximum solar activity and vice versa. This longer-term dataset was reconstructed for years 1850-1962 in a random way using solar activity levels combined with the 52-year calculated SPE-caused ion pair production. Thus, an historical record of atmospheric forcing by SPEs in the form of a daily average ion pair production rate is available over the entire period 1850-2014 for use in global models.

Galactic cosmic rays
The Earth's atmosphere is continuously irradiated by galactic cosmic rays, which consist mostly of protons and α-particles with a small amount of heavier fully ionized species up to iron and beyond. These cosmic rays originate from galactic (mostly supernova shocks) and exotic extra-galactic sources and may have an energy up to 10 20 eV but the bulk energy is in the range of several GeV per nucleon. While the GCR flux can be assumed (at timescales shorter than thousands of years) constant and isotropic in the interstellar space, it is subject to strong modulations within the heliosphere (the region of about 200 AU across, hydromagnetically controlled by the solar wind and the heliospheric magnetic field). This modulation is driven by solar magnetic activity -the stronger the solar activity, the lower the GCR flux near the Earth. This flux is often described by the so-called force-field model (Caballero-Lopez and Moraal, 2004) parameterized via the time-variable modulation potential φ and the fixed shape of the local interstellar spectrum (see, e.g., Usoskin et al., 2005, for more details). Typically, the value of the modulation potential is defined by fitting data from the worldwide network of ground-based neutron monitors calibrated to fragmentary space-borne measurements of GCR energy spectra. These data are available since 1951 or, with caveats of using the ground-based ionization chambers, since 1936 .
Before impinging on the Earth's atmosphere, GCR are additionally deflected by the geomagnetic field.
This shielding is usually parameterized in the form of the effective geomagnetic rigidity cutoff, so that only particles with rigidity (momentum per charge) exceeding the cutoff can penetrate to the atmosphere at a given location, while less energetic particles are fully rejected (Cooke et al., 1991). When energetic cosmic rays enter the atmosphere, they initiate a nucleonic-muon-electromagnetic cascade in the atmosphere, ionizing ambient air. A subproduct of this cascade is the production of cosmogenic isotopes such as 14 C, 10 Be, and others. These cosmogenic isotopes are long-lived and can be used for the reconstruction of solar activity over several thousand years (see Sect. 3).
Between the surface and 25-30 km, cosmic rays are the main source of atmospheric ionization (Mironova et al., 2015) causing the production of NO x and HO x . The influence of GCRs on atmospheric chemistry has been investigated in several model studies (Krivolutsky et al., 2002;Rozanov et al., 2012;Mironova et al., 2015;Jackman et al., 2016). GCR-induced ozone reductions of more than 10 % in the tropopause region and up to a few percent in the polar lower stratosphere have been reported. The potential impact on surface climate has been studied by Rozanov et al. (2012).
The process of development of the atmospheric cascade, initiated by energetic cosmic rays, is complicated and needs to be modeled using direct Monte Carlo simulations of all the processes involved in the development of the cascade, including all types of interactions, scattering, and decay of various species. We note that older models based on empirical parameterizations or on solution of Boltzmann-type equations may introduce significant biases in the results, especially in the lower atmosphere. Accordingly, we use a full Monte Carlo model, CRAC:CRII (Usoskin and Kovaltsov, 2006;Usoskin et al., 2010), based on the CORSIKA Monte Carlo package. A similar result can be obtained with the PLANETOCOSMIC (Desorgher et al., 2005) based on the GEANT package. The agreement between the two models has been verified (Usoskin et al., 2009) to be within 10 %. GCR ion pair production rates are provided as a function of the barometric pressure and geomagnetic latitude and were calculated from the modulation potential values φ of the 9400-year long record by Steinhilber et al. (2012). Since this dataset has a 22-year time resolution, it has been interpolated to interannual timescales to resolve individual solar cycles, based on the sunspot numbers (see Fig. 19). One can see that this agrees well with the values of φ reconstructed using data from the worldwide network of neutron monitors . An example of the calculated ionization rate is shown in Fig. 20. The ionization maximizes in polar regions at heights of 15-20 km, while in the equatorial region the maximum of ionization occurs at about 12 km (note that in case of using ionization per cubic centimeter, the ionization maximizes at about 10 km in the equatorial region and 12 km over the poles).

Implementation of chemical changes induced by particle-induced ionization
MEE, SPE, and GCR-induced atmospheric ionization is expressed in the CMIP6 forcing dataset in terms of ion pair production rates (IPRs). Note that IPR data are provided in units of ion pairs per gram per second as a function of the barometric pressure. These units are natural for the ionization processes and are mostly independent of the atmospheric conditions. Conversion into units of ion pairs per cubic centimeters per second (cm −3 s −1 ), by multiplying with mass density, should be done on the model grid ideally at each time step, but at least once per day. Recommendations for the projection of ion pair production rates onto geographic coordinates can be found in Appendix C. Particle-induced ionization causes, along with the generation of the ion pairs, the production of NO x and HO x . As a basic approach, we recommend considering these NO x and HO x productions in CCMs with interactive chemistry by using the parameterizations provided by Porter et al. (1976) and Solomon et al. (1981), respectively. More detailed information about these approaches is provided in Appendix D and E. Recommendations for the implementation of EPP effects on minor species are given in Appendix F.

Future scenarios (2015-2300)
One of the key questions in the CMIP6 project is our ability to assess future climate changes given climate variability, predictability, and uncertainties in scenarios. In CMIP5, climate projections were based on a stationary-Sun scenario, obtained by simply repeating solar cycle 23, which ran from April 1996 to June 2008 (Lean and Rind, 2009). Clearly, such a stationary scenario is not representative of true solar activity, which exhibits cycle-to-cycle variations, and trends. Therefore, in CMIP6 we decided to replace it by a more realistic scenario for future solar activity exhibiting variability at all timescales. As will become clear below, this scenario provides a plausible course of solar activity until 2300, given what has been observed in the past, and does not aim at predicting what the level of solar activity will actually be.
As of today, predicting solar activity up to 2300 is very challenging, if not impossible. Ever since the solar cycle was first observed, people have been trying to predict what future cycles may look like. Prediction methods were empirical, and at best could give some clue of what the amplitude of the next cycle could be (Petrovay, 2010). This situation prevailed until the early 21st century, when physical models of the magnetic dynamo that drives solar activity started unveiling a more realistic picture (Charbonneau, 2010). Many were confident that in a near future one would be able to predict the solar cycle several decades ahead. The unusually long solar cycle number 23 that ended in 2009, and the weak one (no. 24) that followed came as a surprise, and manifested our evident lack of understanding of the solar cycle.
Given the difficulty in predicting solar activity even one cycle ahead (Pesnell, 2012;Cameron et al., 2013), one may wonder whether it even makes sense to consider longer horizons. The solar cycle is driven by the solar dynamo, by which the dynamical interactions of flows and magnetic fields in the solar convection zone lead to periodic reversals of polarity of the solar magnetic field (Charbonneau, 2010). One of its consequences is the emergence of regions with enhanced magnetic field, namely sunspots, whose numbers are the most widely known proxy for solar activity. During that emergence process, the predominantly toroidal magnetic field generates a dipole moment, which in turns generates a new toroidal magnetic component through rotational shearing. Because these inductive processes are operating in the turbulent environment of the solar convection zone, memoryless stochastic forcing of the dynamo is certainly present. Nonetheless, memory effects associated with these periodic reversals play a major role in determining solar variability on multidecadal timescales, and to some degree are decoupled from the short-term variability. This is our prime motivation for attempting to estimate future solar activity on multidecadal timescales as a basis for our scenario construction.
There are two possible approaches for constructing future scenarios of solar activity. One is to learn from solar dynamo models, and the other is to infer from past variations of solar activity. Recent years have witnessed significant advances in solar dynamo modeling, and the development of several physical models . In stochastically forced kinematic dynamo models, persistence in the solar-cycle-averaged level of activity (hereafter "memory") can extend from less than one and up to three cycles, depending on details of the models and of the physical parameter regime in which they operate (e.g., St-Jean and Charbonneau, 2007;Yeates et al., 2008;Cameron et al., 2013;Muñoz-Jaramillo et al., 2013). In nonkinematic dynamo models incorporating the magnetic back reaction on large-scale inductive flows, deterministic modulation of the primary cycle amplitude can be produced, amounting to a form of memory that can extend over tens of activity cycles in a wide range of parameter regimes (e.g., Tobias, 1997;Bushby, 2006). Among models that do succeed in producing deep activity minima similar to the Maunder minimum, most show onsets occurring surprisingly fast, typically within one or two cycles. At present these models are still not detailed enough to warrant their use in producing physics-based forecasts.
The second approach consists of making a probabilistic statement about future solar activity based on present conditions and by learning from past variations of solar activity. The latter are not totally random and exhibit some degree of regularity which can be exploited by means of time series analysis techniques (Brockwell and Davis, 2010), assuming that the statistical behavior of the Sun is invariant on the timescales under consideration. This enables us to build an ensemble of empirical forecasts and define from these what we call in the following a scenario, namely a plausible course of solar activity, based on assumptions about how this activity will develop. Let us stress that our scenarios are meant to provide a reasonable evolution of solar forcing up to 2300: they are forecasts of what could happen, and do not aim at describing what will happen.
We construct two different scenarios for future solar activity: a reference (REF) scenario with a plausible level of solar activity and its variability; an extreme (EXT) scenario with an exceptionally low level of solar activity. This extreme scenario is meant to be used for sensitivity studies.
Two extreme scenarios would have been preferable for bracketing the possible range of future solar variability. However, the enormous computational effort to analyze such sen-sitivity scenarios within CMIP6 makes it necessary to restrict ourselves to one single extreme scenario. There are several reasons why our extreme scenario is a low one. First, the Sun just exited a period of high activity, called grand solar maximum, and several empirical studies suggest that it is likely to be low or moderate in the near future (Abreu et al., 2008;Barnard et al., 2011;Steinhilber et al., 2012). Secondly, empirical studies indicate that grand maxima are more likely to be followed by a grand minimum than by another grand maximum (Inceoglu et al., 2016). Entry into grand minima typically involves extended excursions at the edge of the attractor defining normal cyclic behavior, with associated higher-than-average cycle amplitudes, until collapse to the trivial solution (or transition to another lowamplitude attractor) is triggered; this behavior is known as intermittency, and in this context a grand maximum is usually more likely to be followed by a grand minimum than by another grand maximum (e.g., Passos et al., 2012). And thirdly, when we generated an ensemble of 1000 scenarios with the empirical models to be described below (the different runs were based on different training intervals and model parameters) none of the scenarios constructions or analyses gave rise to a grand maximum within the next century.
The best gauge of past solar variability is the production rate of the 14 C and 10 Be cosmogenic isotopes (Usoskin, 2013), as already described in Sect. 2.2.3. The level of activity is sometimes expressed in terms of the modulation potential , which is intimately related to the open solar magnetic flux. There exist today different records of cosmogenic isotopes, which are gradually improving as new observations are being added, and underlying assumptions, such as the strength of the geomagnetic dipole, are better constrained. Here, we consider the 9400-year-long record by Steinhilber et al. (2012), which is a composite of 14 C and 10 Be data, and is available from http://www.ncdc.noaa.gov/paleo/forcing.html. The record is sampled every 22 years, and runs from 7439 BC to 1977 AD. For constructing the most likely scenario, we want our historic observations to end as close as possible to the present. This record was extended from 1977 to 1999, using the geomagnetic reconstruction of the open solar flux (Lockwood et al., 2014). The geomagnetic reconstruction provides annual values of the open solar flux back to 1845, and we extrapolate the linear regression between the 22-year boxcarsmoothed geomagnetic reconstruction and the cosmogenic reconstruction to provide an estimate of the cosmogenic in 1999. Figure 21 displays the complete modulation potential record, which exhibits occasional periods of low solar activity (i.e., grand solar minima) separated by periods during which the fluctuations seem more erratic. It is noteworthy that the Sun was more active during recent decades (the modern grand maximum) than during most of the other periods (Solanki et al., 2004).
In the following, we consider three independent (and mostly complementary) forecast methods for extending up to 300 years ahead, and use their weighted average as the most likely value. To convert these 22-year averages of into quantities that are relevant for climate forcing, we first convert the 22-year averaged modulation potential into an average sunspot number using the method described in Usoskin et al. (2014). Historic solar cycles are then scaled to match this average sunspot number and are subsequently stitched together to obtain a future sunspot record. Using the latter, we estimate the SSI, and particle forcing, as described in Sect. 3.5.
The solar-cycle averaged modulation potential (and the sunspot number) cannot be meaningfully predicted more than a few solar cycles ahead (e.g., Kremliovsky, 1995;Petrovay, 2010). Thus, whatever is discussed further is only a plausible scenario that is not pretending to be a prediction with any degree of confidence.

Statistical methods
Here we construct the REF and EXT solar activity scenarios by applying three empirical time series techniques to the heliospheric modulation potential record produced by Steinhilber et al. (2012). The reason for choosing only three techniques out of many is motivated by our desire to build an ensemble of reasonable scenarios that involve different assumptions. We consider these three techniques to reflect a fair range of possibilities for the future evolution of the heliospheric modulation potential, and it would be impractical to include an exhaustive set of techniques. The ones we consider are widely used in different contexts (Brockwell and Davis, 2010). The first one (analogue forecast, AF) is nonparametric and does not make any assumptions on linearity, the second one (autoregressive (AR) model) is parametric and linear, and the third one (harmonic model, HM) is parametric too, but can handle nonlinear systems. Below we describe each of them, before detailing how the two scenarios were constructed.

Analogue forecast
The analogue forecast is calculated with a simple nonparametric technique, known across disciplines by various names including compositing, superposed-epoch analysis, conditional sampling, and Chree analysis. In a data sequence that exhibits a low-amplitude response to a specific trigger event, the response may be obscured by sources of random variability. The AF technique aims to reveal the response to a specific trigger event by averaging the responses to many occurrences of the trigger event, such that over many events random variability will be suppressed and the response will emerge (Laken andČalogović, 2013). Barnard et al. (2011) used this technique with the Steinhilber et al. (2012) record to estimate the possible future evolution given the expected decline from the grand solar maximum that persisted through the late 20th century. Here we perform an updated version of the procedure employed by Barnard et al. (2011).
Defining grand solar maxima in the record as any period above the 90th percentile of the distribution (462 MV) identifies 23 grand solar maxima in the record prior to the most recent one. Here the declines from the grand solar maxima are used as the event triggers from which the AF is calculated, and these times are marked on the time series shown in Fig. 21 by red stars. Figure 21 also shows that the most recent values in the record have not yet fallen below the grand solar maxima threshold. Therefore, as the end date of the most recent grand solar maxima is not known, it must be estimated, to provide a date from which the AF applies. The

Autoregressive model
Autoregressive models are widely used in time series analysis (Box et al., 2015). These linear parametric models assume that variations can be described by means of a linear stochastic difference equation, so that future values are expressed as a linear combination of present and past values. In our context, we havê k+h = a 1 k−1 + a 2 k−2 + . . . + a p k−p , where k is the heliospheric modulation potential (after subtracting its time average) at the kth time step, andˆ k+h is its value predicted h ≥ 0 time steps ahead. Since k is measured with a cadence of 22 years, each value of k corresponds to a 22-year time step. AR models are capable of describing a variety of dynamical behavior, including oscillations, red noise, etc. The main free parameter is the model order p, for which there exist several selection criteria (Ljung, 1997). In our case, we obtain p = 20. According to this value, our scenarios are based on observations that go back at most 440 years into the past. Because AR models are linear, they cannot properly describe nonlinear dynamical effects such as the occasional occurrence of grand solar minima, which appear as a different mode of solar activity (Usoskin et al., 2014). To partly overcome this limitation, we train the model by considering time intervals whose conditions are similar to those prevailing at the end of the 20th century. More specifically, we train the model by using only observations that belong to either of the 23 time intervals [t GSM −1100 years, t GSM +1100 years] that are centered on the same occurrences, t GSM , of the 23 grand solar maxima as in the analogue forecast (see Sect. 3.1.1 and Fig. 21). We exclude observations that follow t GSM by up to 300 years in order to give us a means for testing the predic- tion on a time interval that is (mostly) independent of the one the model has been trained on. The only exception in this list is the last grand solar maximum of the late 20th century, for which we do not have future observations available.
The 2200-year duration of the time interval is the shortest one, below which the performance of the AR model starts degrading. This independence of the intervals on which the model is trained and then tested (called cross-validation; Hastie et al., 2009) is essential for it and allows the testing of the performance of the model and define confidence intervals that truly reflect the difference between the constructed and actual course of solar activity.
Using AR models, we now construct the heliospheric potential 22, 44, . . . , 308 years ahead by training a different model for each value of the scenario horizon h in Eq. (1). The error, which is the usual metric for describing the performance of the scenario construction, is classically defined as where the ensemble average . . . runs over all 23 grand solar maxima. Clearly, the AR model can be improved in several ways. One of them consists of modeling the full record of the heliospheric potential and using threshold AR models to account for mode changes. These issues will be addressed in a forthcoming publication.

Harmonic model
Several studies have reported the existence of periodicities in cosmogenic solar proxies, with outstanding periods of approximately 87 years (known as the Gleissberg cycle), 208 years (de Vries cycle), 350 years, and more (McCracken et al., 2013). The origin of these elusive periodicities has been hotly debated, and is beyond the scope of our study. Steinhilber and Beer (2013) successfully used them to model solar activity on multidecadal timescales, and produced a 500-year extension of the heliospheric potential. We consider the same approach, and thus assume that the dynamical evolution of the heliospheric potential obeys a deterministic model.
We parameterize and train this harmonic model in a way that is similar to the preceding AR model. First, we select 2600year intervals that are centered on the timings, t GSM , of each of the 23 grand solar maxima, and exclude the 300 years that follow each t GSM . In contrast to the AR model, however, we estimate the model coefficients separately for each interval in order to account for possible phase drifts. To select the periods T , and reduce their number, we start from an initial set of N = 19 periods of less than 2200 years, taken either from McCracken et al. (2013) or obtained from spectral analysis. We then estimate the error of this method after discarding one period at a time, only keeping those that do not lead to a significant increase of the error. Finally, we end up with a set of 12 periods of {88, 105, 130, 150, 197, 208, 233, 285, 353, 509, 718, 974} years. Likewise, the error is used to fix the 2600-year duration of the intervals. Longer intervals give a better statistic, but result in a poorer fit because of possible phase drifts in short-period oscillations. Figure 22a shows the results of the AF, AR, and HM methods, as well as the observed record from 1845 to 1999. All three methods reveal a decrease in solar activity until approximately 2100. In the HM model, oscillations with largest amplitudes occur, on average, at 88, 208, and 285 years, and so periodicities are clearly present in the HM. In contrast, scenario constructions obtained from the AF and AR models tend to converge toward a climatological mean.

Errors of statistical methods
The error of each method was assessed with a bootstrap approach. Defining grand solar maxima as any period in the record larger than the 90th percentile of the distribution, there are 23 other grand solar maxima in the record prior to the one that persisted through the late 20th century. For each method, hindcasts were made for the 308 years following the decline from each prior grand solar maximum. For each method and each grand solar maximum, the models were trained analogously to the descriptions above, such that no data from within the prediction window are used to generate each hindcast. The typical error in each method as a function of prediction horizon was then calculated as the root mean square of the error of the 23 hindcasts at each prediction horizon.
Although not used in the scenario construction, a simple persistence forecast and the corresponding error was also calculated, to serve as a benchmark to compare the AF, AR, and HM methods against. The typical error as a function of prediction horizon for the AF, AR, HM, and persistence (PS) methods is shown in Fig. 22c. The AF, AR, and HM methods have similar error levels and each quickly outperform the simple persistence model. For most of the prediction window, the AR method shows the lowest error, although the error in the HM decreases near a prediction horizon of 220 years, arguably due to the strength of the de Vries cycle, a 208-year periodicity observed in the power spectrum of the record, and an important component of the HM model.

Scenario construction
The REF scenario was calculated as the weighted average of the AF, AR, and HM results for the current grand solar maximum, where the errors shown in Fig. 22c were used as the weightings. Here again, the maximum likelihood estimate of the average scenario is obtained simply by making a weighted average of the AF, AR, and HM results. The REF scenario can thus be considered as a reasonable description of what future solar activity could be, without claiming to be an actual prediction of solar activity. We used a different approach to calculate the EXT scenario: the AF, AR, and HM methods were used to generate hindcasts of the 23 prior grand solar maxima, also for a 308-year prediction window. The extreme scenario was then calculated as the 5th percentile of the 3 × 23 hindcasts at each prediction horizon. The REF and EXT scenarios are shown in Fig. 22b. Let us stress again that EXT scenario is meant to be used primarily for sensitivity studies, in contrast to the REF scenario, which is the reference one. Figure 22b shows that both scenarios start with a phase of low solar activity, which extends from approximately 2050 to 2110. In the reference scenario, the deepest level is comparable to the Gleisberg minimum that occurred in the late 19th century, whereas in the extreme scenario, it is considerably deeper, and reaches a Maunder-type minimum. The extreme scenario lingers in that state, whereas the reference one recovers to a climatological mean that is comparable to levels observed during the 1st half of the 20th century. Let us stress that none of the constructed scenarios exhibits a grand solar maximum similar to the one that just ended.

Future solar-cycle definition and scaling procedure
Future cycles are constructed from historical cycles by projecting them into the future. The average solar activity level of the projected historical cycles was thereby scaled in accordance with the predicted activity level of the scenarios. Solar activity variations on timescales shorter than a solar cycle are hence preserved. This strategy ensures consistency between the different types of radiative and particle forcing on all timescales also in the future. The historical cycles used for projection into the future are listed in Table G1 of Appendix G. We assume a linear dependence of the 22-year average sunspot number <SSN> 22 on for the scaling of future solar cycles: The coefficients of Eq. (4) have been obtained from a regression fit, based on SSN and in the time period 1768-2010 (see Fig. 23). We use international sunspot number version 1.0, because most SSI models rely on that version (see Sect. 2.1.1). The resulting SSN time series of both future scenarios have then been used to calculate the SSI with the SATIRE and NRLSSI2 models with annual time resolution. As for the historical CMIP6 dataset, we took for each scenario the arithmetic mean of the two model results. SSI variations on shorter timescales are taken from the corresponding past solar cycles, and are scaled to a comparable cycle-average level of activity by means of a dedicated scaling procedure, as described in Appendix H. F10.7 radio flux data have been constructed from the resulting future SSI record as described in Sect. 2.1.2.
A similar approach has also been chosen for the future particle forcing. Magnetospheric particle forcing (Sect. 2.2.1) relies on the geomagnetic indices Ap and Kp, being closely related to sunspot number on decadal timescales (e.g., Cliver et al., 1998). The scaling of these indices in past solar cycles into the future on the basis of <SSN> is described in Appendix I. The 2015-2300 Ap time series we obtained have then been used to calculate MEE ionization rates for the REF and EXT scenarios. Similarly, odd nitrogen upper-boundary conditions for the consideration of the EPP indirect effect in climate models with their upper lid in the mesosphere can be computed on the basis of the future Ap index with the recommended UBC model . Future GCR-induced ionization (see Sect. 2.2.3) is calculated from the of the respective scenarios and interpolated to interannual timescales by using the future SSN time series. The proton forcing of past solar cycles (Sect. 2.2.2) has also been projected into the future; however, no scaling of the proton ionization in dependence of the future cycles' activity level has been made. This is primarily motivated by the lack of knowledge on long-term variations of proton fluxes, related to the short availability of observational records (since 1962).

Solar forcing in future scenarios
As mentioned before, we provide two scenarios of future solar activity: the reference one is based on the most likely evolution of solar activity from 2015 to 2300, while the extreme one corresponds to the lower 5th percentile of all forecasts. We first forecast the modulation potential with the three approaches described in Sect. 3.1.1 to 3.1.3, and then define the reference scenario as their average, weighted by their inverse forecast error (Eq. 2). Note that the analogue forecast, and to a lesser degree, the AR forecast tend to converge toward a climatological mean, whereas the harmonic forecast keeps on oscillating. Because of that, our forecasts are likely to exhibit somewhat less variability than the observed . Figures 24 and 25 present an overview of the entire daily CMIP6 solar forcing file from 1850 through 2300, respectively for the reference and extreme scenarios. Both show the TSI, the F10.7 solar radio flux, which is a good proxy for Lyman-α line, and three different SSI wavelength ranges in the UV, VIS, and NIR. Also shown are the Ap index as a proxy for auroral electron precipitation, and the ionization rates due to solar protons and galactic cosmic rays. In Fig. 25, MEE instead of proton ionization rates are shown, as the latter are identical in both scenarios.
As explained in Appendix G, our scenarios are built out of past solar cycles; therefore, both the solar cycles and their daily variations are consistent with the average level of heliospheric potential. In this sense, the future scenarios for CMIP6 are much more realistic than the stationary Sun scenario that went into CMIP5.

Preindustrial control forcing
For the PI control experiment, we recommend using one constant (solar-cycle-averaged) value for the TSI and SSI spectrum representative for 1850 conditions (Fig. 26). The average in TSI, SSI, Ap, Kp, and F10.7, as well as the ion-pair production rate by GCRs covers the time period from 1 January 1850 to 28 January 1873, which is two full solar cycles. For the ion-pair production rates by SPEs and MEEs, median values representative for the background are provided in order to avoid the occurrence of large sporadic events in the PI control experiment. As usual the PI control run is supposed to provide an estimate of the unforced climate system to understand internal model variability. It is also used for detection and attribution studies to disentangle contributions from different natural and anthropogenic forcings (some of which include a long-term trend, such as GHGs, aerosols, or solar forcing).
For those groups that are interested, we also provide a 1000-year solar forcing time series with 11-year solar-cycle variability included but without long-term trends (Fig. 26). This time series still has slightly different solar-cycle amplitudes and also preserves the variable phase of the solar cycle; however, the solar-cycle mean activity level is held constant compared to the reference scenario in Fig. 24. By running a second PI control experiment with solar-cycle variability, this provides one additional periodic forcing on top of the seasonal cycle. Since the PI control is also used to determine model variability at decadal timescales, including a solar cycle would certainly change the mean climate and the variance of the control experiment compared to the "standard" control experiment with constant 1850 solar forcing. However, not including the solar-cycle variability may underestimate the variance of the climate system and may lead to climate sys-tem biases. Ideally the groups would do two PI control experiments: one with and one without solar-cycle variability.
Note that the variable PI control dataset is meant solely for sensitivity experiments in order to understand physical mechanisms for internal natural climate variability such as a potential synchronization of North Atlantic climate variability by the 11-year solar cycle (Thieblemont et al., 2015) in the atmosphere-ocean system. It purposely avoids any long-term trend in solar activity, and should therefore not be used for historical model simulations and/or solar forcing reconstructions. More realistic solar forcing time series for the past 1000 years are provided within the PMIP (Paleoclimate Modeling Intercomparison Project) exercise (Kageyama et al., 2017), and the solar forcing is described in more detail in Jungclaus et al. (2016).
The radiative part of the variable PI control forcing has been generated by scaling the annual and subannual components of the REF forcing dataset to a constant solar-cycle mean activity level. The scaling procedure for SSI and F10.7 is described in Appendix H. Constant background compo- Figure 27. A preliminary estimate of the fractional (%) monthly solar-ozone response per 130 units of the F10.7 solar flux in the CMIP6 ozone database for the period 1960-2011 diagnosed using the ozone mixing ratio files downloaded from http://esgf-node.llnl.gov/projects/input4mips (vmro3_input4MIPs_ozone_CMIP_UReading-CCMI-1-0_gr_195001-199912.nc and vmro3_input4MIPs_ozone_CMIP_UReading-CCMI-1-0_gr_200001-201412.nc). The hatching denotes regions where the solar-ozone response, diagnosed using multiple regression analysis, is found to be not significantly different from zero at the 95 % confidence level. Please note that the CMIP6 ozone database for the future simulations was not released at the time of writing. Adapted from Maycock et al. (2017). nents have been added. These have been adjusted such that the mean values of the resulting SSI and F10.7 time series are consistent with the constant PI control forcing. In order to enhance the contrast for this sensitivity experiment, we scale the annual and subannual components of SSI at wavelengths greater than 115 nm to the mean activity level of solar cycles 18-22 (grand solar maximum) rather than to 1850-1873 conditions. For the EUV channels (10-115 nm) and for F10.7, such an enhanced mean activity level would result in unreasonably low background values (due to the adjustment to the constant PI control forcing). Therefore, the annual and subannual components of these quantities were scaled to 1850-1873 conditions.
Similarly, the particle part of the variable PI control forcing has been generated by scaling the annual and subannual components of the REF forcing dataset to the 1850-1873 mean activity level. The scaling of geomagnetic Ap and Kp indices is described in Appendix I. MEE-induced ion-pair production rates for the variable PI control forcing have then been calculated from the scaled Ap data. GCRinduced ion-pair production rates have been calculated using a constant value of representative of the 1850-1873 period. Solar-cycle variations have been added, however, scaled to the 1850-1873 mean activity level. The variable PI control proton forcing is identical to the REF forcing since it does not include any long-term trend. Note that the temporal averages of SSI, TSI, F10.7, Ap, and Kp, as well as GCR-induced ionpair production rates are fully consistent with the values provided in the constant PI control forcing dataset. This, however, is not the case for the proton and MEE forcings, which, in the latter case, do not account for large, sporadic events.
The variable PI control dataset (see Fig. 26) covers the time period from 1 January 1850 until 9 September 2053 (end of solar cycle 27). The dataset can be extended to cover 1000 years by multiple repetition of the solar cycle sequence 12-27. The first 450 years of the resulting forcing time se-ries are consistent in solar-cycle phase and short-term fluctuations with the REF and EXT datasets. Solar forcing only experiments based on variable PI control, REF, and EXT forcing data would therefore be ideally suited to address the impact of long-term solar activity variations on the climate system.

Solar-cycle signal in stratospheric ozone
The climate response to solar variability depends not only on the "direct" impact of changes in TSI and SSI on atmospheric and surface heating rates, but also on the "indirect" effects on stratospheric and mesospheric ozone abundances (e.g., Haigh, 1994). In some regions, the associated solar-ozone response can contribute to more than 50 % of the change in stratospheric heating rates between solar-cycle minimum and maximum (Shibata and Kodera, 2005;Gray et al., 2009). It is therefore important to include the solar-ozone response in global model simulations in order to capture the total impact of solar variability on climate.
In reality, the "direct" and "indirect" parts of the atmospheric heating are highly coupled, since they reflect the same fundamental process (i.e., absorption of solar photons by molecules). In (chemistry-)climate models, the effects of these processes on atmospheric heating rates and temperatures are included in models as a result of variations in the ozone field and the specified values of TSI and SSI (see Table 3). The ozone field in a model can be produced by an interactive photochemical scheme, as presented above for CESM1(WACCM) and EMAC, or it can be externally prescribed in models that do not have a chemistry scheme. Models with a chemistry scheme must adequately represent SSI variability in their photolysis schemes (e.g., in the UV part of the spectrum) to simulate a realistic solar-ozone response. As described above, variations in EPP also affect stratospheric and mesospheric ozone abundances. These effects will be implicitly captured in CCMs with the capability of prescribing EPP and/or their effects on chemical processes (e.g., NO x ).
Several CMIP5 models included a stratospheric chemical scheme (Hood et al., 2015), and it seems likely that more models will have this capability in CMIP6. However, there will be CMIP6 models that do not include chemistry but which resolve the stratosphere and specify SSI, and thus have some of the major ingredients for simulating a topdown pathway for solar-climate coupling (Mitchell et al., 2015b). For these models, the simulated climate response to solar variability will partly depend on the representation of the solar-ozone response in their prescribed ozone field.
CMIP5 models without chemistry were recommended to use the SPARC/AC&C ozone database (Cionni et al., 2011). The historical part of this dataset for the stratosphere provided monthly and zonal mean ozone concentrations based on a multiple regression analysis of measurements from the Stratospheric Aerosol and Gas Experiment (SAGE) satellite instruments. The regression coefficients for various key drivers (e.g., ODS, GHG, solar forcing) were used to reconstruct stratospheric ozone values back to 1850 as a function of latitude and pressure. The historical part of the CMIP5 ozone dataset therefore implicitly included a solar-ozone response derived from satellite observations. However, uncertainties in the amplitude of the solar-ozone response in SAGE II measurements, which cover only around two solar cycles, have been recently documented by Maycock et al. (2016). It is therefore important to document the representation of the solar-ozone response in the WCRP/SPARC Chemistry Climate Model Initiative (CCMI) ozone database for CMIP6, which has been constructed using existing CCM simulations. At the time of writing, the paper describing the CMIP6 ozone database and its construction has not been published; however, for illustrative purposes, Fig. 27 shows the monthly-mean fractional solar-ozone response per 130 units of the F10.7 solar flux that has been diagnosed using a multiple linear regression analysis (see Maycock et al., 2016) for the period 1960-2011 from the CMIP6 historical ozone files (files: vmro3_input4MIPs_ozone_CMIP_UReading-CCMI-1-0_gr_195001-199912.nc and vmro3_input4MIPs_ozone _CMIP_UReading-CCMI-1-0_gr_200001-201412.nc) downloaded from input4MIPs (https://esgf-node.llnl.gov/ projects/input4mips/). At the time of writing, the ozone files for the CMIP6 future simulations have not been released on the input4MIPs server, and hence we cannot provide information about how the solar-ozone response is represented in the ozone field for the future period. Figure 27 shows a solar-ozone response of up to ∼ 2 % in the tropical mid-stratosphere, which peaks at ∼ 5 hPa. Since the peak amplitude of the solar-ozone response in Fig. 27 is considerably smaller, and exhibits a different vertical structure compared to the CMIP5 ozone database (Maycock et al., 2017), we anticipate that the peak magnitude of the stratospheric temperature response over the solar cycle may also be smaller in models using the CMIP6 ozone database (Maycock et al., 2017).
We recommend that CMIP6 models without interactive chemistry use the SSI forcing dataset described above and the recommended CMIP6 ozone database to ensure consistency in the representation of the solar-cycle forcing across models. If CMIP6 models opt to use an alternative ozone dataset containing a different representation of the solar-ozone response, it would be very valuable for this to be documented by modeling groups, so that differences in simulated responses to solar forcing might be better understood (Mitchell et al., 2015b).

Conclusions
This paper provides a comprehensive description of the recommended solar forcing dataset for CMIP6. The dataset con-sists of time series covering 1850-2300 of solar radiative (TSI, SSI, F10.7) and particle (Ap, Kp, ionization rates due to SPEs, MEEs, and GCRs) forcings. This is the first time that solar-driven particle forcing has been included as part of the CMIP recommendation and represents a new capability for CMIP6. TSI and SSI time series for the historical period are defined as averages of two solar irradiance models: the empirical NRLTSI2-NRLSSI2 model and the semi-empirical SATIRE model, which have been adapted to CMIP6 needs as described above. Since this represents a change from the CMIP5 recommended NRLTSI1 and NRLSSI1 dataset, this paper places special emphasis on the comparison between the radiative properties of the CMIP5 and CMIP6 solar forcing recommendations. The solar forcing components are provided separately at daily and monthly resolutions for the historical simulations, i.e., 1850-2014; for the future period, i.e., 2015-2300, including an additional extreme Maunder Minimum-like sensitivity scenario; and as constant and timedependent variants for the preindustrial control simulation. The particle forcing is only included in the daily resolution files. The dataset as well as a metadata description and a number of tools to convert and implement the solar forcing data can be found here: http://solarisheppa.geomar.de/cmip6. In the following we summarize the key features of the CMIP6 solar forcing dataset in comparison to CMIP5 that provide the reader with an overview without reading the paper in detail.
6.1 Radiative forcing -A new and lower TSI value is recommended: the contemporary solar-cycle average is now 1361.0 ± 0.5 W m −2 (Prša et al., 2016).
-Over the last three solar cycles in the satellite era there is a slight negative TSI trend in the CMIP6 dataset. A recent reconstruction of the TSI, with a proper estimation of its uncertainties, suggests that this downward trend between the solar minima of 1986 and 2009 is not statistically significant . The TSI trend leads to an estimated radiative forcing on a global scale of −0.04 W m −2 , which is small in comparison with other forcings over this period.
-The new CMIP6 SSI dataset is the arithmetic mean of the empirical NRLSSI2/TSI2 and the semi-empirical SATIRE irradiance models and covers wavelengths from 10 to 10 000 nm. Note that the SATIRE and the NRLSSI2/TSI2 datasets are CMIP6-adapted and not the original datasets. While SATIRE uses the annually averaged international sunspot number (v1) and applies daily variability from the SATIRE-T model, NRLSSI2/TSI2 uses the daily group sunspot number. This leads to different long-term trends in the presatellite era (see Sect. 2.1.1).
-The CMIP6 SSI dataset agrees very well with available satellite measurements (i.e., those which are used for building the SOLID observational composite) in the contribution of solar-cycle variability to TSI in the 120-200 nm wavelength range. In the 200-400 nm range, which is also important for ozone photochemistry, the CMIP6 dataset shows a larger contribution to solarcycle TSI variability than in CMIP5 (50 % compared to 35 %). However, there is a lack of accurate satellite measurements to validate variations in this spectral region. In the VIS part of the spectrum, the CMIP6 dataset shows smaller solar-cycle variability than in CMIP5 (25 % compared to 40 %). In the NIR, the CMIP6 dataset shows slightly larger variability than in CMIP5.
The implications of the differences in the spectral characteristics of SSI between CMIP5 and CMIP6 for climatological and solar-cycle variability in atmospheric heating rates and ozone photochemistry have been tested using two stateof-the art CCMs, EMAC and CESM1(WACCM), and a lineby-line radiative transfer model, libradtran.
-When comparing differences in annual mean climatologies under perpetual solar-minimum conditions, the CMIP6-SSI irradiances lead to lower SW heating rates (−0.35 K day −1 at the stratopause), cooler stratospheric temperatures (−1.5 K in the upper stratosphere), lower ozone abundances in the lower stratosphere (−3 %), and higher ozone abundances (+1.5 %) in the upper stratosphere and lower mesosphere, compared to the CMIP5-SSI irradiances. These radiative effects lead to a weakening of the meridional temperature gradient between the tropics and high latitudes and hence to a statistically significant weakening of the stratospheric polar night jet in early winter.
-The differences in irradiances between 11-year solarcycle maximum and minimum in the CMIP6-SSI dataset result in increases in SW heating rates (+0.2 K day −1 at the stratopause), temperatures (∼ 1 K at the stratopause), and ozone (+2.5 % in the upper stratosphere) in the tropical upper stratosphere. These direct radiative effects lead to a strengthening of the meridional temperature gradient between the tropics and high latitudes and a statistically significant strengthening of the stratospheric polar night jet in early winter, which propagates poleward and downward during mid-winter and affects tropospheric weather, with a positive Arctic Oscillation signal in late winter. This regional surface climate response is similar and statistically significant in both CCMs. The CMIP6-SSI irradiances lead to slightly enhanced solar-cycle signals in SW heating rates, temperatures, and ozone, compared to the CMIP5-SSI. However, the differences in the 11year solar-cycle signals between the two SSI datasets are generally not statistically significant and are smaller than the differences in climatological conditions between CMIP6 and CMIP5 described above.
6.2 Particle forcing -The reconstruction of geomagnetic Ap and Kp indices backwards in time (starting in 1850) enabled a consistent historical dataset of geomagnetic particle forcing to be created in order to capture the atmospheric impact of precipitating auroral and radiation belt electrons. Regarding the latter, we employed a novel precipitation model for mid-energy electrons, based on the Ap index. Computed MEE ionization rates have been successfully tested in the (CESM1)WACCM model. To capture the effects of polar winter descent of EPPgenerated NO x in climate models that have an upper lid in the mesosphere (i.e., below the EPP source region), recommendations for the implementation of an odd nitrogen upper-boundary condition are provided.
The UBC has been successfully tested in the EMAC model by comparison to observations. Inclusion of the CMIP6-recommended magnetospheric particle forcing in climate model simulations significantly improves the agreement with observed NO x , HO x , and ozone distributions in the polar stratosphere and mesosphere.
-Solar proton and galactic cosmic ray forcings have been built from well-established datasets that have been used in many atmospheric model studies. However, observed proton fluxes are only available since 1963. Therefore, prior to this date the proton forcing included in our dataset is fictitious, although it broadly captures the expected variation in overall strength and distribution throughout the 11-year solar cycle.
-In most cases, particle forcing can be expressed in terms of ion pair production rates. We have provided detailed recommendations for their implementation into atmospheric chemistry schemes.
-CMIP6 model simulations utilizing the recommended particle forcing for the historical period (1850-2014) will enable an assessment of the potential long-term effects of solar particles on the atmosphere and climate as planned in upcoming coordinated WCRP/SPARC SOLARIS-HEPPA studies.

Future forcing scenarios
In CMIP5, future solar irradiances assumed no long-term changes in the Sun and were obtained by simply repeating solar cycle 23 into the future. In CMIP6, we include a more realistic evolution for future solar forcing based on the weighted average of three statistical models constrained by past long-term solar proxy data; this shows a moderate decrease to a Gleissberg-type level of solar activity until 2100 for the REF scenario. We ignore scenarios with high levels of solar activity because the Sun just left such an episode (called a grand solar maximum), and several studies suggest that it is very unlikely to return to it in the next 300 years. In addition, we provide an EXT scenario for the future that can be used for sensitivity studies, which includes an evolution to an exceptionally low level of solar activity during the 21st century similar to that estimated for the Maunder Minimum.

PI control forcing
For the PI control experiment, we recommend using one constant (solar-cycle-averaged) value for the TSI and SSI spectrum representative of 1850 conditions. The average values in TSI, SSI, Ap, Kp, F10.7, as well as the ion-pair production rate by GCRs are derived from the time period 1 January 1850 to 28 January 1873, which is two full solar cycles. For the ion-pair production rates by SPEs and MEEs, median values representative of background conditions are provided in order to avoid the occurrence of large sporadic events in the PI control experiment. We also provide a second PI control forcing time series that includes variations in solar forcing on timescales of the 11-year solar cycle and shorter, but without any long-term trend. This time series contains some variation in 11-year solar-cycle amplitude, and also preserves the variable phase of the solar cycle; however, the mean level of solar activity is held constant. The PI control experiment with solarcycle variability included may better reproduce decadal scale climate variability. Ideally CMIP6 modeling groups will run two PI control experiments: one with and one without solarcycle variability.

Solar ozone forcing
CMIP6 models with interactive chemistry are recommended to include a consistent prescription of the CMIP6-SSI variations in their radiation and photolysis schemes, so that they include an internally consistent representation of the solarozone response. Climate models that do not calculate ozone interactively are recommended to use the SPARC/CCMI ozone database for CMIP6, which has been constructed from existing CCM simulations (https://esgf-node.llnl.gov/ projects/input4mips/). This differs from the representation of the solar-ozone response in the CMIP5 ozone database, which was based on satellite ozone measurements (Cionni et al., 2011). Multiple linear regression analysis of the CMIP6 ozone database over the period 1960-2011 reveals that an 11-year solar-cycle ozone response is implicitly included in the dataset and resembles previous results from CCM studies. An analysis of the representation of particleinduced ozone anomalies in the CMIP6 ozone database has not been performed. CMIP6 models that include both CMIP6-SSI and solar induced-ozone variations are expected to show a better representation of solar climate variability compared to models that exclude the solar-ozone response.
Data availability. The CMIP6 solar forcing dataset described in this paper and the metadata description have been published at http://solarisheppa.geomar.de/cmip6 and linked to the Earth System Grid Federation (ESGF), https://esgf-node.llnl.gov/projects/ input4mips/, with version control and digital objective identifiers (DOIs) assigned. An overview of the CMIP6 Special Issue can be found in Eyring et al. (2016).

K. Matthes et al.: CMIP6 solar forcing Appendix A: Reconstruction of the SSI in the EUV
The EUV spectrum is a complex mix of spectral lines and continua that are associated with different elements, and each have their specific variability in time. In spite of this, there is strong observational evidence for the spectral variability in the EUV having remarkably few degrees of freedom (Amblard et al., 2008). We make use of this property to reconstruct the EUV at wavelengths shorter than 105 nm (in a range that is not covered by NRLSSI2 and SATIRE) as a function of the SSI provided by these same models between 115.5 and 123.5 nm (i.e., including the intense Lymanα line).
Let (λ, t) denote the logarithm of the SSI in the EUV, and¯ (t) its value averaged between 115.5 and 123.5 nm. The logarithm is used mainly to guarantee that the SSI, whose amplitude spans several orders of magnitude, never goes negative. The ratio between solar-cycle amplitude and short-term variability is strongly wavelength-dependent. For that reason, the SSI is frequently decomposed into shortand long-timescale terms, with a cutoff around 81 days (e.g., Woods et al., 2005). Let the 81-day average be 81 . Our empirical model then reduces to The model coefficients A(λ), B(λ), and C(λ) are estimated from 9 years of EUV observations by the TIMED/SEE instrument , spanning the period from February 2002 to February 2011. This simple model matches the SOLID observational composite well within its uncertainty range. Note, however, that TIMED/SEE data below 28 nm, and between 115 and 129 nm are partly modeled, and thus the variability of the EUV spectrum prior to the satellite era should be considered with great care.

Appendix B: Recommendations for model implementation of SSI
The SSI dataset recommended for CMIP6 covers the solar spectrum from 10 to 100 000 nm. It is provided as irradiance averages for 3890 spectral bins (in W m −2 nm −1 ). Sampling and equivalently bin width range from 1 nm (UV and VIS) to 50 nm (NIR). Table B1 contains more details regarding the resolution changing with wavelength.
Most climate models prescribe either TSI or SSI in their respective radiation schemes. If the model's radiation code is able to handle spectrally resolved solar irradiance changes, SSI needs be integrated over the specific wavelength bands to generate top-of-the-atmosphere fluxes. When using CMIP6-SSI, this is done for a given spectral band by simply summing up the irradiances of all (partially) contained bins, each multiplied by the bin width (subtracting potential bin Table B1. Sampling and bin width of CMIP6-SSI. Spectral range Sampling/bin width 10-750 nm 1 nm 750-5000 nm 5 nm 5000-10 000 nm 10 nm 10 000-100 000 nm 50 nm parts that reach beyond the boundaries of the target band). A sample routine for the integration can be found here: http: //solarisheppa.geomar.de/cmip6. Climate models that calculate ozone interactively also have to integrate SSI to the respective wavelength bands in their photolysis code. An example of the numbers of bands in the radiation and photolysis schemes of two state-of-the-art CCMs, WACCM and EMAC, is shown in Sect. 2.1.3.

Appendix C: Recommendations for geographic projection of IPR data
In order to characterize the electron precipitation into the atmosphere since 1850, we must take into account the offset between geographic and magnetic field coordinates, and how the relationship between them changes with time. We recommend the following approach. For years 1850-1900, the gufm1 model may be used (Jackson et al., 2000). Note that this model would allow calculations earlier in time (1590). From 1900 to 2015 magnetic field conversions should use the current IGRF, which at the time of writing is IGRF-12 (Thébault, 2015). It is highly likely that the magnetic field will continue to evolve in the future, and as such we do not recommend fixing the field in any set configuration based on a specific year. Physics-based simulations are now providing representations of future geomagnetic field changes. For the years 2015-2115, Gauss coefficients based on the predicted evolution of the geodynamo can be used from the modeling of Aubert (2015). For years after 2115, secular variation values from 2115 (also from the Aubert, 2015 model) are used to extrapolate forward in time, though obviously with increasing uncertainty. MATLAB modeling code implementing the above recommendations is available on the SOLARIS-HEPPA CMIP6 website, which allows users to calculate the geomagnetic latitude for any given date, geographic location and altitude for the period 1590 onwards.
Appendix D: NO x production by particle-induced ionization Following Porter et al. (1976) it is assumed that 1.25 N atoms are produced per ion pair. This study also further divided the proton impact of N atom production between the ground state N( 4 S) (45 % or 0.55 per ion pair) and the excited state N( 2 D) (55 % or 0.7 per ion pair). Ground state nitrogen atoms, N( 4 S), can create other NO y constituents, such as NO, through (e.g., Rusch et al., 1981;Rees, 1989) and do not cause significant destruction of NO y . If a model does not include the excited state of atomic nitrogen in their computations, the NO y production from EPP can still be included by assuming that its production is instantaneously converted into NO, resulting in a N( 4 S) production of 0.55 per ion pair and a NO production of 0.7 per ion pair.
Appendix E: HO x production by particle-induced ionization The production of HO x relies on complicated ion chemistry that takes place after the initial formation of ion pairs (Swider and Keneshea, 1973;Frederick, 1976;Solomon et al., 1981;Sinnhuber et al., 2012). Solomon et al. (1981) computed HO x production rates as a function of altitude and ion pair production rate. Each ion pair typically results in the production of around two HO x constituents in the stratosphere and lower mesosphere. Sinnhuber et al. (2012) have shown that HO x is formed as H and OH in nearly equal amounts, with small differences of less than 10 % due to different ion reaction chains. In the middle and upper mesosphere, one ion pair is computed to produce less than two HO x constituents per ion pair because water vapor decreases sharply with altitude there, and is no longer available as a source of HO x . For models which do not include D-region ion chemistry, we recommend using the parameterization of Solomon et al. (1981), which is summarized following Jackman et al. (2005b) in Table E1. If the partitioning between HO x species is considered in the model, H and OH should be formed in equal amounts. Below 40 km altitude and for ionization rates less than 10 2 cm −3 s −1 below 70 km altitude, two HO x can be formed per ion pair. Above 90 km altitude, HO x production can be set to zero; between 70 and 90 km, values need to be extrapolated for ion pair production rates smaller than 10 2 cm −3 s −1 and larger than 10 4 cm −3 s −1 , taking care not to exceed zero and two. Table E1. HO x production per ion pair as a function of altitude and ion pair production rate (IPR). Appendix F: Minor constituent changes due to particle-induced ionization If available, the use of more comprehensive parameterizations for productions of individual HO x (OH and H) and NO y (N( 4 S), N( 2 D), NO, NO 2 , NO 3 , N 2 O 5 , HNO 2 , and HNO 3 ) compounds (e.g., Verronen and Lehmann, 2013;Nieder et al., 2014) is encouraged. Similarly, if atmospheric models include detailed cluster ion chemistry of the lower ionosphere (D region), then the ionization rates should be used to drive the production rates of the primary ions (N + 2 , N + , O + 2 , O + ) and neutrals (N, O) produced in particle impact ionization or dissociation (Sinnhuber et al., 2012). Since such a comprehensive treatment of EPP effects on minor species may introduce more sensitive composition changes via chemical feedbacks, it would be important to document the adopted approaches.
Appendix G: Projection of historical solar cycles in future scenarios Appendix H: Scaling of SSI in future scenarios and variable PI control forcing SSI variability is closely linked to solar magnetic activity variations, and hence sunspot number. However, the form of this relationship may differ significantly at different wavelengths and timescales (i.e., decadal, annual, and subannual). Indeed, the contribution to the SSI from different solar features such as faculae, sunspots, the network, and ephemeral regions, show different temporal responses (e.g., Vieira and Solanki, 2010). As a consequence, a simple, wavelengthindependent scaling of historic SSI sequences for projection into the future or for the generation of the variable PI control forcing would lead to unrealistic results Instead, we first decompose the SSI time series at individual wavelength bins λ into components corresponding to the background variability SSI bg (λ) (i.e., long-term variations of the SSI at solar minima), to facular brightening-related variability SSI + (λ), and to sunspot-darkening-related variability SSI − (λ). The latter two components are further decomposed into annual (A + and A − ) and subannual (D + and D − ) contributions (see Fig. H1).
For the projection of past solar cycles into the future only the D + and D − components need to be scaled since the annually resolved SSI is already provided by the SSI models. These components are shown in Fig. H2 for selected wavelength bins. The distributions of D + values within a given solar cycle are rather symmetric around zero and show a close-to-normal distribution. Variability differences between solar cycles are therefore well represented by the corresponding standard deviations SD(D + ) SC of individual cycles. This quantity also shows good correlation with <SSN> and we therefore use it to construct time-resolved scaling functions. The distributions of D − values are largely skewed towards negative values and do not exhibit the characteristics of a normal distribution. This behavior is expected because of the more intermittent response of SSI to sunspot darkening, compared to facular brightening. Variability differences between solar cycles are therefore best represented by the corresponding median absolute deviations MAD(D − ) SC , which are therefore used to construct the time-resolved scaling functions for the D − components.
The coefficients for the scaling of SSI variability with <SSN> have been obtained from linear regression fits of SD(D + ) SC and MAD(D − ) SC to <SSN> for each individual wavelength bin (see Fig. H3). In all fits, a nonzero offset is obtained, with particularly large values in the case of MAD(D − ) SC . Subannual SSI variations are expected to be very low in the absence of sunspots, with variations mainly coming from the solar network (Bolduc et al., 2014). For that reason, we apply a nonlinear correction in the scaling for low <SSN> values in order to obtain realistic results for solar cycles with very low activity in the EXT scenario. This has been achieved by multiplying a <SSN>-dependent ex-  Figure H1. Decomposition of SSI in the scaling procedure (shown for wavelength bins centered at 150.5, 300.5, 550.5, and 852.5 nm, from left to right). Upper panels: daily (grey) and annually (black) resolved SSI. Lower panels: Individual components after decomposition: background SSI bg (black solid); facular-brightening-related SSI + annual, A + (dark-blue), and subannual, D + (light-blue); sunspot-darkeningrelated SSI − annual, A − (dark-red), and subannual, D − (light-red).
For the construction of the variable PI control forcing, annual, and subannual SSI components are scaled individually to a constant solar-cycle mean activity level at each wavelength bin. The scaling has been performed for the D + and D − components based on SD(D + ) and MAD(D − ), respectively, as in the future scenario construction. For A + and A − , we use the corresponding solar-cycle averages to construct time-resolved scaling functions. The background contribu-tions SSI bg were set to constant values. The same procedure was also applied to the F10.7 radio flux.
Appendix I: Scaling of geomagnetic indices in future scenarios and variable PI control forcing The geomagnetic activity level is strongly linked to solar activity by the solar wind-magnetosphere interaction. The relationship of geomagnetic activity (and hence geomagnetic indices) and SSN depends strongly on the considered timescales. Therefore, we decompose the Ap index in components corresponding to different timescales in order  to scale them individually for projection of historic Ap sequences into the future or for the generation of the variable PI control forcing (see Fig. I1). The magnitude of subannual Ap variations is large and ruled by the mid-term geomagnetic activity level. Since there is strong evidence for Ap to be described by a multiplicative process (Watkins et al., 2005), we decompose Ap as follows: where Ap bg is the background component obtained from the Ap solar-cycle averages, Ap a the annually averaged component, and D a multiplicative daily component, the latter characterized by a nearly constant magnitude of variability on decadal to secular timescales. The Ap a variability shows only a weak dependence on the long-term geomagnetic activity level. We use its standard deviation from individual solar cycles for construction of a time-dependent scaling function SD(Ap a ). For the projection of past solar cycles into the future, the components Ap bg and Ap a need to be scaled in relation to <SSN>. Linear regression fits of Ap bg and SD(Ap a ) to <SSN> are shown in Fig. I2. The correlation of Ap bg and <SSN> is very tight, with a correlation coefficient of 0.94. As expected, this is not the case for SD(Ap a ). The pronounced offsets of the regression fits at <SSN>=0 suggest residual geomagnetic activity for very low solar activity (i.e., Maunder minimum) conditions in agreement with previous studies (e.g., Cliver et al., 1998).  Figure I2. Regression of Ap bg (red symbols, left) and SD(Ap a ) SC (red symbols, right) to <SSI>. The resulting linear fit is shown by black lines, the grey shaded areas reflect the RMS errors.
For the construction of the variable PI control forcing, the Ap index is scaled to 1850-1873 average conditions. The scaling has been performed for the Ap a component on the basis of SD(Ap a ). The background contributions Ap bg was set to a constant value corresponding to the 1850-1873 average.
In both future scenario and variable PI control forcing constructions, Ap has been converted into Kp using a statistical correction to account for biases related to the conversion from hourly to daily indices as described in Sect. 2.2.1.