Exploring precipitation pattern scaling methodologies and robustness among CMIP5 models

Pattern scaling is a well-established method for approximating modeled spatial distributions of changes in temperature by assuming a time-invariant pattern that scales with changes in global mean temperature. We compare two methods of pattern scaling for annual mean precipitation (regression and epoch difference) and evaluate which method is “better” in particular circumstances by quantifying their robustness to interpolation/extrapolation in time, inter-model variations, and inter-scenario variations. Both the regression and epoch-difference methods (the two most commonly used methods of pattern scaling) have good absolute performance in reconstructing the climate model output, measured as an area-weighted root mean square error. We decompose the precipitation response in the RCP8.5 scenario into a CO2 portion and a non-CO2 portion. Extrapolating RCP8.5 patterns to reconstruct precipitation change in the RCP2.6 scenario results in large errors due to violations of pattern scaling assumptions when this CO2-/non-CO2-forcing decomposition is applied. The methodologies discussed in this paper can help provide precipitation fields to be utilized in other models (including integrated assessment models or impacts assessment models) for a wide variety of scenarios of future climate change.


Introduction
Quantifying uncertainties in projections of climate change is one of the cornerstone investigative areas in climate science.There are numerous sources of uncertainty, including parametric (which parameter values are the "right" ones), structural (which key processes are missing or poorly character-ized), and scenario (how climate-forcing agents will change in the future).One commonality among these sources is that uncertainties in each of them can be explored using climate models.
Atmosphere-ocean general circulation models (AOGCMs) are the gold standard of climate models used for projections of global change, as they incorporate many of the fundamentally climatically important processes, including atmosphere, land, ocean, and sea ice responses and feedbacks, as well as interactions between these different areas.However, their complexity means that these models are often computationally expensive, so any sensitivity studies or uncertainty quantification efforts using them are necessarily limited.No modern uncertainty quantification technique is capable of fully characterizing the space of AOGCM uncertainties and how they affect projections of climate change (Qian et al., 2016).
Emulators of AOGCMs are often an effective compromise for exploring uncertainty by sacrificing precision for vastly improved computational efficiency.This allows other models, such as integrated assessment models or impacts assessment models, to include an AOGCM-emulating climate component and incorporate feedbacks between the climate and other sectors.There are many methods of building emulators (see MacMartin and Kravitz, 2016, for a discussion of different linear, time-invariant approaches), but one of the most commonly used methods is pattern scaling, described in more detail in Sect. 2. This methodology involves computing a time-invariant pattern of change in a variable in response to change in global mean temperature, which vastly reduces the dimensionality of input needed to produce projections of climate change.
Pattern scaling has a fairly long history of research (e.g., Mitchell, 2003) and has been shown to be reasonably accurate for a variety of purposes.Lynch et al. (2017) provided a review of pattern scaling of temperature, as well as an in-depth exploration of two commonly used pattern scaling methods (regression and epoch-difference methods, described later in Sect.2.1).Both of these methods perform quite well in reproducing the actual model output for temperature.Conversely, comparatively little work has been done on pattern scaling for annual mean precipitation.Ruosteenoja et al. (2007) found that local precipitation changes are generally linear with global mean temperature change, with errors of 15-30 % over 90 years of simulation.Holden and Edwards (2010) identified the importance of covariance between local temperature change and local precipitation change, and Frieler et al. (2012) furthered this discovery, concluding that no single fit (e.g., regression coefficients) will be applicable to all grid points.Herger et al. (2015) used a novel method of piecing together results associated with the desired global mean temperature change and found excellent agreement with model output (errors rarely exceed 0.3 mm day −1 ).In a different style of emulation, Castruccio et al. (2014) trained a statistical model on pre-computed climate model simulations and found that it was capable of capturing nonlinearities in the response in ways that pattern scaling inherently cannot.Xu and Lin (2017) compared several different methods (akin to what we do in Sect.4) to assess pattern scaling on temperature, precipitation, and potential evapotranspiration in the CESM Large Ensemble project (Kay et al., 2015).To the best of our knowledge, no previous study has compared different methods of pattern scaling of precipitation, particularly with a focus on robust model response.
Here we provide a systematic (although non-exhaustive) assessment of the robustness of pattern scaling of precipitation.Section 3 focuses on pattern scaling the response to temperature changes solely due to carbon dioxide increases, looking at interpolation in time, extrapolation in time, and inter-model robustness.Section 4 explores inter-scenario robustness; i.e., whether the patterns obtained for CO 2 are useful for pattern scaling other scenarios.
Through these investigations, we hope to better reveal in what circumstances methods of pattern scaling of precipitation perform well.We will also provide some (limited) guidance as to which situations pattern scaling is likely to provide a computationally efficient, reasonably accurate result vs. which situations require actual simulation using AOGCMs.
2 Pattern scaling methods

Two methods of pattern scaling for precipitation
Pattern scaling involves approximating a time series of the pattern of change in a field of interest B(x, t) by B(x, t): where P (x) describes a time-invariant spatial pattern (the spatial dimension is denoted by x), and T (t) describes a time-varying (the time dimension is denoted by t) series of the change in global mean temperature, starting from a reference period t = 0 (often the preindustrial era).This notation will be used repeatedly throughout the paper.There are two commonly used methodologies for ascertaining P (x): regression and epoch differencing (Barnes and Barnes, 2015).
In the regression method, P (x) is obtained by regressing B(x, t) = B(x, t) − B(x, 0) against T (t) at each point in x.In the epoch method, where the intervals [0, n] and [k, n + k] indicate averaging over n-year time periods at the beginning and end of the simulation, respectively.All values calculated are over a multimodel mean; Ruosteenoja et al. (2007) showed that pattern scaling for precipitation over a model mean outperforms results obtained from using single models.Frieler et al. (2012) argued that no single set of regression coefficients will be applicable to all grid points.We circumvent this issue by (for example) regressing T against B at each grid point.

Methodology
In the following sections, we quantify differences between the reconstruction B and the actual model output B via the root mean square (rms) over the area-weighted difference B − B, calculated as where A(x) is the area of grid box x, and sums are calculated over all x.
All of the analysis conducted here uses simulations from AOGCMs contributed to CMIP5.The models used in the bulk of the analysis in this study (Table 1, group 1) are identical to those used by Lynch et al. (2017) with two exceptions (due to model output availability): 1.The present study used NorESM1-ME instead of NorESM1-M.NorESM1-ME includes prognostic biogeochemical cycling and has the capability of being emissions driven, but when using concentration-driven scenarios (as is the case here), the two versions of the model will produce nearly identical results (Bentsen et al., 2013).
2.  et al., 2012).Knutti et al. (2013) provide an excellent description of these models and their provenance.(Davini et al., 2014;Sanna et al., 2013).Cagnazzo et al. (2013) described some of the differences between these two models.In general, the models agree on qualitative climate features, although as might be expected, CMCC-CMS better matches observations in situations where a fully resolved stratosphere is important for capturing the effects, including dynamical feedbacks of stratospheric circulation and ozone chemistry on surface climate.Although these effects are non-negligible, they are generally of lower order than the changes that occur over the course of the scenarios analyzed in this study (to be discussed presently); therefore, we anticipate that differences between these two models will not substantially affect results for the model mean.
These models were chosen to be representative of the CMIP5 ensemble while retaining model independence; Lynch et al. (2017) described more of the details as to why those models were chosen.Throughout this study, we evaluate three scenarios.The 1pctCO2 scenario involves a 1 % per year increase in the CO 2 concentration, beginning at its preindustrial value.This simulation is run for 140 years to an approximate quadrupling of the CO 2 concentration.The RCP8.5 and RCP2.6 scenarios (Representative Concentration Pathways, or RCPs; Moss et al., 2010;Meinshausen et al., 2011) describe the results of two socioeconomic narratives that produce particular concentration profiles of greenhouse gases, aerosols, and other climatically relevant forcing agents over the 21st century.The RCP8.5 scenario reflects a "no policy" narrative, in which total anthropogenic forcing reaches approximately 8.5 W m −2 in the year 2100.Conversely, the RCP2.6 scenario involves aggressive decarbonization, causing radiative forcing to peak at approximately 3 W m −2 around 2050 and decline to approximately 2.6 W m −2 at the end of the 21st century.Table 2 provides additional forcing details for the two RCP scenarios, as calculated by Hector (Hartin et al., 2015), a climate, carbon-cycle model that is used as the climate component of the Global Change Assessment Model (GCAM), a state-of-the-art Integrated Assessment Model.Both RCPs are appended to simulations of the historical period, for total simulation lengths of 251 years (1850-2100).
Throughout the remainder of the paper, subscripts on P , T , B, and B are used to denote the scenario (e.g., RCP8.5), the model group (e.g., group 2), or the years over which the patterns are computed (e.g., 1-50).If there is no subscript specified, then the associated value corresponds to the group 1 (see Table 1) multi-model mean of the 1pctCO2 simulation, averaged over years 116-140 of the simulation (the last 25 years of the 1pctCO2 simulation, approximately at quadruple the preindustrial CO 2 concentration).
Statistical significance was calculated using Welch's t test, which is analogous to a Student's t test, but where the variances s 1 and s 2 of the two samples x 1 and x 2 , respectively, do not need to be equal.We use this statistic here because the ensemble for each method is small, and the ensemble pattern distribution is assumed to be normal.The test statistic is defined by where n 1 and n 2 are the number of models in each sample, respectively.Once the t statistic is calculated for each grid box, the value in any given grid box is determined to be statistically significant if the test value exceeds a threshold computed from the inverse of the Student's t cumulative probability distribution at the 97.5 % confidence level (which is the 95 % confidence level for a two-sample test).The number of degrees of freedom df used to generate that threshold is ap-Table 3. The rms error values calculated over the entire globe (Eq. 3) for each of the figures.All units are in mm day −1 for differences (Figs. 3,6,7,10,11,and 12) or mm day −1 K −1 for patterns (Fig. 5).
3 Comparisons between pattern scaling methods for CO 2 -only forcing

Pattern scaling for CO 2 concentration changes
Figure 1 shows the baseline (preindustrial) annual mean precipitation pattern B(x, 0) and the scaling patterns P (x) for both of the pattern scaling methods generated from the group 1 (see Table 1) model average for the 1pctCO2 simulation.
The regression and epoch-difference methods have very similar scaling patterns, no differences greater in magnitude than 0.05 mm day −1 K −1 , and no differences are statistically significant (not shown).Both patterns show similar broad features: an increase in tropical precipitation with global warming, particularly over the oceans; increases at high latitudes, again over the oceans; and decreases in the South Pacific, North Atlantic, and southern Indian oceans, as well as Central America and the Mediterranean basin.
Figure 2 shows a comparison between the actual model output (group 1 averaged over the mean of years 116-140 of the 1pctCO2 simulation) and the two methods of reconstruction.Both methods show qualitatively similar features.In general, they reproduce the actual model output well, with possible exceptions in the tropics.Tebaldi and Arblaster (2014) noted that pattern scaling methodologies have difficulty in representing convection processes; therefore, departures in these areas might be expected.
Figure 3 shows a more quantitative comparison between the different reconstruction methods and the actual model output.Overall error (rms; Eq. 3) in the regression and epoch-difference methods are very small (0.04 and 0.03 mm day −1 , respectively; see Table 3), and no region in the reconstruction is statistically different from the actual model output.

Interpolation/extrapolation
In this section, we examine robustness of the methods to interpolation or extrapolation in time.If the scaling pattern P (x) truly were time invariant, then the results presented in this section would be identical to those previously discussed.
Supplement Fig. S1 shows the patterns P (x) obtained by conditioning the reconstructions only on years 1-50 of the 1pctCO2 simulation.In the epoch-difference method, the second epoch is calculated over years 26-50 instead of years 116-140.In the regression method, the regression coefficients are calculated only using the first 50 years of simulation.The patterns calculated by using the regression and epoch-difference methods only show small changes between the two periods, virtually none of which is statistically significant.
Despite similarities, using patterns conditioned on the earlier period to reconstruct the precipitation in the later period (years 116-140) results in considerably poorer performance for both methods (Supplement Fig. S2) than the results shown in Fig. 3.The rms error increases by an order of magnitude (not shown), although few areas show statistically significant differences from the actual model output over this time period.This is likely due to the noise introduced by building P (x) on the early years of the simulation when the climate change signal is weak.
Figure S3 in the Supplement shows results for interpolation in time, where the patterns are conditioned on the full 1pctCO2 simulation (years 116-140), but the reconstruction predicts the average temperature in years 58-82 (halfway through the 1pctCO2 simulation).More specifically, B = P 116−140 (x) T (58 − 82).In general, the patterns for interpolation show similar qualitative features to those of reconstructing the later time period of years 116-140 (Fig. 3).However, error increases by a factor of 2 for both methods, which potentially indicates the presence of nonlinearity.As before, no difference is statistically significant.

Inter-model robustness
In this section, we explore the role of the number of models in improving robustness of the prediction, as well as intermodel robustness of pattern scaling by comparing reconstructions with actual model output where the scaling pattern P (x) is conditioned on an entirely different set of models.More specifically, we examine two questions: (1) how does the prediction fidelity vary with the number of models used in the average?(2) if one conditions the pattern scaling on the average of group 1, can one predict the response of group 2 (or vice versa)?

Regression Epoch difference
Figure 1.The components needed for pattern scaling of the precipitation response to CO 2 forcing, averaged over the 13 models in group 1 (Table 1).Top shows the baseline precipitation pattern for the multi-model average: B(x, 0) in Eq. (1) (mm day −1 ; averaged over years 1-25 of the 1pctCO2 simulation).The other panels show the time-invariant pattern P (x) in Eq. (1) (mm day −1 K −1 ) for the regression method (middle) and the epoch-difference method (bottom).
Figure 4 shows the rms error in the reconstruction (1pctCO2 simulation, averaged over years 116-140) as a function of the number of models used in the comparison.This figure was created by randomly sampling the space of all 26 models listed in Table 1 and  this figure parallel those discussed in previous sections: both methods have similar magnitudes of error (except for small numbers of models).The rms error values (Table 3) for group 1 (13 models) are consistent with the rms error ranges depicted in Fig. 4, indicating that group 1 is not an outlier.

Regression Epoch difference
The regression method shows a dependence of rms error on the number of models, whereas with the exception of low model numbers (< 10), there is much lower dependence for the epoch method.However, except for low model numbers, none of the boxes/whiskers is substantially different from any of the others, leading us to conclude that each of the methods is largely robust to changes in the number of models used to carry out pattern scaling.Section 2 in the Supplement and the associated figures provide additional comparisons between the patterns generated for groups 1 and 2.

Discussion of pattern scaling the precipitation response to CO 2
Both the regression and epoch-difference methods show great promise in their usefulness as precipitation pattern scaling methods.Both are able to reconstruct the changes in precipitation due to CO 2 increases with errors of less than 5 % in every region of the globe (Fig. 3).However, when examining interpolation in time, error increases for both methods, indicating issues with robustness to timescale (Supplement Sect.S1).Also, the pattern shows increased error in many places when different models are used (Supplement Sect.S2), indicating issues with inter-model robustness.
Like the temperature pattern scaling results of Lynch et al. (2017), we find that the regression and epoch-difference methods have similar performance.In the present work, we find that the epoch-difference method slightly outperforms the regression method, but the differences are relatively minor.Given the slight advantages in computational expense and reduced data input requirements, we profess a slight preference for using the epoch-difference method to generate scaling patterns for the precipitation response to CO 2induced global warming.

Pattern scaling for additional forcings
In this section, we compare the patterns and reconstructions between scenarios, primarily related to the RCP8.5 and 1pctCO2 simulations.We do this first as a test of robustness: do the pattern scaling methods perform "better" for CO 2 -only simulations vs. RCP8.5?If the fidelity of the reconstruction to the actual model output is similar for the two scenarios, then subtracting the reconstructions conditioned on RCP8.5 and 1pctCO2 could reveal a scaling pattern for non-CO 2 forcing.We note that this is one of the few ways of ascertaining the non-CO 2 response pattern without running separate simulations both with and without CO 2 forcing -without a scaling method to normalize for similar climate conditions, there is no way of obtaining meaningful re- sults from directly subtracting a 1pctCO2 simulation from an RCP8.5 simulation.(The approach discussed here is analogous to the methodology of Herger et al., 2015, but where they attempted to ascertain similarities between patterns for a given change in global mean temperature, we are interested in the differences.) We note several caveats with this approach.One is that, based on the results of Herger et al. (2015), the reconstructions of RCP8.5 and 1pctCO2 are likely to have some similarities for a given temperature change because the dominant forcing in RCP8.5 is CO 2 (see Table 2).Therefore, ascertaining the non-CO 2 signal could be limited by low signalto-noise ratios.A second caveat, one more germane to pattern scaling, is to ascertain whether the non-CO 2 pattern obtained from RCP8.5 can be used to reconstruct the non-CO 2 precipitation change for a different scenario.There is no a priori reason to expect that this will work, as different scenarios have different combinations of forcings.In Sect.4.3, we investigate this problem using an extreme case, where we ascertain the scaling patterns from an RCP8.5 simulation and use them to attempt to reconstruct the RCP2.6 simulation.
We acknowledge that the non-CO 2 component is a combination of both non-CO 2 greenhouse gases and aerosols, which have opposite effects on global mean temperature.These two categories of forcing have different local responses as well.An alternative approach would be to split the RCP8.5 response into a CO 2 component, a non-CO 2 greenhouse gas component, and a non-greenhouse gas component.Supplement Sect.S3 discusses the necessary calculations for both of these approaches.The CO 2 /non-CO 2 approach proved to be quite amenable to pattern scaling.On the contrary, the CO 2 /other greenhouse gas/non-greenhouse gas approach is not, due to distinct nonlinear relationships between the precipitation response and the derived temperature responses for these particular forcing categories.Therefore, we have chosen to proceed with a CO 2 /non-CO 2 division for the purpose of pattern scaling.

Inter-scenario differences
Figure 5 shows the RCP8.5 scaling pattern P RCP8.5 (x) and the difference from the CO 2 -only pattern.Patterns are nearly identical to those in Fig. 1.Both the regression and epoch-difference methods show no differences exceeding 0.1 mm day −1 K −1 in magnitude and no statistically significant differences of any magnitude.This figure reinforces the findings of Herger et al. (2015) that patterns generated from commonly used scaling methods (regression and epoch difference) do not differ appreciably between scenarios; therefore, pattern scaling can be accomplished by using periods in different scenarios with the same global mean temperature change.
Figures 6 and 7 show this in practice, where the reconstruction of the historical/RCP8.5 simulation B is built on the RCP8.5 pattern, multiplied by T averaged over years 227-251 (2076-2100) and 116-140 (1965-1990), respectively.The reconstructed precipitation response in Figure 6 is generally too strong in the tropics and too weak in the midlatitudes (which is the same pattern in Fig. 3), but Fig. 7 shows the opposite pattern.None of these differences is statistically significant, and the rms error is approximately the same in both figures (0.09-0.10 mm day −1 K −1 ; 2-3 times greater than the error in Fig. 3), but they suggest that there is a distinct non-CO 2 pattern that, while small, is still important in explaining precipitation differences in periods with large temperature change.

Non-CO 2 -forcing pattern
Here we calculate a non-CO 2 pattern for use in pattern scaling.We begin by assuming that the effects of CO 2 forcing and non-CO 2 forcing are separable; i.e., there are no nonlinear interactions between the two forcings that would produce a non-additive response.Although this assumption is not strictly true, it is approximately true to a sufficient degree that such calculations are useful (MacMartin et al., 2015;MacMartin and Kravitz, 2016).Following the notation in  Eq. ( 1), separability means that We set P CO 2 equal to P 1pctCO2 (from Sect.3), because if pattern scaling holds, the time-invariant pattern of CO 2 forcing should be identical, regardless of the scenario from which it is derived.P non−CO 2 is defined to be 4P RCP8.5 −3P CO 2 (see Sect.S3 in the Supplement for the derivation of this expres-sion).Embedded in this expression are inherent assumptions about the validity of a linear pattern scaling approach.If the approach fails, it is because either this pattern does not represent actual non-CO 2 forcing or because the pattern is too difficult to accurately estimate, perhaps due to internal variability.We also note that because this end result is the difference of two large quantities, it may be sensitive to noise that occurs in either P RCP8.5 or P CO 2 .
To calculate TCO 2 , we assume that global mean temperature scales linearly with radiative forcing (e.g., Gregory et al., 2004), and radiative forcing is known to scale logarithmically with the CO 2 concentration (Myhre et al., 1998).Performing linear regression of log 2 ([CO 2 ]) against global mean temperature change in the 1pctCO2 simulation yields a slope of α = 2.40, an intercept of β = −19.89,and an R 2 value of 0.99.(Brackets [•] indicate the CO 2 concentration in ppm v .)Then TCO 2 = αlog 2 ([CO 2 ]) + β.Tnon−CO 2 is calculated as the residual TRCP8.5 − TCO 2 .We note that this formulation does not explicitly account for lags in the climate response to radiative forcing such as ocean thermal inertia.
Figure 8 shows all of the aforementioned T values, plotted as a function of the CO 2 concentration.Both CO 2 and non-CO 2 monotonically increase with time in the RCP8.5 simulation.This is consistent with the design of the RCP8.5 scenario, in which non-CO 2 radiative forcing increases over the period 2000-2100 (Table 2), largely due to a doubling of the methane concentration over this period.This change in forc-  ing corresponds to a non-CO 2 -induced temperature change (green line in Fig. 8) from 0.31 to 1.36 K.
Figure 9 provides descriptions of the actual precipitation effects of both CO 2 and non-CO 2 forcing.Although the two portions of the reconstruction generally show similar features, the regional effects have quite different magnitudes in many regions.In particular, the non-CO 2 response is weaker over the tropical Pacific than the CO 2 response and is stronger over much of the Northern Hemisphere.One distinct difference between the two patterns is that precipitation is reduced over East Asia and India in the non-CO 2 response but increases in the CO 2 response.This is likely a result of global dimming from heavy aerosol emissions.Another source of differences, potentially attributable to dust, is the Saharan outflow over the Atlantic Ocean and extending into the Amazon.This gives us confidence that although the non-CO 2 response is likely dominated by non-CO 2 greenhouse gases (most prominently methane), it appears to have captured an aerosol signature.It would be a useful future area of investigation to conduct pattern scaling studies on single-forcing simulations (e.g., Marvel et al., 2016) to reveal more robust signals and determine which forcings are amenable to pattern scaling, with a particular eye on inter-model variations in the responses to identical forcings.The results in Fig. 9 also reinforce the conclusions of Frieler et al. (2012), who argue that the scaling patterns from one scenario are not in general translatable to scaling patterns for another scenario if the two scenarios are driven by different forcing.Even though Fig. 5 shows that the patterns P RCP8.5 and P 1pctCO2 are nearly identical, even small differences can affect reconstructions of precipitation change for large values of T .
This decomposition of the scaling patterns into CO 2 and non-CO 2 components performs rather well for reconstructing the actual model output (Table 3).For the period 2076-2100 at the end of the RCP8.5 simulation (Fig. 10), the epoch-difference method has fewer errors than the regression method, especially in the tropics and Northern Hemisphere midlatitudes.No error in the regression method exceeds 0.4 mm day −1 in magnitude, and no error in the epochdifference method exceeds 0.2 mm day −1 in magnitude.For

Scaling to predict other scenarios
The final stage of inter-model exploration is to see how well the CO 2 and non-CO 2 patterns generated from one scenario can be used on another scenario.Here we choose the extreme case of predicting the pattern of precipitation change in RCP2.6, based on the patterns calculated from RCP8.5.In this scenario, the CO 2 concentration peaks and then drops slightly (Table 2).The non-CO 2 forcing comprises 29 % of the total forcing in RCP8.5 in 2100 and 32 % of the total forcing in RCP2.6 in 2100, according to simulations using Hector (Hartin et al., 2015).
Figure 12 shows the effectiveness of this reconstruction process.Both the epoch-difference and regression methods show strong differences that are consistent with the patterns displayed in Fig. 9. Section S4 in the Supplement provides some additional derivations that explore the sources of these biases.There are two main conclusions from this section.First, using the non-CO 2 pattern build on RCP8.5 was not effective for explaining non-CO 2 behavior in RCP2.6, indicating that there are limits to the applicability of a "universal" non-CO 2 forcing.A future area of investigation could explore these limits: for example, would the non-CO 2 pattern built on RCP8.5 work on RCP6.0 or RCP4.5 instead of the extreme RCP2.6 case?The second conclusions is that the ability to perform this CO 2 /non-CO 2 decomposition is itself limited.Supplement Sect.S4 goes through detailed calculations showing that if one assumes that such a decomposition is possible, contradictions and inconsistencies arise.Deter-mining why this decomposition failed for RCP2.6 would require a more thorough investigation, possibly including single forcing simulations, which is beyond the scope of this study.Such research could lead to an understanding of which scenarios would be more amenable to separable forcing treatments than others.

Discussion of pattern scaling for non-CO 2 forcings
In general, the pattern scaling results depicted in Sect. 4 are consistent with previous studies.Herger et al. (2015) found that the patterns between scenarios are rather similar, which Fig. 5 confirms.However, the results for pattern scaling may be scenario dependent (Fig. 9) if global mean temperature change ( T ) is sufficiently large, which confirms the conclusions of Frieler et al. (2012).
In particular, we found limited ability in reconstructing the RCP2.6 model behavior from the RCP8.5 run, indicating limits in building scalings using one scenario and applying them to a different scenario.We deliberately chose an extreme case to understand whether such universal applications exist.The ability to do this might be improved for "closer" scenarios, such as RCP8.5 and RCP6.0 or RCP4.5.

Conclusions
We have explored two different, commonly used methods of pattern scaling for annual mean precipitation, with a focus on robustness to interpolation/extrapolation in time, inter-model variations, and inter-scenario differences.Both the regression and epoch-difference methods perform well and approximately similarly.
Most of the errors that arise for either method are either in areas dominated by convection (predominantly over the tropical oceans) or at high latitudes.Both of these areas are large sources of nonlinear responses to global mean temperature change; therefore, pattern scaling might not be expected to perform well in these areas.The approach of Tebaldi and Arblaster (2014) of using zonal mean temperature as a scaling parameter may prove useful in accounting for errors at high latitudes.
In terms of the usefulness of pattern scaling of precipitation, because the regression and epoch-difference methods perform well over most land regions, they are likely quite suitable for a variety of applications, including societal models (like integrated assessment models or impacts models) that mostly deal with land areas.If one's application requires good performance over tropical oceans, then pattern scaling may no longer be appropriate, and instead output from the full AOGCM may be required.However, given the difficulties that many climate models have with proper representations of convective processes and the resulting precipitation biases those difficulties cause (e.g., Song and Zhang, 2009), there may be doubts as to how well AOGCMs represent precipitation in these areas in the first place.
The results presented in Sect. 4 indicated that while some scenarios are amenable to broad separations of pattern scaling forcings, some others are not.Much more systematic work needs to be done in this area to determine the usefulness of pattern scaling for different forcings.Single forcing experiments would be particularly useful, as they can allow for a determination as to which forcings work best for pattern scaling, as well as whether there are any nonlinear effects that result from applying multiple simultaneous forcings.Another potential approach would be to use the "hybrid-pattern" method described by Xu and Lin (2017), in which a simple energy balance model is used to build separate forcings, circumventing the need for expensive single-forcing AOGCM simulations.
The results presented here have applications that extend beyond providing libraries of scaling patterns for integrated assessment models (Lynch et al., 2017).Another more speculative application involves efficacy of climate forcings.Kravitz et al. (2015) developed a method of comparing forcing agents via analyses of their rapid adjustments (fast responses), i.e., their responses in the absence of global mean temperature change.If our method of decomposing the response into CO 2 and non-CO 2 components could be extended to single forcings, then one could isolate the feedback responses (slow responses), which are the portions of the responses that depend on global mean temperature change.Thus, there is potential to provide a more quantitative intercomparison of the different effects of climate forcing agents.

Figure 2 .
Figure 2. Comparison between the actual group 1 multi-model average precipitation output (top) and the reconstructions produced by pattern scaling ( B in Eq. 1).All values are in mm day −1 and represent averages over years 116-140 of the 1pctCO2 simulation.Middle panel shows the regression method, and bottom panel shows the epoch-difference method.

Figure 3 .
Figure 3. Differences between the reconstructions produced by pattern scaling ( B) and the actual model output for precipitation (B).(a) shows absolute values of B − B (mm day −1 ), and (b) shows percent change.Top row shows results for the regression method, and bottom row shows the epoch-difference method.All values are calculated for a group 1 multi-model average for the 1pctCO2 simulation over the years 116-140.Stippling indicates a lack of statistical significance in the pattern of differences (Sect.2.2).

Figure 5 .Figure 6 .Figure 7 .
Figure5.Absolute values (left) of and differences (right) in the precipitation scaling pattern P (x) (Eq. 1) when different scenarios are used to construct the pattern (RCP8.5 vs. 1pctCO2).Panels (a, c) show values of P RCP8.5 , and (b, d) show values of P RCP8.5 − P 1pctCO2 (mm day −1 K −1 ).Panels (a, b) show results for the regression method, and bottom row shows the epoch-difference method.All values are calculated for a group 1 multi-model average for the 1pctCO2 simulation.Stippling indicates a lack of statistical significance in the pattern of differences (Sect.2.2).

Figure 8 .
Figure 8. Decomposition of global mean temperature change (as a function of the CO 2 concentration) into its components, as described in Sect.4.2.

Figure 9 .
Figure 9.The CO 2 (a, b) and non-CO 2 (c, d) responses over years227-251 (2076-2100)  of the RCP8.5 simulation, as well as the difference between the two (e, f).CO 2 response is calculated as B = P 1pctCO2 TRCP8.5 (227-251), and non-CO 2 response is calculated as B = P non−CO 2 TRCP8.5 (227-251) (see Eq. 1 and the discussion surrounding Eq. 6 for further details).Left column shows results for the regression method and right column shows the epoch-difference method.

Table 1 .
Models used in the present analysis.Most of the analysis was conducted using the models in group 1.

Table 2 .
(Hartin et al., 2015)ues (W m −2 ) for RCP8.5 and RCP2.6 in 2000, 2050, and 2100.CO 2 forcing and total forcing were calculated using the simple climate model Hector(Hartin et al., 2015).Non-CO 2 forcing is calculated as the difference between total and CO 2 forcing.Percentages in parentheses indicate the percentage of the total forcing.