Earth system models (ESMs) are the gold standard for producing future projections of climate change, but running them is difficult and costly, and thus researchers are generally limited to a small selection of scenarios. This paper presents a technique for detailed emulation of the Earth system model (ESM) temperature output, based on the construction of a deterministic model for the mean response to global temperature. The residuals between the mean response and the ESM output temperature fields are used to construct variability fields that are added to the mean response to produce the final product. The method produces grid-level output with spatially and temporally coherent variability. Output fields include random components, so the system may be run as many times as necessary to produce large ensembles of fields for applications that require them. We describe the method, show example outputs, and present statistical verification that it reproduces the ESM properties it is intended to capture. This method, available as an open-source R package, should be useful in the study of climate variability and its contribution to uncertainties in the interactions between human and Earth systems.

There are a variety of scientific applications that use data from future
climate scenarios as an input. Examples include crop and agricultural
productivity models (

This limited selection of scenarios may be inadequate for many types of studies. Users might need customized scenarios following some specific future climate pathway not covered by the scenario library, or they might need many realizations of one or more future climate scenarios.

Examples of research areas for which archival runs might be insufficient
include uncertainty studies in which the multiple realizations are used to
compute a statistical distribution of outcomes in the downstream model

In these situations, researchers typically turn to

Most of these methods are deterministic functions of their inputs, and thus
their outputs can be viewed as expectation values for the ESM output. Real
ESM output, however, would have some distribution around these mean response
values. We will refer to these departures from the mean response generically
as “variability”. Many of the applications described above are sensitive to
climate variability. For example,

There have been some attempts to add variability to emulators, but producing
realistic variability is difficult due to the complicated correlation
structure exhibited by climate model output over both space and time.
Typically, methods deal with this difficulty by either placing a
priori limits on the form of the correlation function

In this paper we describe a computationally efficient method for producing climate scenario realizations with realistic variability. The realizations are constructed so as to have the same variance and time–space correlation structure as the ESM data used to train the system. The variability produced by this method includes random components; so the system may be run many times with different random number seeds to produce an ensemble of independent realizations. The results in this study are limited to temperature output at annual resolution. Future papers will extend the method to additional output variables, such as precipitation, and to subannual time resolution.

In the text that follows, we use bold symbols
(e.g.,

Occasionally we will add a matrix and a vector; e.g.,

Our method requires a collection of ESM model output to train on. Any model
can be used, and by switching out the input data the method can be tuned to
produce results representative of any desired ESM. For all of the results in
this paper we have used the CESM(CAM5) (Community Earth System Model; Community Atmosphere Model version 5) output from the CMIP5 archive

To keep clear the distinction between the data produced by the emulator and the ESM data used to train the emulator, we will refer to the ESM data as “synthetic measurements” (when referring to the data as a whole) or “cases” (when referring to individual frames in the data), while the terms “results” and “model output” will be reserved for the data produced by the emulator.

Throughout the discussion, we will treat each temperature state as a vector,
with each grid cell providing one entry in the vector. The ordering of the
grid cells within the vector is arbitrary but consistent throughout the
entire calculation. The entire set of synthetic measurements will be grouped
into the input matrix

We will also derive from the input an operator for computing the
area-weighted mean of a grid state. We denote this vector by

Our basic procedure will be to construct a deterministic model for the mean response to global temperature. The residuals between the mean response and the synthetic temperature fields will be taken as representative of the variability in the ESM and used to construct variability fields that will be added to the mean response to produce the final product.

Schematic of the residual calculation showing the shapes of
the matrices involved. The result of the outer product

In principle, the mean response could be calculated using any of the emulation
techniques described in Sect.

The matrix of residuals,

To capture the time correlation we will make use of the Wiener–Khinchin
theorem

In theory we could use a similar technique to capture the spatial
correlation; however, in practice the spherical geometry of the
spatial domain makes this difficult. Moreover, it is not just the
spatial correlation properties that matter, but also the locations at
which spatially correlated phenomena occur.
Therefore, we capture spatial correlations by using principal
components analysis (PCA) to express the
grid state as a linear combination of basis vectors that diagonalize
the covariance matrix of the system.

In practice, it is convenient to force all of the basis vectors except for
one to have area-weighted global means of zero, so that all of the
variability in the global mean is carried by a single component. This
property is useful because it allows us to control how much the generated
variability distorts the global properties of the mean response field it is
being added to. To accomplish this, we introduce a small modification to the
EOF decomposition procedure. We define the zeroth basis vector

The typical use of PCA in many fields, including climate modeling, is for
dimensionality reduction. In such applications the next step after computing
the EOFs would be to identify and keep a small set of EOFs that capture the
majority of the variability and to throw away the rest. In this case,
dimensionality reduction is

At this point we are ready to apply the Wiener–Khinchin theorem. We compute
the discrete Fourier transform (DFT) of the

The steps in the variability generation algorithm are summarized in
Table

Summary of steps in the variability generation algorithm
described in Sect.

To illustrate the algorithm, we have produced four
independent variability fields by applying the algorithm to the input data
described in Sect.

Figure

Year 2025 snapshot for variability fields generated using the
procedure described in Sect.

Relative power for each EOF. Roughly half of the total power is contained in the first 10 EOFs. The aggregate power for all EOFs beyond 400 is 1 % of the total.

The spatial structure in the variability is apparent. Temperature perturbations occur on scales of roughly 40–60 arcdeg. Some features, such as the one seen in the low-latitude eastern Pacific, appear in all of the frames with greater or lesser strength, or, in one case, with an opposite sign. Other features, such as the cool patch over northern Europe in the third frame, have no apparent analog in the other realizations.

We can get a sense of the behavior of the variability fields over time by
looking at the power spectral density of the EOFs (Fig.

The second observation is that the power spectrum whitens (becomes
more uniform across frequencies)
considerably (Fig.

Heat map of power spectral density (PSD) for the first 50 EOFs. The trend of decreasing total power and more uniform spectral density continues for the remaining EOFs beyond EOF 50.

Smoothed power spectral density (PSD) for the first nine EOF basis functions. EOFs 2, 3, and 5 show peaks in the PSD, indicating quasiperiodic behavior on 3–5-year timescales. EOF 1 has most of its power at low frequencies, indicating that this component is approximately (though not exactly) constant over the course of a single ESM run.

In Fig.

Spatial visualizations of the EOFs 1–6 basis functions. EOF
grid cell values are scaled such that the magnitude of the largest
value is 1. These components capture large-scale patterns of variability. EOFs 2,
3, and 5 all feature a temperature anomaly in the eastern
Pacific. These same components can be seen in
Fig.

Figure

Spatial visualizations of higher EOF basis functions. EOF
grid cell values are scaled such that the magnitude of the largest
value is 1. The characteristic scale of temperature fluctuations decreases for
functions later in the series. Thus, EOFs 25 and 50 show features
at about half the scale of those shown in
Fig.

The time series produced by this method are designed to match three key
statistical properties of the ESM data used to train the emulator:

distribution of values in a grid cell over time and between realizations;

correlation between values in different grid cells; and

time autocorrelation of spatially correlated patterns of grid cells.

The generation procedure described in this paper does not strictly guarantee
that the generated fields have the desired statistical properties; therefore,
we turn to statistical tests of some of the key properties. Testing for the

The statistic being tested is the same in the generated data as in the input data.

The statistic being tested differs in the generated data by some de minimis value from the input data.

All of the statistical tests described in this section were performed on an
ensemble of 20 generated fields, each with 95 one-year time steps, for a
total of 1900 model outputs in the tests that operated directly on the
generated data. For the test that operates on the

The first property we will examine is the variance of the distribution of
grid cells. We used the

It may seem surprising that the fraction of positive results was so much
smaller than the number expected from the

Our model results, on the other hand, are

Pearson test power for several hypothetical correlation
coefficients between

Our second test concerns the covariance between grid cells. Testing for
equal, nonzero covariances directly is challenging, but we can transform the
results into a form that is more readily testable. Starting from
Eq. (

To test the condition in Eq. (

Comparison of the beta(5,5) distribution and a normal distribution with equal variance. The beta distribution is zero outside of the depicted range, while the normal distribution asymptotically approaches zero. Although the difference between these two distributions is small, the Shapiro–Wilk test can easily distinguish them.

The final statistical test concerns whether the generated residuals are
normally distributed. Apart from being necessary to ensure the validity of
the

To arrive at such a distribution, consider how the generated residual fields
are calculated. The value

We used the Shapiro–Wilk test of normality to evaluate the normality
of the grid cell distribution. For this sample size, the power of the
test for distinguishing between a

Property 3 deserves additional comment because it is explicitly

The properties enumerated above ensure that, when using the generated data to drive an ensemble of downstream models and compute statistics on those results, the scale of the fluctuations produced, their spatial location and extent, and their periodic character, if any, will be faithfully reproduced, allowing reliable calculations of variance in outcomes, return times of extremes, and regional differences in impacts. Therefore, we expect a technique like this to be invaluable for studies of the contribution of variability to uncertainty in climate effects and feedbacks.

Supporting such uncertainty studies was our primary purpose in developing
this tool, but the analysis in Sect.

Comparison between coefficients of the mean response model
for the RCP85 and MULTI emulators. For both the linear term

There is one important pitfall to watch out for when using this method to
learn the behavior of an ESM; viz., one must take care not to allow the mean
response model to overfit the ESM data. The more complex the model, the
greater the danger of overfitting, but even simple models like the linear
regression used here can overfit. Consider EOF 1 and its power spectrum,
depicted in Fig.

Therefore, it is essential to include enough independent ESM runs in the training data to ensure that the mean response model will not capture fluctuations that are idiosyncratic to a particular run. Exactly how many runs are needed will depend on the complexity of the mean field response model. For a relatively simple model, such as the linear model used in this paper, as few as three independent runs (i.e., one more than the number of parameters per grid cell) should provide reasonable protection against absorbing variability features into the mean response model. Conversely, mean response models with many parameters per grid cell would require more independent inputs. In case of doubt, cross-validation should be used to diagnose possible overfitting. Along similar lines, the input data should include runs for scenarios that span the entire range of future scenarios that the system will be used to emulate. This practice ensures that the mean response model will not be called upon to extrapolate beyond the range of conditions it was trained on.

Several readers of early versions of this work questioned the decision to fit
the mean response model over the entire range of RCP scenarios, speculating
that this practice would result in a mean response model that represented a
sort of compromise amongst the various RCPs in the input data, fitting none
of them particularly well. If the mean response model were to be underfit in
this way, then the residuals from the misfitting would be lumped in with the
variability and subjected to the randomization procedure described in
Sect.

Throughout the rest of this section we will refer to this collection of
hypotheses as the

To investigate this question, we fit two more emulators to subsets of
the data. The first of these used only the three ensemble members for
the RCP8.5 scenarios. We designated this emulator “RCP85”. The
second fit used three ESM runs covering the RCP2.6, RCP6.0, and
RCP8.5 scenarios. We designated this emulator “MULTI”. Our first
test was to compare the mean field models for these two emulators.
Figure

We can quantify just how similar the two models are by fitting linear models
predicting the RCP85 coefficients from the corresponding MULTI coefficients.
When we do this, we find that the average ratio between the RCP85 and MULTI

From this result alone, we see that the mean response models for these two emulators are virtually identical, making it extremely unlikely that CC effects are an appreciable source of error in the MULTI emulator. For this reason, description of additional tests of CC effects, along with source code and results, has been relegated to the data and analysis code archive cited in the code and data availability section.

As with most emulation schemes, this one makes certain assumptions about the
models it is trying to emulate. The most important assumption is that the ESM
outputs can be linearly separated into a temperature-dependent component
(what we have been calling the “mean field response”) and a time-dependent
component (the “variability”). Notably, we assume that the temperature
response is independent of the temperature history. This assumption, though
common in emulator studies, is dubious. The assumption can be partially
negated by including additional predictor variables in the mean field model

A related assumption is the assumption of stationarity. The variability
fields produced by this method have stationary statistical properties. Some
research has suggested that the variability is likely to change with
increasing global mean temperature

Having a computationally efficient method for generating realizations of future climate pathways is a key enabler for research into uncertainties in climate impacts. In order to be fit for this purpose, a proposed method must produce data with statistical properties that are similar to those of Earth system models, which are currently the state of the art in projecting future climate states.

In the preceding sections we have described such a method, and we have shown that it reproduces key statistical properties of the Earth system model on which it was trained. Specifically, it produces equivalent distributions of residuals to the mean field response and equivalent space and time correlation structure. The method is computationally efficient, requiring under 10 min to train on the input data set used for the results presented here. Once training is complete, generating temperature fields takes just a few seconds per field generated.

As a result, we believe the method will be extremely useful for the impact studies it was designed to support. Currently, the method is limited to producing temperature only, and at annual resolution. However, we believe that the method can be readily extended to other climate variables and to shorter timescales. These extensions will be the subject of follow-up work.

Software implementing this technique is available as an
R package released under the GNU General Public License. Full source and
installation instructions can be found in the project's GitHub repository
(

The data and analysis code for the results presented in this paper are
archived at

RL designed the algorithm, developed the fldgen package, and performed the analysis of the results. CL ran early versions of the algorithm and performed analysis on those results. AS performed the theoretical analysis of the algorithm's properties, which informed the statistical analysis. CH, BK, and BBL advised the project and provided feedback on early drafts of the paper. RL prepared the manuscript with contributions from all coauthors.

The authors declare that they have no conflict of interest.

This research is based on work supported by the US Department of Energy, Office of Science, as part of research in the Multi-Sector Dynamics, Earth and Environmental System Modeling Program. The Pacific Northwest National Laboratory is operated for the Department of Energy by the Battelle Memorial Institute under the contract DE-AC05-76RL01830.

This research was supported in part by the Indiana University Environmental Resilience Institute and the Prepared for Environmental Change grand challenge initiative.

This paper was edited by Fiona O'Connor and reviewed by three anonymous referees.