A performance expectation is that Earth system models simulate well the climate mean state and the climate variability. To test this expectation, we decompose two 20th century reanalysis data sets and 12 CMIP5 model simulations for the years 1901–2005 of the monthly mean near-surface air temperature using randomised multi-channel singular spectrum analysis (RMSSA). Due to the relatively short time span, we concentrate on the representation of multi-annual variability which the RMSSA method effectively captures as separate and mutually orthogonal spatio-temporal components. This decomposition is a unique way to separate statistically significant quasi-periodic oscillations from one another in high-dimensional data sets.

The main results are as follows. First, the total spectra for the two reanalysis data sets are remarkably similar in all timescales, except that the spectral power in ERA-20C is systematically slightly higher than in 20CR. Apart from the slow components related to multi-decadal periodicities, ENSO oscillations with approximately 3.5- and 5-year periods are the most prominent forms of variability in both reanalyses. In 20CR, these are relatively slightly more pronounced than in ERA-20C. Since about the 1970s, the amplitudes of the 3.5- and 5-year oscillations have increased, presumably due to some combination of forced climate change, intrinsic low-frequency climate variability, or change in global observing network. Second, none of the 12 coupled climate models closely reproduce all aspects of the reanalysis spectra, although some models represent many aspects well. For instance, the GFDL-ESM2M model has two nicely separated ENSO periods although they are relatively too prominent as compared with the reanalyses. There is an extensive Supplement and YouTube videos to illustrate the multi-annual variability of the data sets.

The ultimate goal in developing Earth system models (ESM) is to enable exploitation of the inherent Earth system predictability, and hence reduce weather- and climate-related uncertainties in our daily life, and guide societies in making sustainable choices (e.g. Slingo and Palmer, 2011; Meehl et al., 2014). For the predictions to be useful and usable, the expectation is that the climate mean state and climate variability are well simulated by these tools. Due to the complexity of the models and the data they produce, testing the expectation poses a challenge: many aspects of the model performance are gathered under the variability concept and no single diagnostic alone is sufficient to exhaust its all facets. Yet, understanding the discrepancies between the observed and simulated variability is crucial feedback for model development.

Representation of climate variability among models participating in climate model inter-comparisons, such as CMIP5, has been studied by, for example, Bellenger et al. (2014), Knutson et al. (2013), Ba et al. (2014) and Fredriksen and Rypdal (2016). We will add to this literature by interfacing a representative set of contemporary coupled climate models with reanalysis data focusing on spatio-temporal modes of climate variability. One century of global reanalysis data is of course a very short period for this purpose and severely constrains inter-comparison studies (e.g. Wittenberg, 2009; Stevenson et al., 2010). First, time series should cover a sufficient number of recurring “events” for obtaining significance for the findings. Therefore, decadal-to-multi-decadal variability is of interest but not as informative as focusing on shorter cycles of variability. Second, the applied methods have to be very effective in extracting information from the short but high-dimensional data sets. For these reasons, we concentrate on the representation of multi-annual variability in reanalyses and coupled climate models applying randomised multi-channel singular spectrum analysis (RMSSA; Seitola et al., 2014, 2015) which effectively separates mutually orthogonal spatio-temporal components from our high-dimensional data sets.

The aim of this study is to decompose the 20th century climate variability into its multi-annual modes, and to assess how these modes are represented by the contemporary climate models. We hope for this to provide guidance for model development due better understanding of the deficiencies in representing reanalysed modes of multi-annual climate variability. Ultimately, interpretation of the hints about model deficiencies as development topics is due for the development teams themselves. Our role is to point towards the potential error sources. For reassuring the teams that high-dimensional time series analysis is possible today, we emphasise the methodological aspect of this study. RMSSA can, under very weak assumptions about the data, decompose high-dimensional data sets in a unique way and separate statistically significant quasi-periodic spatio-temporal oscillations from one another. This is in contrast to many other approaches which either make assumptions about the oscillation structures, such as Fourier or spherical decomposition, or resolve only either spatial or temporal aspects of variability. RMSSA can detect spatially evolving “chains of events” through resolving eigenmodes of spatio-temporal covariance data. This is a significant advantage over, say, PCA, which only resolves eigenmodes of spatial covariances and often projects temporal evolution of an “event” onto a number of different eigenmodes. In addition, the novel data compression based on random projections enables here a vast increase in tractable problem size (i.e. data dimension) – even multi-variate decomposition is now possible, although not included here.

Multi-channel singular spectrum analysis (MSSA; Broomhead and King, 1986a, b) can be characterised as being a time series analysis method for high-dimensional problems. It effectively identifies spatially and temporally coherent patterns of a data set by decomposing a lag-covariance data matrix into its eigenvectors and eigenvalues (e.g. Ghil et al., 2002) using singular value decomposition (SVD). The lag window in MSSA is a user choice, recommended typically to be shorter than approximately one third of the length of the time series (Vautard and Ghil, 1989). A long lag window enhances the spectral resolution, i.e. the number of frequencies that can be identified, but distributes the variance on a larger set of components. Here, MSSA eigenvectors are called space–time empirical orthogonal functions (ST-EOFs), and the projections of the data set onto those ST-EOFs space–time principal components (ST-PCs). Because of the lag window, ST-PCs have a reduced length and they cannot be located into the same index space with the original time series. However, they can be represented in the original coordinate system by the reconstructed components (RC; Plaut and Vautard, 1994).

MSSA is computationally expensive and practical limits are easily exceeded for large data sets and long lag windows. In order to overcome this limitation, the computationally more efficient variant called RMSSA is applied here. The RMSSA algorithm, in a nutshell, (1) reduces the dimension of the original data set by using so-called random projections (RP; Bingham and Mannila, 2001; Achlioptas, 2003), (2) decomposes the data set by calculating standard MSSA in the low-dimensional space and (3) reconstructs the components in the original high-dimensional space.

In RP, the original data set is projected onto a matrix of Gaussian-distributed random numbers (zero mean and unit variance) in order to
construct a lower dimensional representation. In this study, we reduce the
data volume to about 5 % of the original volume. Since the computational
complexity of RP is low, involving only a matrix multiplication, it can be
applied to very high-dimensional data sets. Although RP is not a lossless
compression, it has the important property that the lower-dimensional data
set has essentially the same structure as the original high-dimensional data
set. This has been demonstrated for climate model data in Seitola et
al. (2014). The RMSSA algorithm is briefly presented in the
Appendix

The ST-PCs represent the different oscillatory modes extracted from the data set. In order to estimate the dominant frequencies associated with each ST-PC, the power spectrum is calculated with the multitaper spectral analysis method (MTM) (Thomson, 1982; Mann and Lees, 1996). To further compare the variability modes and their intensities in different data sets, the power spectrum of all the ST-PCs of each data set is summed up to obtain so-called total spectrum. The ST-PCs are already weighted by their respective explanatory power, i.e. multiplied by the corresponding eigenvalue. Therefore the components with more explanatory power also have higher spectral densities compared to the ones that explain only a small fraction of the variance. Therefore no extra weighting is needed in this step.

The uncertainty related to the explanatory power of each ST-PC (i.e. the
confidence interval of the respective eigenvalue) is estimated using North's rule of thumb for sampling errors (North et al., 1982). The sampling
error

In data sets of dynamical systems, ST-PCs/ST-EOFs of MSSA often appear as
quadratic pairs that explain approximately the same variance and are

Significance testing in MSSA requires solving conventional PCs of the
original data set. In case of very high-dimensional problems this easily
exceeds practical computational limits. The RMSSA implementation in Seitola
et al. (2015) contains the Allen-Robertson test such that the PCs are solved
in the dimension-reduced space, and is thus affordable even in very
high-dimensional problems. The Appendix

The data consist of the monthly mean near-surface air temperature from the
historical 20th Century simulations of 12 different climate models
(Table

The historical (1901–2005) simulations were extracted from the CMIP5 data
archive and they follow the CMIP5 experimental protocol (Taylor et al.,
2012). The 20th Century simulations use the historical record of climate
forcing factors such as greenhouse gases, aerosols, solar variability, and
volcanic eruptions. We used a single ensemble member of each model and the
model data sets were interpolated into a common grid of

Climate models used in the study. For more details of the models, see Table 9.1. in IPCC (2013).

As a reference, we used two reanalysis data sets: the 20th Century Reanalysis
V2 data (hereafter 20CR) provided by the NOAA/OAR/ESRL PSD (Compo et al.,
2011), and ERA-20C data provided by ECMWF (Poli et al., 2013). The data sets
are produced using an ensemble of perturbed reanalyses, and the final data
set corresponds to the ensemble mean. In 20CR, only surface pressure
observations are assimilated, and the observed monthly sea-surface
temperature and sea-ice distributions from HadISST1.1 (Rayner et al., 2003)
are used as boundary conditions (Compo et al., 2011). In ERA-20C,
observations of surface pressure and surface marine winds are assimilated
(Poli et al., 2013). Unlike 20CR, it uses a more recent sea-surface
temperature and sea-ice cover analysis from HadISST2 (Rayner et al., 2006).
Both 20CR and ERA-20C are forced by historical record of changes in climate
forcing factors (greenhouse gases, volcanic aerosols and solar variations).
In order to be consistent with the climate model simulations, the same time
period is used (1901–2005, i.e. 1260 monthly mean fields) and the
reanalysis data sets were interpolated into the same grid as the model
simulations (

Some pre-processing of the data was needed before applying RMSSA. At each
grid point the data sets were processed as follows:

linear trend was fitted and removed,

annual cycle was estimated using seasonal-trend decomposition (STL; Cleveland et al., 1990) and removed,

resulting values were mean-centred and divided by the average standard
deviation of all the data sets (see Fig.

Map of the common normalisation factor. Shown is the mean standard
deviation of 2 m temperature (

The reanalysis and climate model data sets have different temperature standard deviations, which would impact the temperature variability from inter-annual to multi-decadal timescales (e.g. Thompson et al., 2015). To retain these differences, we have used a common normalisation factor (i.e. the average standard deviation of all the data sets). This procedure reduces the weight of grid points with high variance, typically at higher latitudes, and hence adds weight on the lower latitude features. After the pre-processing, the dimension reduction step of RMSSA was applied so that approximately 5 % of the original data dimensions were retained. The lag window in the analysis was 20 years (240 months). The total spectra were obtained from this analysis, and are comparable due to normalisation using the common standard deviation of the data sets.

The statistical significance test uses a red noise null hypothesis. In the test we have used data sets that are normalised by their own standard deviations. Using a common normalisation interferes with generating the red noise surrogates corresponding to each data set. The first 50 PCs of each data set were retained as input. Those PCs explain 79 % of the variability in 20CR, 75 % in ERA-20C, and 70–80 % in the climate model data sets. A total of 1000 realisations of red noise surrogate data sets were generated, and confidence interval (95 %) for the oscillatory modes were estimated. We note that transformation to PCs may interfere with the detection of weak signals, as demonstrated by Groth and Ghil (2015).

We used reconstructed components (RC; see Appendix

To summarise the animations, we have calculated composite maps of the modes.
The compositing procedure follows the one described in Plaut and
Vautard (1994). The idea is to choose the grid point time series
(RC

Reanalysis ST-PC time series (columns 1 and 3) of monthly near-surface temperature 1901–2005 and their spectra (columns 2 and 4) for 20CR and ERA-20C. The components are ordered according to the explained variance (%).

The main outcome of the RMSSA method, the space–time principal components
(ST-PCs) characterise both the spatial and temporal structure of the modes of
variability. Sections 3.1–3.4 focus on their temporal aspects. The leading
30 ST-PC time series and the corresponding power spectra are displayed in
Fig.

components with predominantly multi-decadal periodicity (1, 2, 5, and 6) explain a total of 7.2 and 5.9 % of the variance in 20CR and ERA-20C, respectively, with clear similarities in their time series and spectra;

multi-annual components (3, 4, 7, and 8) explain 4.2 and 3.2 % of the variance in 20CR and ERA-20C, respectively;

there is a broad multi-annual peak centred at 5 years and a narrower peak at 3.5 years in both reanalyses; these are clearly separated in ERA-20C at the components 3 and 4 vs. 7 and 8. This separation in 20CR is less clear;

there are many spectral peaks in the reanalyses at 2–3 year periods with little explained variance but some are well separated and distinct.

The conclusion based on Fig. 2 is that the leading sources of the near-surface air temperature variability at multi-decadal and multi-annual periods are well identifiable in the reanalysis data sets. 20CR and ERA-20C are composed of very similar components explaining the variance in the two data sets. This is of course expected but it is also reassuring from the methodological view point: despite its complexity, the RMSSA decomposition is consistent.

It is noteworthy in Fig.

Global patterns of 2 m temperature for the components 3, 4, 7 and 8
in 20CR (left column) and ERA-20C (right column). Snapshots are taken from
January 1987 (top row) and January 1998 (bottom row). Unit

Figure

As Fig.

As Fig.

Statistical significance tests are presented in Figs.

The total spectra for the 12 CMIP5 models are shown in Fig.

Here we will concentrate on the multi-annual aspects but note in passing that
the level of multi-decadal variability (

In multi-annual scales, the model performance varies a lot among the models. There is a group of models (a, b, d and e) with high spectral density at about 3–7 year periods. The models d and e have a bi-modal spectral structure, as in the reanalyses, while models a and b have a broad unimodal peak. Decompositions (available in the Supplement, Sect. S1) partly explain the reasons leading to these total spectra.

In model a, for instance, there is a unimodal broad peak at 3.5–4 year
periods (Fig.

In model e, there is a bimodal total spectrum (Fig.

In most other models, the multi-annual variability is less prominent than in
the reanalyses. In model c (Fig.

Finally, Fig.

In the reanalyses (Fig.

In summary, there are 5–15 statistically significant periods in the models,
except for model k (Fig.

ERA-20C phase (1–8) composites of the 3–4 year variability mode.
Unit

ST-PC components can be represented in the original coordinate system as so-called reconstructed components that can be visualised. In this section, some visualisation results are presented and discussed.

In ERA-20C, there is a spectral peak at the 3.5-year period, which is significant
at 5 % level (Fig.

In phase 1 (Fig.

Next, 20CR and the CMIP5 model behaviour is studied. The 3.5-year mode is
significant in 20CR and ERA-20C. For the illustration, we have chosen
component pairs from the model decompositions (Fig. S1 in the Supplement) that have
spectral peaks between 3 and 4 years and do not express substantial
variability on other timescales. In most climate models, such a
corresponding mode exists, except in models g and k. In model c this mode is
not significant at 5 % level, but it is illustrated anyway. The
Supplement reveals how these modes are represented in different data sets
(Figs. S3–S14). The format is the same as in Fig.

In 20CR (Fig. S3), the anomalies are weaker compared to ERA-20C (Fig. S4).
This is mainly because the 3–4 year mode is distributed on two component
pairs in 20CR whereas in ERA-20C it is concentrated on one pair.
Nevertheless, a similar although weaker signal is evident in 20CR, such as the
northeast propagation of the North Pacific temperature anomaly. (Note that in
Fig.

We note that a substantial portion of variance at inter-annual to inter-decadal timescales can be attributed to “climate noise” associated with processes with timescales much shorter than the inter-annual scale (Wunsch, 1999; Feldstein, 2000). If the amplitude of the variability mode exceeds some noise threshold (such as red noise), then the variability mode is also likely driven by some process external to the atmosphere, in addition to the climate noise. For example, a large part of the inter-annual atmospheric ENSO pattern is presumably driven by anomalies of tropical diabatic heating associated with sea surface temperature anomalies (Feldstein, 2000). We assume that for this reason the multi-annual patterns related to ENSO clearly exceed the noise threshold in the results of this study.

The aim of this study is to decompose the 20th century climate variability into its multi-annual modes, and to assess how these modes are represented by the contemporary climate models. To this end, two 20th century reanalysis data sets and 12 CMIP5 model simulations for the years 1901–2005 of the monthly mean near-surface air temperature have been decomposed using RMSSA. The statistical significance of the identified modes has been estimated with Monte Carlo simulations. The main conclusions are as follows.

Spectral properties of the 20CR and ERA-20C reanalysis data appear remarkably similar. The most prominent forms of variability in both data sets are related to approximately 3.5- and 5-year modes which are significant at 5 % level. The spectral power in ERA-20C is systematically slightly higher than in 20CR. The 3.5-year mode is illustrated in more detail. In ERA-20C, the mode is associated with a typical ENSO pattern of temperature anomalies in the equatorial Pacific Ocean, South America, and northwestern North America. On top of these, the mode also contains a northeast-propagating temperature anomaly over northernmost North America, and another eastward-propagating anomaly in the vicinity of western Antarctica. Since about the 1970s, the amplitude of this 3.5-year global mode have increased.

None of the 12 coupled climate models closely reproduce all aspects of the reanalysis spectra, although some models represent many aspects well. For instance, the GFDL-ESM2M model has two nicely separated ENSO-related periods although they are relatively too prominent as compared to the reanalyses. Also, a number of models represent the propagating temperature anomalies at 3–4 year time frame. Some suggestions are provided in the text for potential model development aspects.

There is an extensive Supplement available presenting the results in visual format for each reanalysis and model data set. In the future, relaxation of the uni-variate nature of the present study would seem a natural extension. This is now possible since the use of random projections allows efficient data structures, preserving compression. Of special interest would be to study behaviour of variables directly linked with atmosphere–ocean coupling processes, such as heat, momentum and moisture fluxes over oceans.

All data used in this study were downloaded from open sources. The RMSSA
algorithm and the statistical significance testing are implemented using GNU
licensed free software from the R Project for Statistical Computing
(

The RMSSA algorithm and the significance test is briefly presented here. The
original data matrix is

The next step is to construct an augmented data matrix

The ST-PCs can be represented in the original coordinate system by the RCs (Plaut and Vautard, 1994; Ghil et al., 2002). This transformation is given by

RMSSA with significance testing is briefly presented in the following. Testing the MSSA components against a red noise null hypothesis requires orthogonal input vectors, which are obtained by calculating first a conventional PCA and retaining a set of dominant PCs. Therefore some additional calculation steps are included in the RMSSA-algorithm:

SVD of lower dimensional matrix

Finally, a large number of red noise processes (i.e. surrogate data sets) are generated, and the confidence limits for the MSSA eigenmodes are determined. This significance test (Monte Carlo MSSA) is described in detail in Allen and Robertson (1996).

Heikki Järvinen suggested the study and mostly wrote the article. Teija Seitola implemented the methods, performed all computations, and wrote data and method descriptions, Johan Silén supported the method development, and Jouni Räisänen the climate model data analysis.

This research has been funded by the Academy of Finland (project number 140771), Academy of Finland Centre of Excellence Programme (project number 272041) and the Fortum Foundation (grant number 201500127). Edited by: R. Neale Reviewed by: three anonymous referees