A non-linear Granger causality framework to investigate climate – vegetation dynamics

Satellite Earth observation has led to the creation of global climate data records of many important environmental and climatic variables. These come in the form of multivariate time series with different spatial and temporal resolutions. Data of this kind provide new means to further unravel the influence of climate on vegetation dynamics. However, as advocated in this article, commonly-used statistical methods are often too simplistic to represent complex climate-vegetation relationships due to linearity assumptions. Therefore, as an extension of linear Granger causality analysis, we present a novel non-linear framework 5 consisting of several components, such as data collection from various databases, time series decomposition techniques, feature construction methods and predictive modelling by means of random forest. Experimental results on global data sets indicate that, with this framework, it is possible to detect non-linear patterns that are much less visible with traditional Granger causality methods. In addition, we discuss extensive experimental results that highlight the importance of considering non-linear aspects of climate–vegetation dynamics. 10


Introduction
Vegetation dynamics and the distribution of ecosystems are largely driven by the availability of light, temperature, and water; thus, they are mostly sensitive to climate conditions (Nemani et al., 2003;Seddon et al., 2016;Papagiannopoulou et al., 2017).Meanwhile, vegetation also plays a crucial role in the global climate system.Plant life alters the characteristics of the atmosphere through the transfer of water vapour, exchange of carbon dioxide, partition of surface net radiation (e.g.albedo), and impacts on wind speed and direction (Nemani et al., 2003;McPherson et al., 2007;Bonan, 2008;Seddon et al., 2016;Papagiannopoulou et al., 2017).Because of the strong two-way relationship between terrestrial vegetation and climate variability, predictions of future climate can be improved through a better understanding of the vegetation response to past climate variability.
The current wealth of Earth observation data can be used for this purpose.Nowadays, independent sensors on different platforms collect optical, thermal, microwave, altimetry, and gravimetry information, and are used to monitor vegetation, soils, oceans, and atmosphere (e.g.Su et al., 2011;Lettenmaier et al., 2015;McCabe et al., 2017).The longest composite records of environmental and climatic variables already span up to 35 years, enabling the study of multidecadal climate-biosphere interactions.Simple correlation statistics and multilinear regressions using some of these data sets have led to important steps forward in understanding the links between vegetation and climate (e.g.Nemani et al., 2003;Barichivich et al., 2014;Wu et al., 2015).However, these methods in general are insufficient when it comes to assessing causality, particularly in systems like the landatmosphere continuum in which complex feedback mechanisms are involved.A commonly used alternative consists of Granger-causality modelling (Granger, 1969).Analyses of this kind have been applied in climate attribution studies to investigate the influence of one climatic variable on another, e.g. the Granger-causal effect of CO 2 on global temperature (Triacca, 2005;Kodra et al., 2011;Attanasio, 2012), of vegetation and snow coverage on temperature (Kaufmann et al., 2003), of sea surface temperatures on the North Atlantic Oscillation (Mosedale et al., 2006), or of the El Niño-Southern Oscillation on the Indian monsoon (Mokhov et al., 2011).Nonetheless, Granger causality should not be interpreted as "real causality"; one assumes that a time series A Granger causes a time series B if the past of A is helpful in predicting the future of B (see Sect. 2 for a more formal definition).However, the underlying statistical model that is commonly considered in such a context is a linear vector autoregressive model, which is (again), by definition, linear; see, e.g.Shahin et al. (2014); Chapman et al. (2015).
In this article, we show new experimental evidence that advocates the need non-linear methods to study climatevegetation dynamics due to the non-linear nature of these interactions (Foley et al., 1998;Zeng et al., 2002;Verbesselt et al., 2016).To this end, we have assembled a large, comprehensive database, comprising various global data sets of temperature, radiation, and precipitation, originating from multiple online resources.We use the Normalized Difference Vegetation Index (NDVI) to characterize vegetation, which is commonly used as a proxy of plant productivity (Myneni et al., 1997;Nemani et al., 2003).We followed an inclusive data collection approach, aiming to consider all available data sets with a worldwide coverage, and at least a 30-year time span and monthly temporal resolution (Sect.3).Our novel non-linear Granger-causality framework is used for finding climatic drivers of vegetation and consists of several steps (Sect.2).In a first step, we apply time series decomposition techniques to the vegetation and the various climatic time series to isolate seasonal cycles, trends, and anomalies.Subsequently, we explore various techniques for constructing more complex features from the decomposed climatic time series.In a final step, we run a Granger-causality analysis on the NDVI anomalies, while replacing traditional linear vector autoregressive models with random forests.This framework allows for modelling non-linear relationships and prevents overfitting.The results of the global application of our framework are discussed in Sect. 4.

Linear Granger causality revisited
We start with a formal introduction to Granger causality for the case of two times series, denoted as x = [x 1 , x 2 , . .., x N ] and y = [y 1 , y 2 , . .., y N ], with N being the length of the time series.In this work, y alludes to the NDVI anomaly time series at a given pixel, whereas x can represent the time series of any climatic variable at that pixel (e.g.temperature, precipitation, radiation).Granger causality can be interpreted as predictive causality, for which one attempts to forecast y t (at the specific timestamp t) given the values of x and y in previous timestamps.Granger (1969) postulated that x causes y if the autoregressive forecast of y improves when information of x is taken into account.In order to make this definition more precise, it is important to introduce a performance measure to evaluate the forecast.Below, we will work with the coefficient of determination R 2 , which is here defined as follows: where y represents the observed time series, ȳ is the mean of this time series, ŷ is the predicted time series obtained from a given forecasting model, and P is the length of the lag-time moving window.Therefore, the R 2 can be interpreted as the fraction of explained variance by the forecasting model, and it increases when the performance of the model increases, reaching the theoretical optimum of 1 for an error-free forecast and being negative when the predictions are less representative of the observations than the mean of the observations.Using R 2 , one can now define Granger causality in a more formal way.Definition 1.We say that time series x Granger causes y if R 2 (y, ŷ) increases when x t−1 , x t−2 , . .., x t−P are included in the prediction of y t , in contrast to considering y t−1 , y t−2 , . .., y t−P only, where P is the lag-time moving window.
In climate sciences, linear vector autoregressive (VAR) models are often employed to make forecasts (Stock and Watson, 2001;Triacca, 2005;Kodra et al., 2011;Attanasio, 2012).A linear VAR model of order P boils down to the following representation: with β ij being parameters that need to be estimated and 1 and 2 referring to two white noise error terms.This model can be used to derive the predictions required to determine Granger causality.In that sense, time series x Granger causes time series y if at least one of the parameters β 12p for any p significantly differs from 0. Specifically, and since we are focusing on the vegetation time series as the only target, the following two models are compared: We will refer to Eq. (3) as the "full model" and to Eq. ( 4) as the "baseline model", since the former incorporates all available information and the latter only information of y.
Comparing the above two models, x Granger causes y if the full model manifests a substantially better predictive performance in terms of R 2 than the baseline model.To this end, statistical tests can be employed, for which one typically assumes that the errors in the model follow a Gaussian distribution (Maddala and Lahiri, 1992).However, our above definition differs from the perspective in research papers that develop statistical tests for Granger causality (Hacker and Hatemi-J, 2006), because we intend to move away from statistical hypothesis testing, since the assumptions behind such testing are typically violated when working with climate data where neither variables nor observational techniques are fully independent from each other in most cases, and errors are not normally distributed (see Sect. 2.4 for further discussion).
In climate studies, the Granger-causal relationship between two time series x and y has often been investigated in the bivariate setting (Elsner, 2006(Elsner, , 2007;;Kodra et al., 2011;Attanasio, 2012;Attanasio et al., 2012).However, such an analysis might lead to incorrect conclusions, because additional (confounding) effects exerted by other climatic or environmental variables are not taken into account (Geiger et al., 2015).This problem can be mitigated by considering time series of additional variables.For example, let us assume one has observed a third variable w, which might act as a confounder in deciding whether x Granger causes y.The above definition then naturally extends as follows.Definition 2. We say that time series x Granger causes y conditioned on time series w if R 2 (y, ŷ) increases when x t−1 , x t−2 , . .., x t−P are included in the prediction of y t , in contrast to considering y t−1 , y t−2 , . .., y t−P and w t−1 , w t−2 , . .., w t−P only, where P is the lag-time moving window.
Similarly as above, we refer to the two models as full and baseline model, respectively.Therefore, in the trivariate setting, Granger causality might be tested using the following linear VAR model: where a causal relationship between x and y exists if at least one β 12p significantly differs from 0. As previously mentioned, the time series w might also have a causal effect on y and be correlated with x.For this reason, w should be included in both models (baseline and full), so that the method can cope with cross-correlations between predictors or, in our case, between the climatic drivers of vegetation anomalies.An extension of this definition for more than three time series is straightforward.

Overfitting and out-of-sample testing
It is well known in the statistical literature that predictions made on in-sample data, i.e. the same data that were used to fit the statistical model, tend to be optimistic.This process is often referred to as overfitting; i.e. by definition, the fitting process leads to parameter values that cause the model to mimic the observed data as closely as possible (Friedman et al., 2001).In the context of Granger-causality analysis, overfitting will occur more prominently in the multivariate case, when the number of considered time series increases.The results in Sect. 4 are based on multivariate analysis; thus, they are vulnerable to overfitting; the situation further aggravates when switching from linear to non-linear models, because then the number of parameters typically increases to allow for a more flexible functional model form.
To prevent overfitting, out-of-sample data should be used in evaluating the predictive performance in Grangercausality studies (Gelper and Croux, 2007).The most straightforward procedure for creating out-of-sample data is to separate the time frame into two parts, a training set and a test set, which typically constitute the first and last halves of the time frame.A few authors have adopted this approach for climatic attribution (Attanasio et al., 2012;Pasini et al., 2012); however, satellite Earth observation time series are usually too short to allow for train-test splitting in that fashion.An alternative approach, which uses the available data in an efficient manner, is cross-validation.To this end, the time frame is divided into a number of short intervals, typically a few years of data, in which one interval serves as a test set, while all remaining data are used for parameter fitting.This procedure is repeated until all intervals have served once as a test set, and the prediction errors obtained in each round are aggregated so that one global performance measure can be computed.We direct the reader to Michaelsen (1987) and Von Storch and Zwiers (2001) for further discussion.
The inclusion of a regularization term in the fitting process of over-parameterized linear models will avoid overfitting.Typical regularizers that shrink the parameter vectors of linear models towards 0 are L2 norms (as in ridge regression), L1 norms (as in least absolute shrinkage and selection operator (LASSO) models), or a combination of the two norms (as in elastic nets) (Friedman et al., 2001).Translated to VAR models, this implies that one should impose restrictions on the parameter matrix of Eq. ( 5), as done in the recent theoretical paper of Gregorova et al. (2015).In this work, we want to identify causal relationships between a vegetation time series and various climatic time series.Hence, there is only one target variable of interest, and a simpler approach can be adopted.Denoting the vegetation time series by y, one can mimic in the trivariate setting a VAR model by means of three autoregressive ridge regression models: In this article, we aim to detect the climate drivers of vegetation and not the feedback of vegetation on climate (see, e.g. Green et al., 2017).Therefore, it suffices to retain Eq. ( 6) in our analysis as is stated above for the trivariate case (Eq.5).Concatenating all parameters of this model into a vector β = [β 01 , β 11p , . .., β 13p ], one fits the parameters in ridge regression by solving the following optimization problem: with λ being a regularization parameter, that is tuned using a validation set or nested cross-validation, and ||β|| 2 being a penalty term, i.e. the squared L2 norm of the coefficient vector.The sum only starts at P + 1 because a moving window of P lags is considered.For simplicity, we describe the above approach for the trivariate setting, even though the total number of variables used in our study is a lot larger (see Sect. 3); nonetheless, extensions to the multivariate setting are straightforward.

Non-linear Granger causality
The methodology that we develop in this paper is closely connected to the methods explained in the previous section.However, as we hypothesize that the relationships between climate and vegetation can be highly non-linear (Foley et al., 1998;Zeng et al., 2002;Verbesselt et al., 2016), we also replace the linear VAR models in the Granger-causality framework with non-linear machine learning models.In other fields, such as neuroscience, kernel methods or other nonlinear models have been used for the investigation of nonlinear Granger-causality relationships between time series (Ancona et al., 2004;Marinazzo et al., 2008).In our analysis, we use simple non-linear methods that are applicable to large data sets.More sophisticated approaches typically do not scale well enough in global climate-vegetation data sets.Therefore, in our work, the machine learning algorithm we choose is random forests due to its excellent computational scalability (Breiman, 2001).Random forests is a well-known method that has shown its merits in diverse application domains and has successfully been applied to Earth observations in both classification and regression problems (Dorigo et al., 2012;Rodriguez-Galiano et al., 2012;Loosvelt et al., 2012a, b).Briefly summarized, the random forest algorithm forms a combination of multiple decision trees, where each tree contributes a single vote to the final output, which is the most frequent class (for classification problems) or the average (for regression problems).
Compared to most application domains where random forests are applied, we employ the algorithm in a slightly different way as an autoregressive non-linear method for time series forecasting.In practice, this means that we replace the full and baseline linear model of Sect.2.1 by a random forest model.At each pixel, the vegetation time series is still considered as a response variable, and the various climate time series serve as predictor variables (see Sect. 3.1 for an overview of our database).For a given value of the NDVI time series y at timestamp t, we investigate properties of the different predictor time series -temperature, radiation, etc. -by considering a moving window including a number of previous months (Fig. 1).In this way, the definition of Granger causality in Sect.2.1 is adopted.Any climatic time series x Granger causes vegetation time series y if the predictive performance in terms of R 2 improves when the moving window x t−1 , x t−2 , . .., x t−P is incorporated in the random forests, in contrast to considering y t−1 , y t−2 , . .., y t−P and w t−1 , w t−2 , . .., w t−P only.Analogous to the linear case, we will speak of a full random forest model when all variables are taken into account and of a baseline random forest model when only the moving window y t−1 , y t−2 , . .., y t−P of y is considered as a predictor.In Fig. 1, this principle is extended to four time series.The baseline random forest predictions of NDVI at t 1 are based on the observations from the green moving window only, whereas the full random forest model includes the three red moving windows as well.
In our experiments, we treat each continental pixel as a separate problem and use the Scikit-learn library (Pedregosa et al., 2011) for the random forest regressor implementation, with the number of trees equal to 100 and the maximum number of predictor variables per node equal to the square root of the total number of predictor variables.Changes in these parameters or in the randomness of the algorithm do not cause substantial changes in the results (not shown).Model performance is assessed by means of 5-fold cross-validation.The window length is fixed to 12 months because initial experimental results revealed that longer time windows did not lead to improvements in the predictions (results omitted).Finally, we also experimented with techniques that exploit spatial correlations to improve the predictive performance of the model (see Sect. 4.3).Here, NDVI takes the role of the time series y in Eq. ( 3).In addition, three climate predictor time series are shown.The baseline random forest model only considers the green moving window, whereas the full random forest model includes the red moving windows as well.The pixel corresponds to a location in North America (lat: 37.5 • , long: −87.5 • ).

Granger-causal inference
Generally, the null hypothesis (H 0 ) of Granger causality is that the baseline model has equal prediction error as the full model.Alternatively, if the full model predicts the target variable y significantly better than the baseline model, H 0 is rejected.In some applications, inference is drawn in VAR by testing for significance of individual model parameters.Other studies have used likelihood-ratio tests, in which the full and baseline models are nested models (Mosedale et al., 2006).However, in both cases, the models are trained and evaluated on the same in-sample data.As it has been discussed above, the performance of any Granger-causal model should be validated on out-of-sample data to avoid overfitting (see Sect. 2.2).Therefore, the null hypothesis of noncausality in the formulation stated above should be tested for by comparing out-of-sample prediction errors.To this end, statistical tests have been proposed and applied both in the econometric literature as well as in Granger-causality studies in the context of climate science.These kinds of tests, which compare out-of-sample prediction errors, are available for models for which parameter estimation is done through ordinary least squares or maximum likelihood estimation (Attanasio et al., 2013).Moreover, the asymptotic and finitesample properties of a battery of tests for comparing forecasting accuracies of different models have been studied and, more recently, further tests aiming specifically at nested models have been proposed (Clark and McCracken, 2001).
Unfortunately, all the tests mentioned above were designed to compare the out-of-sample prediction errors of lin-ear parametric models (McCracken, 2007).In climate, relations between variables are highly non-linear and tend to become even more non-linear as the temporal resolution of the data becomes finer (Attanasio et al., 2013).Therefore, it would be convenient to have at our disposal a statistical test to assess the significance of any quantitative evidence of climate (Granger) causing vegetation anomalies.Ideally, the test would be model independent so that any non-linear model could be used.One well-known model-independent test to compare the accuracy of two forecasts is the Diebold-Mariano test (DM test) (Diebold, 2015).Although its application to Granger causality is promising, the test does not hold for nested models, because under H 0 the prediction errors from two nested models are exactly the same and perfectly correlated (McCracken, 2007).An alternative approach for comparing the predictive performance of different models is to use resampling methods such as the bootstrap or schemes such as 5×2 cross-validation (Dietterich, 1998).Methods based on the bootstrap have been used before in Granger-causality studies with climate data (Diks and Mudelsee, 2000;Attanasio et al., 2013).However, these results need to be interpreted with care because, by increasing the number of bootstrap samples, the power of any paired test (such as the Wilcoxon signed rank test) to detect significant differences between the error distributions of both models (full and baseline) increases as well.For these reasons, we conclude that developing a statistical test that is able to handle non-stationary time series and non-linear models is not a trivial task.To the best of our knowledge, no such test exists in the current literature.In this paper, we focus on express-ing Granger causality in a quantitative instead of a qualitative way and stress the gained improvement with the use of a nonlinear model.

Global data sets
Our non-linear Granger-causality framework is used to disentangle the effect of past climate variability on global vegetation dynamics.To this end, climate data sets of observational nature -mostly based on satellite and in situ observations -have been assembled to construct time series (see Sect. 3.3) that are then used to predict NDVI anomalies following the linear and non-linear causality frameworks described in Sect. 2. Data sets have been selected from the current pool of satellite and in situ observations on the basis of meeting a series of spatiotemporal requirements: (a) expected relevance of the variable for driving vegetation dynamics, (b) multidecadal record and global coverage available, and (c) adequate spatial and temporal resolution.The selected data sets can be classified into three different categories: water availability (including precipitation, snow water equivalent, and soil moisture data sets), temperature (both for the land surface and the near-surface atmosphere), and radiation (considering different radiative fluxes independently).Rather than using a single data set for each variable, we have collected all data sets meeting the above requirements.This has led to a total of 21 different data sets which are listed in Table 1.They span the study period 1981-2010 at the global scale and have been converted to a common monthly temporal resolution and 1 • × 1 • latitude-longitude spatial resolution.To do so, we have used averages to resample original data sets found at finer native resolution and linear interpolation to resample coarser-resolution ones.
To conclude, as a proxy for the state and activity of vegetation, we use the third-generation (3G) Global Inventory Modeling and Mapping Studies (GIMMS) satellite-based NDVI (Tucker et al., 2005), a commonly used long-term global record of NDVI (Beck et al., 2011).We note that this data set is used to derive the response variable in our approach (seasonal NDVI anomalies; see Sect.3.2), while all other data sets are converted to predictor variables.The length of the NDVI record  sets the study period to an interval of 30 years.

Anomaly decomposition
In climate studies, Granger causality has already been applied on time series of seasonal anomalies (Attanasio, 2012;Tuttle and Salvucci, 2016).The latter may be obtained in a two-step decomposition procedure by first subtracting the seasonal cycle and then the long-term trend from the raw time series.Several competing decomposition methods have been proposed in the literature, including additive models, multiplicative models, and more sophisticated methods based on break points (see, e.g.Cleveland et al., 1990;Grieser et al., 2002;Verbesselt et al., 2010).In our framework, we used the following approach: in a first step, at each given pixel, the "raw" time series of the target variable y t and the climate predictors (x t , w t ,. . . ) are detrended linearly based on a simple linear regression with the timestamp t as a predictor variable applied to the entire study period.For the case of the target variable, this can be denoted as follows: with α 0 and α 1 being the intersect and the slope of the linear regression, respectively.We obtain in this way the detrended time series y D t = y t − y T r t .This detrending is needed to remove non-stationary signals in climatic time series, and al- lows us to draw the emphasis to the shorter-term multi-month dynamics.By detrending, one can assure that the mean of the probability distribution does not change over time; however, other moments of the probability distribution, such as the variance, might still be time dependent.As classical statistical procedures for testing Granger causality (i.e.autoregressive model, statistical tests) are developed for stationary time series, those methods are in fact not applicable to nonstationary climate data.In a second step, after subtracting the trend from the raw time series, the seasonal cycle y S t is calculated.When the assumption is made that the seasonal cycle is annual and constant over time, one can simply estimate it as the monthly expectation.To this end, the multi-year average for each of the 12 months of the year is calculated.Finally, the anomalies y R t can then be computed by subtracting the corresponding monthly expectation from the detrended time series: y R t = y D t − y S t .This procedure is schematically represented in Fig. 2.

Predictor variable construction
We do not limit our approach to considering raw and anomaly time series of the data sets in Table 1 as predictors but also take into consideration different lag times, past cumulative values, and extreme indices (see following text).These additional predictors, here referred to as "higher-level variables", are calculated based on raw and anomaly time series.Our application of Granger causality can be interpreted as a way to identify patterns in climate during past moving windows (see Fig. 1) that are predictive with respect to the anomalies of vegetation time series.Therefore, by feeding predictor variables from previous timestamps to a linear (or non-linear) predictive model, one can identify subsequences of interest in the moving window specified for timestamp t, a technique that is similar to so-called shapelets (Ye and Keogh, 2009).In addition, vegetation dynamics may not necessarily reflect the climatic conditions from, e.g. 3 months ago, but the average of the, e. matic conditions is referred to here as a "cumulative" response.More formally, we construct a cumulative variable of k months as the sum of time series observations in the last k months: Note that, unlike in the case of lagged variables, cumulative variables always include the period up to time t. Figure 3 illustrates an example of a 4-month cumulative variable.In our analysis, we experimented with time lags covering a wide range of time-lag values and concluded that including lags of more than 6 months did not yield substantial predictive power.
Another type of higher-level predictor variable that can be constructed from the data sets in Table 1 are extreme indices.Over the last few years, several research studies have focused on defining and indexing climate extremes (Nicholls and Alexander, 2007;Zwiers et al., 2013).As an example, the Expert Team on Climate Change Detection and Indices (ETCCDI) recommends the use of a range of extreme indices related to temperature and precipitation (Zhang et al., 2011;Donat et al., 2013).Here, we calculate a variety of analogous indices for the whole set of the collected climatic variables, based on both the raw data sets as well as on the seasonal anomalies (see Table 2).In addition, we derived lagged and cumulative predictor variables from these extremes' indices to incorporate the potential impact of climatic extremes occurring, e.g. 3 months ago, or during the previous, e.g. 3 months, respectively.All these resulting time series appear as additional predictor variables in our non-linear Grangercausality framework (see Sect. 2.3).
Combining the different climate and environmental predictor variables described above, we obtain a database of 4571 predictor variables per 1 • pixel, covering 30 years at a monthly temporal resolution.

Detecting linear Granger-causal relationships
In a first experiment, we evaluate the extent to which climate variability Granger causes the anomalies in vegetation using a standard Granger-causality approach, in which only linear relationships between climate (predictors) and vegetation (target variable) are considered.To this end, ridge regression is used as a linear VAR model in the Granger-causality approach (note that this ridge regression will be substituted by the non-linear random forest approach in Sect.4.2).In the application of the ridge regression, we use all climatic and environmental predictor variables (Sect.3.3) and adopt a nested 5-fold cross-validation to properly tune the hyper parameter λ (see Eq. 9). Figure 4a shows the predictive performance of the full ridge regression model.While the model explains more than 40 % of the variability in NDVI anomalies in some regions (R 2 > 0.4), this is by itself not necessarily indicative of climate Granger causing the vegetation anomalies, as it may reflect simple correlations.In order to test the latter, we compare the results of the full model to a baseline model, i.e. an autoregressive ridge regression model that only uses previous values of NDVI to predict the NDVI at time t (see Sect. 2.1).If climate Granger caused the variability of NDVI at a given pixel, the full ridge regression model (Fig. 4a) would show an increase in the predictive power over the predictions based on the baseline ridge regression model.However, the results unequivocally show that -when only linear relationships between vegetation and climate are considered -the areas for which vegetation anomalies are Granger caused by climate are very limited, involving mainly semiarid regions and central Europe (Fig. 4b).
For further comparison, we analyse the predictive performance obtained when (linear) Pearson correlation coefficients are calculated on the training data sets, selecting the highest correlation to the target variable for any of the 4571 predictor variables at each pixel.Figure 4c shows that the explained variance is again rather low and, for most regions, substantially lower than the R 2 of the baseline ridge regression model, here considered as the minimum to interpret this predictive power as Granger causal.These results indicate that, despite being routinely used as a standard tool in climate-biosphere studies (see, e.g.Nemani et al., 2003), univariate correlation analyses are unable to extract the nuances of the relationships between climate and vegetation dynamics.

Linear versus non-linear Granger causality
To analyse the effect of climate on vegetation more thoroughly, we substitute the linear ridge regression model (VAR) by the non-linear random forest model.Results in Fig. 5 highlight the differences.Compared to the results in Sect.4.1, the predictive power substantially increases by considering non-linear relationships between vegetation and climate (Fig. 5a).This is the case for most land regions but is especially remarkable in semiarid regions of Australia, Africa, and Central and North America, which are frequently exposed to water limitations.In those regions, more that 40 % of the variance of NDVI anomalies can be explained by antecedent climate variability.These results are further investigated by Papagiannopoulou et al. (2017), who highlight the crucial role of water supply for the anomalies in vegetation greenness in these and other regions.On the other hand, the variance of NDVI explained in other areas, such as the Eurasian taiga, tropical rainforests, or China, is again below 10 %.We hypothesize two potential reasons: (a) the uncertainty in the observations used as target and predictors are typically larger in these regions (especially in tropical forests and at higher latitudes), and (b) these are regions in which vegetation anomalies are not necessarily primarily controlled by climate but may be predominantly driven by phenological and biotic factors (Hutyra et al., 2007), occurrence of wildfires (Van der Werf et al., 2010), limitations imposed by the availability of soil nutrients (Fisher et al., 2012), or agricultural practices (Liu et al., 2015).Nonetheless, the explained variance shown in Fig. 5a is again not necessarily indicative of Granger causality.As we did in Fig. 4b, in or-der to test whether the climatic and environmental controls do, in fact, Granger cause the vegetation anomalies, we compare the results of our full random forest model to a baseline random forest model which only uses previous values of NDVI to predict the NDVI at time t.As seen in Fig. 5b, in this case, the improvement over the baseline is unambiguous.One can conclude that -while not considering all potential control variables in our analysis -climate dynamics indeed (Granger) cause vegetation anomalies in most of the continental land surface, with a larger impact on subtropical regions and midlatitudes.Moreover, a comparison between Figs. 4b and 5b unveils that these causal relationships are highly non-linear, as expected given the distinct resistance and resilience of different ecosystems, which are reflected by a progressive response and recovery of vegetation to these perturbations (Foley et al., 1998;Zeng et al., 2002;Verbesselt et al., 2016).
For a better understanding of the results obtained by the two models, we average the performance of each model regionally.More specifically, we use the International Geosphere-Biosphere Program (IGBP) (Loveland and Belward, 1997) land cover classification to stratify the mean and variance of R 2 for both the baseline and the full model in Fig. 5 per IGBP land cover class.The bar plot in Fig. 6 shows that the full model outperforms the baseline model in all IGBP land cover classes, i.e. that Granger causality exists for all these biomes.In the parentheses, we note the number of pixels per region.The error bars indicate that the variances of the two models are analogous; i.e. they are low or high in both models in the same land cover class.For the Closed Shrublands region, one can observe the highest difference between the two models, yet only 19 pixels belong to this biome type.In savannah regions, the performance of the full model is high in comparison with other regions (see Fig. 5).On the other hand, the lowest performance improvement of the full model with respect to the baseline is observed for the regions of Deciduous Needleleaf Forests and Evergreen Broadleaf Forests.This shows that for these two regions climate is not identified as a major control over vegetation dynamics (see discussion in previous paragraph about tropical and boreal regions).

Spatial and temporal aspects
Environmental dynamics reveal their effect on vegetation at different timescales.Since the adaptation of vegetation to environmental changes requires some time, and because soil and atmosphere have a memory, a necessary aspect to investigate is the potential lag-time response of vegetation to climate dynamics which relates to the ecosystem resistance and resilience properties.The idea of exploring lag times was introduced by several studies in the past (see, e.g.Davis, 1984;Braswell et al., 1997), and it has been adopted in various studies more recently (Anderson et al., 2010;Kuzyakov and Gavrichkova, 2010;Chen et al., 2014;Rammig et al., www.geosci-model-dev.net/10/1945/2017/Geosci.Model Dev., 10,[1945][1946][1947][1948][1949][1950][1951][1952][1953][1954][1955][1956][1957][1958][1959][1960]2017 Table 2. Extreme indices considered as predictive variables.These indices are derived from the raw (daily) data and the (daily) anomalies of the data sets in Table 1.We also calculate the lagged and cumulative variables from these extreme indices.especially in regions such as the Sahel, the Horn of Africa, or North America.In those regions, 10-20 % of the variability in NDVI is explained by the occurrence of prolonged anomalies and/or extremes in climate, illustrating again the non-linear responses of vegetation.For more detailed results about lagged vegetation responses for specific climate drivers and the effect of climate extremes on vegetation, the reader is referred to Papagiannopoulou et al. (2017).

Name
Because of uncertainties in the observational records used in our study to represent climate and predict vegetation dynamics, and given that ecosystems and regional climate conditions usually extend over areas that exceed the spatial resolution of these records, one may expect that the predictive performance of our models becomes more robust when including climate information from neighbouring pixels.In addition, it is quite likely that neighbouring areas have similar climatic conditions which, in turn, affect vegetation dynamics in a similar manner.We therefore also consider an extension of our framework to exploit spatial autocorrelations, inspired by Lozano et al. (2009), who achieved spatial smoothness via an additional penalty term that punishes dissimilar-ity between coefficients for spatial neighbours.In our analysis, we incorporate spatial autocorrelations at a given pixel by extending the predictor variables of our models with the predictor variables of the eight neighbouring pixels.We provide such an extension both for the full and the baseline random forest model.As such, for the full random forest model, a vector of 41 139 (4571 × 9) predictor variables is formed for each pixel.
Figure 7c illustrates the performance of the full random forest model that includes the spatial information.As one can observe in Fig. 7d, the explained variance of NDVI anomalies remains similar to the original model that depicts the same approach without spatial autocorrelation (Fig. 5a).While in most areas the performance slightly increases, the explained variance never improves by more than 10 %; as a result, incorporating spatial autocorrelations in our framework does not seem to further improve the quantification of Granger causality and is not considered in further applications of the framework (see Papagiannopoulou et al., 2017).A possible explanation for this result is that the model without the spatial information cannot be outperformed because of the large dimensionality of the feature space, which may include redundant information, in combination with the low number of observations per pixel (Fig. 5a).Note that in this case the number of observations per pixel remains the same as in the original model (360 observations) while the number of predictor variables is 9 times larger.

The importance of focusing on vegetation anomalies
In Sect.3.2, we advocated that Granger-causality analysis should target NDVI anomalies, as opposed to raw NDVI values.There are several fundamental reasons for this.First, by applying a decomposition, one can subtract long-term trends from the NDVI time series, making the resulting time series more stationary.This is absolutely needed, as existing Granger-causality tests cannot be applied for non-stationary time series.Secondly, by subtracting the seasonal cycle from the time series, one is not only able to remove a confounding factor that may contribute predictive power without bear-ing causality but also able to remove a clear autoregressive component that can be well explained from the NDVI time series themselves.As vegetation has a strong seasonal cycle, it is not difficult to predict subsequent vegetation conditions by using the past observations of the seasonal cycle only.To corroborate this aspect, we repeat our analysis in Sect.4.2, but this time the raw NDVI time series instead of the NDVI anomalies are considered as the target variable.We again compare the full and the baseline random forest models.
The results are visualized in Fig. 8a.As it can be observed, worldwide the R 2 is close to the optimum of 1.However, due to the overwhelming domination of the seasonal cycle, it becomes very difficult, or even impossible, to unravel any potential Granger-causal relationships with climate time series in the Northern Hemisphere; see Fig. 8b.The predictability of NDVI based on the seasonal NDVI cycle itself is already so high that nothing can be gained by adding additional climatic predictor variables (see also the large amplitude of the seasonal cycle of NDVI at those latitudes compared to the NDVI anomalies, as illustrated in Fig. 2).Therefore, a nonlinear baseline autoregressive model is able to explain most of the variance in the time series.Moreover, as observed in Fig. 1, temperature and radiation also manifest strong seasonal cycles that often coincide with the NDVI cycle.For most regions on Earth, such a stationary seasonal cycle is less present for variables such as precipitation.This can potentially yield wrong conclusions, such as that temperature in the Northern Hemisphere is driving most NDVI variability, since the two seasonal cycles have the same pattern.However, based on the above discussion, it becomes clear that results of that kind should be treated with caution: for climate data, a Granger-causality analysis should be applied after decomposing time series into seasonal anomalies.

Conclusions
In this paper, we introduced a novel framework for studying Granger causality in climate-vegetation dynamics.We compiled a global database of observational records spanning a 30-year time frame, containing satellite, in situ, and reanalysis-based data sets.Our approach consists of the combination of data fusion, feature construction, and non-linear predictive modelling.The choice of random forest as a nonlinear algorithm has been motivated by its excellent computational scalability with regards to extremely large data sets, but could be easily replaced by any other non-linear machine learning technique, such as neural networks or kernel methods.
Our results highlight the non-linear nature of climatevegetation interactions and the need to move beyond the traditional application of Granger causality within a linear framework.Comparisons to linear Granger-causality-based approaches indicate that the random forest framework can predict 14 % more variability of vegetation anomalies on average globally.The predictive power of the model is especially high in water-limited regions where a large part of the vegetation dynamics responds to the occurrence of antecedent rainfall.Moreover, our results indicate the need to consider multi-month antecedent periods to capture the effect of climate on vegetation, in particular to account for the effects of climate extremes on vegetation resilience.The reader is referred to Papagiannopoulou et al. (2017) for a detailed analysis of the effect of different climate predictors on the variability of global vegetation using the mathematical approach described here.

Figure 1 .
Figure1.An illustrative example of the moving window approach considered in the analysis of vegetation drivers at a given timestamp t 1 .Here, NDVI takes the role of the time series y in Eq. (3).In addition, three climate predictor time series are shown.The baseline random forest model only considers the green moving window, whereas the full random forest model includes the red moving windows as well.The pixel corresponds to a location in North America (lat: 37.5 • , long: −87.5 • ).

Figure 2 .
Figure2.The three components of the NDVI time series decomposition of a specific pixel of the Northern Hemisphere (lat: 53.5 • , long: 26.5 • ).On top are the linear trend (black continuous line) and the seasonal cycle (dashed black line) fitted on the raw data (red).On the bottom are the remaining anomalies; see text for details.

Figure 3 .
Figure3.Example of lagged and cumulative variables extracted from a temperature time series.On top is part of a raw daily time series with its monthly aggregation.In the middle is the 4-month lag-time monthly time series.On the bottom is the corresponding 4-month cumulative variable.The pixel corresponds to a location in Kentucky, USA (lat: 37.5 • , long: −87.5 • ).

Figure 4 .Figure 5 .
Figure 4. Linear Granger causality of climate on vegetation.(a) Explained variance (R 2 ) of NDVI anomalies based on a full ridge regression model in which all climatic variables are included as predictors.(b) Improvement in terms of R 2 by the full ridge regression model with respect to the baseline ridge regression model that uses only past values of NDVI anomalies as predictors; positive values indicate (linear) Granger causality.(c) A filter approach in which the variable with the highest squared Pearson correlation against the NDVI anomalies is selected.(d) Improvement in terms of R 2 by the filter approach with respect to the same baseline ridge regression model that uses only past values of NDVI anomalies.

Figure 6 .
Figure6.Mean R 2 and variance per IGBP land cover class for both the baseline and full random forest model.The green part indicates the improvement in performance of the full model with respect to the baseline, i.e. the quantification of Granger causality (as in Fig.5b).The number of pixels per IGBP class is noted in the parentheses.

Figure 7 .Figure 8 .
Figure 7. Analysis of spatiotemporal aspects of our framework.(a) Explained variance (R 2 ) of NDVI anomalies based on a full random forest model in which all climatic variables are included as predictors as in Fig. 5a, except for the cumulative variables and the extreme indices (see Sect. 3.3).(b) Difference in terms of R 2 between the model without cumulative and extreme predictors and the full random forest model in Fig. 5a.(c) Explained variance (R 2 ) of NDVI anomalies based on a full random forest model in which all climatic variables are included as predictors as in Fig. 5a, as well as the predictors from the eight nearest neighbours.(d) Difference in terms of R 2 between this full random forest model which includes spatial information from neighbouring pixels and the full random forest model in Fig. 5a.

Table 1 .
Data sets used in our experiments.Basic data set characteristics are provided, including the native spatial and temporal resolutions.