Randomly correcting model errors in the ARPEGE-Climate v6.1 component of CNRM-CM: applications for seasonal forecasts

Stochastic methods are increasingly used in global coupled model climate forecasting systems to account for model uncertainties. In this paper, we describe in more detail the stochastic dynamics technique introduced by Batté and Déqué (2012) in the ARPEGE-Climate atmospheric model. We present new results with an updated version of CNRM-CM using ARPEGE-Climate v6.1, and show 5 that the technique can be used both as a means of analysing model error statistics and accounting for model inadequacies in a seasonal forecasting framework. The perturbations are designed as corrections of model initial tendency :::: drift : errors estimated from a preliminary ::::: weakly : nudged re-forecast run over an extended reference period of 34 boreal winter seasons. Perturbations are then drawn randomly in forecast mode, but consistently for 10 all three prognostic variables perturbed. Statistical :: A ::::::: detailed :::::::: statistical : analysis of these model corrections show ::::::::: corrections :: is :::::::: provided, ::: and :::::: shows that they are mainly made of intra-month variance, justifying the use of these corrections ::::::: therefore :::::::: justifying :::: their :::: use as in-run perturbations of the model in seasonal forecasts. However, the inter-annual and systematic error correction terms cannot be neglected. We explore therefore the impact of using monthly mean perturbations throughout a 15 given forecast month in a first ensemble re-forecast SMM. Time correlation of the errors is limited, but some consistency is found between the errors of two or :: up :: to three consecutive days. This leads us to explore the ::::: These ::::::: findings ::::::::: encourage ::: us :: to :::: test :::::: several ::::::: settings ::: of ::: the ::::::: random ::::: draws ::: of ::::::::::: perturbations ::: in ::::::: seasonal ::::::: forecast ::::: mode. ::::::::::: Perturbations :::: are ::::: drawn :::::::: randomly ::: but ::::::::::: consistently :: for ::: all ::::: three ::::::::: prognostic 20 ::::::: variables ::::::::: perturbed. ::: We :::::: explore ::: the :::::: impact :: of ::::: using ::::::: monthly ::::: mean ::::::::::: perturbations ::::::::: throughout :: a ::::: given :::::: forecast :::::: month :: in : a :::: first :::::::: ensemble ::::::::: re-forecast ::::::: (SMM), :::: and ::: test ::: the use of five-day sequences of perturbations in a second ensemble re-forecast : (S5D). Both experiments are compared in the light of a REF reference ensemble with initial perturbations only. A comprehensive forecast quality analysis is then provided. Results :::::: Results :: in ::::: terms ::: of ::::::: forecast 25 :::::: quality are contrasted depending on the region and variable of interest, but very few areas exhibit a clear degradation of forecasting skill with the introduction of stochastic dynamics. We highlight


Introduction
Handling uncertainties in seasonal predictions with numerical models is an issue of the utmost importance. These uncertainties arise from two main sources: initial conditions of the different variables 35 describing the evolution of the atmosphere, ocean, and land surface, and approximations made in the modelling process. The first source is addressed by using ensemble predictions, to sample the error on the initial state by running several integrations of a given season. The second source is now increasingly tackled in coupled global circulation models (GCMs) with several approaches developed over the last decades. Multi-model forecasts are now issued routinely by the EUROSIP consortium 40 (Vitart et al., 2007), the United States National Multi-Model Ensemble (Kirtman et al., 2013) or the APEC Climate Center (Wang et al., 2009). Pooling several models together provides a first rough estimate of the uncertainties related to choices in parameterizations of sub-grid processes or numerical approximations in the individual models (e.g. discretization in time and space). Numerous studies in the framework of international research projects based on retrospective seasonal forecasts 45 (or "re-forecasts") have illustrated the gain in terms of forecast skill when using a multi-model ensemble versus a single model (see Hagedorn et al., 2005;Doblas-Reyes et al., 2009;Alessandri et al., 2011;Batté and Déqué, 2011). Further calibration of these forecasts (by weighting each individual model contribution using a separate training period) can improve this effect (Rodrigues et al., 2013;Doblas-Reyes et al., 2005). 50 Simultaneously to these multi-model studies, other techniques to account for model inaccuracies were developed in the climate modelling framework. Multi-parameter (Collins et al., 2006) or multiphysics techniques (Watanabe et al., 2012) generate ensemble simulations with different physics parameter settings and physics schemes for the sub-grid scales, respectively. Over the last twenty years, stochastic perturbations have also been tested as a means of introducing noise in numerical 55 weather prediction (NWP) models and components of GCMs. Most studies have focused on the atmospheric component, building on methods perturbing parameterization tendencies (Buizza et al., 1999) or scattering kinetic energy dissipated by the model at the sub-grid scale back to larger scales (Shutts, 2005).
Stochastic perturbations in the atmosphere have been shown to improve the skill, reliability and 60 mean state of seasonal forecasting systems (see e.g. Weisheimer et al., 2011Berner et al., 2008;Batté and Doblas-Reyes, 2015). An increasing number of studies report results from intro-ducing stochastic perturbations in the other components of the climate system, such as the ocean (Brankart, 2013;Brankart et al., 2015), land-surface (MacLeod et al., 2015) or sea ice models (Juricke et al., 2013). Berner et al. (2015) provides a review of some of the latest advances in stochastic 65 parameterization for NWP and climate models.
At CNRM-GAME, an alternative method to the stochastic physics techniques was designed to perturb the atmospheric component of the coupled climate model in a seasonal forecasting framework (Batté and Déqué, 2012). Past studies (Yang and Anderson, 2000;Barreiro and Chang, 2004;Guldberg et al., 2005) had suggested that systematically correcting model tendency errors in GCMs the ::::::: method :::::::: presented ::::: here, dubbed "stochastic dynamics", we apply additive perturbations to the prognostic variables of the model drawn from a sample of model error corrections estimated in a 75 preliminary run, instead of a systematic correction. In Batté and Déqué (2012), we showed a reduction of systematic error in the extra-tropical geopotential height fields for boreal winter re-forecasts over an extended period with CNRM-CM5. Since then, the method has been more thoroughly assessed in subsequent versions of the coupled model in a seasonal re-forecasting framework. Different choices in the frequency and strength of perturbations have been extensively tested. Building on the 80 conclusions from these assessments and operational constraints, a version of stochastic dynamics was introduced in the operational seasonal forecasting system 5 at Météo-France in 2015.
Section 2 describes the CNRM-CM model :: : and setup for seasonal re-forecasts and provides more details on the stochastic dynamics technique. A statistical analysis of the model errors estimated from the nudged re-forecast run is led in section 3. Section 4 examines the impact of using corrections of these model errors in two stochastic dynamics seasonal :::::: winter re-forecasts, using a reference unper-95 turbed run as a benchmark. Common skill and forecast quality metrics will be used, as well as an analysis of the representation of North Atlantic weather regimes. Section 5 summarizes conclusions and discusses limitations and future plans for stochastic perturbations in CNRM-CM.
3 2 Model and methods

Stochastic dynamics
The stochastic dynamics method was first described in Batté and Déqué (2012). The idea behind this method is to combine an ad-hoc correction technique with the introduction of in-run random perturbations in the atmospheric model. It is impossible to know ahead of time the errors the model 120 will make at each time step, however, the statistical properties of model errors can be inferred, provided we have a sufficient sample of past forecasts. Model error corrections can then be drawn at random in forecast mode. In this method, the estimation of model tendency error corrections relies on newtonian relaxation (or nudging) as in Guldberg et al. (2005). Random model perturbations are then drawn from a population of initial tendency ::::: model : error corrections and applied in-run to 125 ARPEGE-Climate. The perturbed variables are ARPEGE prognostic variables temperature, specific humidity and vorticity :::::::::::: streamfunction.
We chose not to perturb the rotational component of winds to let the model adjust to pertubations, as suggested by Guldberg et al. (2005). Another prognostic variable we did not nudge was sea-level pressure, since our philosophy was to let the surface free of perturbations so it could adjust to the 130 higher levels in the atmosphere. Nudging of these two additional variables was tested with another version of the model, and very little difference was found in terms of model skill in seasonal reforecast runs using the perturbations for all prognostic fields.
Nudging is applied during a preliminary one-member seasonal run for November to February (NDJF), starting each year from 1979 to 2012. This run serves primarily one purpose: providing the model tendency error estimates that then make up the population of random corrections from which perturbations can be drawn. Correction estimates are defined each day following equation 2. 150 The in-run perturbations in the actual seasonal re-forecasts are applied by drawing a random datet and adding the corresponding tendency error corrections to the standard model formulation

155
Note that in a retrospective forecast framework, one could theoretically draw the correction corresponding to the time for which the model is integrated. Although one would need to draw all the consecutive corrections for the model to follow closely the reference data, corrections for a given month and year have an inter-annual component, and Batté and Déqué (2012) showed that drawing corrections from within the year one is trying to forecast gave significantly higher skill scores. To 160 avoid over-estimating model skill, since the re-forecast and nudged run periods are the same, the technique is applied in cross-validation mode in the re-forecasts discussed in part 4, by systematically discarding the corrections for the year being forecast from the perturbation population. Ideally, the corrections should be computed over a completely separate period from the re-forecasts. However, when evaluating seasonal forecasting systems, a limited number of data points is available in 165 the verification scores and we chose to use an extended re-forecast period to ensure as much robustness in our skill assessments as possible.

Analysis of ARPEGE-Climate model errors
The technique described in this study can be used as both a diagnosis of model errors and a perturba-190 tion method. The first opportunity is explored by deriving standard statistics of the ARPEGE-Climate model errors in a coupled initialized prediction framework. 6

Spectral analysis
The δX population is originally in spectral space (for a total wavenumber of 127) and was first analyzed in terms of squared amplitude for each total wavenumber n. For each prognostic variable, 195 model level z and re-forecast month mo we compute A n (z, mo): where N is the size of the perturbation population {δX i } for month mo, and m is the zonal wavenumber.
To present information in a synthetic way, these statistics are integrated over 200 hPa deep layers 200 of the model. We take into account the influence of lead time on results, since the weak nudging may allow the model to drift slowly from its initial state. Figure 1 shows results for all three nudged prognostic variables. Amplitude is plotted against the wavenumber on a logarithmic scale for both axes.
The first row shows the amplitude spectra of δX for January corrections integrated over 200 hPa layers. For humidity ( Fig. 1(a)), corrections have (as expected) an amplitude that is several orders of 205 magnitude smaller for the upper layers of the atmosphere than for the lower layers. This difference in amplitude is much less pronounced for temperature and streamfunction. For temperature ( Fig. 1 it is worth mentioning that the slope of decrease in amplitude with wavenumber in log-log space is more pronounced for the upper layers of the atmosphere than for the lower layers. In the lower layers, the land-sea contrast in temperature corrections generates small structures in the perturbation 210 patterns, increasing the amplitude of the corrections for the higher wavenumbers. Figures 1(d-f) show the month-by-month results for the mid-troposphere layer (600-800 hPa). For all three variables, the amplitude of corrections seems to increase with lead time for the smaller wavenumbers, but a clear difference is found mainly between November and the following months of the nudged re-forecasts used to derive the correction terms.

Gridpoint analysis
The spectral δX fields were then converted to gridpoint space for a spatial analysis of the correction terms. Again, results are integrated over 200 hPa layers for the sake of clarity. Figure 2 plots the 7 December mean (in color) and standard deviation (isolines) for δX specific humidity, temperature 225 and streamfunction corrections for these layers.
As shown before, corrections for humidity are several orders of magnitude higher for the lower levels of the atmosphere than in the stratosphere, whereas temperature and streamfunction corrections are of similar amplitude. Results are consistent with the spectral analysis in Fig. 1, in the sense that for streamfunction corrections are somewhat larger in the upper layer of the atmosphere, but 230 with less small-scale patterns, therefore concentrated on the smaller wavenumbers.
In terms of standard deviation, patterns for temperature and streamfunction are mainly zonal (with some exceptions due to land-sea contrast in the lower layers for temperature). Standard deviation increases with latitude in the northern and southern hemispheres for both variables, and values are quite similar between layers. For specific humidity, standard deviation is higher in the tropics and 235 around the Equator. Less zonal symmetry is found than for temperature corrections. For temperature, standard deviation values are of the same order of magnitude as the mean corrections in the tropics, whereas streamfunction and humidity correction standard deviations are higher than the mean correction in most areas of the globe. The temperature mean correction is mostly negative, implying that the model is warmer than ERA-Interim over most of the atmospheric column.

Temporal analysis
A question we wish to address when studying the perturbation population used in our forecasts is the consistency in time of the δX terms. Indeed one possibility in the use of the perturbations is to apply corrections estimated for consecutive days in the nudged run. This would make sense only if some coherence in time is found between the δX terms. We estimate this by computing the autocorrelation 245 of correction terms according to the lag between their corresponding dates in the nudged re-forecast run. Figure 3 shows autocorrelation at lags of 1, 2 and 3 days of February specific humidity ::: and :::::::::: temperature corrections (at approximately 850 hPa) and :: as :::: well :: as streamfunction corrections (circa 500 hPa), computed for all years of the re-forecast period.
Autocorrelation for humidity corrections is generally stronger over land than ocean, and strongly For streamfunction, autocorrelation from one day to the next is higher than for humidity and 255 temperature (over 0.6 in most parts of the globe), and remains above 0.4 in some areas for a two day lag. Values are typically the same order as that of humidity with a difference in the lag of one day.

Variance decomposition
When using pseudo-random correction terms as perturbations in an ensemble forecasting framework, we wish to combine two effects: correction of systematic errors the model makes in coupled seasonal forecasting mode, and introduction of perturbations to account for the model uncertainties 270 that cannot be dealt with deterministic methods. Both effects could in some sense cancel each other out: the introduction of too large purely random terms can move the model too far from its own equilibrium and induce adverse effects, which could translate into increased systematic errors in climate forecasts. On the other hand, if the systematic error correction is too strong with respect to the purely random part of the perturbations added in the model, ensemble members will follow too sim-275 ilar trajectories drawn towards the reference climate. In the following paragraph, we take a deeper look at the perturbations in terms of variance and mean, so as to estimate the relative importance of the systematic error term and the interannual and intra-month (more random) variance terms in the corrections used.
Equations 5-7 show how the mean square correction terms for a given month (lead) of the nudged 280 re-forecast can be split into three components: one is the squared mean correction, the other two the straightforward variance decomposition into inter-annual and intra-month variance. In these equations, N is the total number of perturbations for a given forecast time (month), y a given year of the re-forecast period used in the nudged run and n y the number of perturbations for the month of focus in year y (not the same each year in the case of February). The squared mean term δX 2 can be 285 interpreted as the systematic error correction for the variable studied. The variance decomposition separates the inter-annual signal (which is, to some extent, what one wants to predict with seasonal forecasts) from intra-month variability which can be approximated as noise on a seasonal time scale. stratospheric streamfunction. In most areas, for all three variables, the intra-month term accounts for more than 50% of the total squared correction. Red lines show the proportion of inter-annual variance in the decomposition, which stays below 40% for all latitudes and layers. Although this term is smaller than the intra-month "noise", it contains valuable information for seasonal forecasts: this was shown in Batté and Déqué (2012) with a so-called "OPT" experiment where corrections were 300 drawn in the current season of the reforecast. The black line shows the proportion of the systematic correction in the total squared correction term. This term ranges on average between 10 and 30% depending on the variable and vertical layer. More zonal variability is found than for the inter-annual term, and the symmetry with the intra-month term is quite striking.
This analysis shows that the corrections used are mostly made of noise (at least at a seasonal 305 time scale), although mean corrections and inter-annual variability cannot be neglected. These conclusions justify the use of these corrections as possible "pseudo-stochastic" perturbations to the ARPEGE-Climate atmospheric model in seasonal integrations.

Experimental setting
To evaluate the impact of this perturbation method, several sets of seasonal re-forecasts were run, starting on November 1st 1979 to 2012 and running for four months (until end of February). Re-forecast 315 ensemble size is set to 30 members. Table 1 summarizes the characteristics of each ensemble.
Unlike Batté and Déqué (2012) , where perturbations were drawn at daily intervals, we chose to run an ensemble using perturbations from 5 consecutive days, drawn separately for each member from within the other years of the re-forecast period. This experiment is called S5D. Every five days, another five day set of δX terms is picked for each member from the same calendar month as the 320 re-forecast. Note that the δX terms are drawn according to the date of the nudged re-forecast run, meaning that perturbations for the three prognostic fields are consistent with a certain model error at a given date and time.
Given the relative importance of systematic error and interannual variance with respect to total squared mean perturbations (Fig. 4), we also chose to test the impact of perturbing without intra-month 325 variance in the corrections used. To do this we ran experiment SMM, where monthly means of δX terms from the same calendar month but other years of the re-forecast period are used for each ensemble member. The year from which perturbations are drawn changes each month of the re-forecast.

Spread and deterministic skill
Ensemble seasonal forecasts with GCMs are often overconfident in the sense that the spread around the ensemble mean is smaller than the root mean square error of the ensemble mean with respect to verification data (Shi et al., 2015). This lack of dispersion in ensemble forecasts can incur mis-380 leading unreliable forecasts . Including stochastic perturbations in the components of the GCM can help partly correct these flaws, as they tend to increase the ensemble spread. In this paragraph, we wish to assess how the stochastic dynamics technique impacts ensemble spread, in the sense that this technique is not a random perturbation technique, but rather includes model corrections. An increase in spread with the use of this technique is not straightfor-385 ward, although we have shown previously that the variance of the perturbations is mainly composed of intra-month variance which we assume has a similar effect than adding noise to the system. the spread with stochastic dynamics is not significantly larger than without (significance at a 95% 400 level is tested with bootstrapping intervals).
In the case of precipitation, the impact is less systematic. Regions in the Northern Hemisphere high latitudes and the Eastern Tropical Pacific exhibit a significantly higher spread with stochastic dynamics, but extended regions of North and West Africa show a lower spread in precipitation (although for these regions precipitation amounts as well as model spread are much more limited).

405
The highest impact on 500 hPa geopotential height (Z500) spread is found for the Northern Hemisphere extra-tropics and subpolar regions. Z500 spread is significantly higher east of Greenland with SMM perturbations. The S5D experiment exhibits similar patterns of spread increase but very few gridpoints have a significantly higher spread than REF.
These impacts on ensemble spread are limited both in terms of amplitude and geographical re-

Re-forecast skill
In the previous paragraphs, we have shown that stochastic dynamics applied in a seasonal re-forecasting 415 framework have non-negligible impacts on the forecast mean state and ensemble spread. The next step in assessing the impact of this method on forecast quality is comparing the results in terms of skill over the re-forecast period for the three experiments REF, SMM and S5D.
One common justification for the introduction of stochastic perturbations is the lack of spread of the ensemble re-forecasts with respect to skill measured as the root mean square error of the ensem-420 ble mean. We have found some (although limited) impact of the method on ensemble spread, it is therefore worthwhile checking how the spread-skill ratio evolves with the introduction of stochastic dynamics.
The model ensemble root mean square error (RMSE) measures the distance between predicted and 425 observed anomalies, :::::::: therefore :::::::: removing ::: the ::::: mean :::: bias :: of ::: the :::::: model. and S5D. For near-surface temperature, RMSE is lower than spread over most oceans, but higher 430 over many continental areas. Precipitation re-forecasts are underdispersive over most subpolar and polar regions and the Tropical Pacific, but in tropical and mid-latitudes many areas exhibit a higher RMSE than model spread. In the case of Z500, RMSE is lower than model spread over most areas of the globe, some exceptions include North America and parts of the North Pacific and Northwest Atlantic oceans.
Note that the scores presented here were computed based on model anomalies in cross-validation mode, but without further calibration of the ensemble forecasts (as a quantile-quantile calibration technique for instance) which can improve results with respect to climatology. The results in terms of CRPSS are consistent with the minor changes in the model spread-skill ratio and low impact of The global evaluation of the stochastic dynamics technique in terms of impact on re-forecast skill is quite contrasted, with results depending on the regions of study. Furthermore, we face a recurrent issue in the seasonal to decadal prediction field, which is the limited statistical significance 535 of differences in skill between two versions of a system. We stress however that the results presented here are computed for relatively large ensemble sizes (30 members) and a 34-year re-forecast period, giving a certain robustness to results presented here. It is also worth mentioning that most significant impacts found with the stochastic dynamics technique are found for both versions of the method discussed in this paper. This could imply that the skill improvements are mostly due to improvements 540 in the model mean state due to the non-zero mean term in the perturbations applied in the stochastic dynamics technique.
Earlier in this paper, we found evidence that the stochastic dynamics technique improved the Z500 bias over the North Atlantic mid-latitudes and the Arctic. The technique also improves the model spread-skill ratio over Europe (see supplementary Fig. S3 :: fig. ::: S4 : for Z500). Figure 11 corroborates 545 this: we computed the model spread and RMSE for Z500 averaged over Europe, according to the lead time, for the three ensembles. The RMSE is reduced with the stochastic dynamics technique in the first month of the re-forecast, and spread is larger than for REF in both S5D and SMM ensembles for each re-forecast lead time.

565
In this study we compute the NAO index as the projection of the DJF Z500 anomaly for a given year on the leading EOF of 500 hPa geopotential height in ERA-Interim over the North Atlantic -Europe region defined by Hurrell et al. (2003) over the reference period (in cross-validation mode, e.g. by removing the year of interest from the 1979-2012 period). This is done both for the ERA-Interim reference index and each member of the three re-forecast ensembles. Figure 12 shows boxplots of

585
As in other standard-resolution climate GCMs (see for instance Dawson et al. (2012)), the seasonal forecasting system discussed here fails to represent the North Atlantic weather regimes properly.
Moreover, the REF re-forecast exhibits quite strong Z500 biases over the region. We therefore project model daily 500 hPa geopotential height anomalies for each ensemble member onto the EOFs of the ERA-Interim anomalies instead of using the model EOFs. Weather regimes are attributed following 590 an euclidean distance criterion. In the following, we chose a minimum weather regime duration of 3 days, all days in regimes lasting less than this limit were classified as regime transition days. This explains the minor differences in climatological frequencies of the ERA-Interim regimes in table 2 and Fig. S6. Table 2 shows the frequency and mean duration of each weather regime in ERA-Interim and ex-595 periments REF : , ::::: SMM and S5D. Compared to reanalysis data, the REF ensemble underestimates the frequency of the NAO+ regime by more than 5.5% and overestimates the NAO− regime frequency by over 4%. The introduction of stochastic dynamics in the atmospheric model tends to correct at least parts of these errors, as ::::: SMM :: or : S5D statistics are generally closer to ERA-Interim than REF. This is also the case for regime duration. The mean duration of each regime is systematically 600 improved with S5D :::::::: stochastic perturbations. In most cases the length of the regimes is not considerably changed, apart from the Blocking regime for which stochastic dynamics in the S5D experiment make the regime last on average 0.4 days longer. One could think that the introduction of stochastic perturbations could cause the model to shift from one regime to another more frequently, therefore shortening the mean length of each regime. Results in table 2 show that this is not the case, as :::: both 605 ::::: SMM ::: and : S5D perturbations tend to increase regime duration when the model under-estimates it.
Another aspect we wish to assess is how the stochastic dynamics technique changes the frequency of weather regime transitions. Figure 13 shows the frequency of these transitions for ERA-Interim, versus 35% in ERA-Interim). For these two examples, the S5D experiment ::::::::: experiments : including stochastic dynamics slightly improves :::::: improve : results. However, this is not always the case, and it is impossible to conclude as to one experiment exhibiting better weather regime transition frequencies than another.
These results for North Atlantic weather regimes show that when including perturbations to the 620 model dynamics, the intraseasonal variability of the model stays quite consistent with reference data, and improves in some aspects such as regime frequencies.
As another way of assessing weather regime forecast quality over the re-forecast period, we computed a score based on the Brier Score over the four weather regimes by comparing the actual 640 weather regime frequency to the weather regime probability given by the ensemble forecast. This score is a distance in probability space and should be as small as possible. A corresponding (positively oriented) skill score is obtained by computing a corresponding reference distance. We chose the ERA-Interim frequency of each regime over all other years of the re-forecast period as a reference forecast. Our REF ensemble has a skill score of -0.011, meaning that using ERA-Interim climatol-645 ogy over the other years of the re-forecast gives a better probability forecast than CNRM-CM of weather regime frequencies. When introducing 5-day stochastic dynamics, the skill score is positive and reaches 0.081. Again, significance of these results is quite limited, but all seem consistent and lead us to conclude that this technique improves the representation of North Atlantic variability at a seasonal time scale.

Conclusions
This study has provided details on the stochastic dynamics technique, first developed and described in Batté and Déqué (2012) and further amended in more recent versions of the CNRM-CM coupled GCM for seasonal forecasts. A version of this method (similar to the S5D experiment discussed in this paper) has been implemented in the next operational seasonal forecasting system 5 at Météo- of intra-month variance, but that inter-annual variance and systematic part of the perturbations was non-neglectable ::::::::::: non-negligible.
Beyond the analysis presented in Batté and Déqué (2012), the impact of stochastic dynamics was studied in two boreal winter seasonal re-forecast runs compared to a reference re-forecast with initial perturbations only. The SMM experiment used monthly mean correction terms drawn seperately and 670 each month for each ensemble member, whereas the S5D experiment explored the use of five-day sequences of perturbations drawn independently every five days for each ensemble member. Results showed a reduction of precipitation bias over most areas of the globe, as well as improvements in the model mean Z500 field over the Northern Hemisphere. The reduction of Z500 bias is consistent with results from Batté and Déqué (2012)  NAO− regime frequency with ERA-Interim was also found with the ::::: SMM ::: and S5D experiment :::::::::: experiments, although no significant change was found in DJF NAO index correlation skill. Overall, the introduction of stochastic dynamics perturbations in CNRM-CM seems to benefit the representation of North Atlantic weather regimes.

705
On more theoretical grounds, the philosophy behind the stochastic dynamics technique is very ad hoc in the sense that it uses model error statistics to correct these in forecast mode, instead of introducing stochasticity in the physical parameterizations of the model. The additive perturbations to the model dynamics can cause imbalance in the energy and water budgets, although the impact most likely remains quite limited, as shown by the skill assessments in this study. In terms of inter-710 actions with surface and ocean components in the coupled model, the perturbations are dialed down to zero in the lowest levels of the atmosphere, but results in terms of SST biases show that these do have a systematic impact on the surface. This aspect will be further evaluated in specific case studies. However, our belief based on comprehensive skill evaluations is that the overall influence of the technique is positive at a seasonal time scale.

715
One motivation for introducing stochastic dynamics in the CNRM-CM climate forecasting systems was to generate ensembles in burst mode instead of lag-average initialization. This evolution of the initialization technique enables us to use the same configuration for weekly and sub-seasonal forecasts, without significantly degrading the skill of several ensemble members by starting from older initial conditions. This study showed however that the impact of the method on ensemble 720 spread (with respect to perturbing only at forecast time 0) depended on the area and variable of interest, and was somewhat limited. The technique could be complemented by other stochastic methods to perturb the atmospheric physical tendencies, although interactions between this type of perturbations and dynamical nudging in the model should be carefully documented. Developments are currently underway to include SPPT (Palmer et al., 2009) in the ARPEGE-Climate model.

725
An extension of the method considered at CNRM is to introduce flow-dependency in the corrections, based on classification of the correction population depending on the state of the atmosphere, following the idea explored by D' Andrea and Vautard (2000). Preliminary studies using classification of streamfunction fields or based on the state of ENSO gave disappointing results in re-forecast skill assessments. An interesting perspective to explore this aspect is to take advantage of the long 730 reanalysis datasets such as ERA-20C (Compo et al., 2011) and 20CR (Poli et al., 2013), however the applications in real-time coupled forecasts would be necessarily limited since these reanalyses span periods for which ocean data are unavailable. to registered users for research purposes only. Outputs from the seasonal re-forecasts discussed in this paper are available upon request to the authors, and some will be included in the SPECS project repository at the British Atmospheric Data Centre (BADC, http://browse.ceda.ac.uk/browse/badc/ specs/data/). 28.0% ::: 8.36 : ::::: 23.8% ::: 6.78 : ::::: 21.8% ::: 9.35 : ::::: 17.1% ::: 6.38 :