Sensitivity of the WRF model to PBL parametrisations and nesting techniques: evaluation of wind storms over complex terrain

. Simulating surface wind over complex terrain is a challenge in regional climate modelling. Therefore, this study aims at identifying a set-up of the Weather Research and Forecasting Model (WRF) model that minimises systematic errors of surface winds in hindcast simulations. Major factors of the model conﬁguration are tested to ﬁnd a suitable set-up: the horizontal resolution, the planetary boundary layer (PBL) parameterisation scheme and the way the WRF is nested to the driving data set. Hence, a number of sensitivity simulations at a spatial resolution of 2 km are carried out and compared to observations. Given the importance of wind storms, the analysis is based on case studies of 24 historical wind storms that caused great economic damage in Switzerland. Each of these events is downscaled using eight different model set-ups, but sharing the same driving data set. The results show that the lack of representation of the unresolved topography leads to a general overestimation of wind speed in WRF. However, this bias can be substantially reduced by using a PBL scheme that explicitly considers the effects of non-resolved topography, which also improves the spatial structure of wind speed over Switzerland. The wind direction, although generally well reproduced, is not very sensitive to the PBL scheme. Further sensitivity tests include four types of nesting methods: nesting only at the boundaries of the outermost domain, analysis nudging, spectral nudging, and the so-called re-forecast method, where the simulation is frequently restarted. These simulations show that restricting the freedom of the model to develop large-scale disturbances slightly increases the temporal agreement with the observations, at the same time that it further reduces the overestimation of wind speed, especially for maximum wind peaks. The model performance is also evaluated in the outermost domains, where the resolution is coarser. The results demon-strate the important role of horizontal resolution, where the step from 6 to 2 km signiﬁcantly improves model performance. In summary, the combination of a grid size of 2 km, the non-local PBL scheme modiﬁed to explicitly account for non-resolved orography, as well as analysis or spectral nudging, is a superior combination when dynamical downscaling is aimed at reproducing real wind ﬁelds.


Introduction
Prominent features of the North Atlantic and European climate are cyclonic disturbances, which may be intensified and lead to severe storms (von Storch and Weisse, 2008). Several severe wind storms have hit central Europe during the last decades (Schiesser et al., 1997;Etienne et al., 2013). These situations, although rare, produce considerable economical cost and are listed as an important natural hazard in Europe (Beniston et al., 2007). Ongoing economic and demographic growth, as well as climate change, may imply even stronger impacts in the future, which has raised concerns among insurance companies, since isolated events such as storm Lothar in December 1999 caused damages of up to USD 12 billion (MunichRe, 2001). A better understanding of the mechanisms leading to severe wind storms, and a reliable projection of their characteristics under climate change conditions, are important to minimise the impact of such events in contemporary and future societies (Muskulus and Jacob, 2005;Goyette, 2010). How-ever, wind is still not as widely studied as temperature or precipitation (e.g. Schär et al., 2004;Kjellström et al., 2007;Rajczak et al., 2013). For example, in areas of complex terrain like Switzerland, the main focus of high-resolution simulations with respect to wind is on case studies (Goyette, 2008;Etienne et al., 2013). Recently, simulations of about 90 storms over Switzerland were combined into a storm climatology (Stucki et al., 2015).
The fundamental problem regarding surface wind is its intrinsically complex nature, particularly over areas of complex terrain like the Alps (Whiteman, 2000). This complexity precludes its realistic simulation with coarse-resolution models, but also hampers the extrapolation of local observations onto regular grids, which could be used for impact studies. Dynamical downscaling is a common tool that allows one to bridge the gap between the coarse resolution of global circulation models (GCMs) or reanalysis products and the local terrain characteristics that influence temperature, precipitation or wind (e.g. Kotlarski et al., 2014). Thereby, this method employs regional climate models (RCMs), which, driven at the boundaries by a global data set, simulate the climate in a limited area domain. This reduces computational costs, which in turn allows implementation of simulations with higher spatial resolutions. RCMs driven by GCMs have been used for a variety of applications: from climate change projections (Kjellström et al., 2007;van der Linden and Mitchell, 2009;Gómez-Navarro et al., 2010;Rajczak et al., 2013;Jacob et al., 2013) to paleoclimatology (Gómez-Navarro et al., 2013, and references therein). Besides, they are used in the so-called hindcast simulations, which combine the reliability of reanalysis products with the high resolution provided by RCMs. Studies focusing specifically on wind have been one of the applications of such types of simulations (Jiménez et al., 2010;Jerez and Trigo, 2013;Etienne et al., 2013;García-Díez et al., 2013Menendez et al., 2014;Lorente-Plazas et al., 2015;Draxl et al., 2014).
RCMs, however, contain various sources of uncertainties, like deviations in the driving data set, numerical approximations, as well as parametrisations of the sub-grid processes. A number of studies in different locations assessed the sensitivity of the model performance due to different model configurations. Dierer et al. (2005) studied the dependency of wind speed on the planetary boundary layer (PBL) parametrisations implemented in model MM5 as well as the atmospheric stability in different European countries. More recently, García-Díez et al. (2013) focused on the role of different PBL schemes. Other studies have investigated the role of the PBL schemes in the ability to simulate the surface wind of typhoons (Kwun et al., 2009) along the coast of the Mediterranean Sea (Menendez et al., 2014) or in southern Spain (Santos-Alamillos et al., 2013). Recently, García-Díez et al. (2015) investigated the added value of dynamical downscaling compared to the driving data set, and found that over the Iberian Peninsula, 9 km resolved simulations are barely able to add value compared to the driving data set. However, this result might critically depend on the area under study, as in relatively flat areas the added value of dynamical downscaling becomes less apparent.
Thus, this study focuses on the performance of very highresolution RCM simulations over an area of extremely complex topography such as the Alpine area. As suggested by Jiménez et al. (2008), in such areas the spatial resolution becomes a major challenge, and the conclusions drawn from coarser-resolution simulations cannot be generalised without caution. Thus, the present study aims at finding a model set-up that minimises systematic errors in hindcast simulations in storms with the purpose of reproducing mean and maximum surface winds in complex terrains. Thereby, potentially important sensitivities of the model set-up are explored, which encompass spatial resolution, PBL parametrisations and the use of nudging techniques. The sensitivity with respect to the driving data set is not investigated in order to concentrate on the sensitivity within the model. Since the simulations at 2 km grid size require significant computational power, the study is based on a case study approach, rather than on continuous simulations. Hence, a total of 24 historical wind storms is simulated for each model set-up.
The study is structured as follows: Sect. 2 presents the reanalysis product used to drive the RCM and the observational network. Section 3 describes the model set-up including the different nesting options tested in this study. It further presents the set of sensitivity experiments carried out. The results are discussed in Sect. 4, focusing first on the role of the PBL scheme and the nesting method applied. Then, the role of the horizontal resolution is discussed, including how errors are spatially distributed over different areas of the Alps. Finally, Sect. 5 draws the main conclusions.

Reanalysis data set
The data set providing the initial and boundary conditions for the RCM is the ERA-Interim reanalysis (Dee et al., 2011). It spans the period from 1979 to today, and is used in its highest resolution of 0.75 • × 0.75 • . The ERA-Interim data set is generated by running the Integrated Forecast System model (version 2006) of the European Centre for Medium-Range Weather Forecasts (ECMWF). The horizontal resolution of the model is T255 (approximately 80 km). The model has 60 vertical levels up to a pressure level of 0.1 hPa. Observational data are assimilated with a 4-D variational analysis (4D-Var) in a 12 h analysis window. A number of observational data sets are used, ranging from satellite data to surface pressure observations and radiosonde profiles (Dee et al., 2011). However, the assimilation system does not take into account observations of surface wind, which is important to avoid circularity given that this study uses wind observations in the val-idation part, or observations of pressure and humidity over high terrain (typically elevations higher than 1500 m).

Observational data
To evaluate the model's ability in dynamically downscaling wind storms, a reliable set of observations is required. In particular, this is the case in areas of complex terrain, where wind speed and direction can vary within distances of tens of metres. The Swiss Federal Office of Meteorology and Climatology (MeteoSwiss) provides such observations from a dense network of weather stations. Hourly mean values of wind speed and direction are used in this analysis. Those are compared to the hourly model output of wind speed and direction at the nearest grid point to each station. Although this introduces a mismatch between hourly averaged and instantaneous values, its influence on the results was investigated, and the results (not shown) show that it is negligible compared to uncertainties attributable to the use of different model configurations.
Some basic data checks are carried out before using the data in the evaluation. Following an approach similar to Lorente-Plazas et al. (2014), all series are visually inspected. Simple plausibility checks are performed, such as calculating and plotting the running mean and standard deviation to search for anomalies. Stations showing spurious jumps or gaps are discarded from the analysis hereafter. The measurement heights above ground differ in some stations, ranging from 10 to 60 m. Therefore, the simulated wind is linearly interpolated to the measurement height for the comparison with observations. Additionally, and as a measure to check for the sensitivity of this approximation, the observed wind has been converted to an equivalent height of 10 m, the level provided by the model output. The differences in the performance metrics obtained with both methods in all the analyses below are negligible, and thus are not discussed here for the sake of brevity. After the quality checks, the remaining weather station network still sufficiently covers Switzerland (Fig. 1). We consider in the analysis all the stations that recorded each individual storm. Note that this number increases with time as the observational network has been growing. Thereby, the first storm selected took place in February 1990, and was recorded by a total of 68 stations, and the last storm in February 2010 was recorded by 112 stations. While 65 stations captured all 24 storms, 36 missed just one storm, whereas only four sites captured fewer than 20 storms. The weather stations cover a wide variety of geographical conditions: plains, valleys and mountainous areas with a minimum (maximum) height of 197 (3580) metres above sea level (m a.s.l). Thus, this data set allows one to evaluate surface wind simulations under different geographical and climatic conditions. Still, as suggested by Gómez-Navarro et al. (2012), the errors and uncertainties contained in the observations shall not be neglected, but need to be kept  in mind when using it to draw conclusions about simulation performance.

Model set-up
The study is based on the Weather Research and Forecasting Model (WRF, version 3.5) released in September 2013 (Skamarock et al., 2008). WRF is a limited-area meteorological model used for weather forecasting and climatic purposes. It employs an Eulerian mass-coordinate solver with a nonhydrostatic approach, and a terrain-following eta-coordinate system in the vertical. It is a state-of-the-art mesoscale model used in a variety of studies also for hindcast simulations (Kwun et al., 2009;Jiménez et al., 2010Awan et al., 2011;García-Díez et al., 2013;Jiménez and Dudhia, 2013;Santos-Alamillos et al., 2013;Menendez et al., 2014, among others). A first decision in regional climate modelling concerns the selection of the domain to be simulated. Although this selection is susceptible to introducing uncertainties, this study employs just one domain set-up, and hence the sensitivity of the performance to the model domain is not investigated here. There are a number of reasons for this. First, there is not much freedom, in the sense that the domain is primarily selected according to the area of interest, in this case the Alpine area. The number of domains is conditioned by the resolution of the driving data set and the final resolution of 2 km aimed at in our study. So only one is used in all simulations (Fig. 2). It consists of four two-way nested domains with grid sizes of 54, 18, 6 and 2 km for domains D1 to D4, respectively. All domains use a Lambert conformal projec- tion, which conserves the spatial distances in both directions. The analysis hereafter evaluates the model performance with the focus set on the innermost domain, although the model performance in the coarser domains is also investigated to assess the role of the horizontal resolution. Vertically, WRF does not allow one to use a varying number of levels in nested domains. Hence, the number of vertical levels has been set to 40 in every domain. This number is similar to the number in the recent literature, which ranges between 30 and 46 (Miguez-Macho et al., 2004;Lo et al., 2008;Kwun et al., 2009;Santos-Alamillos et al., 2013;Etienne et al., 2013). The vertical resolution ensures that several eta levels lie below the PBL height at any time. Naturally, the number of levels within the PBL varies at each grid point according to the PBL height due to different meteorological situations. In the simulations carried out, a minimum (maximum) of three (seven) levels vertical layers lie within the PBL at any time.
Another source of uncertainty is related to the choice of the physical parametrisations, such as microphysics, convection, radiation and the formation of the PBL, among others (Stensrud, 2007). Since the latter is the parametrisation that is most relevant for the surface winds, a number of sensitivity tests are conducted and analysed in order to find the most appropriate PBL scheme (see next section). The other parametrisations remain unchanged in all simulations, i.e. the microphysics WRF single-moment six-class scheme (Hong and Lim, 2006), the Kain-Fritsch scheme of cumulus (Kain, 2004), which is implemented only in the two outermost domains, the Rapid and accurate Radiative Transfer Model (RRTM) (Mlawer et al., 1997), the short-wave radiation scheme by Dudhia (1989), and the Noah land soil model (Chen and Dudhia, 2001).

PBL schemes
The PBL plays a major role in simulating surface winds (Stensrud, 2007;Kwun et al., 2009;Santos-Alamillos et al., 2013;Menendez et al., 2014). Nowadays, there are many different approximations to account for the relevant subgrid processes that lead to different PBL formations. In this study four different schemes are implemented, which capture a considerable range of different approaches possible. Similarly to García-Díez et al. (2013), we use the fully non-local scheme developed at Yonsei University (hereafter YSU) (Hong and Lim, 2006), the local closure scheme by Mellor-Yamada-Janjic (MYJ) (Mellor and Yamada, 1982;Janjić, 2001), and the Asymmetric Convective Model 2 (ACM2), which combines local and non-local transport depending on the atmosphere conditions (Pleim, 2007a, b). García-Díez et al. (2013) described the different approaches in detail, which are therefore not repeated here. The fourth scheme consists of a subtle modification of the YSU scheme that accounts for the unresolved orography by introducing a correction term into the momentum equation . This scheme aims at correcting a general problem of WRF with simulating wind, namely its tendency to overestimate wind speed (Cheng and Steenburgh, 2005;Mass and Ovens, 2011;. This is in particular a problem in areas of complex terrain, where topographic features not explicitly considered by the coarse resolution of the model introduce further friction. This scheme also effectively removes the roughness when the Laplacian of the resolved terrain falls below −20 m −1 . Although the latter aims at removing the negative bias reported by Jiménez and Dudhia (2012) over mountain tops, it can be a source of bias in other resolutions, and can indeed explain part of the biases found in this study. This scheme is referred to hereafter as YSU * .

Nesting approach
RCMs are nested in a global data set, which drives the simulation by providing the initial and boundary conditions. Dynamical downscaling is hence mostly an initial value problem in the first days of the simulation, which evolves into a boundary value problem when the initial state has been "forgotten" by the atmosphere. However, how to specify the lateral boundary conditions is a mathematically ill-posed problem, since they become over-specified (Staniforth, 1997). A solution to this problem, widely adopted in state-of-the-art RCMs, consists of Newtonially relaxing the driving fields in a buffer zone around the borders of the grid (Davies, 1976). This relaxation damps small-scale discrepancies, but does not handle large scales properly and generates disturbances in the large-scale circulation (Miguez-Macho et al., 2005). Several methods have been proposed to deal with this problem.
The first approach basically consists of using Newtonian relaxation at the boundaries without any correction inside the domains. This is referred to hereafter as "free simulations". In favour of this approach, it is argued that simulations benefit from a better representation and the undisturbed development of regional processes. Another argument is that RCMs are often used to downscale climate change projections or paleo-simulations. Such simulations are performed with relatively coarse GCMs, so that modifications of the large-scale circulation may be beneficial, as potential biases from the GCMs may be partly corrected by the RCMs.
In the case of reanalysis data used at the boundaries, it may be desirable that the RCM simulation stays close to the largescale situation of the driving data. A first method to achieve this is the so-called reforecast simulation. The method consists of splitting a long simulation into shorter simulation periods of 1 to a few days, running each period separately and finally merging them. This method effectively minimises the impact of the boundaries, transforming the problem into a mostly initial-value problem. The reforecast method is regularly applied García-Díez et al., 2013;Menendez et al., 2014, among others), and the increased skill of this method compared to continuous runs has been reported (Lo et al., 2008). A major advantage of this nesting method is its simplicity. Furthermore, it introduces an important computational advantage, since it effectively allows one to split long simulations into a number of independent tranches, providing a natural and very efficient parallelisation of the problem. However, it has the undesirable side effect of requiring a spin-up period for each tranche that has to be run but discarded. Thus, depending on the length of each tranche and the spin-up period, the convenience of this method has to be carefully addressed. In this study we test this approach by simulating every single day independently with a spin-up period of 12 h for each run, which results in one-third of computational time not being exploited.
A more sophisticated method is to force the RCM to follow the driving large-scale conditions. This is implemented by additional terms in the dynamic equations that restrict the degrees of freedom of the simulation. This is the so-called nudging nesting, of which two versions are available. The 3-D analysis nudging introduces a Newtonian relaxation term into the prognostic equations of the model, and was first introduced by Charney et al. (1969). This addition corrects some variables by an artificial tendency term based on the difference between the original state produced by the model and the driving data set (Lo et al., 2008). The WRF provides a number of options that allow one to select which variables and vertical levels should be affected by the correction term. In the current study, horizontal wind, temperature and humidity are nudged in every level but in the boundary layer. The intensity of the correction depends on a nudging factor, which is set here to the default value of 3 × 10 −4 s −1 , which is also used in similar studies (Lo et al., 2008).
A variation of this method is spectral nudging, introduced by von Storch et al. (2000). In this approach the variables are Fourier-transformed prior to the nudging. Then, only selected parts of the spectrum are nudged in a similar fashion as the 3-D analysis nudging approach, i.e. by introducing a Newtonian relaxation term into the equations. In doing so, the model is forced to mimic the long waves of the driving input data, which contain the large-scale pattern of the atmospheric circulation, whereas it is free to add value in the smaller scales (Miguez-Macho et al., 2005). As in the sensitivity experiments using 3-D analysis nudging, the simulations carried out in this study nudge horizontal wind, temperature and humidity only in the levels above the PBL, and with the same nudging factors, 3 × 10 −4 s −1 . Unlike 3-D analysis nudging, this configuration requires one to set the number of waves to be considered in the Fourier analysis, which controls the spatial variability from the input data set that is being preserved. This number is set to 4 and 2 for domains D1 and D2, respectively, which correspond to a wavelength of about 1000 km. Due to their small size, no nudging is applied in the two innermost domains.

Overview of the experiments
This section summarises the set of simulations carried out to investigate the sensitivities of the different settings. Following the approach by Etienne et al. (2013), a total of 24 historical wind storms is considered (Table 1). The selected storms appear between November and February and are embedded in different synoptic-scale flow conditions. Each storm is simulated using eight different model configurations encompassing the sensitivity due to PBL parametrisation, nesting method and horizontal resolution (Table 2). Thus, a total of 192 simulations are performed. Each simulation spans 6 days, with the corresponding storm in the middle of the simulation, and discarding a spin-up period of 12 h.
The comparison of observations and simulation results is performed for hourly values at each observational site. For this, the simulation result at the closest grid point to the observational site is selected. Although this can lead to representativeness errors (Jiménez et al., 2010), such errors are systematic in all simulations, and do not play a significant role in the assessment of the relative model performance across model configurations.

The role of the PBL scheme
To evaluate the sensitivity of the model result due to different PBL schemes, set-ups C1 to C4 are compared with each other ( Table 2). The analysis concentrates on results of the innermost domain. All storms are analysed in an identical way, but for the sake of brevity most of the discussion is based on the results for storm Lothar (storm number 13 in Table 1). Still, similarities and deviating characteristics found in other storms are discussed subsequent to the analysis of the Lothar storm. Figure 3 shows the situation during the 24 h around the most mature phase of storm Lothar. This situation was characterised by an intense upper-level zonal jet and strong baroclinicity. The storm formed over the western Atlantic and moved through the Atlantic with moderate amplitude until it reached the French Atlantic coast. There, it experienced an explosive growth as it travelled poleward across the upperlevel jet axis (Rivière et al., 2010). The synoptic scale was dominated by a strong north-south gradient in geopotential height that produced strong large-scale winds with a western component.
The surface winds over Switzerland during storm Lothar are presented in Fig. 4 showing the wind speed averaged over 109 weather stations during a 6-day period. The temporal agreement of the sensitivity simulations with the observations is remarkable. The most severe winds peaked on 26 December at 12:00 UTC, but the secondary peaks in the time series are also generally well captured by all sensitivity simulations. Despite the good timing, an overestimation of wind speed becomes apparent. This overestimation of wind speed is in agreement with the results reported by other stud- Figure 3. Synoptic situation of storm Lothar: colour shading depicts the sea level pressure, whereas blue contours indicate the geopotential height at 500 hPa. The arrows represent 10 m s −1 wind. The fields represent consecutive snapshots for the period of most severe wind speeds between 26 December, 06:00 UTC and 27 December, 00:00 UTC in steps of 6 h (see Fig. 4). The data fields are obtained from the ERA-Interim reanalysis, the same used to drive the RCM simulations.
ies in different locations and synoptic circulations (Cheng and Steenburgh, 2005;Mass and Ovens, 2011;. The comparison of the sensitivity simulations with different PBL schemes (configurations C1 to C4 in Fig. 4) shows that the YSU * scheme (C2) substantially reduces such overestimation of wind speed. The set-up showing the strongest overestimation of surface winds is the fully local scheme, MYJ (C3), followed by the hybrid approach, ACM2 (C4). This indicates that the non-local approach used in YSU schemes is slightly more suited to reproducing wind speed over complex terrain. The extra drag factor introduced in the YSU * scheme leads to significantly better simulation of wind speed, and suggests that the inclusion of similar approaches in ACM2 and MYJ may lead to similar improvements also in these schemes for the representation of surface wind speed. Figure 4 provides a first glance at the model-observation comparison. Still, many details are lost by averaging over all stations, in particular the evaluation of the model performance to reproduce the spatial distribution of the most severe winds. Therefore, additional statistics are performed which are presented for storm Lothar in Fig. 5. The boxplots and diamonds show the temporal and spatial performances, respectively (Fig. 5). Thereby, the boxplots represent the distribution of 109 stations for four statistical metrics that evaluate the temporal performance of the simulation: correlation, root   (Table 2). YSU * Spectral mean square error (RMSE), bias of the mean wind speed, and bias of the maximum wind speed. These four metrics are included since they allow one to evaluate whether the model generally tends to over-or under-estimate wind speed, but also whether the simulation is able to mimic the temporal evolution. Note that considering the maximum wind speed is important for scientific questions on extremes of wind speed.
To assess the spatial performance of the sensitivity simulations (illustrated by diamonds in Fig. 5), the wind speed is averaged over the 6 days of each storm simulation at each location. This is done separately for the model and the observations, resulting in two spatial patterns of mean wind. Finally, the spatial correlation, spatial RMSE, and spatial biases are calculated. Note that this calculation is not meaningful for maximum wind speed (and is therefore omitted).
The temporal metrics (shown by boxplots) resemble the findings of the time series in Fig. 4, showing that all model configurations tend to overestimate wind speed. Compared to other PBL schemes, the YSU * scheme (C2) is able to reduce the bias of the mean wind, although it still shows slight positive biases for more than 75 % of the stations. Addition-  ally, the RMSE is lower in the sensitivity simulation with the YSU * (C2). The temporal correlation ranges between 0.8 and 0.2, depending on the weather station. The median value is about 0.6 in all configurations. There is a rather low variation between the sensitivity simulations in the temporal correlation, since this metric is dominated by the accuracy of the driving data set, which is common to all simulations. There is however a lower correlation in the C2 configuration compared to the other configurations. This is a consistent feature across different storms (see discussion below). The performance of the maximum wind speed behaves very similarly to the mean bias for all sensitivity simulations, although errors become more pronounced: the MYJ scheme exhibits a strong overestimation of the maximum wind speed that is above 10 m s −1 for 50 % of the locations, whereas the YSU * scheme simulates values closer to the observations, although with deviations above 3 m s −1 in 50 % of the locations. The spatial metrics show that the biases behave similarly to the ones of the temporal scale (Fig. 5). This is expected as the spatial bias is identical to the median of the temporal bias if the wind distribution is symmetric. The spatial bias again dominates the spatial RMSE, although in this case the RMSE is significantly lower across all simulations. The sensitivity simulation with the YSU * scheme (C2) shows the lowest spatial RMSE, highlighting the scheme's ability to reduce the overestimation of wind speed. The overall higher spatial correlations than temporal correlations indicate that the model generally is able to simulate the spatial structure of wind independent of the scheme applied. Again, the sensitivity simu-lation using the YSU * scheme is superior in this metric. Interestingly, the spatial correlation when using this scheme contrasts with the sensitivity simulation with the original YSU scheme (C1), as the latter ranks worst among all sensitivity simulations with respect to the PBL scheme. Thus, the spatial metrics show that the YSU * scheme of  improves the surface wind simulation by taking into account unresolved orography.
Next, the sensitivity of the model to the PBL scheme is assessed with respect to wind direction. Thereby, the wind rose of storm Lothar is shown (S13 in Fig. 6). The synoptic situation of storm Lothar shows a very intense westerly flow (Fig. 3). As expected, this situation dominates the wind rose, showing primarily south-westerly directions at the surface. There are additional peaks in the observations in other directions, although due to the pooling process, it could be due to systematic biases in certain stations (typically valleys, where the 2 km resolution could not be sufficient and lead to representativity errors), rather than a general change in wind direction during the lifetime of the storm. Regardless of the PBL scheme, the simulations of storm Lothar are able to capture the main wind direction, with a slight systematic bias towards southern directions. The major difference among configurations is a slightly less pronounced preferred direction in the YSU * scheme, although it does not contribute to reducing the bias of an overly southerly direction. Thus, the simulation of wind direction seems to be mostly insensitive to the PBL scheme selected.
Most of the conclusions drawn from the analysis of storm Lothar about the PBL schemes are consistent through the various storms simulated. This is illustrated in a comprehensive although summarised way in Fig. 7. Hereby, "temporal" series show the median temporal correlation (i.e. the centre of the boxplots in Fig. 5), whereas "spatial" series indicate the mean spatial correlation (i.e. the diamonds in Fig. 5). The temporal correlation seems to be mostly insensitive to the choice of PBL scheme (C1 to C4 in Fig. 7a), with the exception of the YSU * scheme (configuration C2), which shows a slightly lower temporal correlation for almost all storms. These slightly lower correlations are in agreement with similar findings by García-Díez et al. (2015). Although the authors could not find reasons for the reduced temporal correlation, this phenomenon becomes ameliorated when nudging is used (see next section), rendering this caveat less relevant for the sake of the identification of a suitable model set-up. Still, the reduction of the overestimation of wind speed by this configuration leads to lower temporal RMSE across the storms (Fig. 7b). This improvement becomes especially noticeable in the maximum wind speed bias (Fig. 7c) where the sensitivity simulations with the YSU * scheme show that the bias fluctuates around zero, whereas it is significantly larger for the other set-ups (C1, C3, and C4). In space, the comparison of the PBL schemes demonstrates that the YSU * scheme exhibits systematically higher spatial correlations (Fig. 7d). Thus, the simulations of all storms using the YSU * scheme  Table 2. The number of the case is indicated in each panel, and corresponds to storm Lothar (S13), as well as three other storms with different synoptic conditions. The colours correspond to observations and simulations as in Fig. 4. For the calculation of the histograms, the hourly wind direction during the entire period registered in each location is pooled, and the number of times that a wind direction lies within each 15 • bin is counted and finally normalised.
are able to better allocate the wind speeds at the right locations than the simulations using other PBL schemes. The simulations using the YSU * scheme further exhibit lower spatial RMSEs, due to higher correlations and a reduction of the overestimation of wind speed (Fig. 7e).
Wind direction performance across all storms is analysed in a similar fashion. However, this variable has to be treated differently, taking into account the problems associated with its circularity. Thus, similarly to Jiménez and Dudhia (2013), the d parameter is calculated: This definition produces positive (negative) biases when simulated wind direction is orientated clockwise (anticlockwise) with respect to observations. Once this parameter is calculated for each site in each time step, its distribution is obtained. For this, all values are pooled, so the temporal and spatial details are lost in the discussion hereafter. A RMSE that accounts for the deviations between the simulation and the observations in every location and time step for each storm is derived from the distribution of this bias using The sensitivity simulations show a RMSE of about 70 • regardless of the PBL scheme (Fig. 8). Similarly, the median of d exhibits a negative bias, again independent of the PBL scheme. For both metrics, the inter-case variation is larger than the variation between the different sensitivity simulations. Thus, this confirms the finding from storm Lothar that the PBL scheme plays a minor role in reproducing wind direction. Still, it is noteworthy that the C3 and C4 configurations perform better than C1 and C2, as they exhibit lower d and lower RMSE. This result is expected, and in good agreement with the findings by Jiménez and Dudhia (2013), who pointed out that the model's ability to reproduce wind directions is inversely related to wind speed.
To assess whether the results of wind direction and the minor role of the PBL scheme may depend on the storm selected, wind roses of three additional storms are shown in Fig. 6. Storm S07 corresponds to a typical foehn storm. Unlike for the Lothar storm, the wind rose does not show preferred directions, as expected since foehn storms affect only part of Switzerland. S16 corresponds to a bise storm. In this configuration there is again a clear preferred direction, but  it is exactly opposite to Lothar. In this storm there is a clear second maximum towards −30 • . Finally, S24 corresponds to the Xynthia storm in February 2010. This is a west-wind storm, although its particular trajectory when travelling towards Switzerland induces a foehn-like situation, and thus has been catalogued as such by Etienne et al. (2013).
In these examples, all sensitivity simulations show remarkable performance in identifying the most dominant wind directions. WRF is clearly able to capture the different nature of these storms and to simulate the surfaces' wind regime accordingly. However, none of the four PBL schemes stands out in reproducing the wind direction, resembling the minor role of the PBL scheme and showing that this result is independent of the specific type of the storm.

The role of the nesting technique
The analysis carried out in the previous section indicates that the YSU * scheme is superior compared to the other PBL parametrisations, so this scheme is used in the sensitivity experiments hereafter (see Table 2). The next choice pertains to how the RCM is nested to the driving data set. To assess the sensitivity of the nesting approach, the focus is set on sensitivity simulations C2, C6, C7 and C8, where the free simulations (C2) are compared with analysis nudging (C6), reforecast (C7) and spectral nudging (C8). Figure 4 illustrates that the nesting techniques further reduce the systematic overestimation of the wind in the case of storm Lothar. For the wind speed maxima, configurations C6 and C7 better reproduce the intensity and precise timing compared to the C2 configuration. Figure 5 presents spatial and temporal performance metrics for storm Lothar. Spatially, the sensitivity simulations, which include nudging techniques, exhibit slightly higher correlations than the free simulation (C2). In contrast, spatial bias shows that the analysis nudging reduces bias compared to C2, whereas the reforecast method increases the bias. The latter is due to the last day of the simulation, where this sensitivity simulation exhibits a strong bias (Fig. 4). For the temporal performance metrics, all nudged simulations (C6-C8) tend to increase the correlation compared to C2, although the improvements are small and raise concerns regarding the robustness of this finding. Note that the choice of cases could be masking the importance of the nesting technique. The reason is that under wind storms the strength of the flow crossing the domain leaves little freedom for the model to develop deviations from the driving data set. Thus, the role of the nesting technique could be larger in regular situations not considered by this study. However, a clearer improvement introduced by nudging is found for the maximum wind. The original YSU * scheme without nudging shows systematic positive biases in this variable, which are reduced when reforecast, but especially analysis or spectral nudging, is used (Fig. 5).
The analysis of wind direction delivers similar results as in the sensitivity to different PBL schemes (Fig. 6). The role of the nudging approach in correctly simulating the wind direction is minor regarding storm Lothar. So, it is not possible to identify any nesting configuration that outperforms the others. Instead, all simulations behave similarly, and the main wind direction seems to be equally reproduced across sensitivity simulations according to the synoptic characteristic of the storm.
As before, the analysis is extended to all 24 storms. The mean temporal correlation obtained for different storms is shown in Fig. 7a. This figure illustrates that the temporal agreement is slightly but consistently improved when some nudging is applied, rendering the temporal agreement with the observations comparable to the other schemes, as argued for storm Lothar. The analysis and spectral nudging (C6 and C8) systematically improve the simulations compared to the free simulation (C2). The reforecast (C7) shows the improvement of temporal correlation to be highly dependent on the storm considered. This becomes even more obvious in Fig. 7b, where the C6 and C8 schemes exhibit lower RMSE than C2, and also generally lower RMSE than C7. Moreover, nudging reduces the bias in maximum wind speed consistently, and makes analysis and spectral nudging equally suitable for improving the maximum wind speed compared to free simulations (C2). Regarding the improvement in wind direction, the model performance varies erratically, depending on the storm (Fig. 8). The role of the nesting scheme with respect to the median error d is even smaller than the PBL schemes. A very similar result can be drawn from the analysis of the RMSE, although in this case the nesting setups generally exhibit lower RMSE than the free simulation. However, these differences between the schemes are small, so that an identification of a nesting set-up that significantly outperforms the others is not possible when considering wind direction.

The role of horizontal resolution
As argued above, the horizontal resolution has a profound impact on the ability of the model to simulate wind speed. In particular, this is the case if the closest grid point of the model to the weather station is used in the analysis. Note that this simple approach neglects the fact that the model averages subgrid terrain properties and leads to so-called representativity errors. It is beyond the scope of this study to assess these errors and to address a method to minimise them, since they introduce systematic biases that only depend on the domain configuration, which is fixed across simulations and thus plays a secondary role in the evaluation of the relative skill of different model configurations. Still, such errors, and the model performance in general, critically depend on the model resolution, so the importance of model resolution and the type of station is discussed in more detail. Note that, for the analysis shown in this section, a subset of the simulations had to be repeated to set the nesting configuration to one-way. This allows one to evaluate the actual performance of the model in coarser domains, since otherwise the two-way approach artificially increases the performance in coarser domains based on the simulation in the inner ones. The representativity error is quantified by calculating the horizontal distance (s) and difference of height ( h) between the stations and the closest grid point (identical to the model performance assessment above). The mean representativity error over all weather stations as well as the standard deviation are given in Table 3 for domains D2 to D4, with resolutions of 18, 6 and 2 km, respectively. Obviously the 2 km resolution is closest to the real locations of the observations, with an average distance of 747 m. The horizontal errors become more severe when a coarser resolution is implemented, and reach a mean of about 6.6 km in the 18 km resolution setup. As expected, the height bias is close to zero, but there is a large standard deviation from station to station, indicating that the error is pronounced in areas of complex topography. The model topography is too smooth even at 2 km grid size and is unable to reproduce the real topography, which explains the high standard deviations.
The influence of the horizontal resolution on the model performance is investigated using the C6 configuration as an example (Fig. 9). Considering all stations, spatial correlations are 0.74, 0.39 and 0.22 for the resolutions from 2, 6 and 18 km, respectively. Similarly, the bias increases from 0.46, 2.19 and 3.24, respectively. This increase in bias is explained by the fact that a coarser resolution implies smoother orography, which eventually leads to an excess of wind speed due  Table 2). The figure depicts temporal and spatial correlation, RMSE and bias in a similar way to Fig. 5, but different spatial resolutions (grid sizes 18, 6 and 2 km in domains D2, D3 and D4) and locations (ALL for all stations; PL for plains; MO for mountains, defined as those locations whose height exceeds 1200 m; and VA for valleys) are shown separately. The locations of each type of station are indicated in Fig. 1. to the underestimated terrain roughness. The smoothness is also a reason why the RMSE monotonously increases.
For temporal metrics, a somewhat unexpected behaviour is found. Although the temporal correlation drops to a median value of zero in the coarsest domain analysed, the model exhibits a remarkable high correlation in the 6 km resolution domain. To better understand this high correlation, the site-averaged wind speed in different domains for the Lothar case is compared to observations (Fig. 10). Although the series corresponding to D4 is more realistic and reproduces the timing and intensity of the most severe wind speed, the simulation in 6 km (D3) captures the phasing of secondary peaks in the time series better than in the 2 km resolution. Indeed, the RMSE reproduces the expected result of a reduction in performance when successive coarser domains are used. This behaviour is an instance of the more general problem of the intrinsically lower predictability of features at smaller scales, as described by Mass et al. (2002). They showed how, although high-resolution simulations have the potential to improve the simulation of physical processes with respect to coarser ones, they are more severely affected by timing and spatial errors, as well as by deficiencies of the observational network used for verification. Thus, our example of winds over complex terrain illustrates how the validation of highresolution models is a cumbersome task, and that the use of several statistics allows more robust assessments of model performance.
The role of the representativity error can be explored through the separation of the observational sites in subcategories such as stations in plains, mountain or valleys, as shown in Fig. 9 and labelled PL, MO and VA, respectively. Although the temporal correlation is not dramatically dependent on the geographical category, it is slightly higher over plains, as expected by the fact that the model resolution is  Table 2 (same series as in Fig. 4). The red and blue lines correspond to the result with the same model configuration, but in domains D2 and D3, respectively. more suitable for simple terrain, and it is worse over valleys, where important terrain features remain unresolved. The performance measurements behave similarly in D3, when separating into PL, MO and VA. In D2 the correlation is so low that it precludes one drawing any conclusion. Although biases are generally close to zero in the innermost domain, there is a larger variation between the stations in the mountains, because the differences between the station height and model topography can be large, and indeed RMSE is significantly larger in this location. The spatial correlation in the innermost domain shows a low value of 0.31 over the plains, which contrasts with the value of 0.78 obtained for mountains. This can be explained as a signal-to-noise artefact. The problem is that in plains the mean wind is not as strongly modulated by height as it is in mountains, where there is a larger difference among stations. Thus, small variations in mean wind lead to large variations in the spatial correlation, since the mean wind speed is not a good predictor of the location of a station within plains. Additionally, the correlation is calculated according to only the 46 stations that correspond to plains in the Lothar storm. Such a low number leads to large variance of the estimator of correlation, which further contributes to the signal-tonoise problem. Thus, spatial correlation of mean wind patterns over homogeneous terrain is not a meaningful measure of model performance, and should be treated with care.

Summary and conclusions
This paper analyses a number of sensitivity experiments aimed at identifying a model set-up for WRF that minimises systematic errors in hindcast simulations of wind storms over areas of complex topography. The simulations use the ERA-Interim reanalysis for initial and boundary conditions. These data are downscaled to a resolution of 2 km over the Alps in a series of consecutive nested domains. Due to the high demand for computational resources, the analysis is based on case studies, rather than on continuous simulations over several years. Therefore, 24 different simulations lasting 6 days and containing prominent historical storms in Switzerland (Etienne et al., 2013) are simulated and analysed. This selection is motivated by their relevance in risk assessments and impact studies, which are two typical applications of dynamically downscaled data sets. To identify a suitable set-up to realistically simulate wind over complex terrain, eight different sensitivity experiments are performed for each case study taking into account different PBL parametrisations, nudging techniques and horizontal resolutions.
The sensitivity tests designed to evaluate the role of the PBL parametrisation show that WRF systematically overestimates wind speed compared to observations. The overestimation occurs in all types of locations (plains, valleys or mountains), and is exacerbated in coarser domains. This result confirms previous studies pointing out the overestimation of wind speed in simulations with WRF and its relation to the lack of representation of the unresolved topography (e.g Cheng and Steenburgh, 2005;Mass and Ovens, 2011;. For the MYJ scheme, wind speeds that are up to 100 % larger than in the observations are found. The overestimation becomes even stronger when focusing on maximum wind speed, a variable especially relevant in impact studies. However, this drawback can be significantly reduced by choosing the YSU * scheme which, being based on the non-local YSU scheme, explicitly accounts for unresolved orography. These results resemble findings by  and Gonçalves-Ageitos et al. (2015), who tested this PBL parametrisation in a small area of relatively complex topography in the north of the Iberian Peninsula and in the Pyrenees, respectively. It is noteworthy that this improvement is not produced by a trivial reduction of wind speed in every location, but this reduction is applied where complexity of topography is more severely underestimated, yielding a remarkable increase in the model's ability to reproduce the spatial structure of wind speed. As a minor caveat, this scheme tends to slightly reduce the temporal correlation of the simulated wind compared to observations, being an unexpected side effect that has also been reported by García-Díez et al. (2015) in simulations over the Iberian Peninsula. The authors do not however have a satisfactory explanation for this behaviour, whose detailed analysis shall be addressed in future studies.
The model is qualitatively able to reproduce the leading wind directions generated by very different synoptic conditions. However, the simulations still exhibit systematic biases in wind direction that cannot be improved through a suitable model configuration, since the model performance in reproducing wind direction exhibits little sensitivity to any of the evaluated model configurations. This fact points toward the prominence of representativity errors produced by the channelling of wind in valleys that cannot be properly reproduced by the 2 km resolution of the simulations. Thus, the model performance regarding wind direction is dominated by other factors such as the driving conditions, insufficient resolution, or representativity errors.
Additionally, the sensitivity with respect to the nesting technique is explored by comparing free simulations to analysis and spectral nudging, as well as the so-called reforecast approach. The use of nudging techniques slightly improves several aspects of the simulation, like reducing the mean wind overestimation discussed above and improving the spatial pattern of mean wind (in particular, 3-D analysis nudging). Furthermore, the free simulations generally show a lower temporal agreement with observations than nudged simulations, a feature that is consistent across storms. Analysis nudging yields a significant improvement for maximum wind speed, for which the overestimation is reduced and leads to values closer to zero on average than when no nudging is applied. These results indicate that preserving the large-scale circulation via nudging slightly improves the simulation of wind at regional scales, at least for hindcast simulations where the driving data set is generally reliable, and whose aim is to be as close to the observations as possible. Still, the particular choice of cases considered in this study could be underestimating the actual effect of nudging, since the strong flow crossing the domain in strong wind storms leaves little freedom for the model to develop disturbances when no nudging is applied. However, for other scientific questions, a free simulation set-up could be more appropriate, as atmospheric processes and their interactions with regional-scale features are able to develop desirable disturbances that add value to RCM simulations. Typical examples are climate change projections (van der Linden and Mitchell, 2009;Gómez-Navarro et al., 2010;Jacob et al., 2013), paleosimulations (Gómez-Navarro et al., 2013), but also classical sensitivity and process studies (Kilic and Raible, 2013;Cipagauta et al., 2014).
Using the set-up with analysis nudging and the YSU * scheme, the role of the spatial resolution and the representativity error is assessed. As expected, horizontal resolution is critical for a realistic wind simulation in very complex terrain. A reduction from 6 to 2 km shows a clear improvement in simulating the mean wind pattern as well as maximum winds. This contrasts with the results reported by García-Díez et al. (2015) that found little added value at 9 km resolution simulations driven by WRF over the Iberian Peninsula, and indicates that the ability of RCMs to add value to the driving data sets depends critically on the complexity of the area of interest. In particular, this study demonstrates the ability of WRF to add value in simulations up to 2 km over the Alpine region. The results for the 18 km configuration show barely any performance, with negligible spatial and temporal correlations. Thus, the overestimation of wind speed becomes exacerbated in coarser-resolution do-mains, further indicating that the main source of wind overestimation is the unresolved orography. Separating into plain, mountain and valley areas, the temporal agreement is slightly higher over flat terrain and reduced in valleys. The mean biases are similar, although show more spatial variability in the mountains, driven by the larger variability of height biases. More remarkable differences are seen in the RMSE values, which show relatively high values of about 6 m s −1 in the mountains compared to 3 m s −1 in the flat regions and valleys.
In summary, this study suggests two set-ups for the simulation of wind storms over complex topography. They depend on the scientific question of (i) configuration C6 with the YSU * scheme that reduces wind overestimation and increases spatial correlations. It further uses 3-D analysis nudging that improves the temporal agreement with respect to observations, and at the same time further reduces the overestimation of maximum wind speed and improves the spatial distribution of wind speed. Thus, this combination is the most suitable for running hindcast simulations aimed at achieving a reliable surface wind simulation over areas of complex orography and in synoptic situations leading to severe storms. (ii) When the timing is not so relevant but an undisturbed development of regional processes is needed, the configuration using the YSU * scheme and free simulations delivers a realistic simulation of surface winds over complex terrain.