During the second Wind Forecast Improvement Project (WFIP2; October 2015–March 2017, held in the Columbia River Gorge and Basin area of eastern Washington and Oregon states), several improvements to the parameterizations used in the High Resolution Rapid Refresh (HRRR – 3 km horizontal grid spacing) and the High Resolution Rapid Refresh Nest (HRRRNEST – 750 m horizontal grid spacing) numerical weather prediction (NWP) models were tested during four 6-week reforecast periods (one for each season). For these tests the models were run in control (CNT) and experimental (EXP) configurations, with the EXP configuration including all the improved parameterizations. The impacts of the experimental parameterizations on the forecast of 80 m wind speeds (wind turbine hub height) from the HRRR and HRRRNEST models are assessed, using observations collected by 19 sodars and three profiling lidars for comparison. Improvements due to the experimental physics (EXP vs. CNT runs) and those due to finer horizontal grid spacing (HRRRNEST vs. HRRR) and the combination of the two are compared, using standard bulk statistics such as mean absolute error (MAE) and mean bias error (bias). On average, the HRRR 80 m wind speed MAE is reduced by 3 %–4 % due to the experimental physics. The impact of the finer horizontal grid spacing in the CNT runs also shows a positive improvement of 5 % on MAE, which is particularly large at nighttime and during the morning transition. Lastly, the combined impact of the experimental physics and finer horizontal grid spacing produces larger improvements in the 80 m wind speed MAE, up to 7 %–8 %. The improvements are evaluated as a function of the model's initialization time, forecast horizon, time of the day, season of the year, site elevation, and meteorological phenomena. Causes of model weaknesses are identified. Finally, bias correction methods are applied to the 80 m wind speed model outputs to measure their impact on the improvements due to the removal of the systematic component of the errors.
The second Wind Forecast Improvement Project (WFIP2) took place in Oregon and Washington states from October 2015 through March 2018. This Department of Energy (DOE) and National Oceanic and Atmospheric Administration (NOAA) funded project was aimed at improving the parameterizations within the High Resolution Rapid Refresh (HRRR – 3 km horizontal grid spacing) model and its nested version (HRRRNEST – 750 m horizontal grid spacing), with the goal of increasing the forecast skill of wind turbine hub-height (80 m) wind speeds. The study area is a region of complex terrain that included a large amount of wind power generation, with more than 4.6 GW of installed capacity associated with the Bonneville Power Administration (BPA) balancing authority.
WFIP2 (Shaw et al., 2019; Wilczak et al., 2019a; Olson et al., 2019a) as well as the first WFIP (held in the US Great Plains, in 2011–2012; Wilczak et al., 2015) represent efforts to improve forecasts for the renewable energy sector. While the first WFIP was in an area with relatively flat terrain, WFIP2 took place in an area characterized by pronounced topographic features. These include the Cascade Mountains and the Columbia River Basin to the east, with the Columbia River Gorge forming a gap in the mountain range resulting in complex flow patterns in the region. Important background information regarding the project can be found in several publications: Shaw et al. (2019) presents a general overview of the project; Wilczak et al. (2019a) describes the instruments deployed for the 18-month-long campaign and the meteorological forecast challenges of the region; and Olson et al. (2019a) discusses the parameterization improvements applied to the HRRR and HRRRNEST models resulting from a better understanding of local atmospheric processes achieved by the use of the observations.
Toward the end of the campaign, a model freeze was imposed and some case studies with interesting meteorological conditions were selected to focus model improvements around. Changes to the model physical parameterizations based on model known deficiencies and findings from this campaign were then tested over these case studies and those that showed improvements were selected to become a new experimental physics suite. Finally, four 6-week periods (one for each season: “spring 2016” – 25 March–7 May 2016; “summer 2016” – 24 June–7 August 2016; “fall 2016” – 24 September–7 November 2016; and “winter 2017” – 25 December 2016–7 February 2017) were chosen to rerun the models in control (CNT) and experimental (EXP) configurations. The EXP configuration included all the modifications/improvements added to the models, while the CNT runs used the HRRR parameterization present in the NCEP operational version of the HRRR at the start of WFIP2. The four 6-week periods will be called “reforecast periods” throughout the rest of the paper, while the model reruns (HRRR CNT, HRRR EXP, HRRRNEST CNT, and HRRRNEST EXP) will be called “reforecast runs”.
Since the primary goal of WFIP2 is to advance the state of the art of wind energy forecasting in areas with complex terrain in general, and in the BPA region in particular, in this paper we use hub-height wind speed observations from sodars and profiling lidars to assess the impacts of the experimental parameterizations and finer horizontal grid spacing on the performance of the models. These instruments were chosen because they accurately measure wind speed and direction from 20 m up to a few hundred meters above ground level, which is the layer of the atmosphere most relevant for wind energy production. While in this paper improvements in bulk statistics (mean absolute error, MAE, and bias) are evaluated, a companion research article (Djalalova et al., 2019) determines the improvements using the same set of measurements and the same model runs at forecasting wind power ramp events.
The paper is organized as follows: in Sect. 2 the observational and numerical weather prediction (NWP) model datasets are described; in Sect. 3 details of the bulk statistical results are presented for 80 m wind speed MAE and bias for individual models, in terms of time of the day, model initialization time, forecast horizon, season of the year, and site elevation; in Sect. 4 improvements in the statistical results are quantified due to the experimental physics, model finer horizontal grid spacing, and a combination of the two, again as a function of the time of the day, the season of the year, and the different meteorological phenomena predominant in the area, both with and without bias correcting the model output. Section 5 presents a summary and conclusions.
Various in situ, scanning, and profiling instruments were deployed and
maintained by WFIP2 team partners who later provided quality controlled
versions of the data. All data are available to the public from the DOE Data
Archive and Portal (DAP;
Diurnally averaged 80 m wind speed MAEs for HRRR CNT (red
curves), HRRR EXP (blue curves), HRRRNEST CNT (yellow curves), and HRRRNEST
EXP (black curves). Panel
In the current study, data collected at 22 remote-sensing sites (19 sodars and three lidars) spanning the WFIP2 region are used, since their measurements cover the part of the atmosphere of most interest for wind energy. As measurements through the entire turbine rotor layer were not always available, we decided to focus on the 80 m level when available to avoid averaging the data over a variable depth layer of the atmosphere that could result, in some cases, in biasing the average toward values more representative of the lower part of the layer.
Some sites had a co-located sodar and lidar. In this situation the instrument with the highest data availability during the campaign was chosen. This choice led to the selection of the 19 sodars and three lidars listed in Table 1, where the latitude, longitude, elevation of the site, terrain complexity, percentage of data availability over the four reforecast periods, and the institution in charge of the instrument are also presented. The terrain complexity was computed as the standard deviation (in meters) relative to the average slope in a 6 km by 6 km area (81 points) around the site using the HRRRNEST model topography.
List of the instruments used in this study with site identification name, latitude, longitude, elevation, terrain complexity, percentage of data availability, and institution in charge (ANL: Argonne National Laboratory; ARL: Air Resources Laboratory; CU: University of Colorado; LLNL: Lawrence Livermore National Laboratory; NREL: National Renewable Energy Laboratory; PNNL: Pacific Northwest National Laboratory; UND: University of Notre Dame).
Continued.
Although the focus of this study is on the 80 m wind speed statistics, we also examine the statistics of wind power generation, using a generic IEC (International Electrotechnical Commission) Class 2 power curve to convert wind speed into power. Details for the conversion from wind speed into power are given in Wilczak et al. (2019b), while Wilczak et al. (2019a) and Djalalova et al. (2019) demonstrated that the equivalent wind power generation computed from these 22 remote sensors using the abovementioned curve is representative of the actual wind power generation over the entire BPA area. The geographical location of the 19 sodars and three lidars is provided in a map later in the paper, and a more comprehensive base map of all the instruments deployed for WFIP2 is presented in Wilczak et al. (2019a).
WFIP2 model development and improvement focused on improving forecasts in complex terrain for wind energy applications. Improvements in operational NWP models usually target extreme weather events and near-surface weather in general, with little focus on the improvement of the forecast of wind speed at hub height. Wind energy generation is especially abundant in regions of complex terrain where there are many forecasting challenges due to the complexity of the terrain-modulated flows and the feedback processes associated with them. Thus, forecast errors in hub-height wind speeds can originate from various model components. For this reason, WFIP2 model development and improvement included a number of model components: the boundary-layer and surface-layer schemes, the representation of drag associated with sub-grid-scale topography and wind farms, and the cloud–radiation interaction. Moreover, because of the complex terrain, special care had to be devoted to scaling adaptive physical parameterizations.
While the reader is referred to Olson et al. (2019a, b) for complete
details on the improved model configurations, we provide a list with brief
summaries of the set of model physical parameterizations and relevant
numerical methods targeted for development in WFIP2.
The mixing length is the distance parcels are allowed to be displaced by
turbulence processes, therefore depending on the size of the turbulent
eddies. In the new formulation, the mixing length is independent of the
height above ground and turbulent eddies are forced to be smaller than the
depth of the model layer in strong stratification, thus improving
maintenance of cold pools and stable boundary layers in general. A mass-flux scheme was added to the original MYNN (Mellor–Yamada–Nakanishi–Niino) PBL scheme, making it an
eddy-diffusivity mass-flux (EDMF) scheme and allowing for direct coupling of
the sub-cloud convective cores and the cloud layer above. This resulted in
improved coverage of shallow cumulus and improved profiles of temperature
and humidity, while a smaller impact was found on low-level winds during the
day. SGS clouds and coupling to radiation improves the downward shortwave forcing
in shallow cumulus and stratocumulus conditions. The primary impact is to
improve the surface energy balance, which can then more accurately drive the
turbulent mixing, while a small direct impact was found on low-level winds. The representation of drag due to SGS orography was added to the HRRR
physics suite including surface drag due to gravity waves and form drag.
While the SGS gravity wave drag acts in stable PBLs and the form drag acts
for all stabilities, form drag has a smaller impact than the gravity wave
drag at the high resolutions of the HRRR, and neither are active in the
HRRRNEST. This addition improves the maintenance of cold pools by reducing
the near-surface wind speeds (and wind speed bias), while also reducing the
near-surface vertical wind shear in stable conditions. In the Monin–Obukhov theory the flat-terrain approximation implies that all
fluxes (momentum, heat, and moisture) happen in the vertical, but this
approximation becomes unrealistic in complex terrain. For this reason, the
new surface-layer scalar flux algorithm now includes horizontal fluxes. While typically horizontal turbulent mixing is calculated with no direct
communication with the parameterized vertical mixing, the impact of
horizontal fluxes can now be of similar magnitude as the vertical fluxes,
improving the representation of fine-scale turbulence. The expected benefits
are mostly found at sub-kilometric scales. Horizontal diffusion is now performed in Cartesian space instead of
terrain-following sigma coordinates. This option is a replacement to mixing
along sigma coordinates, which can produce artificial vertical mixing in
steep terrain. This change improves the maintenance of cold pools by no
longer mixing vertically when model vertical coordinates follow steep
terrain. A representation of wind farm drag was introduced by adopting the Weather
Research and Forecasting (WRF) wind farm parameterization (Fitch et al.,
2012, 2013a, b). The inclusion of this parameterization reduces a
high wind speed bias within wind farms but can contribute to a slight
negative wind speed bias near wind farms.
The biggest improvements in the reforecasts were found from 1, 3, and 4,
which improved the representation of turbulent mixing in stable boundary
layers (Olson et al., 2019a, b).
Details of the simulations used in this analysis are as follows. For the four reforecast periods (spring, summer, and fall 2016, and winter 2017), 24 h forecasts were made with the HRRR and HRRRNEST, initialized twice per day at 00:00 and 12:00 UTC, using initial conditions from the operational RAPid refresh model (RAP; Benjamin et al., 2016), with no additional data assimilation and with output available every 15 min. For simplicity, we refer to the runs initialized at 00:00 UTC as the Z00 runs and the runs initialized at 12:00 UTC as the Z12 runs. The reforecasts were run in both CNT and EXP configurations, with the EXP configuration including all the improved parameterizations. The 3 km HRRR is directly initialized from the 13 km RAP grid, so there is a spin-up period associated with the model atmosphere adjusting to the higher-resolution terrain, which typically has much higher mountain peaks and lower valleys in the HRRR relative to the RAP. This spin-up problem would be even more exaggerated if the HRRRNEST was directly initialized from the RAP model atmosphere, so to minimize this problem, we chose to allow the HRRR model atmosphere to spin-up for 3 h before we initialized the HRRRNEST from the HRRR 3 h forecast. Therefore, the HRRRNEST output runs were delayed by 3 h to ameliorate these spin-up problems so that a gap in the HRRRNEST model output exists from forecast horizon 00 to forecast horizon 02 (from 00:00 to 02:45 UTC for the Z00 initialized runs, and from 12:00 to 14:45 for the Z12 initialized runs). For this reason, in order to show meaningful comparisons between the models, we utilize only the forecast horizons 03–24 for the HRRR runs also.
For our analysis, in order to compare to the observations, the 80 m wind field is obtained from model output horizontally bilinearly interpolating to the 22 site locations using the four closest grid points and linearly vertically interpolating the two closest heights (approximately 36 and 83 m). The HRRR has relatively coarse vertical resolution, with only five full model layers below 200 m, but the middle of the third layer is very close to 80 m a.g.l., so a linear interpolation does not have a significant impact on the accuracy of the estimated 80 m wind speeds.
The observations were also averaged and interpolated in time over the 15 min model output times (most of the observations were already at a 15 min interval, but some were at a 10 min interval or less) and linearly interpolated to the 80 m level.
In this section we examine the diurnal variation in 80 m wind speed MAE and bias (model–observations) at all sites and the seasonal variation in MAE and biases from the four reforecast periods to identify the dependence of the statistics on the time of the day, model initialization time, forecast horizon, and season. The dependence on the elevation of the site is also investigated.
The 80 m wind speed MAEs, averaged over the 19 sodars and three lidars, show a
clear diurnal pattern (Fig. 1). Each of the four reforecast runs (HRRR CNT
is in red, HRRR EXP in blue, HRRRNEST CNT in yellow, and HRRRNEST EXP in
black) is averaged over the four reforecast periods in panel a,
while panels b–e show the four reforecast periods separately.
Initialization times are represented by circles (Z00 runs) and by X's (Z12 runs), while the averages between these values are in solid, bold
lines. The 80 m wind speed MAEs show a clear diurnal pattern, consistent
among all model runs, with larger average MAEs during stable atmospheric
conditions at nighttime (LST
While on average the experimental physics and finer grid spacing lowers the MAEs over the four reforecast periods (Fig. 1a: blue, yellow, and black lines all show smaller MAEs compared to the red lines), the improvements are less consistent when looking at the four reforecast periods separately (panels b–e). In winter, the improvements are more robust, as explained in Olson et al. (2019a), due to better maintenance of cold pools, which frequently happen in this area over the winter (Whiteman et al., 2001; McCaffrey et al., 2019) and which are investigated in detail in Sect. 4.4.
The biases of the 80 m wind speed also exhibit a diurnal cycle (Fig. 2).
Again, Fig. 2a shows averages of the four reforecast periods and panels b–e display the four reforecast periods separately. The diurnal
trend of the bias in the HRRR CNT is evident in the red curves, with
positive biases at nighttime (stable atmospheric conditions), averaging 0.7 m s
As in Fig. 1 but for the 80 m wind speed biases.
While the HRRRNEST reforecast runs (CNT in yellow and EXP in black) reduce the bias compared to their respective HRRR simulations, it is not clear yet if the HRRRNEST EXP is better than the HRRRNEST CNT or vice versa. Similarly to the MAEs, differences between the four reforecast runs are larger at nighttime and smaller during the daytime (when the biases are consistently mostly negative).
MAEs of the 80 m wind speed, presented in Fig. 3a, show that the HRRR EXP (in blue) does better than the HRRR CNT (in red) in fall and in winter but not in spring or summer. MAEs of the HRRRNEST CNT (in yellow) are better than those of the HRRR CNT (in red), and the HRRRNEST EXP (in black) is now almost always better than the other models. Biases, presented in Fig. 3b, show values in the HRRR EXP (in blue) becoming much too negative (caused by the additional orographic drag employed in the HRRR EXP) compared to the HRRR CNT (in red) in the spring, summer, and fall. Future revisions of the orographic drag in the HRRR will address this issue. The HRRRNEST EXP (black) is better than the HRRRNEST CNT (in yellow) only in the fall and winter, and again it is not clear that one of these two models has a demonstrably smaller overall bias.
Eighty-meter wind speed MAEs
The results of this section indicate that the time of the day is of primary importance in terms of MAEs and biases, while the model initialization time and the forecast horizon are of secondary importance. Consequently, the remaining statistical analysis is carried out averaging the Z00 and Z12 runs.
As evident from Table 1, the 22 sites used for this analysis have very different elevations (ranging from 63 m a.s.l. at Rufus, RFS, to 991 m a.s.l. at Prineville, PVE), as well as different surrounding topographic variability. In this section, we investigate the dependence of the model error statistics on the site elevation. In Fig. 4a, b, c, and d, the results for the 80 m wind speed normalized bias, averaged over the two model initialization times, and over all forecast horizons from 03 to 24, are presented for the four reforecast periods. Sites are sorted from low to high elevation (from Rufus on the left to Prineville on the right) and biases are normalized by the averaged (observed) 80 m wind speed at each site. On the right axes of Fig. 4a, b, c, and d, we show (as dotted black lines) the averaged 80 m wind speed at each site for each reforecast period. These averages show some dependence on site elevation in fall and winter, most likely caused by cold pool events with lower wind speeds confined to the sites at lower elevation. We also note that sites at higher elevation do not have higher 80 m wind speeds than sites at lower elevation in summer and in spring. The topography of the area with the location of the sites is in Fig. 4e. The biases presented in Fig. 4 show that the diurnally and seasonally averaged biases are smaller (and often negative) at lower elevations, with a positive trend with increasing elevation. In particular, the HRRR CNT (red) has the largest positive bias at high elevations in winter which is likely due to the premature mix-out of cold pools occurring preferentially at higher elevations first, which can lead to longer periods of time with a positive wind speed bias. As in Fig. 2, HRRR EXP runs (in blue) always show the lowest bias, almost always negative, particularly at the lowest elevation sites. When not normalized by the averaged wind speed at the site (not shown) the trend was consistent with that shown in Fig. 4 but even more accentuated. In contrast, a similar analysis but for MAE normalized by the averaged 80 m wind speed at each site (not shown) did show a mostly neutral dependence on site elevation (with a slight decrease with site elevation).
Eighty-meter wind speed bias (model–observations) normalized by the
averaged (observed, in dotted black lines) 80 m wind speed at each site for
the four reforecast runs as a function of site elevation for the four
reforecast periods separately: panel
Although it is not clear at this point what the physical reason is for the models having a normalized bias dependent on site elevation (it may be due to the characteristics of the atmospheric phenomena predominant in this area and challenging to forecast), it is important to know that in an area of complex terrain like that of WFIP2, this dependence exists. The dependence of the bias on the elevation indicates that a post-processing bias correction of the model should be done at each site independently.
Terrain complexity is not as powerful of a predictor of model bias as site elevation. A similar analysis to that presented in Fig. 4 was performed, but sorting the sites by the complexity of the surrounding terrain (see Table 1). In this analysis (not shown) the trend of 80 m wind speed MAE and bias was not clearly defined.
In this section we examine the statistical significance and percentage improvement in the model forecast of 80 m wind speed and power. The improvements are analyzed in terms of the new physics (EXP vs. CNT runs) as well as horizontal grid spacing of the models (HRRRNEST vs. HRRR runs), first separately and then combining the impact of the two (HRRRNEST EXP vs. HRRR CNT). Finally, we evaluate the dependence of the improvements on the dominant meteorological phenomena of the area (Shaw et al., 2019), including cold pools (Whiteman et al., 2001; Zhong et al., 2001; McCaffrey et al., 2019), gap flows (Sharp and Mass, 2002, 2004), easterly flows (Neiman et al., 2018), mountain waves (Durran, 1990, 2003), topographic wakes, and convective outflows (Mueller and Carbone, 1987).
The impact of the experimental physics in the HRRR runs (HRRR EXP vs. HRRR
CNT) is almost always positive for wind speed and power. Percent improvement
and statistical significance is shown in Fig. 5 for 80 m wind speed (a, c, e) and 80 m wind power (b, d, f). These results are obtained
averaging all sites together, over the two model initialization times
(forecast horizon from 03 to 24) and over the four reforecast periods.
Diurnal variations in MAE (HRRR CNT in red and HRRR EXP in blue) are
presented in Fig.5a and b, while panels c and d show
differences between MAEs of the HRRR CNT run and MAEs of the HRRR EXP run
(error bars represent the
Panels
Improvements due to finer horizontal grid spacing are larger than those due to the experimental physics. The impact of the finer horizontal grid spacing in the control runs (HRRRNEST CNT vs. HRRR CNT) is shown in Fig. 6 for 80 m wind speed (a, c, e) and 80 m wind power (b, d, f). MAE values in panels a and b are in red for the HRRR CNT runs and in yellow for the HRRRNEST CNT. In Fig. 6e and f, we see a large percentage improvement in MAE due to finer horizontal grid spacing, particularly at nighttime and during the morning transition (approximately between 01:00 and 15:00 UTC). Improvements due to finer horizontal grid spacing are larger than those due to the experimental physics in Fig. 5, with values now up to 10 % in 80 m wind speed MAE and up to 15 % in 80 m wind power MAE. The percentage improvements are smaller during daytime, when the HRRR model with larger horizontal grid spacing had lower MAE compared to nighttime.
As in Fig. 5 but for HRRRNEST CNT (in yellow) vs. HRRR CNT (in red) runs, showing the impact on 80 m wind speed MAE of finer model horizontal grid spacing.
In Fig. 7 we compare the improvements in 80 m wind speed MAE due to the experimental physics (left panels) from the HRRR (shown previously in Fig. 5) with those found in the HRRRNEST and the improvements due to finer horizontal grid spacing (right panels) from the CNT simulations (shown previously in Fig. 6) with those found in the EXP simulations. The dark blue curve shows the impact of the experimental physics on the models with larger horizontal grid spacing (HRRR EXP vs. HRRR CNT), while light blue shows the impact of the experimental physics on the models with finer horizontal grid spacing (HRRRNEST EXP vs. HRRRNEST CNT). The red curve shows the impact of finer horizontal grid spacing on the CNT runs (HRRRNEST CNT vs. HRRR CNT), while the impact of finer horizontal grid spacing on the EXP runs (HRRRNEST EXP vs. HRRR EXP) is shown in orange. When averaged over the four reforecast periods, the impact of the experimental physics (left upper panel) is quite similar between the higher and finer horizontal grid spacing models; however when considering the four reforecast periods separately (lower left smaller panels), the impact varies considerably. For example, in summer the impact of the experimental physics on the HRRRNEST is mostly neutral (light blue curve), while in the HRRR it is actually producing a negative impact (dark blue curve). In contrast, while the impact of the experimental physics is positive for both horizontal grid spacings in winter, it is very positive for the HRRR (dark blue curve). This variation could be due to changes in the physics that are grid-spacing dependent, making the impact different for HRRR and HRRRNEST. Similar considerations can be made for the improvement due to finer horizontal grid spacing (right panels). When averaged over the four reforecast periods (right upper panel) the impact of the finer horizontal grid spacing is similar between the models with different physics. However, for the winter reforecast period (lower right panel) the impact of the finer horizontal grid spacing on the EXP runs is mostly neutral (orange curve), while for the CNT runs it is clearly positive (red curve).
Improvements in 80 m wind speed MAE due to the experimental
physics (left panels) and finer horizontal grid spacing (right panels) for
the four reforecast periods averaged together (upper panels) and for the
four reforecast period separately (lower smaller panels) for all reforecast
runs. Dark blue is HRRR EXP vs. HRRR CNT, light blue is HRRRNEST EXP vs. HRRRNEST CNT, red is HRRRNEST CNT vs. HRRR CNT, and orange is HRRRNEST EXP
vs. HRRR EXP. Red and blue arrows on the
As a final step of the analysis, the combined impact on 80 m wind speed MAE of the experimental physics and finer horizontal grid spacing, comparing the HRRRNEST EXP to HRRR CNT is shown in Fig. 8. Consistent with the results presented in the previous sections, we find that the combination of the experimental physics and finer horizontal grid spacing produces even larger improvements, always positive and up to a maximum of 14 % in the 80 m wind speed MAE (panel e) and up to a maximum of 18 % in 80 m wind power MAE (panel f). Again, larger improvements are found during the nighttime and during the morning transition, with smaller improvement found during daytime when the models had lower MAEs.
As in Fig. 6 but for HRRRNEST EXP (in black) vs. HRRR CNT (in red) runs, showing the combined impact on 80 m wind speed MAE of the experimental physics and finer model horizontal grid spacing.
To condense the results presented in this section, a summary plot with the
percentage improvements in MAE due to the experimental physics, finer
horizontal grid spacing, and the combination of the two, for the four
reforecast periods separately and averaged together is presented in Fig. 9
(panel a is for 80 m wind speed MAE and panel b is for 80 m wind
power MAE results). For this plot the results are averaged over all sites,
between the two initialization times, and over all reforecast horizons
between 03 and 24. Averaged over the four reforecast periods (bars on the
right side of each panel) we see improvements due to the experimental
physics in the HRRR (in dark blue) and HRRRNEST (in light blue) reforecast
runs, up to
Panel
The improvements due to the experimental physics and finer horizontal grid
spacing (and to the combination of the two) as a function of the different
meteorological phenomena common to this area are presented in Fig. 10. For
this analysis we take advantage of the WFIP2 Event Log, which was created
and updated regularly during WFIP2 by several meteorologists documenting the
meteorological conditions of relevance in the area and is available on the
DAP (Shaw et al., 2019). The WFIP2 meteorologists based their classification
of events on WFIP2 observations and other surface observations, real-time
and global model forecasts, satellite images, and local radio soundings. In
the Event Log document, days and characteristics of the different
meteorological phenomena were recorded, with the possibility that on some
days multiple phenomena could occur at the same time. Although the
categorization of the days into different meteorological phenomena involves
a certain level of subjectivity, the final classification process involved
weekly meetings during the field study with meteorologists on the project
team, many with operational forecasting experience in this geographic area,
during which a consensus was reached by the team, making us confident that
other meteorologists would agree with the classifications we used. The Event
Log is accessible to the public (available on the DAP,
Improvements due to the experimental physics (blue and light blue), finer horizontal grid spacing (red and orange), and the combination of the two (gray) as a function of the different meteorological phenomena common to the WFIP2 area.
As shown in Fig. 10, the number of days with gap flow events is very high (145), and if we plot the same figure separately for each of the four reforecast periods (Fig. 11), we see that the gap flow events are almost equally distributed over the four reforecast periods (34 in spring 2016, panel a; 41 in summer 2016, panel b; 38 in fall 2016, panel c; and 32 in winter 2017, panel d). For gap flow events, model performances can be different from season to season due to the fact that their nature differs from season to season (being thermally forced in summer and synoptically forced in fall and winter). Mountain wave (54 d in total) and topographic wave events (30 d in total) are also distributed over all reforecast periods. From Fig. 11 we can say that the impact of the experimental physics and finer horizontal grid spacing on 80 m wind speed MAE during gap flow, mountain waves, and topographic wake situations differs from season to season (negative in spring and summer and positive for fall and winter).
Same as in Fig. 10, but for the four reforecast periods
individually (spring,
Consequently, the blue bar in spring and summer extending toward negative values, visible in Fig. 9, is not only due to the negative impact of mountain wave and topographic wake days, but also to gap flow days in spring and summer (Fig. 11a, b). From Fig. 11 we also note that easterly flow is a category with a more consistent impact, always being improved by the experimental HRRR physics. Cold pool events are also consistently improved by the experimental HRRR physics; this type of event happens mostly in fall and winter (only one event is found in spring, therefore its impact cannot be considered statistically significant).
To better understand the reasons for the lack of MAE improvement in the HRRR EXP vs. HRRR CNT runs during diurnal gap flow days in summer, in Fig. 12 we present the aggregated time series of 80 m wind speed MAE (panel a) and wind speed (panel b) for the 22 sites for part of the summer reforecast period (all of the summer reforecast period shows a similar behavior). In panel b, days identified in the Event Log as experiencing gap flows are highlighted with the red shaded areas. From the time series in Fig. 12a, we see that the 80 m wind speed MAE of the HRRR EXP (blue line) is often larger than that of the HRRR CNT (red line). For almost all of the gap flow days the HRRR EXP forecasts the down-ramp too early at the end of each daily gap flow event, compared to the observations and to the HRRR CNT. Similar results were found for the spring reforecast period (not shown).
Time series of 80 m wind speed MAE
Although from Fig. 11 we see the experimental physics generally improves the
HRRR during cold pool events, we next examine details of the when and how
this improvement occurs. Figure 13 is similar to Fig. 12, but for part of the
winter reforecast period. In panel b, days identified in the Event
Log as experiencing cold pools are highlighted with the blue shaded areas.
In the time series shown in Fig. 13a, a period when the
80 m wind speed MAE of the HRRR EXP (blue line) is larger than the HRRR CNT
(red line) is highlighted with the red oval, while at a later time (inside
the blue oval) the opposite is true. Differences between these cold pool
events were examined using the WFIP2 real-time model observation evaluation
website (
As in Fig. 12, but for part of the winter 2017 reforecast period.
Time–height cross sections (not shown, but available from the WFIP2 real-time model observation evaluation website) of microwave radiometer temperature and winds from the radar wind profiler superimposed on radio acoustic sounding system virtual temperature at Wasco, OR, for 4 and 19 January 2017 revealed that the cold pool at the beginning of January is brought in by sustained easterly winds and has weaker stable stratification compared to the cold pool event in the second half of January, which is characterized by very low wind speeds close to the surface and more strongly stable stratification. Thus, although these periods are both listed as cold pool events, they have different atmospheric characteristics. In the first case the experimental physics in the HRRR EXP run does not help the model to outperform the HRRR CNT, while in the second case it does. A large wind speed deficit in the HRRR EXP forecast on 4 January 2017 (visible in the red oval in Fig. 13b) might occur because the HRRR EXP model has too much drag due to the SGS and/or because of the wind farm parameterization, with wind farms just upwind, east of Wasco. In contrast, on 18 January 2017, a large wind speed excess in the HRRR CNT forecast (visible in the blue oval in Fig. 13b) occurs because of (1) not enough drag in the HRRR CNT to reduce the strong winds immediately above the cold pool, (2) too much mixing at the top of the cold pool, which may be due to too large mixing lengths, and (3) “horizontal” mixing along sloped sigma coordinates, which contribute to vertical mixing. Given the very different wind and stability profile characteristics of the two cold pool events, having routinely available observations of these profiles and assimilating them into the models would likely improve their short-term forecast skill. The need for a network of ground-based profiling instruments to improve numerical weather prediction and operational forecasting is also strongly advocated by the National Research Council (2009).
Next, we evaluate whether the improvements measured in the previous sections are mainly due to reducing the biases of the models (the systematic component of the error) or if the model improvements also address the random component of error. To this aim the model 80 m wind speed output needs to be bias corrected before the bulk statistics and the relative improvements can be computed. Several methods have been investigated in the literature to remove the systematic component of the error from model outputs. For this study, due to the nature of the 80 m wind speed biases presented in Fig. 2, two possible bias correction methods have been considered. The first one removes the mean bias from each model, at each site, and for each reforecast period separately (“mean bias”). The second method removes the mean bias from each model, at each site, for each of the reforecast periods, and for each hour of the day separately (“diurnal bias”). Since, as is clear from Fig. 2, the nature of the bias differs among the models, we examined the impacts of both of these simple bias correction methods. In Fig. 14 we present similar results to those presented Fig. 9a, but after applying the mean bias correction (Fig. 14a) and the diurnal bias correction (Fig. 14b). In both cases, the methodology used to apply the bias correction was to split the dataset into two parts, determine the bias correction on the first half and evaluate it independently on the second half of the dataset.
Percentage improvements on 80 m wind speed MAE (after bias
correcting the model output) due to the experimental physics, finer
horizontal grid spacing, and the combination of the two for the four
reforecast periods separately and averaged together. Panel
The mean bias correction enhances the improvement due to the experimental physics in the HRRR and HRRRNEST models (blue and light blue bars, comparing Figs. 14a to 9a). This improvement indicates that the experimental physics improves the random component of the model error, even if the experimental physics might degrade the systematic component: Fig. 3b shows that the bias of the HRRR EXP model is larger than the bias of the HRRR CNT model. In comparison, applying the diurnal bias correction (Fig. 14b) also increases the improvement due to the experimental physics (dark blue and light blue bars) over all reforecast periods and for their average, while the improvements due to finer horizontal grid spacing in the models (red and orange bars) actually decrease.
Although the scope of the study presented in this paper is to measure
the impact of the improved model parameterizations on the forecast of 80 m wind speeds, it is important to assess what improvements, if any, were
brought to other key variables in the boundary layer. Olson et al. (2019a)
considered this matter when comparing HRRR (CNT and EXP) model outputs to
eight 915 MHz radar wind profilers in the WFIP2 region. The 915 MHz radar
wind profilers observe through the planetary boundary layer, where the MAE
wind speeds were found to be reduced over all four reforecast periods,
especially at night and in winter (stable atmospheric conditions), with MAE
reduced by up to 0.5 m s
Another important variable considered by Olson et al. (2019a) was
temperature, comparing the model runs to radio acoustic sounding system
virtual temperature measurements. For this variable the largest improvements
were found in winter, with MAE of temperatures reduced by more than 0.5
Other key meteorological variables over which model improvements were measured by Olson et al. (2019a) were 2 m temperature and 10 m wind speed comparing the upgraded models to the previous version over the entire CONUS (CONtiguous United States) domain. For these variables RMSE and biases were improved over both the eastern and western CONUS domains, proving that model improvements in one variable were found in other variables as well.
Measurements collected by 19 sodars and three lidars during the second Wind Forecast Improvement Project (WFIP2), an 18-month field campaign in the Columbia River Gorge and Basin area, were used to validate model runs by the High Resolution Rapid Refresh (HRRR) model (3 km horizontal grid spacing) and its nested version (HRRRNEST, 750 m horizontal grid spacing).
The models were run for four 6-week reforecast periods (one for each season) in control (CNT) and experimental (EXP) configurations, where the EXP runs included new parameterizations to the HRRR and HRRRNEST physics suites (i.e., representation of wind farms and of drag associated with sub-grid-scale (SGS) topography in the HRRR), improvements to existing parameterizations (i.e., boundary-layer and surface-layer schemes, cloud–radiation interaction), and improvements to numerical methods (i.e., finite differencing of the horizontal diffusion). Results showed that:
The 80 m wind speed MAE and bias vary significantly through the diurnal cycle, with time of day being more important at determining the 80 m wind speed MAE and bias values than either the initialization time or the forecast horizon.
The HRRR EXP reforecast run reduces the diurnal trend in the bias, but results in a near constant negative bias, possibly by exaggerating the drag due to sub-grid-scale orography added to the HRRR physics suite (but not added to the HRRRNEST).
The 80 m wind speed biases have lower values (often negative) at lower elevations but increase with the site elevation. Differences in the sub-grid-scale terrain inhomogeneity did not help explain any of the bias or MAE in the results.
The experimental physics in the HRRR reduces 80 m wind speed MAE by 3 %–4 % and 80 m wind power MAE by 4 %–5 %.
Finer model horizontal grid spacing improves 80 m wind speed MAE in the control runs, particularly at nighttime and during the morning transition. Smaller improvements occur during daytime, when the larger horizontal grid spacing model had lower MAE than at nighttime. The finer horizontal grid spacing of the HRRRNEST improves 80 m wind speed MAE values up to 5 %, and 80 m wind power MAE up to 7 %–8 %.
The combined impact on 80 m wind speed MAE of the experimental physics and finer horizontal grid spacing produces an even larger reduction in MAE, averaging 7 %–8 % for 80 m wind speed and 11 %–12 % for 80 m wind power.
Improvements in MAE and bias due to the experimental physics and finer horizontal grid spacing depend on season but are almost always positive. However, in spring and summer, the experimental physics in the HRRR runs increases the 80 m wind speed MAE.
The negative impact of the experimental physics on the HRRR MAE found in spring and summer results from the degradation of the HRRR EXP on days experiencing gap flows, mountain waves, and topographic wakes and is probably due to the representation of drag in the HRRR EXP. In particular, for almost all of the summer gap flow days, the HRRR EXP predicts the down-ramps occurring at the end of the events too early.
Although cold pool forecast skill improves due to the experimental physics in the models, different types of cold pools are predicted with varying skill. If routinely available observations of wind and stability profiles were assimilated into the models, short-term forecast skill would likely improve.
Mean bias and diurnal bias corrections of the 80 m wind speed model outputs demonstrated that the experimental physics improves both the systematic and the random component of the model errors. The impacts of the different bias corrections on the improvements due to finer horizontal grid spacing in the models are mixed.
The strength of WFIP2 came from many observational scientists and model
developers working closely together, steering the observational-based
process understanding to guide model improvements which were later
transitioned into operations. The current analysis quantifies the skill
added by improvements made to the models within 4 months towards the end
of WFIP2. A model freeze was then imposed so that the models could be run in
EXP and CNT configurations over the four chosen reforecast periods. Since
the model code freeze, three research tasks related to better simulating the
low-level wind speeds have been prioritized: first the inclusion of momentum
transport in the new mass-flux component of the MYNN-EDMF, second modifying
the small-scale gravity wave drag to only parameterize small-amplitude
gravity waves associated with sub-grid-scale terrain undulations
Further improvements to the models, based on WFIP2 observations, will become part of the operational HRRR in the near future.
The operational HRRR model is not entirely open source (data
assimilation/cycling scripts/etc), but updates to the model
parameterizations used in the HRRR are deposited periodically to the
official repository for the Advanced Research version of the Weather
Research and Forecasting (WRF-ARW) model, maintained by the National Center
for Atmospheric Research (NCAR), which is open source (
Details on the improvements applied to the HRRR and HRRRNEST parameterizations can be also found in Olson et al. (2019a).
All dataset used in this study are freely available to the public from the
DOE Data Archive and Portal (DAP;
Please contact the corresponding author for additional details, if needed.
LB, IVD, and JMW contributed with the data preparation, main analysis, and organization of the results in the paper. JBO and JSK worked on the improvements of the HRRR and HRRRNEST parameterizations, ran the models in CNT and EXP configurations, and contributed with useful discussions to improve the paper. AC contributed with the categorization of the atmospheric phenomena in the Event Log, with observational data, and with useful discussions to improve the paper. LKB, HJSF, EPG, RK, JKL, PM, MP, YP, MTS, and DDT contributed with observational data and with useful discussions to improve the paper.
The authors declare that they have no conflict of interest.
We thank all the people involved in WFIP2 for site selection, leases, instrument deployment and maintenance, data collection, and data quality control.
This research has been supported by the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy (grant no. DE-EE0007605) and by the NOAA/ESRL Atmospheric Science for Renewable Energy (ASRE) program. This work was authored (in part) by NREL, operated by the Alliance for Sustainable Energy, LLC, for the U.S. DOE, under contract no. DEAC36-08GO28308, with funding provided by the U.S. DOE Office of Energy Efficiency and Renewable Energy Wind Energy Technologies. Pacific Northwest National Laboratory is operated by Battelle Memorial Institute for the U.S. DOE under contract no. DEAC05-76RL01830.
This paper was edited by Klaus Gierens and reviewed by Jeffrey Freedman and one anonymous referee.