Introduction
Quantitative precipitation forecasting is recognized as one of the most
challenging aspects of numerical weather prediction
NWP;. While progress is continually
being made in improving the accuracy of single forecasts – through
improvements in the model formulation as well as increases in grid resolution
– a complementary approach is the use of ensembles in order to obtain an
estimate of the uncertainty in the forecast
.
Of course, ensemble forecasting systems themselves remain imperfect, and one
of the most important problems is insufficient spread in ensemble forecasts,
where the forecast tends to cluster too strongly around rainfall values that
turn out to be incorrect.
One reason for lack of spread in an ensemble is that model variability is
constrained by the number of degrees of freedom in the model, which is
typically much less than that of the real atmosphere. The members of an
ensemble forecast may start with a good representation of the range of
possible initial conditions, but running exactly the same model for each
ensemble member means that the range of possible ways of modelling the
atmosphere – of which the model in question is one – is not fully
considered. Common ways of accounting for model error are running
different models for each ensemble member
e.g., adding random perturbations to the
tendencies produced by the parameterizations
e.g., and randomly perturbing parameters in physics schemes e.g..
Focusing on convective rainfall, and for model grid lengths where convective
rainfall is parameterized, another way of accounting for model error is to
introduce random variability in the convection parameterization itself
e.g.. Ideally this
should be done in a physically consistent way, so that the random variability
causes the parameterization to sample from the range of possible convective
responses on the grid scale. A recent overview is given by
.
Such “stochastic” convection parameterization schemes have been developed
over the last 10 years and are just beginning to be implemented and verified
in operational forecasting set-ups, with some promise for the improvement of
probabilistic ensemble forecasts
e.g. The purpose of the
present study is to continue this pioneering work of verifying probabilistic
forecasts using stochastic convection parameterizations, by investigating the
performance of the (PC) scheme in the Met Office Global and Regional
Ensemble Prediction System (MOGREPS) .
The PC scheme has been shown to produce rainfall variability in much better
agreement with cloud-resolving model results than for other non-stochastic schemes
and has been shown to add variability in a physically
consistent way when the model grid spacing is varied . It has
also been demonstrated that the convective variability it produces, on scales
of tens of kilometres, can be a major source of model spread
and further that its performance at large scales in a model
intercomparison is similar to that of more traditional methods
.
These are encouraging results, albeit from idealized modelling set-ups, and it
is important to establish whether or not they might translate into better
ensemble forecasts in a fully operational NWP set-up.
examined seven cases using the Consortium for Small-scale Modeling (COSMO) ensemble system with 7 km grid
spacing and compared the spread in an ensemble using only different
realizations of the PC scheme (i.e. where the random seed in the PC scheme
was varied but the members were otherwise identical) with that in an ensemble
where additionally the initial and boundary conditions were varied. They
found the spread in hourly accumulated rainfall produced by the PC scheme to
be 25–50 % of the total spread when the fields were upscaled to
35 km. The present study investigates the behaviour of the scheme in a
trial of 34 forecasts with the MOGREPS-R ensemble, using a grid length of
24 km. The mass-flux variance produced by the PC scheme is inversely
proportional to the grid box area being used, and so it is not obvious from
the results of whether the stochastic variations of
PC will contribute significantly to variability within an ensemble system
operating at the scales of MOGREPS-R. Nonetheless, MOGREPS-R has been shown,
in common with most ensemble forecasting systems, to produce insufficient
spread relative to its forecast error in precipitation ,
suggesting that there is scope for the introduction of a stochastic
convection parameterization to be able to improve its performance.
Although the version of MOGREPS used here has now been superseded, the present
study represents the first time that the scheme has been verified in an
operationally used ensemble forecasting system for an extended verification
period, and it provides the necessary motivation for more extensive tuning and
verification studies in a more current system. As well as this, the present
study aims to reveal more about the behaviour of the scheme itself, building on work referenced above, as well as on recent work by
, which focused on individual case studies.
The paper compares the performance of the PC scheme with the default MOGREPS
convection parameterization, based on , in order to seek
evidence that accounting for model error by using a stochastic convection
parameterization can lead to improvements in ensemble forecasts. Of course,
the two parameterizations are different in other ways than the stochasticity
of the PC scheme: it is therefore possible that any differences in
performance are due to other factors. Nonetheless, the default MOGREPS scheme
has benefitted from much experience in being developed alongside the Met Office
Unified Model UM, whereas relatively modest efforts were
made here to adapt the PC scheme to the host ensemble system: thus, any
improvements that the PC scheme shows over the default scheme are of clear
interest.
Methods
The Plant–Craig stochastic convection parameterization
The scheme operates, at each model grid point, by reading
in the vertical profile from the dynamical core and calculating what
convective response is required to stabilize that profile. It is based on the
Kain–Fritsch convection parameterization ,
adapting the plume model used there and also using a similar formulation for
the closure, based on dilute convective available potential energy (CAPE). It generalizes the
Kain–Fritsch scheme by allowing for more than one cloud in a grid box and by
allowing the size and number of clouds to vary randomly. Details of its
implementation in an idealized configuration of the UM are given by
; this would be regarded as version 1.1. The important
differences in the implementation for the present study, to produce version
2.0, are presented here.
The scheme allows for the vertical profile from the dynamical core to be
averaged in horizontal space and/or in time before it is input. This means
that the input profile is more representative of the large-scale (assumed
quasi-equilibrium) environment and is less affected by the stochastic
perturbations locally induced by the scheme at previous time steps. It was
decided in the present study to use different spatial averaging extents over
ocean and over land, in order that orographic effects were not too heavily
smoothed. The spatial averaging strategy implemented was to use a square of
7×7 grid points over the ocean and 3×3 grid points over land;
the temporal averaging strategy was to average over the previous seven time steps
(each of 7.5 min) and the current time step. The cloud lifetime was set to
15 min. As well as using the averaged profile for the closure calculation,
the plume profiles were also calculated for ascent within the averaged
environment.
Initial tests showed that the scheme was yielding too small a proportion of
convective precipitation over the domain. Two further parameters were
adjusted from the study by , in order to increase this
fraction: the mean mass flux per cloud 〈m〉 and the root mean
square cloud radius 〈r2〉. Similar changes were made
for the same reason by in their mid-latitude tests
over land and reflect the fact that the original settings in
and were chosen to match well with
cloud-resolving model simulations of tropical oceanic convection.
Specifically, the mean mass flux per cloud was reduced here from
2×107 kg s-1 to 0.8×107 kgs-1 in order to
increase the number of plumes produced by the scheme. The entrainment rates
used in the scheme are inversely proportional to cloud radius, and a probability density function (pdf) of
cloud radius is used characterized by the root mean square cloud value
〈r2〉. This was increased from 450 to 600 m, in order
to produce less strongly entraining plumes. This had some impact on the
convective precipitation fraction, but the scheme still yielded a relatively
low proportion of convective rain: 12 % in these tests, as compared with
50 % for the standard scheme. The overall amount of rainfall was similar
for the two schemes, with the dynamics compensating for the reduction in
convective rain produced and ensuring that the instability was suitably
removed by the dynamics and convection scheme combined in both cases.
There is no correct answer for the convective fraction, which is both model-
and resolution-dependent in current operational practice. For example, the
current ECMWF model has a global average of about 60 %
. Doubtless the convective precipitation fraction
produced by the Plant–Craig scheme in MOGREPS-R could be increased further
with stronger changes to parameters, and we remark that
set 〈r2〉 to 1250 m for
their tests, which would likely have such an effect.The convective rainfall
fraction will also depend on the details of the host model, its large-scale
cloud parameterization and the grid spacing, and the settings of the
convective parameterization itself. For example, the Plant–Craig scheme in
COSMO has been found to yield a convective fraction of 36 % at 28 km
grid spacing in the extra-tropics , and in ICON it was found
to yield a convective fraction of 59 % at 25 km grid spacing, also in
the extra-tropics (Tobias Selz, personal communication, 2016). We attempted
only minimal tuning here and were deliberately rather conservative about the
parameter choices made, with the intention that the results can reasonably be
considered to represent a lower limit of the possible impact of a more
thoroughly adapted scheme.
Description of MOGREPS
The Met Office Global and Regional Ensemble Prediction System has
been developed to produce short-range probabilistic weather forecasts
. It is based on the UM , with 24
ensemble members, and is comprised of global and regional ensembles. In the
present study, the regional ensemble MOGREPS-R was used, with a resolution of
24 km and 38 vertical levels. This covers a North Atlantic and European
(NAE) domain, which is shown in Fig. . The model was run on a
rotated latitude–longitude grid, with real latitude and longitude locations
of the North Pole and the corners of the domain given in
Table . The regional ensemble was driven by initial and
boundary conditions from the global ensemble, as described by
. The operational system has been upgraded since these
tests, and so the present study represents a “proof of concept” for a
stochastic convection scheme in a full-complexity regional or global
ensemble prediction system, rather than a detailed technical recommendation
for the latest version of MOGREPS.
An outline of the MOGREPS NAE domain, with its rotated
latitude–longitude grid. The contours are for reference and are derived
from the data set used in the present study to separate the domain into land
and ocean areas. The grey shading shows the region for which radar-derived precipitation data
were available.
Locations of the North Pole and
the corners of the domain of the NAE rotated grid, in terms of real
latitude and longitude.
Location
latitude (∘N)
longitude (∘E)
North Pole
37.5
177.5
Bottom left
16.3
-19.8
Top left
72.7
-80.0
Bottom right
16.5
14.2
Top right
73.2
74.1
Stochastic physics is already included in the regional MOGREPS, in the form of
a random parameters scheme, where a number of selected parameters are
stochastically perturbed during the forecast run . This
scheme was retained for the present study, given that the Plant–Craig scheme is intended to account only
for the variability in the convective response for a given large-scale state,
and as such its design does not conflict with the inclusion of a method to
treat parameter uncertainty within other parameterization schemes. The
MOGREPS random parameter scheme does introduce variability in parameters that
appear within the standard UM convection scheme, which is based on the
scheme with subsequent developments as described by
. No stochastic parameter variation is applied for any of the
parameters appearing in the Plant–Craig scheme. Thus, there is no “double
counting” of parameterization uncertainty in these tests, but rather we are
comparing different methods of accounting for convective uncertainties in a
framework which also includes a simple stochastic treatment of uncertainties
in other aspects of the model physics.
The forecasts using the Plant–Craig scheme were obtained by rerunning the
regional version of MOGREPS, with the standard convection scheme replaced by
the Plant–Craig scheme, and driven by initial and boundary conditions taken
from the same archived data that were used for the operational forecasts.
These are compared with the forecasts produced operationally during the
corresponding period, so that the only difference between the two sets of
forecasts is in the convection parameterization scheme. The study used the UM
at version 7.3. The model time step was 7.5 min, within which the convection
scheme was called twice, and the forecast length was 54 h.
Time period investigated
The time period investigated was from 10 until 30 July 2009. This
length of time was chosen as being sufficient to obtain statistically
meaningful results, but without requiring a more lengthy experiment that
would only be justified by a more mature system. The particular month was
chosen partly for convenience and partly as a period that subjectively had
experienced plentiful convective rain over the UK, therefore providing a good
test of a convective parameterization scheme.
Experimental forecasts with the Plant–Craig scheme were generated twice
daily (at 06:00 and 18:00 UTC) for comparison with the operational forecast
which was taken from the archive. On some days the archive forecast was
missing and so no experimental forecast was generated. In total 34 forecasts
were generated, with start times shown in Table .
Start times of forecasts investigated in this study
(all dates in July 2009).
10, 18:00 UTC
16, 18:00 UTC
21, 06:00 UTC
27, 18:00 UTC
11, 06:00 UTC
17, 06:00 UTC
21, 18:00 UTC
28, 06:00 UTC
11, 18:00 UTC
17, 18:00 UTC
22, 06:00 UTC
28, 18:00 UTC
12, 06:00 UTC
18, 06:00 UTC
23, 06:00 UTC
29, 06:00 UTC
12, 18:00 UTC
18, 18:00 UTC
23, 18:00 UTC
29, 18:00 UTC
13, 06:00 UTC
19, 06:00 UTC
24, 18:00 UTC
30, 06:00 UTC
14, 06:00 UTC
19, 18:00 UTC
25, 06:00 UTC
30, 18:00 UTC
15, 18:00 UTC
20, 06:00 UTC
25, 18:00 UTC
16, 06:00 UTC
20, 18:00 UTC
26, 06:00 UTC
Validation
A detailed validation was carried out against Nimrod radar rainfall data
. This observational data set is only
available over the UK (as shown in Fig. ), and so most of the
validation in the following focuses on this region. The forecasts were
assessed on the basis of 6-hourly rainfall accumulations, every 6 h, for
lead times from 0 to 54 h.
Fractions skill score
This score (denoted FSS) was developed by , and was used
by to assess the quality of deterministic forecasts
produced using the Plant–Craig scheme for two case studies. Note that we use
the term “deterministic”, in this manuscript, to refer to forecasts
providing a single quantity (for example, a single-member forecast, or the
ensemble mean), and “probabilistic” to refer to forecasts providing a
probabilistic distribution (or, at the very least, a deterministic forecast,
with, in addition, an assessment of its uncertainty). The FSS is determined,
at a given grid point X, by comparing the fractions of observed, O, and
forecast, F, grid points exceeding a specific rainfall threshold, within a
specific spatial window centred at X. Here we define
FSS=1-〈(F-O)2〉〈F2〉+〈O2〉,
where the angled brackets 〈…〉 indicate averages over the
grid point centres X for which observations are available, over the
different forecast initialization times, and here over the different ensemble
members (so that effectively a separate score is calculated for each ensemble
member, and these are averaged to produce the overall score denoted here by
FSS). The spatial window (over which the fractions are evaluated) gives the
scale at which the score is applied, so that the FSS can be used to assess
the performance of forecasts both at the grid scale and at larger scales. The
division by 〈F2〉+〈O2〉 normalizes against the
smoothing applied at the given scale, so that the score always ranges between
0 and 1. The FSS is positively oriented.
Brier scores
In order to determine whether or not the variability introduced by the
Plant–Craig scheme is added where it is most needed, the Brier skill score
BSS; was applied to both forecast sets, using the same
observational data, to assess the respective quality of the probabilistic
forecasts. The Brier score is a threshold-based probabilistic verification
score and is given by the mean difference between the forecast probability
of exceeding a given threshold (this probability is here simply taken to be
the fraction of ensemble members which forecast precipitation greater than
the threshold) and the observed probability (i.e 1 if the observed
precipitation is above the threshold and 0 if it is below). To obtain the
BSS, this is compared with a reference score; the
reference score is here taken to be that calculated from always forecasting a
probability taken from the observation data set (i.e. the proportion of times
the observed precipitation is above the threshold). Thus,
BSS=1-〈(f-o)2〉〈(〈o〉-o)2〉,
where f is the forecast probability; o is the observation (0 or 1); and
〈o〉 is the “climatological” probability based on the
observation set. The angle brackets denote an average over the entire
forecast set. Although 〈o〉 is only available a posteriori
to the event, it does provide a useful “base” for comparison: if the
forecast issued is no better than one given by simply always issuing a
climatological average (i.e. if BSS ≤0), then the forecast can be said to
have no skill.
Ensemble added value
This measure aims to assess the benefit of using an ensemble, as opposed to a
single forecast randomly selected from the ensemble. It was recently
developed and described in detail by , and a brief outline is
given here. The score is of particular interest to the present study, as this
measure should highlight the advantages and disadvantages of using the
stochastic Plant–Craig methodology and provides an assessment that is less
affected by structural differences between the Plant–Craig scheme and the
Gregory–Rowntree (GR) scheme.
The ensemble added value (EAV) is based on the quantile score (QS)
, which is used to assess probabilistic
forecasts at a given probability level (equivalently, the Brier score assesses
probabilistic forecasts at a given value threshold). If a quantile forecast
ϕτ of the τth quantile of a meteorological variable is given,
then the quantile score for that quantile is interpreted as
qτ=〈(ω-ϕτ)(τ-I{ω<ϕτ})〉
where ω is the observed value, the function I(x) is defined as 1 if
x is true and 0 if x is false and the angle brackets denote an average
over all forecasts, as for the Brier skill score. In this way, a forecast for
a low quantile is penalized more heavily if it is above the observed value
than if it is below the observed value, and vice versa for a forecast for a
high quantile (note that the score is negatively oriented). The score for the
50 % quantile is simply the mean absolute error.
The QS can, like the Brier score, be decomposed into a reliability and a
resolution component . In order to calculate the EAV, a
potential QS, Qτ, is defined as the total QS minus its reliability
component. The QS is here evaluated by first sorting the ensemble members, and
interpreting the mth sorted ensemble member as the (m-0.5)/24 quantile
forecast. The EAV is then given by summing the potential QSs, Qm, over the 24
members and comparing with an equivalent sum over reference potential QSs:
EAV=1-∑mQm∑mQmref.
The reference forecast is created by defining the quantile as simply a
randomly selected member of the ensemble, so that the reference forecast
represents the score which could have been obtained with only one forecast (a
single member is randomly selected, with replacement, once for the entire
period but separately for each quantile). The EAV thus measures the quality
of the ensemble forecast, relative to the quality of the individual members of
the ensemble.
Separation into weakly and strongly forced cases
applied the Plant–Craig scheme in an ensemble
forecasting system for seven case studies, with various synoptic
conditions, and showed that the proportion of ensemble variability arising from
the use of the stochastic scheme (as opposed to that arising from variations in the initial
and boundary conditions) depends on the strength of the large-scale forcing,
as measured by the large-scale vorticity maximum. In particular, the stronger
the large-scale forcing, the lower the proportion of the variability that
comes from the stochastic scheme.
investigated two of the case studies further, by verifying
forecasts using the Plant–Craig scheme and using a non-stochastic convection
scheme. They found that the improvement in forecast quality from using the
Plant–Craig scheme was significantly higher for the more weakly forced of the
two cases, since the additional grid-scale variability introduced by the
stochastic scheme is more important.
As part of the present study, we extend the work of
by separating our validation period into dates for which the synoptic forcing
is relatively weak or strong. We then compare any improvement in
the forecasts using the Plant–Craig scheme, over those using the
Gregory–Rowntree scheme, for the two sets of forecasts, to assess over an
extended period whether the benefit of using a stochastic scheme is indeed
greater when the synoptic forcing is weaker.
The separation into weakly and strongly forced cases was carried out a
posteriori to the event based on surface analysis charts. The aim here is
not to develop an adaptive forecasting system, but rather to develop
understanding of the behaviour of the Plant–Craig scheme. Nonetheless, the
results may also be interpreted as providing evidence that such a system may
be feasible if the strength of the synoptic forcing could be predicted in
advance (using, for example, the convective adjustment timescale as
discussed by ). The period was divided into 12 h sections,
centred on 00:00 or 12:00 UTC, and a surface analysis chart valid at the
respective centre time was used to determine whether to categorize the
section as weakly or strongly forced. The 00:00 UTC analyses were taken
from , and the 12:00 UTC analyses from .
The separation was conducted by assigning periods with discernible cyclonic
and/or frontal activity over or close to the UK as strongly forced and the
rest as weakly forced, with some additional adjustment of the preliminary
categorization based on the written reports by . The
periods were categorized as in Table .
Categorization of 12 h periods
(centred at the time given) investigated in this study into weak and
strong synoptic forcing (all dates in July 2009).
10, 00:00 UTC Weak
17, 12:00 UTC Strong
25, 00:00 UTC Weak
10, 12:00 UTC Strong
18, 00:00 UTC Strong
25, 12:00 UTC Weak
11, 00:00 UTC Strong
18, 12:00 UTC Weak
26, 00:00 UTC Strong
11, 12:00 UTC Strong
19, 00:00 UTC Strong
26, 12:00 UTC Strong
12, 00:00 UTC Strong
19, 12:00 UTC Weak
27, 00:00 UTC Strong
12, 12:00 UTC Strong
20, 00:00 UTC Weak
27, 12:00 UTC Weak
13, 00:00 UTC Weak
20, 12:00 UTC Weak
28, 00:00 UTC Strong
13, 12:00 UTC Weak
21, 00:00 UTC Strong
28, 12:00 UTC Strong
14, 00:00 UTC Strong
21, 12:00 UTC Strong
29, 00:00 UTC Strong
14, 12:00 UTC Strong
22, 00:00 UTC Strong
29, 12:00 UTC Strong
15, 00:00 UTC Weak
22, 12:00 UTC Strong
30, 00:00 UTC Weak
15, 12:00 UTC Weak
23, 00:00 UTC Weak
30, 12:00 UTC Weak
16, 00:00 UTC Weak
23, 12:00 UTC Weak
31, 00:00 UTC Weak
16, 12:00 UTC Weak
24, 00:00 UTC Weak
31, 12:00 UTC Strong
17, 00:00 UTC Strong
24, 12:00 UTC Weak
Results
Fractions skill score
The quality of the respective deterministic forecasts (i.e. those produced by
individual ensemble members, with no supplementary indication of the forecast
uncertainty) using GR and PC is assessed
using Figs. , , and . The performance of the
schemes is overall similar, with PC being superior for low thresholds (in
contrast to the findings of ) and short lead times and GR
for moderate thresholds. With upscaling (Figs. and ),
the performance of both schemes improves for all thresholds and lead times.
The PC scheme benefits particularly from the upscaling at higher thresholds
and longer lead times, sometimes performing significantly better than the GR
scheme, where at the grid scale the performance was equal. In general, the
difference in the scores between the two schemes does not reach such high
values as those seen in , although this could be due to the
fact that they investigated individual case studies which were specifically
selected to test the impact of the stochastic scheme, whereas our results are
scores averaged over an extended period.
In general, then, the schemes perform similarly overall, and the impact of
using a stochastic scheme on the FSS is modest. Indeed, the fact that there
is no skill for the highest threshold, for either scheme, is more important.
This lack of skill could be simply due to the fact that the case study period
was too short to obtain a statistically significant sample of extreme rain
events. However, it is also true that MOGREPS significantly overforecasts
heavy rain over the UK for this period (see Fig. ).
Fractions skill score computed for grid-scale data for
the Gregory–Rowntree scheme
(top), the Plant–Craig scheme (centre), and the difference between the two
schemes (Plant–Craig minus Gregory–Rowntree, bottom).
Fractions skill score for the Gregory–Rowntree scheme
(top), the Plant–Craig scheme (centre), and the difference between the two
schemes (Plant–Craig minus Gregory–Rowntree, bottom). The neighbourhood area is (120 km)2,
corresponding to the central grid box and two grid boxes in each
direction.
Fractions skill score for the Gregory–Rowntree scheme
(top), the Plant–Craig scheme (centre), and the difference between the two
schemes (Plant–Craig minus Gregory–Rowntree, bottom). The neighbourhood area is (216 km)2, corresponding to the central grid box and four grid boxes in each
direction.
Separation into weakly and strongly forced cases
Figure shows the difference in FSS between PC and GR, for
forecasts separated into weakly and strongly forced cases, as described in
Section . It can be seen that, with no averaging, PC is better
for the smallest thresholds but worse for the moderate thresholds, while with
upscaling the relative performance for moderate and higher thresholds is
improved, especially for the weakly forced cases.
PC generally performs better than GR for weakly forced cases and worse for
strongly forced cases. While both schemes benefit from upscaling the score,
this benefit is greater for PC. The results agree well with those of
for two example cases, where the Plant–Craig scheme benefits
more from the upscaling than the non-stochastic scheme and performs
relatively better for the weakly forced than for the strongly forced
case.
Moreover, it is clear that the upscaling is more beneficial to
the PC scheme (relative to the GR scheme) for the weakly forced cases than for
the strongly forced cases. The interpretation is that the PC scheme provides
a better statistical description of small-scale, weakly forced convection than
a non-stochastic scheme. This will not provide any improvement to the FSS
evaluated at the grid scale,
since the convection is placed randomly, but it does improve the FSS when it is
evaluated over a neighbourhood of grid points, so that it becomes a more
statistical evaluation of the quality of the scheme.
Fractions skill score for the Plant–Craig scheme,
minus
that for the Gregory–Rowntree scheme, for strongly forced cases (full
lines) and weakly forced cases (dashed lines), with no averaging (top), with a
neighbourhood area of two grid boxes in each direction (centre), and with a
neighbourhood area of four grid boxes in each direction (bottom). The score
shown is the average over all lead times.
Brier score
The quality of the probabilistic forecasts, with respect to forecasts using
the observed climatology, is assessed using Brier skill scores, plotted in
Fig. . While neither scheme has skill for high thresholds, PC
performs substantially better for medium and low thresholds, for all lead
times. In particular, PC has skill in predicting whether or not rain will
occur (zero threshold), while GR does not. Further analysis shows that this
is also the case for thresholds between 0 and 0.05 (not shown).
Brier skill score for the Gregory–Rowntree scheme
(top), the Plant–Craig scheme (centre), and the difference between the two
schemes (Plant–Craig minus Gregory–Rowntree, bottom). For the difference plot, instances where both skill scores
are lower than zero are not plotted.
The decomposition of the Brier score into reliability (Fig. ) and
resolution (Fig. ) is also shown (note that the difference is taken
in the opposite direction for reliability so that the colour scale must not
be reversed). The Plant–Craig scheme improves both components of this score;
the improvement for reliability is rather higher than that for resolution.
The scores for both reliability and resolution are low for the higher
thresholds, which is probably a consequence of the fact that there are
insufficient data to assess such extreme values.
Brier score reliability for the Gregory–Rowntree scheme
(top), the Plant–Craig scheme (centre), and the difference between the two
schemes (Gregory–Rowntree minus Plant–Craig, bottom).
Brier score resolution
for the Gregory–Rowntree scheme
(top), the Plant–Craig scheme (centre), and the difference between the two
schemes (Plant–Craig minus Gregory–Rowntree, bottom).
Separation into weakly and strongly forced cases
Figure shows the Brier skill scores as a function of threshold,
separated into strongly and weakly forced cases. The forecasts are improved
using PC for both sets of cases, and the difference is considerably greater
for weakly forced cases, where GR has almost no skill. This can be interpreted
in terms of the fact that small-scale variability is relatively more important
for the weakly forced cases, and ensemble members using the Plant–Craig scheme
differ from each other more than for the strongly forced cases, where initial
and boundary condition variability is relatively more important
. Our result is similar to what was found by
, where the Plant–Craig scheme was found to perform better
than a non-stochastic scheme for a weakly forced case, and at low thresholds,
but worse than the non-stochastic scheme for a strongly forced case.
Brier skill score for the Gregory–Rowntree scheme
(green lines) and the Plant–Craig scheme (red lines), averaged over all
lead times, for cases with strong forcing (full lines) and weak forcing
(dashed lines), as a function of threshold. The reference for the skill score is the observed climatology. The axes have been chosen to
focus on where the skill score is above zero.
Ensemble added value
The EAV is plotted in Fig. . The PC scheme performs substantially
better for this score across lead times, and the improvement is of a similar
magnitude to that of the Brier score. This suggests that the improvement in
the probabilistic forecast from using PC comes from the stochasticity of the
scheme, since the EAV is measured against individual forecasts from the same
ensemble: it should, therefore, be “normalized” against differences in the
underlying convection scheme which are not related to the stochasticity. The
interpretation here is that, while structural differences between two
convection schemes will lead to differences in the quality of the ensemble
forecasts, this will mainly be due to differences in the quality of
individual members of the ensemble. The stochastic character of the PC scheme
may or may not improve the quality of the individual members, but it is
primarily designed to improve the quality of the ensemble as a whole.
Note that the ensemble forecasts using the GR scheme also have a
positive EAV, representing the value added by the multiple initial and
boundary conditions provided by the global model, and by the stochasticity
coming from the random parameters scheme. Since these factors are also present in the
ensemble forecasts using the PC scheme, it can be interpreted that the
fractional difference between the two EAVs represents the value added by the
stochastic character of the PC scheme as a fraction of the value added by all the ensemble generation
techniques in MOGREPS.
Ensemble added value (EAV) for the Gregory–Rowntree
scheme
(green line) and the Plant–Craig scheme (red line) as a
function of forecast lead time.
General climatology
Although Nimrod radar observations were only available over a restricted part
of the forecast domain, it is also of interest to compare the forecasts over
the whole domain. Figure shows the convective fraction: that is, the
amount of rainfall which came from the convection scheme divided by the total
amount of rain from the convection scheme and grid-scale precipitation. Both
schemes produce more convective rain over land, and the difference between
the fractions over land and sea is in proportion to the fraction over the
whole domain; the fractions are fairly constant with forecast lead time.
As discussed in Sect. , the convective fraction is much lower
for PC than for GR, suggesting that adjusting parameters to increase this
fraction would further increase the PC influence on the forecast (for
example, used a reduced closure timescale to
increase the activity of the PC scheme). The reduced convective rainfall in
the case of PC was compensated for by a corresponding increase in the
grid-scale rainfall (so that the total amount of rainfall in the two cases
was roughly the same). Whether this increase in grid-scale rainfall improves
or degrades the forecast is not clear, so there is some uncertainty as to how
much of the improvement observed over the UK is due to the stochasticity of
the scheme and how much may be related to the convective fraction. The
ensemble added value is intended to isolate the effects of the stochasticity
and provides strong evidence that a significant amount of the forecast
improvement does indeed come from this. However, it is possible that further
improvements in the forecast due to increasing the convective fraction from
the PC scheme (and thus increasing the beneficial effects of the
stochasticity) would be offset by a reduction in quality due to the lower
activity of the grid-scale precipitation.
The ensemble spread is shown as a function of lead time in Fig. ,
over the whole domain and separately over land and over ocean. Both schemes
produce more spread over land, but the difference between PC and GR is also
much greater over land. This is presumably due to the fact that PC has a
higher convective fraction over land and is therefore more able to influence
the spread. The spread increases with forecast lead time and does so more
quickly with PC than with GR.
Figure shows density plots of rainfall from the two schemes, and
from the observations, over the UK part of the domain, for a lead time of 30
to 36 h. It is clear that the model produces too many instances of heavy
rainfall for this period and that this is exacerbated by the extra
variability introduced by the PC scheme. However, as shown earlier in this
section, neither scheme has any skill for large thresholds. It is clear from
Fig. that this is partly due to overproduction of heavy rain,
although it is also the case that the case study was of insufficient length
to fully assess such extreme values.
Figure shows that the PC scheme also produces more heavy rainfall
than the GR scheme over ocean (here for a lead time of 30 to 36 h). This
suggests that one possible approach to tuning the PC scheme could be to apply
less input averaging over the ocean, since have shown that
applying more input averaging increases the variability and, therefore, the
tails of the distribution.
Although a lead time of 30 to 36 h was chosen for Figs. and
, similar conclusions could be drawn for the plots for other lead
times (not shown). The exception to this statement is that for the first
6 h, for which the forecasts had not developed sufficiently for the curves
to lie significantly apart from each other.
Convective fraction as a function of forecast lead time,
for the Gregory–Rowntree scheme (green lines) and the Plant–Craig scheme
(red lines), over land (dashed lines), over ocean (dotted lines), and in
total (full lines), for the full NAE domain.
Ensemble spread as a function of forecast lead time,
for the Gregory–Rowntree scheme (green lines) and the Plant–Craig scheme
(red lines), over land (dashed lines), over ocean (dotted lines), and in
total (full lines), for the full NAE domain.
Density plots for accumulated rainfall for the period of
30 to 36 h lead time, over the UK part of the domain, for forecasts with
the Gregory–Rowntree scheme (green line), the Plant–Craig scheme (red
line), and observations (black line).
Validation over the whole NAE domain
A validation using the routine verification system was also performed for the
two set-ups, covering land areas over the whole forecast domain. This
calculates various forecast skill scores, by comparing against SYNOP
observations at the surface and at a height of 850 hPa, and yielded a mixed
assessment of the performance of the PC scheme against the GR scheme. For
example, the continuous ranked probability score, which assesses both the
forecast error and how well the ensemble spread predicts the error
, was improved by roughly 10 % on using the PC scheme
for rainfall but degraded by about 10 % for temperature and pressure. The
impact on the wind forecast was broadly neutral.
This shows
that, while the improvements demonstrated in this section hold for other
areas outside the UK, this has come at a cost to the quality of the forecast
for some of the other variables. An important advantage of using a stochastic
convection scheme, over a statistical downscaling procedure, is its feedback
on the rest of the model, and it is important that this feedback is of
benefit. The recent analysis by is very encouraging in this
regard, demonstrating the processes of upscale error growth from convective
uncertainties can be well reproduced by the PC scheme, in good agreement with
the behaviour of large-domain simulations in which the convection is
simulated explicitly .
Conclusions
Density plots for accumulated rainfall for the period
of
30 to 36 h lead time, over the entire NAE domain, for forecasts with
the Gregory–Rowntree scheme (green line) and the Plant–Craig scheme (red
line) over ocean.
A physically based stochastic scheme for the parameterization of deep convection has
been evaluated by comparing probabilistic rainfall forecasts produced using
the scheme in an operational ensemble system with those from the same ensemble
system with its standard deep convection parameterization. The impact of using
a stochastic scheme on deterministic forecasts is broadly neutral, although there is
some improvement when larger areas are assessed. This is relevant to
applications such as hydrology, where rainfall over an area larger than a grid
box can be more relevant than rainfall on the grid box scale.
The Plant–Craig scheme has been shown to have a positive impact on
probabilistic forecasts for light and medium rainfall, while neither scheme
is able to skillfully forecast heavy rainfall. The impact of the scheme is
greater for weakly forced cases, where subgrid-scale variability is more
important. studied a convection-permitting ensemble without
stochastic physics and found that deterministic forecast skill was poorer
during weak than during strong forcing conditions. They developed a
convective adjustment timescale to measure the strength of the forcing
conditions. This quantity can be calculated from model variables and could
therefore be used in advance to determine how predictable the convective
response will be for a given forecast. This could potentially be useful in an
adaptive ensemble system using two convection parameterizations (see, for
example, ), one of which is stochastic and is better
suited to providing an estimate of the uncertainty in weaker forcing cases.
Although the Plant–Craig scheme clearly produces improved probabilistic
forecasts, it is not certain whether this is due to its stochasticity, due to
different underlying assumptions between it and the standard convection
scheme, or simply due to the decrease in convective fraction seen in this
implementation. In order to make a clean distinction, further studies could
be performed in which the performance of the Plant–Craig scheme is compared
against its own non-stochastic counterpart, which can be constructed by using
the full cloud distribution and appropriately normalizing, instead of
sampling randomly from it cf.. Nonetheless, the results
from applying the recently developed ensemble added value metric do provide
some relevant information for this question. This metric aims to assess the
quality of the ensemble in relation to the underlying member forecasts, and
the Plant–Craig scheme has been shown to increase it. This indicates that
the stochastic aspect of the scheme can increase the value added to a
forecast by using an ensemble, since other aspects of the scheme (including
the convective fraction) would be expected (broadly) to affect the
performance of the ensemble as a whole and of the individual members
equally.
The results of this study justify further work to investigate the impact of
the Plant–Craig scheme on ensemble forecasts. Since the version of MOGREPS
used in this study has been superseded, it is not feasible to carry out a
more detailed investigation beyond the proof of concept carried out in the
present study. Interestingly, the resolution used in this study is now becoming more
widely used in global ensemble forecasting, and so future work could involve
implementing the scheme in a global NWP system, for example the global version
of MOGREPS. This would enable assessments to be made as to whether the scheme
provides benefits for the representation of tropical convection, in addition
to those aspects of mid-latitude convection that were demonstrated here.
Code and/or data availability
The source code for the Plant–Craig parameterization, as it was used in this
study, can be made available on request, by contacting r.s.plant@reading.ac.uk.