This study was motivated by the use in air pollution epidemiology and health
burden assessment of data simulated at 5 km
The adverse associations between ambient air pollution – especially
particulate matter (PM), ozone (O
Whilst policies and legislation have been put in place to limit and mitigate the impacts of air pollution (Heal et al., 2012), there is increasing recognition that more effective protection of human health may be achieved by not focusing on individual pollutants, but by taking a multi-pollutant approach (Dominici et al., 2010). Compared with the traditional single pollutant focus (WHO, 2006), an approach based on pollution mixtures has the advantage of enabling the complexity of exposures and health effects to be characterized more fully: it can help identify harmful emission sources, and it has the potential to provide a more effective framework for air-quality regulation, for example by focusing on sources and pathways that influence several pollutants at once. There are analytical complexities in assessing the potential interactions between combinations of pollutants (Kim et al., 2007; Mauderly and Samet, 2009), including the paucity of measured exposure data, which are typically derived from relatively sparse monitoring sites that may measure different combinations of pollutants at different locations. Furthermore, monitor networks are usually established for compliance with legislation (e.g. deliberately sited close to, or away from, pollution sources), and so may lack representativeness for characterizing population exposure (Duyzer et al., 2015), leading to bias in air pollution epidemiology (Sheppard et al., 2012).
Modelling can increase the availability of air pollution data (Jerrett et al., 2005). The current gold standard for air-quality modelling are process-based, deterministic atmospheric chemistry models (Colette et al., 2014). These seek to simulate the multitude of complex factors that govern the spatial and temporal variability in air pollutant concentrations, including the distributions of different emissions sources, local and long-range dispersion processes, in situ photochemistry, and dry and wet deposition processes.
As part of a multi-institution project on the health impacts of exposure to
multiple pollutants, we have derived UK-wide distributions of surface air
pollution at hourly temporal resolution over multiple years (2001–2010), at
5 km
The high temporal and spatial resolution output from the EMEP4UK-WRF model has many advantages for air pollution studies, including (i) provision of data at times and locations where monitoring data are not available; this has the dual benefit of increasing effective sample size in multi-pollutant health epidemiology and of reducing reliance on the assumption that a single monitor is representative of species concentrations over a large area; (ii) provision of data on individual particle chemical components in addition to the aggregated mass concentration of PM that is measured; (iii) the facility to explore many related aspects such as geographical or demographic differences in exposures to air pollutant mixtures (and related issues of environmental justice); and (iv) the impacts of potential future emissions scenarios.
It is important to have an understanding of the performance capabilities of
any model, relevant to the use to which the model output is to be put. Much
has been written on air quality model evaluation (see, for example,
Vautard et al., 2007; Dennis et al., 2010; Derwent et al., 2010; Rao et al.,
2011; Thunis et al., 2012, 2013; Pernigotti et al., 2013),
including publications arising out of international collaborative programmes
such as AQMEII (Air quality modelling evaluation international initiative,
The objective of this paper is to record detailed assessment of the modelled
surface concentrations of O
The EMEP MSC-W regional Eulerian ACTM is described in Simpson et al. (2012)
and at
Anthropogenic emissions of NO
The default EMEP MSC-W photochemical scheme was used, which contains 72
gas-phase species and 137 reactions; the gas/aerosol partitioning formulation
was the Model for an Aerosol Reacting System (MARS) (Binkowski and Shankar,
1995). Simulation of secondary organic aerosol (SOA) formation, ageing, and
partitioning was via the 1-D volatility basis set (Donahue et al., 2006) with
its implementation in the model as described by Bergström et al. (2012).
The EMEP4UK model output for PM
Hourly measurements of the concentrations of NO
A data capture threshold of 75 % was applied throughout the process of
calculating statistics from the hourly measurements, as is standard protocol
for EU data reporting
(
Comparison with model output was only undertaken for AURN sites with a
AURN monitoring sites are classified according to their general location and
proximity to particular sources of air pollution
(
The coordinates of each AURN station with valid measurements during the
period 2001–2010 were used to locate the 5 km
Measurements from the UK AURN adhere to EU Directives on reference
instrumentation and QA/QC procedures. Concentrations of NO
The objective of all these external scaling processes for these PM
measurements has been to provide the best practical measure of “reference
equivalent” PM
Irrespective of these changes to PM
Numbers of UK AURN (Automatic Urban and Rural Network) sites
satisfying the data capture criteria described in Sect. 2.2, together with
model–measurement statistics (as defined in Sect. 2.4) for the 10-year means
of NO
The coherence between long-term spatial patterns of modelled and measured
concentrations was investigated through the correlation across sites of the
10-year (2-year for PM
The daily pollutant metrics were grouped by day-of-week, month-of-year, and year of the 10-year period. Statistics were then calculated on the grouped pairs of daily model simulations and measurements for each pollutant at each site, and summarized by site type. Of the various statistics proposed for quantifying the performance of air-quality models, correlation, bias, and RMSE are consistently cited for evaluation against policy-relevant metrics of pollutant concentration (USEPA, 2007; Derwent et al., 2010; Thunis et al., 2012). The first two statistics in particular are important for application to health studies (see the Discussion).
Scatter plots of the 10-year means of the modelled and measured
pollutant daily metrics at each site, grouped by site type, and with data
markers shaded according to the latitude of the measurement site:
In each of the following, the index
Pearson's correlation coefficient:
Mean bias:
Root mean square error:
The FAC2 statistic, the proportion of all pairs of modelled and observed concentrations that are within a factor of 2 of each other, was also calculated. This statistic provides an additional general indication of overall model skill.
Scatter plots of the individual-site model versus measurement 10-year means
of NO
Figure 1a shows excellent model–measurement agreement in 10-year mean
NO
For urban sites, model–measurement agreement was generally better at lower
latitude sites, i.e. for sites in the south of the UK compared with sites in
the north (Fig. 1a). The slight increase in model negative bias for NO
Correlation of the normalized bias between model and measurement
10-year means of pollutant daily metrics (2-year mean for PM
Figure 1b shows that the modelled 10-year mean of daily max 8 h mean O
As for NO
The lack of model–measurement spatial correlation in 10-year mean O
The 10-year means of daily-mean simulations of PM
In general there were no strong associations between model–measurement bias
for 10-year mean PM
Median (25th percentile, 75th percentile) values of the
Figure 1d shows that all 2-year mean modelled PM
Model–measurement spatial correlation of PM
Table 3 summarizes the individual-site model versus measurement
The temporal variability in daily NO
Table 3 shows that the agreement between modelled and measured temporal
variability in daily PM
Figure 2 shows box–whisker plots summarizing the individual site
model–measurement
Model–measurement statistics per site for NO
These seasonal variations may have a variety of causes. In terms of chemical
and meteorological effects, the NO
As with daily mean NO
Model–measurement statistics per site for O
Model–measurement statistics per site for PM
Again, the general consistency in temporal correlation with site type and time period, compared with the variation in bias, is consistent with the main driver of model shortcoming being in accuracy of emissions (totals and temporal disaggregation) rather than in simulation of atmospheric chemistry and transport processes.
The focus in this work was model–measurement comparisons at daily and annual
averaging resolutions, but concentration data were available at hourly
resolution and the Supplement presents figures and discussion of the
comparison statistics for NO
The work presented here was motivated by the use of the EMEP4UK-WRF model
output for air pollution epidemiology and health burden assessment; therefore
the model–measurement comparison focused on health-relevant metrics for the
most important ambient air pollutants: specifically the annual and daily
means for PM
Even for a well-specified Eulerian model (in terms of input data, transport,
chemistry, etc.), model–measurement agreement may not be perfect for (at
least) the following two reasons: (i) the model simulates a volume-averaged
concentration, whereas the monitor records the composition of the air in one
part of that volume, which may or may not reflect the average concentration
for the whole volume over the relevant time-averaging period; and (ii) the
measurement may be in error. A rural background monitor in homogenous terrain
and well away from local sources may be anticipated to be sampling air that
is more homogenous over the 5 km
Model–measurement statistics per site for PM
The presence of measurement uncertainty constrains the extent to which
model–measurement statistics can be used to evaluate the performance of a
model. The FAIRMODE project (fairmode.jrc.ec.europa.eu) has developed a
series of relationships, published in Thunis et al. (2012, 2013), Pernigotti
et al. (2013), and in documents on the FAIRMODE website, that define minimum
values for model–measurement statistics, given values for the measurement
uncertainty,
The intention here is to provide an overview of how the EMEP4UK-WRF
model–measurement statistics presented here compare with threshold criteria
for evaluation of an air-quality model in the European air-quality context.
It is recognized that satisfying the MPC is a necessary but not sufficient
part of model validation. Nevertheless, Table 3 shows that in all instances
the site-mean model–measurement
Although MPC values cannot be calculated here for daily mean NO
FAIRMODE also outlines an approach to defining a model quality objective for
bias relative to long-term average pollutant concentration measurement. The
absolute values for this MQO bias, calculated using the measurements relevant
to this study, are presented in Table 2 for each pollutant and site type, and
are also demarcated by the green lines in the scatter plots of modelled
versus measurement long-term means in Fig. 1. Minimum model performance is
satisfied if
The UK AURN operates as a single network subject to standardized QA/QC
procedures (as described in the Sect. 2), so measurement uncertainty might be
lower than the values derived by the FAIRMODE project for measurement across
multiple networks. On the other hand, the MPC values in Table 3 show that
allowing for increasing measurement uncertainty at lower concentrations very
considerably relaxes the threshold of an MPC. Also, as described in
Sect. 2.2, instrumentation for “real-time” measurement of PM
Although the EMEP4UK-WRF model–measurement statistics reported in Tables 2
and 3 are for the most part in line with or better than anticipated model
performance criteria, there were also instances of trends in statistics with
site type, month-of-year, and day-of-week. (In general there were no obvious
inter-annual trends across the decade of comparisons.) Bias was least overall
for rural sites (e.g. median normalized mean bias values for O
The positive model bias for O
Instances of trends in model–measurement bias with month or day-of-week are described in the Results section. The generally good daily temporal correlations discussed already indicate that the model captured the day-to-day changes in air mass movements which are the strongest influences on surface concentrations of pollutants at this temporal resolution. The observed seasonal and weekday–weekend variations in bias (and of diurnal variations in bias – see the Supplement) are therefore strongly suggestive of shortcomings in the monthly and weekday–weekend (and hour-of-day) emissions factors applied in the model to disaggregate the annual total emissions supplied by the emissions inventories.
As stated at the outset, the motivation here was use of the EMEP4UK-WRF model output for health studies. In the context of use of concentration data for epidemiology, in the broadest terms correlation is more important than bias, and for the model output reported here, model–measurement correlations (both temporal and spatial) were generally considerably better, particularly for the gaseous pollutants, than bias statistics. Epidemiological studies of association of ambient air pollution with health require an estimate of exposure for each subject, most usually from measurements from monitors, but increasingly from models. The difference between the estimates and a hypothetical gold standard, for example concentration outside the residence of each subject, is called exposure measurement error. (It is assumed here that it is the association of ambient pollution with health outcome at the small-area level that is important, because of the link to regulation (Dominici et al., 2000), rather than exposure at the level of the individual, and therefore issues of disparity between the concentration at a location and true personal exposure are not considered.) The consequences of measurement error are to reduce the power of the study to detect an association and to bias the magnitude of the association (Sheppard et al., 2005, 2012; Armstrong and Basagaña, 2015).
The agreement statistics determining the magnitude of this “blunting” depends on the specific context. Study power is simplest, depending only on the correlation between the true and estimated exposure. Of the two main types of epidemiological studies of air pollution: in “spatial studies” power is diminished according to the correlation of long-term true and estimated means over space; in “time series studies” it depends on correlations of daily values over space. Thus the model–measurement correlations reported in Sect. 3.1 and 3.2 have a fairly direct implication for study power in those two study types, except that errors in the measured values as estimates of the mean over the population in the grid square (or wider area) are not allowed for. Because of this, the power of studies using modelled concentrations would be somewhat better than implied by the correlations reported (Butland et al., 2013).
Low correlation of “true” and estimated exposures also often reduces estimated size of association (e.g. relative risk per unit exposure), but other aspects of the error distribution also matter, notably the extent of Berkson or classical type (Butland et al., 2013; Armstrong and Basagaña, 2015). It is difficult and beyond the scope of this paper to separate Berkson and classical error, but in the absence of this it would be reasonable to consider the model–measurement correlations as broad guides to bias in association as well as power. Perhaps surprisingly, additive bias (e.g. estimating concentration 10 units too high on average) has little effect in epidemiological studies, at least if the exposure–health association is assumed linear, as it usually is (although bias in association is also dependent on relative magnitudes of variance in “true” and estimated exposures).
As well as the good temporal correlations for daily pollutant metrics, the good spatial correlations between long-term averaged modelled and measured concentrations across urban sites for all four pollutants selected encouragingly suggest that the EMEP4UK-WRF modelled pollutant concentration may broadly reduce exposure measurement error caused by using pollution measurements from air pollution monitors far from the population under consideration. On the other hand, a bias error in the simulations contributes to uncertainty in the investigation of any threshold in concentration–health effect, and in health impact assessments that apply concentration–response functions to estimated concentrations of exposure.
This study has worked with the EMEP4UK-WRF v4.3 model. Model–measurement
statistics will be different for other models. However, other ACTMs are
similarly constructed, and so the broad discussion points relating to
intrinsic limitations to monitor versus grid-volume comparison statistics,
unresolved sub-grid variabilities, and shortcomings in magnitudes and
temporal trends in emissions are generalizable. Local dispersion models can
better represent the sources and dispersion at high spatial resolution, but
these can only be configured for specific urban areas at a time, are
similarly constrained by the accuracy of the spatiotemporal emissions data
and require provision of boundary conditions of meteorology and atmospheric
composition (often supplied by an ACTM). Dispersion models have also been
combined with land-use regression models (Wilton et al., 2010; Michanowicz et
al., 2016) but again for individual areas only. Some progress is being made
in combining measurement (both ground-based and satellite) and model data
through data assimilation (e.g. MACC-II: Monitoring Atmospheric Composition
and Climate – Interim Implementation
(
This study was motivated by the use in air pollution epidemiology and health
burden assessment of data simulated at 5 km
In general for epidemiology, capturing correlation is more important than bias and RMSE, and in this study model–measurement temporal correlation of daily concentrations generally exceeded minimum performance values calculated from methods reported in the literature that take into account potential measurement uncertainties. Model–measurement bias varied according to monitor site classification, with generally less bias at rural background compared with urban background sites, but bias was again better (i.e. smaller) than values that take account of uncertainties in the measurements. The greater consistency in temporal correlation with site type and across months and day-of-week, compared with variations in bias, is strongly indicative that the main driver of model shortcoming is inaccuracy of emissions (totals and the monthly and day-of-week temporal factors applied in the model to the totals) rather than in simulation of atmospheric chemistry and transport processes.
Despite discussed limitations, these detailed analyses support use of model data such as these in air pollution epidemiology. Air pollution modelling at the spatial coverage and spatial resolution described here has the benefit of increasing study power, of providing data for air pollutant components that are either not, or only sparsely, measured, and of enabling investigation of the potential effects of alternative future scenarios.
This study used output from the EMEP4UK-WRF model,
which is a regional application of the European Monitoring and Evaluation
Programme (EMEP) MSC-W model (available at
The authors declare that they have no conflict of interest.
This work was supported by funding from the Natural Environment Research
Council and Medical Research Council Environmental Exposure and Human Health
Initiative (EEHI) grants NE/I007865/1, NE/I007938/1, and NE/I008063/1. The
EMEP4UK model is also supported by the UK Department for the Environment,
Food and Rural Affairs (Defra) and the NERC Centre for Ecology & Hydrology
(CEH). We acknowledge access to the AURN measurement data, which were
obtained from