Ozone air quality simulations with WRF-Chem (v3.5.1) over Europe: model evaluation and chemical mechanism comparison

We present an evaluation of the online regional model WRF-Chem over Europe with a focus on groundlevel ozone (O3) and nitrogen oxides (NOx). The model performance is evaluated for two chemical mechanisms, MOZART-4 and RADM2, for year-long simulations. Modelpredicted surface meteorological variables (e.g., temperature, wind speed and direction) compared well overall with surface-based observations, consistent with other WRF studies. WRF-Chem simulations employing MOZART-4 as well as RADM2 chemistry were found to reproduce the observed spatial variability in surface ozone over Europe. However, the absolute O3 concentrations predicted by the two chemical mechanisms were found to be quite different, with MOZART-4 predicting O3 concentrations up to 20 μgm−3 greater than RADM2 in summer. Compared to observations, MOZART-4 chemistry overpredicted O3 concentrations for most of Europe in the summer and fall, with a summertime domain-wide mean bias of +10 μgm−3 against observations from the AirBase network. In contrast, RADM2 chemistry generally led to an underestimation of O3 over the European domain in all seasons. We found that the use of the MOZART-4 mechanism, evaluated here for the first time for a European domain, led to lower absolute biases than RADM2 when compared to ground-based observations. The two mechanisms show relatively similar behavior for NOx , with both MOZART-4 and RADM2 resulting in a slight underestimation of NOx compared to surface observations. Further investigation of the differences between the two mechanisms revealed that the net midday photochemical production rate of O3 in summer is higher for MOZART-4 than for RADM2 for most of the domain. The largest differences in O3 production can be seen over Germany, where net O3 production in MOZART-4 is seen to be higher than in RADM2 by 1.8 ppbh−1 (3.6 μgm−3 h−1) or more. We also show that while the two mechanisms exhibit similar NOx sensitivity, RADM2 is approximately twice as sensitive to increases in anthropogenic VOC emissions as MOZART-4. Additionally, we found that differences in reaction rate coefficients for inorganic gas-phase chemistry in MOZART-4 vs. RADM2 accounted for a difference of 8 μgm−3, or 40 % of the summertime difference in O3 predicted by the two mechanisms. Differences in deposition and photolysis schemes explained smaller differences in O3. Our results highlight the strong dependence of modeled surface O3 over Europe on the choice of gas-phase chemical mechanism, which we discuss in the context of overall uncertainties in prediction of ground-level O3 and its associated health impacts (via the health-related metrics MDA8 and SOMO35).


Introduction
Tropospheric ozone (O 3 ) is an air pollutant, with adverse effects on human and ecosystem health, as well as a shortlived climate forcer with a significant warming effect (e.g., Monks et al., 2015;Stevenson et al., 2013;WHO, 2003).In Europe, ozone pollution remains a problem: the European Environmental Agency reports that between 2010 and 2012, 98 % of Europe's urban population was exposed to O 3 levels in exceedance of the WHO air quality guideline (EEA, 2014), leading to more than 6000 premature deaths annually (Lelieveld et al., 2015).This is despite the fact that European emissions of ozone precursors, in particular nitrogen oxides (NO x ) and volatile organic compounds (VOCs), Published by Copernicus Publications on behalf of the European Geosciences Union.
have decreased significantly since 1990.The persistence of unhealthy levels of ozone in Europe can be attributed to increases in hemispheric background ozone (Wilson et al., 2012) as well as the nonlinear relationship between O 3 and levels of precursor species NO x and VOC (EEA, 2014).
Air quality models are employed to understand the drivers of air pollution at a regional scale and to evaluate the roles of and interactions between emissions, meteorology and chemistry.These models fall into two broad categories: offline chemistry transport models (CTMs), in which meteorology is calculated separately from model chemistry, and online models, the category to which WRF-Chem belongs, in which the meteorology and chemistry are coupled, meaning they are solved together in a physically consistent manner (e.g., Zhang, 2008).The meteorology and chemistry components in WRF-Chem use the same horizontal and vertical grids and same time step, eliminating the need for temporal interpolation (e.g., Grell et al., 2004Grell et al., , 2005)).
Air quality modeling studies over the European region have predominantly utilized CTMs, examples of which include EMEP (Simpson et al., 2012), CHIMERE (Terrenoire et al., 2015), and LOTOS-EUROS (Schaap et al., 2008).The application of online coupled regional meteorologychemistry models in Europe, among them WRF-Chem, has been recently reviewed by Baklanov et al. (2014).The use of WRF-Chem over Europe has increased in recent years (e.g., Forkel et al., 2012;Žabkar et al., 2015;Solazzo et al., 2012a, b;Tuccella et al., 2012;Zhang et al., 2013a, b).However, only a limited number of these studies are dedicated to the evaluation of WRF-Chem-simulated meteorology and chemistry over the whole European domain.The study of Tuccella et al. (2012) evaluated the performance of WRF-Chem using the RADM2 chemical mechanism by comparing domain-wide average values against observations of meteorology and chemistry.However, an evaluation of the spatial distribution of model-simulated meteorology and trace gases is missing.This type of spatial information is extremely pertinent for air quality management applications, where model performance at a national scale can become more relevant than performance metrics applied to all of Europe; this information gets lost when only comparing quantities that have been averaged over the entire domain.Additionally, Tuccella et al. (2012) utilized time-invariant chemical boundary conditions, which the authors suggested misrepresented the seasonal changes in the intercontinental transport (Tuccella et al., 2012).The importance of temporally varying chemical boundary conditions in air quality modeling has also been stressed in other studies (including Akritidis et al., 2013;Andersson et al., 2015).In addition to the study of Tuccella et al. (2012), Zhang et al. (2013b) evaluated the performance WRF-Chem-MADRID (Zhang et al., 2010), an unofficial version of WRF-Chem coupled to the Model of Aerosol Dynamics, Reaction, Ionization, and Dissolution (MADRID), over Europe for the month of July 2001, employing the gasphase mechanism CB05 (Yarwood et al., 2005).This de-tailed study provides a valuable reference for comparison to the present work, but their simulations are only done for 1 month, rather than the complete seasonal cycle.
Several groups contributed WRF-Chem simulations to the AQMEII project (phase 1 and phase 2) for the European domain (Solazzo et al., 2012b;Im et al., 2015).In AQMEII phase 1, two different WRF-Chem simulations were part of the model ensemble for Europe, but evaluation of model performance for ozone focused on evaluation of the ensemble (Solazzo et al., 2012b), rather than on individual members.In fact, in the analysis of Solazzo et al. (2012b), individual models were anonymized, meaning the performance statistics for the WRF-Chem ensemble members are not explicitly presented.The evaluation of model performance with respect to ozone in AQMEII phase 2 (Im et al., 2015) provides more information on the model performance of the contributing WRF-Chem ensemble members for the European domain.In AQMEII phase 2, seven different WRF-Chem runs were part of the ensemble.Of these seven simulations, four of them used the gas-phase chemical mechanism RADM2 (Stockwell et al., 1990), two used the mechanism CBMZ (Zaveri and Peters, 1999), and one used the mechanism RACM (Stockwell et al., 1997;Geiger et al., 2003).All WRF-Chem simulations for Europe in AQMEII phase 2 tended to underestimate ozone concentrations, with annual average normalized mean bias ranging from −1.6 to −15.8 %, depending on the ensemble member.
The purpose of the present study is to perform a detailed evaluation of meteorology and gas-phase chemistry simulated by WRF-Chem, including the spatial and seasonal variations over a full-year seasonal cycle using timevarying chemical boundary conditions.This evaluation is performed for two different gas-phase chemical mechanisms within WRF-Chem: MOZART-4 (Emmons et al., 2010) and RADM2 (Stockwell et al., 1990).As discussed above, the RADM2 mechanism has been popularly used in WRF-Chem for simulation over Europe (Tuccella et al., 2012;Im et al., 2015).The MOZART-4 chemical mechanism has been widely used with WRF-Chem for regional air quality applications outside of Europe (e.g., Pfister et al., 2013;Im et al., 2015).To the authors' knowledge, however, WRF-Chem with  has not yet been applied and evaluated over a European domain.
The simultaneous evaluation of WRF-Chem with two different chemical mechanisms further allows us to evaluate the sensitivity of O 3 and NO x to the choice of chemical mechanism in a setup where the differences in model physics and other parameters are minimized.This is in contrast to the study of Im et al. (2015), where the various WRF-Chem ensemble members also used different schemes for model physics.Coates and Butler (2015) recently investigated the sensitivity of the production of odd oxygen (O x , a proxy for production of O 3 ) to the choice of chemical mechanism using a box model, and found that choice of chemical mechanism led to differences in O 3 concentrations on the order of 10 ppb under idealized conditions, although differences between the MOZART-4 and RADM2 chemical mechanisms tended to be closer to 5 ppb.In another box model study, Knote et al. (2015) investigated the sensitivity of O 3 , NO x , and other radicals to the different gas-phase chemical mechanisms used in the models that contributed to the AQMEII phase 2 intercomparison project.Knote et al. (2015) found that the choice of chemical mechanism is responsible for a 5 % uncertainty in predicted O 3 concentrations and a 25 % uncertainty in predicted NO x concentrations.
The present study builds on the work of Coates and Butler (2015) and Knote et al. (2015) by comparing two chemical mechanisms within an online coupled regional air quality model.The use of WRF-Chem provides an advantage in that it is compatible with multiple different chemical mechanisms, allowing us to test the effect of different chemistry with minimal confounding factors due to differences in model physics, etc.Furthermore, the use of an online regional model rather than a box model allows us to examine the sensitivity of model-predicted concentrations to the choice of chemical mechanism under more realistic conditions, in which variations in meteorology and dynamics are fully included.Parameters such as radiation are allowed to vary realistically, and different chemical regimes (NO x vs. VOC limited) are present (e.g., in different seasons and in different parts of the model domain).
Chemical mechanism comparisons have also been undertaken previously using 3-D regional air quality models, though the majority have focused on comparing the SAPRC-99 mechanism (Carter, 1990) with versions of the Carbon bond mechanism (Gery et al., 1989) over a US domain (Luecken et al., 2008;Faraji et al., 2008;Yarwood et al., 2003;Zhang et al., 2012).Two additional studies have compared versions of the RACM mechanism with RADM2 (Mallet and Sportisse, 2006) and CB05 (Kim et al., 2010) using the model Polyphemus (Mallet et al., 2007) for a European domain.Typically, these studies found that simulations using two different chemical mechanisms led to differences in O 3 on the order of 5-10 ppb (Luecken et al., 2008;Zhang et al., 2012;Mallet and Sportisse, 2006;Kim et al., 2010), although extreme differences of 30-40 ppb were observed between SAPRC-99 and CB-IV mechanisms when simulating high ozone episodes (Faraji et al., 2008;Yarwood et al., 2003).
In this paper, the model configuration, including emissions and initial and boundary conditions, is described in Sect. 2. A description of observational datasets for meteorology and chemistry and the evaluation methodology is provided in Sect.3. Results for the model evaluation and intercomparison of two chemical mechanisms are presented in Sect. 4 followed by a summary and concluding remarks in Sect. 5.
We defined our simulation domain on the Lambert projection.The model domain is centered at 15 • E, 52 • N, and covers nearly the entire European region.The horizontal resolution is chosen to be 45 km × 45 km.The model domain has 115 and 100 grid points in the west-east and south-north directions, respectively.
We have used 35 vertical levels in the model starting from surface to 10 hPa.The lowest model level corresponds to an approximate altitude of 50 m above the surface.Tests have shown that surface layer concentrations in this configuration are effectively the same as when the lowest model level is at a height of 14 m, but with no urban surface physics scheme (the urban physics scheme is incompatible with a 14 m model level).Geographical data including terrain height, soil properties, albedo, etc. are interpolated primarily from USGS (United States Geological Survey data; Wang et al., 2014) at 30 s resolution.The land use classification has been interpolated from the CORINE data (EEA, 2012) at 250 m resolution, which was then mapped to the USGS land use classes used by WRF (see Kuik et al., 2016).
Model simulations are conducted for the period of 23 December 2006 to 31 December 2007.The first week of output was treated as model spin up and has been discarded.The instantaneous model output, stored every hour, has been used for the analysis.The different options used in this study to parameterize the atmospheric processes are listed in Table 1.A namelist is available in the Supplement.
The initial and lateral boundary conditions for the meteorological fields were provided from the ERA-Interim reanalysis dataset available from ECMWF (http://www.ecmwf.int/en/research/climate-reanalysis/era-interim).These data are available every 6 h with a spatial resolution of approximately 80 km (T255 spectral).In order to limit the errors in the WRF-simulated meteorology, four-dimensional data assimilation (FDDA) has been applied.In the FDDA, temperature is nudged at all the vertical levels with a nudging coefficient of 0.0003.The horizontal winds are nudged at all the vertical levels, except within the planetary boundary layer (PBL), with the nudging coefficient of 0.0003.Sensitivity studies performed showed that nudging of water vapor highly suppressed the precipitation over Europe in a manner inconsistent with observations.As such, water vapor is not nudged in our simulations.This also follows the approach of, e.g., Miguez-Macho et al. (2004) and Stegehuis et al. (2015).The nudging coefficients for temperature and winds have been chosen following previous studies (Stauffer et al., 1991; (Kusaka and Kimura, 2004) Planetary boundary layer Yonsei University scheme (Hong et al., 2006) Cumulus parameterization Grell 3-D scheme (Grell and Dévényi, 2002) Liu et al., 2012).The time step for the simulations has been set at 180 s.

Emissions
Anthropogenic emissions of CO, NO x , SO 2 , NMVOCs, PM 10 , PM 25 , and NH 3 are used from the TNO-MACC II emission inventory for Europe (Kuenen et al., 2014) for the year 2007.These emissions are provided as yearly totals by source sector on a high-resolution (7 km × 7 km) grid.The TNO-MACC II emission inventory is based on emissions reported by member countries to the European Monitoring and Evaluation Program (EMEP), which are then further refined to fill gaps and correct errors and obvious inconsistencies.Emissions are temporally disaggregated based on seasonal, weekly and diurnal cycles provided by Denier van der Gon et al. (2011) and Schaap et al. (2005).These temporal profiles vary by source sector according to the SNAP (selected nomenclature for sources of air pollution) convention.NMVOC emissions are split into modeled NMVOC species (e.g., ethane, aldehydes) based on von Schneidemesser et al. (2016).NO x is emitted as 90 % NO, and 10 % NO 2 by mole.Emissions are distributed into the first seven model vertical layers (the surface and the first six model layers above the surface) based on sectoral averages from Bieser et al. (2011), although model runs showed little sensitivity to the distribution of emissions above the surface layer.
The model domain used in this study is larger than the European domain used in the TNO-MACC II inventory (Kuenen et al., 2014).Emissions at our domain edges were filled using the Hemispheric Transport of Air Pollution (HTAP v2.2) emission inventory for the year 2008 (http: //edgar.jrc.ec.europa.eu/htap_v2/index.php).The HTAP v2 data, described in detail by Janssens-Maenhout et al. (2015), are harmonized at a spatial resolution of 0.1 • × 0.1 • and available with monthly time resolution.In our model simulations, no additional weekly or diurnal profiles were applied to the HTAP v2 emissions.Furthermore, all emissions from HTAP were emitted into the surface model layer.Because HTAP emissions were only used at the grid edge, the differences in temporal and vertical resolution of emissions used for HTAP are not expected to have a significant impact on model results.An example of emissions processed for model input is shown Fig. S1 in the Supplement.Biomass burning emissions are from the fire inventory from NCAR (FINN), version 1 (Wiedinmyer et al., 2011).To avoid the double counting of emissions from agricultural burning (i.e., assuming that the FINN product captures largescale agricultural burning), emissions of the combustion species CO, NO x , and SO 2 from SNAP category 10 (agriculture) in the TNO-MACC II inventory were not included in model simulations, at the suggestion of H. A. C. van der Gon (personal communication, 2015).Biogenic emissions are calculated online based on weather and land use data using the model of emissions of gases and aerosols from nature (MEGAN) (Guenther et al., 2006).

Model chemistry
The two year-long WRF-Chem simulations performed for this study are summarized in Table 2.In the MOZART simulation, gas-phase chemistry is represented by the Model for Ozone and Related chemical Tracers, version 4 (MOZART-4) mechanism (Emmons et al., 2010).Tropospheric chemistry is represented by 81 chemical species, which participate in 38 photolysis and 159 gas-phase reactions.The MOZART-4 mechanism includes explicit representation of the NMVOCs ethane, propane, ethene, propene, methanol, isoprene, and α-pinene.Other NMVOC species are represented by lumped species based on the reactive functional groups.In the WRFV3.5.1 code, two bug fixes have been included for the MOZART-4 mechanism: the NH 3 + OH rate coefficient has been corrected following Knote et al. (2015), and a correction has been made to treatment of the vertical mixing of MOZART-4 species (A. K. Peterson, personal communication, 2014).In the WRF-Chem simulations, we use the version of MOZART-4 coupled to the simple GO- CART aerosols mechanism (Ackermann et al., 1998), known as the MOZCART mechanism.In this paper, we limit our analysis to gas-phase species.Because of this focus, and to simplify the interpretation of the mechanism intercomparison (see below), all aerosol radiative feedbacks (i.e., both direct and indirect effects) are turned off in all model simulations in this study.
In the RADM2 simulation, gas-phase chemistry is represented by the second-generation Regional Acid Deposition Model (RADM2) (Stockwell et al., 1990).This mechanism has 63 chemical species which participate in 21 photolysis and 136 gas-phase reactions.The NMVOC oxidation in RADM2 is treated in a less-explicit fashion than in MOZART, in which ethane, ethene, and isoprene are the only species treated explicitly and all other NMVOCs are assigned to lumped species based on OH reactivity and molecular weight.In WRF-Chem, RADM2 is coupled to the MADE/SORGAM aerosol module, which is based on the Modal Aerosol Dynamics Model for Europe (MADE) (Binkowski and Shankar, 1995;Ackermann et al., 1998) and Secondary Organic Aerosol Model (SORGAM) (Schell et al., 2001).However, as noted above, in this study we focus our analysis on gas-phase chemistry.
In both the RADM2 and MOZART simulations, the chemical mechanism code was generated with the kinetic preprocessor (KPP) (Damian et al., 2002;Sandu and Sander, 2006), and equations are solved using a Rosenbrock-type solver.Note that when using RADM2 chemistry, there are two different solvers available within WRF-Chem.We chose to use the KPP chemistry and Rosenbrock solver to be consistent with the MOZART runs, and also because the alternative QSSA chemistry solver has been shown to have problems representing NO x titration (Forkel et al., 2015).In particular, the QSSA treatment of RADM2 chemistry was found to result in an underrepresentation of nocturnal ozone titration for areas with high NO emissions.

Observational datasets
A summary of the observational datasets used for model evaluation can be found in Table 3.

Meteorology
Since WRF-Chem couples the meteorology simulations online with the chemistry, we begin by evaluating the modeled meteorological fields against observations which are driving the simulations of chemical fields.In this study, the WRF-Chem-simulated meteorological fields are evaluated against the in situ measurements of mean sea-level pressure (MSLP), 2 m temperature (T2) and 10 m wind speed and direction (WS10 and WD10, respectively) from the Global Weather Observation dataset provided by the British Atmospheric Data Center (BADC).We chose these meteorological variables for the evaluation as these are expected to have the most significant influence on the gas-phase chemistry, which is the main focus of this study.

EMEP network
The EMEP observational dataset provides surface measurements of pollutant concentrations, including tropospheric ozone and its precursors, at stations chosen to be representative of regional background pollution (see, e.g., Tørseth et al., 2012).The regional focus is in keeping with the goals of the Convention on Long-range Transboundary Air Pollution (CLRTAP), under which this network is administrated.

AirBase network
AirBase is the public air quality database of the European Environmental Agency (EEA), and represents a much denser network of monitoring than the EMEP network (http://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-database-7).Because of the relatively coarse horizontal resolution in this model study, model output is only compared against AirBase stations that are classified as "rural background".The station classification was taken from the metadata provided by the EEA for AirBase.Some AirBase stations are also part of the EMEP network; the subset of AirBase stations used in this study exclude any stations that are also part of the EMEP network (since they are already included in the evaluation against EMEP observations).

Evaluation methodology
Stations were excluded from our season-by-season analysis if the temporal coverage was less than 75 %, i.e., if missing or flagged hourly (or 3-hourly) data represented more than 25 % of the hourly (or 3-hourly) time series over the entire season.
For sensitivity studies that consider the month of July only, stations were considered that had at least 75 % temporal coverage for the month.This criteria was applied for all meteorological and chemistry observations.For comparison of model output to in situ observations, the model grid cell that is closest to the latitude/longitude location of the measurement station was chosen.Statistics calculated include the mean, mean bias (MB), normalized mean bias (NMB), mean fractional bias (MFB), and the temporal correlation coefficient (r).The domain-wide statistics presented in Tables 4-9 were calculated by first calculating the statistical quantity hour by hour at each station, and then averaging these values over all times (in the season) and all stations.Definitions of calculated statistical quantities can be found in Appendix B. When applying these statistics to wind direction, it was treated as a scalar quantity, when in fact it is a vector.This simple approach was favored over applying a correction (as done by, e.g., Zhang et al., 2013a in cases where the difference in modeled vs. observed wind direction was greater than 180 • ).This is not expected to make an important impact on our analysis, es-pecially since northerly winds (i.e., centered around 0 From hourly concentrations of O 3 , both observed and modeled, additional ozone metrics for health impacts are calculated.MDA8 is defined as the maximum daily 8 h mean ozone, in accordance with the European Union's Air Quality Directive.Note that for calculation of MDA8, a missing value was assigned if 1 or more hours of data in the 8 h average were missing.SOMO35 is an indicator of cumulative annual ozone exposure used in health impact assessments.The accumulated health impact is assumed to be proportional to the sum of concentrations above a cutoff of 35 ppb, chosen because the relationship between O 3 and adverse effects is very uncertain below this threshold (WHO, 2013).Mathematically, SOMO35 is defined as the sum of MDA8 levels over 35 ppb (70 µg m −3 ) over a year, in units of concentration multiplied by days, following Amann et al. (2008).
where N valid is the number of valid (i.e., not missing) daily values.4 Results and discussion

Evaluation of meteorology
Table 4 shows a summary of domain-wide statistics evaluating the MOZART model simulation against observations of meteorological variables MSLP, T2, WS10, and WD10; the spatial distribution of these statistics shown in Figs.1-3 for temperature and wind variables.Differences in predicted meteorology between the MOZART and RADM2 simulations are small, with differences in MSLP less than one hundredth of 1 %, and differences in T2, WS10, and WD10 gen-erally far below 1 %.Since the simulations were run without aerosol-radiative feedbacks, it was expected that the two simulations would show minimal differences in meteorology, and we conclude that differences in O 3 and NO x predicted in the MOZART and RADM2 simulations (Sect.4.2) are a direct result of differences in the chemistry, rather than chemistry-radiative feedbacks.Statistics for meteorology for the RADM2 simulation can be found in the Supplement, Table S1, and Figs.S4-S7.MSLP has been reproduced over the entire European domain with a high degree of skill in every season for both sim- The spatial distribution of seasonal average T2 in the model and observations is shown in Fig. 1, along with the spatial variation in mean bias and temporal (3-hourly) correlation.Overall, the spatial variability in measured T2 is found to be well reproduced by WRF-Chem during all the seasons.The absolute values of mean biases in T2 were generally found to be lower than 1 • C. Larger biases in T2 can be found in the Alps, in particular during winter, where T2 is often overpredicted by more than 1 • C (Fig. 1).This larger bias over mountainous regions, also found in a previous study (Zhang et al., 2013a), is likely due to the complex mountain terrain and related unresolved local dynamics.The r values are generally found to be more than 0.9 in all the seasons and show no significant geographical variation, indicating that the model is able to reproduce the hourly variations in near-surface temperature.Averaged over the entire domain, the mean bias in T2 varies from −0.4 to +0.3 • C depending on the season (Table 4).
The spatial variability in wind speeds, including the seasonality, with strongest winds during the winter, have been reproduced by the model (Fig. 2).However, the model tends to overestimate winds speeds with larger biases (2 m s −1 or more) during the winter and fall.The regions showing greater bias in wind speed include the Alps, coastal regions, and the low-lying areas of northern Germany and Denmark (Fig. 2).The temporal correlation of wind speed is generally above 0.7 in the northern half of the domain, but is lower (0.4-0.6) in the southern part of the domain, in areas in the Alps and close to the Mediterranean (Fig. 2).Similar behavior for modeled wind speed is reported by Zhang et al. (2013a), who attributed the overestimation in wind speeds primarily to poor representation of surface drag exerted by unresolved topographical features, which results in model limitations in simulating circulation systems, such as sea breeze and bay breeze.An overview of the statistics for wind direction is presented in Table 4, with the spatial distribution shown in Fig. 3. Wind direction over the continent is predominantly from the west and south, and the mean bias in wind direction is between 20 and 30 • depending on the season.Similar to the patterns seen for wind speed, areas with complex topography (the Alps, the Balkans, the Mediterranean coast) show the largest biases and the lowest correlations for wind direction.
Overall, we find that WRF-Chem is capable of reproducing the spatial and temporal variations in the European meteorological conditions reasonably well, in a manner consistent with previous studies (e.g.Zhang et al., 2013a).

Ozone
We begin the evaluation of chemistry by examining the seasonal average surface O 3 distribution over Europe from the MOZART simulation, as shown in Fig. 4. Predicted surface O 3 distributions show a clear seasonality, with maximum concentrations during summer.In all seasons, surface O 3 concentrations are highest over the Mediterranean region, with values during the spring and summer greater than 110 µg m −3 .Simulated concentrations reproduce the northsouth gradient in O 3 seen in the ground-based observations.Figure 5 provides another comparison of seasonal average O 3 distributions in the model vs. the observations (from both the AirBase and EMEP networks) and additionally shows the spatial distribution of MB and r, the temporal (hourly) correlation coefficient; performance statistics are shown in Table 5 (against observations from the AirBase network) and Table 6 (against observations from the EMEP network).MOZART overpredicts O 3 concentrations for most of Europe in the summer and fall.In winter and spring, MOZART tends to underestimate O 3 in north-central Europe, but overestimate O 3 in southern Europe.Hourly correlation coefficients for O 3 are highest (greater than 0.6) in northern Europe (especially France, Germany, and the Benelux region) and in Spain, but are lower (with values of approximately 0.4) throughout Italy and the mountainous regions of the Alps.Notably, Italy and the Alps are the regions within our domain that exhibit the highest biases and lowest correlations with respect to wind direction and speed (Sect.4.1), which could explain the poorer temporal correlation for O 3 in these areas.
Looking at Tables 5 and 6, we see some differences in the statistical performance of the MOZART simulation when compared to the EMEP vs. the AirBase observational datasets.Considering the EMEP observations over the whole domain (Table 6), MOZART slightly overpredicts O 3 in summer, with a summertime mean bias of 4 µg m −3 , whereas the summertime mean bias when compared the AirBase network is 10 µg m −3 (Table 5).In winter and spring, the bias (MB, NMB, and MFB) in MOZART-predicted O 3 is more negative when compared to EMEP observations than to AirBase observations.In fall, the sign of the domain-average bias changes if considering the model performance against EMEP vs. AirBase observations.These differences likely reflect differences in the character of the two observational networks.First, we expect that the AirBase rural background sites considered here may be, on average, more influenced by local pollution sources than the EMEP sites, which are selected to be representative of more remote regional background.Secondly, the geographical coverage of AirBase vs. EMEP sites for O 3 is slightly different (Fig. S8).In particular, coverage of the UK and the Nordic countries is almost exclusively via the EMEP network, potentially giving the EMEP observations a northern bias in comparison to the AirBase-only sites.Both features of the measurement networks could explain the www.geosci-model-dev.net/9/3699/2016/Geosci.Model Dev., 9, 3699-3728, 2016 lower values of the domain-wide average O 3 observed at the EMEP vs. the AirBase stations.
In addition to evaluating the model's ability to simulate hourly O 3 concentrations, we also consider MDA8 and SOMO35, two metrics designed to evaluate the impact of ozone on health.The distribution of seasonal average values of MDA8 is shown in Fig. 6 for the MOZART simulation.The European Union's Air Quality Directive states that, as a long-term objective, MDA8 should not exceed the threshold value of 120 µg m −3 ; as a target value this long-term objective should not be exceeded for more than 25 days per year, averaged over 3 years.Figure 6 shows that, at some stations in the Alps and in southern Italy during summer, the average value of MDA8 exceeds 120 µg m −3 .As seen in Fig. 7, the number of days when MDA8 exceeds the 120 µg m −3 is greater than 25 in spring alone for much of southern Eu- rope, which is also captured well by the MOZART simulation.MOZART tends to overpredict MDA8 and the days in exceedance of the target value in summer and fall, consistent with the overestimation of hourly average O 3 during this season.Since the metric MDA8 is, in effect, a measure of daytime ozone, it is always higher than the straight average of hourly concentrations.As a consequence, MOZART shows greater bias in MDA8 than in average O 3 in seasons where average O 3 is already overpredicted (Tables 5 and 6).In general, regional and seasonal patterns for MDA8 simulated by MOZART are similar to those for simulated average O 3 .SOMO35, an indicator for cumulative annual exposure, is shown in Fig. 8 for the year 2007.MOZART is able to reproduce the north-south gradient of SOMO35 seen in the observations quite well, while overpredicting the magnitude of SOMO35 by 2 mg m −3 multiplied by days (Table 7).
WRF-Chem simulations using the RADM2 chemical mechanism show a spatial and seasonal distribution of surface O 3 over Europe (Figs.9 and 10) that is qualitatively similar to that for MOZART.The correlation coefficients for the MOZART and RADM2 simulations are also similar in both magnitude in distribution (Figs. 5 and 10).Absolute O 3 concentrations are most similar (i.e., less than 5 % different) between the mechanisms near the northwest edges of the domain (see Figs. 4 and 9), where the prevailing westerly winds (Supplement, Fig. S2) mean that O 3 imported from the boundary conditions play a dominant role.However, it is striking to note that the surface O 3 concentrations predicted by two different chemical mechanisms are generally quite different, with RADM2 predicting average surface O 3 values that are approximately 20 µg m −3 lower than those predicted by MOZART in spring and summer (c.f.Figs. 4 and 9, Tables 5 and 8, and 6 and 9).In contrast to MOZART, RADM2 underpredicts O 3 throughout most of Europe in all seasons.An exception to this is in southern Europe in winter, where RADM2, like MOZART, shows some overprediction of O 3 concentrations in southern Europe, particularly near the Mediterranean.RADM2 also overpredicts O 3 near the Mediterranean in fall (a season where MOZART overpredicts O 3 Europe wide).The general underprediction of O 3 concentrations in RADM2 means that the health metrics MDA8 and SOMO35 are also underpredicted (Tables 7-8 and Fig. 8).Overall, absolute biases (i.e., the absolute value of MB, NMB, and MFB) are smaller for MOZART than for RADM2, indicating that MOZART is more successful overall in reproducing European ground-level O 3 .
Model biases for O 3 in both the MOZART and RADM2 simulations are in line with biases found in other regional modeling studies for Europe.For instance, values for the NMB in European summertime O 3 ranged from less than −20 to greater than +20 % depending on the ensemble member in AQMEII (Solazzo et al., 2012b;Im et al., 2015), compared to values of −18 and +14 % for the RADM2 and MOZART simulations, respectively, in the present study.Zhang et al. (2013b)    served differences in predicted O 3 , including the use of timeinvariant chemical boundary conditions, the use of the QSSA rather than the Rosenbrock chemical solver (which has been shown to make a difference; see Forkel et al., 2015), and the use of an alternate emissions inventory (from EMEP).The temporal correlation with hourly measurements for O 3 in this study are also in line with other regional modeling studies of O 3 for Europe.Simulations with both chem-ical mechanisms led to reasonable correlations between the model-predicted and observed O 3 concentrations over the entire domain, with r values generally in the range of 0.6-0.8(Figs. 5 and 10, Tables 5 and 8).This is consistent with the hourly correlation coefficient for O 3 of 0.62 reported by Tuccella et al. (2012), where their r value represents an average over the entire year of 2007.Zhang et al. (2013b) also report correlation coefficients of 0.6-0.7 for hourly O 3 over the Eu-ropean domain (horizontal resolution 0.5 • ) using the CB05 gas-phase chemical mechanism in WRF-Chem.
In addition to evaluating the performance of the MOZART and RADM2 simulations on their ability to reproduce ground-level ozone concentrations, we compare the observed sensitivity of modeled O 3 to the choice of chemical mechanism to other studies that have investigated the uncertainty in 3-D model predictions associated with the choice of chemical mechanism.Knote et al. (2015) used box model simulations based on AQMEII phase 2, and concluded that the uncertainty in predicted O 3 in a 3-D model solely due to the choice of gas-phase chemical mechanism should be of the order of 5 %, or 4 ppbv (8 µg m −3 ).This is quite a bit smaller than the sensitivity to chemical mechanism found in this study, where we see differences in summertime average O 3 of 20 µg m −3 , corresponding to a relative difference of approximately 40 %.Coates et al. (2016) have shown that adding representation of stagnant conditions (which were not represented in Knote et al., 2015) to a box model increased the sensitivity of predicted O 3 to the chemical mechanism, and also improved model agreement with observations.This result suggests that day-to-day variability in meteorological conditions and transport can enhance the sensitivity of O 3 to chemical mechanism compared to what is seen in box models.
Another interesting basis for comparison is the study of Mallet and Sportisse (2006), who investigate uncertainty in the CTM Polyphemus due to various physical parameterizations, including chemical mechanism (comparing RACM and RADM2), using an ensemble approach.They estimated an overall uncertainty in O 3 concentrations of 17 % based on choices for physical parameterizations in general, but identified the choice of chemical mechanism along with the turbulent closure parameterization as the two most important drivers of this uncertainty.Simulations using the RACM vs. RADM2 mechanisms yielded differences in average O 3 concentrations of 7-13 µg m −3 , depending on the other parameterizations used.It is clear that the sensitivity of O 3 to the use of the MOZART vs. RADM2 chemical mechanism in this study is large compared to other studies of mechanism comparisons in 3-D models (see also Luecken et al., 2008;Kim et al., 2010), though even larger absolute differences in hourly O 3 concentrations (up to 40 ppb, or 80 µg m −3 ) have been found in studies of episodic ozone (Faraji et al., 2008;Yarwood et al., 2003).It is possible that MOZART and RADM2 as implemented in this study are examples of chemical mechanisms that are extremely different from one another on a spectrum of other commonly used mechanisms; the differences between the two mechanisms will be further explored in Sect.4.3.

Nitrogen oxides
Seasonal average surface-level NO x for the MOZART simulation are shown in Fig. 11.Several hotspots in the spa-tial distribution of NO x mixing ratios are apparent, as expected based on the intensity of emissions in these areas.NO x hotspots with concentrations of more than 30 µg m −3 are visible over parts of France, Belgium, Germany, and Russia.Similar high concentrations are also seen over the marine regions close to Barcelona, Monaco, and southern France.As shown in Table 5, the MOZART simulation slightly underpredicts domain-average NO x concentrations for all seasons when comparing to AirBase observations.In Figs. 12 and 13 we examine the spatial distribution of NO x broken down into its components, NO 2 and NO, together with the spatial distribution of MB and r.The MOZART simulation overestimates NO 2 in the UK, northern France, Belgium, and central Germany, all of which are regions known for having high NO x emissions and concentrations.However, this does not hold true for the Netherlands, a neighboring region with high emissions where MOZART tends to underpredict rather than overpredict NO 2 concentrations.NO, on the other hand, is significantly underpredicted compared to surface measurements throughout the domain.This may be partially due to the relatively coarse horizontal resolution of the model, in which fresh NO emissions are immediately diluted over a large area, and could also be a consequence of model deficiencies in representing NO x chemical cycles.Artifacts related to reporting of low NO concentrations approaching measurement detection limits could also play a role (observed time series for NO typically show a baseline of 1-2 µg m −3 , whereas modeled concentrations reach a baseline of zero).
Domain average temporal correlation coefficients (r) against hourly measurements of NO x , NO 2 , and NO (Tables 5 and 6) range from approximately 0.2 to 0.5, which is lower than correlations for O 3 but consistent with other studies, discussed further below.In all seasons, the domainaveraged temporal correlation coefficient is higher when compared to EMEP vs. AirBase observations.This is attributed to lesser local influences and therefore better regional representativeness of the EMEP stations.No exceptional patterns are seen in the spatial distribution of r for NO 2 or NO, although correlation appears slightly better in the northern part of the domain.The MOZART simulation shows the highest domain-average correlation coefficients (r) for NO x , NO 2 , and NO in winter and fall, and the lowest domain-average r values in summer.
NO x predicted by the RADM2 simulation shows fairly similar behavior to NO x predicted by the MOZART simulation (cf.Figs. 12 and 14 and additional Figs.S10-S11 in the Supplement).In general, simulated NO x concentrations are slightly higher for MOZART than for RADM2.Domainwide average NO x concentrations predicted by MOZART are approximately 2 µg m −3 higher than for RADM2 in all seasons except winter, where the difference is approximately 3 µg m −3 (cf.Tables 5 and 8).The spatial distribution of MB for NO 2 for the RADM2 simulation generally shows the same patterns as observed for the MOZART simula- tion, namely a slight overestimation in the UK, northern France, Belgium, and central Germany.Like for MOZART, NO for RADM2 is underpredicted throughout the domain, with NO concentrations slightly more negatively biased than in MOZART in all seasons except fall, when NO concentrations are higher for RADM2 than for MOZART and show better agreement with the observations.Temporal correlation for NO 2 and NO in RADM2 is also found to show similar behavior to the MOZART simulation.An exception to the similarity observed between the mechanisms for NO x can be seen over central Germany in winter, where MB values for NO 2 are 6-10 µg m −3 for MOZART (Fig. 12), but in the range of 0-6 µg m −3 for RADM2 (Fig. 14).Differences in NO x concentrations predicted by the MOZART vs. RADM2 simulations are generally less than 20 %, consistent with Knote et al. (2015), who conclude that uncertainty due to choice in chemical mechanism leads to an uncertainty of up to 25 % in 3-D model simulations.
Performance of the present simulations with respect to NO 2 can also be compared to previous published studies (note that none of the above-cited studies perform a validation for NO or NO x ).Zhang et al. (2013b) 2012) report a MB for NO 2 of −0.9 µg m −3 averaged over the whole year; for comparison, the RADM2 simulation in this study shows a MB in the range of −2.5 to −1 µg m −3 for fall, spring, and winter, but a MB of +0.67 µg m −3 in summer compared to AirBase observations.Evaluation of NO 2 was not treated in detail in the AQMEII studies, but Im et al. (2015) report that the models for the European domain underestimate NO 2 by 9 to 45 %.
To gain insight into model behavior for O 3 , we added terms to the model output representing hourly accumulated tendencies, i.e., the change in concentration of a species due to photochemistry only, for July simulations using MOZART and RADM2.The hourly net photochemical production rate was calculated as the difference in the accumulated tendency from one time step to another.Figure 15 shows the average of the midday (11:00-14:00 CEST, or 09:00-12:00 UTC) photochemical production rate of O 3 and NO x components for both the MOZART and RADM2 simulations.(Note that the net photochemical production rate is shown here in ppb h −1 for more intuitive comparison of production and loss of the different species on a mole basis; µg m −3 was used in Sect.4.2 because this is the unit in which limit and target values in the EU Air Quality Directive are expressed.) Overall, the spatial variability as well as the magnitudes of net O 3 production rates are found to be similar for MOZART-4 and RADM2 chemistry (Fig. 15).For both mechanisms, the greatest midday net O 3 production rates are found in southern Europe, particularly over the Mediterranean and Atlantic coasts.The difference in net O 3 production rate between the two mechanisms is also shown in Fig. 15.MOZART exhibits greater net O 3 photochemical production rates than RADM2 for most of Europe, with the exception of the southeast corner of the domain (Greece, Turkey, and the nearby Mediterranean), where net O 3 production rates are greater for RADM2.The difference in net O 3 production rate (MOZART-RADM2) shows a large maximum over central Europe, centering over Germany and extending west and east into France and Poland.Over Germany, net O 3 production in MOZART is seen to be higher than in RADM2 by 1.8 ppb h −1 or more.
As expected, regions of high NO 2 production in both MOZART and RADM2 simulations are seen over the high NO x -emission regions including Benelux, southern England, western Germany, the Po Valley, and major cities including Paris and Moscow.The difference in net NO 2 production rate between the two mechanisms is also highest where the absolute NO 2 production rates are highest; in these areas the net NO 2 production rate is lower for MOZART than for RADM2 by greater than 0.25 ppb h −1 .Furthermore, areas where the two mechanisms show the greatest differences in net NO 2 production rate tend to be the areas where the net O 3 production rate is most different between the two mechanisms, including the large maximum over the Netherlands and northwest Germany.
To further investigate the differences between ozone chemistry in MOZART vs. RADM2, we performed two additional sensitivity studies with each mechanism: one in which all anthropogenic NO x emissions were increased by 30 %, and one in which all anthropogenic VOC emissions were increased by 30 %.We then examined the change in O 3 concentrations due to these emission perturbations to diagnose whether the chemical mechanisms were operating in a NO xsensitive or a VOC-sensitive regime.Results are shown in Fig. 16.For the simulations where NO x emissions were increased by 30 %, MOZART and RADM2 show very simi- An alternate approach to identify areas of NO x -sensitive vs. NO x -saturated regimes is to use indicator ratios (in the base simulation) following Sillman (1995).We have applied this approach with the indicator ratio CH 2 O / NO y (Fig. S12) and find that areas identified as NO x sensitive using the indicator ratio are the same as those identified using the simulation with +30 % NO x emissions.These results are also consistent with the areas of Europe found to be NO x saturated in the model study of Beekmann and Vautard (2010).Magnitudes of the observed change in O 3 in response to increased NO x emissions are quite similar for both mechanisms, although RADM2 shows slightly stronger NO x saturation (i.e., a stronger decrease in O 3 given a 30 % increase in NO x emis-sions) in the area centered around Benelux, and stronger NO x sensitivity over Scandinavia and northwest Russia.
In contrast to the similar behavior seen for NO x sensitivity, the VOC sensitivity exhibited by the two mechanisms is quite different (Fig. 16, lower panel).For both MOZART and RADM2, the effect of increased anthropogenic VOC emissions on O 3 is smaller than the effect of increased NO x emissions.The MOZART simulation shows very little impact of increased VOC emissions on O 3 , with differences in average O 3 concentration generally confined to ±2 % of the base simulation.In contrast, increasing VOC emissions in the RADM2 simulations leads to increased O 3 concentrations throughout nearly the entire domain.Areas where MOZART and RADM2 are in agreement in predicting VOC sensitivity (increased O 3 concentrations in response to increased VOC emissions) are generally those with high NO x emissions, where one would expect the highest VOC sensitivity based on theory; these areas include Benelux, northern France, northwest Germany, and shipping tracks in the Mediterranean.However, the increase in O 3 concentration is modest for both mechanisms; for RADM2 it is generally limited to increases of 2-4 % over the base simulation.The results of the +30 % VOC sensitivity studies for July indicate that d[O 3 ]/d[VOC] is higher (more positive) for RADM2 than for MOZART for the chemical regime represented by the models in July 2007.This shows that the two mechanisms are simulating different O 3 chemical regimes -in the case of RADM2, there is greater VOC sensitivity, meaning that addition of VOC emissions moves the chemistry in the direction of maximum O 3 production efficiency; this is not the case for MOZART over much of the domain.A more extensive study would be needed to evaluate whether the conclusion that d[O 3 ]/d[VOC] is higher for RADM2 than for MOZART can be applied more generally.
Taken as a whole, Fig. 16 shows that MOZART behaves in a classically NO x -sensitive manner for most of domain, with O 3 responding to changes in NO x but showing little response to changes in anthropogenic VOC.NO x -saturated behavior is also observed, particularly around the area of UK, Benelux, and northern France and Germany.RADM2, on the other hand, exhibits more of a mixed NO x and VOC sensitivity for much of the domain.The NO x sensitivity seen in RADM2 is very similar to that seen in MOZART, but the response of RADM2 to changes in VOC is much stronger (by about a factor of 2) than observed in MOZART.With the exception of some small areas in the North and Baltic seas south of Norway and Sweden, RADM2 predicts O 3 increases with VOC increases throughout the entire domain.This difference in VOC sensitivity seen between the mechanisms has implications for policy decisions, as it indicates uncertainty in the European response of O 3 to policies designed to reduce anthropogenic VOC emissions.
In addition to characterizing mechanism behavior with respect to net photochemical O 3 production and NO xand VOC sensitivity, we evaluate the contribution of other sources that could explain the large differences in predicted O 3 between the MOZART and RADM2 simulations.First, MOZART and RADM2 use different rate coefficients for several inorganic gas-phase chemical reactions.To test the effect of these differences, all RADM2 inorganic reaction rate coefficients were changed so that they matched those used in MOZART simulations in the cases where the reactions are the same in both mechanisms (Sect.S3 in the Supplement).The differences in inorganic rate coefficients between the two mechanisms explain a significant difference in predicted O 3 concentrations: when RADM2 is run with inorganic rate coefficients from MOZART, the resulting domainmean O 3 is higher by more than 8 µg m −3 for the month of July, approximately 40 % of the difference in predicted O 3 .
Besides the gas-phase chemistry itself, there are some differences in the implementation of MOZART-4 vs. RADM2 in WRF-Chem that could also contribute to the observed differences in modeled O 3 : in particular, in the treatment of dry deposition and photolysis (described in the Supplement, Sect.S2).To test the effect of differences in treatment of dry deposition, we conducted an additional sensitivity analysis (not shown) in which we modified the RADM2 simulation to treat dry deposition in the same way as it is treated in MOZART.However, this led to only a small difference in average ozone (an increase of 1 µg m −3 ), indicating that modeled surface O 3 concentrations are relatively insensitive to these differences in the treatment of dry deposition, at least in the summer.In a sensitivity test where we modified the model code so that the MOZART simulation ran with the same photolysis scheme as used in our RADM2 simulation (i.e., with the Madronich TUV scheme and without reading in climatological O 3 and O 2 columns), we found that average O 3 for July decreases by 3 µg m −3 .This indicates that modeled O 3 is also somewhat sensitive to differences in the treatment of photolysis in MOZART and RADM2.However, taken together, our sensitivity simulations suggest that the differences in the inorganic reaction rate coefficients are more important than the differing treatments of dry deposition and photolysis in explaining the differences in predicted O 3 between the RADM2 and MOZART simulations.

Summary and conclusions
In this paper, we present a detailed description of a WRF-Chem setup over the European domain and provide an evaluation of the simulated meteorological and chemical fields with an emphasis on models' ability to reproduce the spatial and temporal distribution of ground-level O 3 and NO x .Within WRF-Chem we compare the performance of two different chemical mechanisms: MOZART-4, for which we present the first model evaluation for a European domain, and RADM2.Overall, we found that our WRF-Chem setup reproduced the spatial and seasonal variations in the meteorological parameters over Europe, with biases and correla-tions consistent with previous studies.Simulations using the MOZART-4 as well as RADM2 chemical mechanisms were found to reproduce the spatial and temporal distributions in ground-level O 3 over Europe, based on observations from the EMEP and AirBase networks.However, we find significant differences in O 3 concentrations predicted by the two chemical mechanisms, with RADM2 predicting as much as 20 µg m −3 less O 3 than MOZART during the spring and summer seasons.In general, MOZART-4 chemistry overpredicts O 3 concentrations for most of Europe in the summer and fall, whereas RADM2 leads to an underestimation of O 3 over the European domain in all seasons.Taken as a whole, use of MOZART-4 chemistry performs better, leading to lower absolute model biases in O 3 .This is the case when considering hourly O 3 concentrations as well as metrics relevant for human health, such as MDA8 and SOMO35.Despite the large differences in predicted O 3 , the two mechanisms show relatively similar behavior for NO x , with both MOZART and RADM2 simulations resulting in a slight underestimation of NO x compared to surface observations.
The net midday photochemical production rate of O 3 in summer is found to be higher for MOZART than for RADM2 for most of the domain, with the largest differences between the mechanisms seen over Germany, where the net O 3 photochemical production for MOZART is higher than for RADM2 by greater than 1.8 ppb h −1 (3.6 µg m −3 h −1 ).However, we have shown that RADM2 is approximately twice as sensitive to increases in anthropogenic VOC emissions as MOZART, suggesting that, under local VOC-limited conditions not seen at the regional scale of our simulations, RADM2 is likely to produce O 3 at a greater rate than MOZART.Despite the differences in sensitivity to changes in VOC emissions exhibited by the two mechanisms, sensitivity to changes in NO x emissions in MOZART and RADM2 are found to be similar.
Our results indicate that modeled surface O 3 over Europe is sensitive to the choice of gas-phase chemical mechanism, with observed differences in O 3 between mechanisms that are larger than those seen in many past studies.Although the most fundamental differences between MOZART-4 and RADM2 (and other chemical mechanisms used in regional modeling) is the representation of VOC oxidation chemistry, we find that approximately 40 % of the difference seen in predicted O 3 in this study can be explained by differences in inorganic reaction rate coefficients employed by MOZART-4 and RADM2.This result suggests that harmonization of inorganic rate coefficients among chemical mechanisms used for regional air quality modeling might be valuable, and could potentially lead to a smaller spread in model-predicted O 3 compared to that seen in, e.g., the multimodel studies of AQMEII (Solazzo et al., 2012b;Im et al., 2015).Further investigation of chemical mechanism behavior within 3-D models in general would be helpful to constrain uncertainties in regional air quality modeling.

Code availability
The WRF-Chem model is an open-source, publicly available software.The code is being continually improved, with new releases approximately twice per year.WRF-Chem code can be downloaded at http://www2.mmm.ucar.edu/wrf/users/download/get_source.html.The corresponding author will provide the bug fixes to version 3.5.1 used in this study, described in Sect.2.3, upon request.

Data availability
The WRF-Chem source code is publicly available (see Sect. 6, code availability).The input data used for simulations in this study is either publicly available or available upon request from the data owners.Initial and boundary conditions for meteorological fields were obtained from ECMWF (2016), http://www.ecmwf.int/en/research/climate-reanalysis/era-interim.Initial and boundary conditions for chemical fields were from MOZART-4/GEOS-5, provided by NCAR (2016) at http://www.acom.ucar.edu/wrf-chem/mozart.shtml.Corine land cover data were obtained from EEA (2012), http://www.eea.europa.eu/data-and-maps/data/corine-land-cover-2006-raster-2.TNO-MACC II anthropogenic emissions data were obtained from TNO; others interested in using this data should contact TNO directly.The HTAP v2.2 anthropogenic emissions were obtained from http://edgar.jrc.ec.europa.eu/htap_v2/index.php.
The Global Weather Observation dataset was provided by the UK Met Office via the British Atmospheric Data Centre; others interested in using this data should contact the data center directly.EMEP and the Norwegian Institute for Air Research (NILU) provided the EMEP chemical observation data via the public EBAS database (NILU, 2015, http://ebas.nilu.no).AirBase is the public air quality database of the EEA; data were obtained at http://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-database-7 (EEA, 2013).WRF-Chem tools for preprocessing boundary conditions as well as fire, and anthropogenic emissions were provided byNCAR (http://www.acom.ucar.edu/wrf-chem/download.shtml).Model output produced in this study can be provided upon request to the corresponding author.
The Supplement related to this article is available online at doi:10.5194/gmd-9-3699-2016-supplement.

Figure 1 .
Figure 1.Seasonal average values of 2 m temperature (T2) in • C. Model results and statistics are shown for the MOZART simulation at the locations of the observations.

Figure 2 .
Figure 2. Seasonal average values of 10 m wind speed (WS10) in m s −1 .Model results and statistics are shown for the MOZART simulation at the locations of the observations.

Figure 3 .
Figure 3. Seasonal average values of 10 m wind direction (WD10) in degrees.Model results and statistics are shown for the MOZART simulation at the locations of the observations.

Figure 4 .
Figure 4. Seasonal average values of surface O 3 in µg m −3 .Contours are model output from the MOZART simulation.Filled dots represent hourly measurements at AirBase rural background stations, filled squares represent measurements at EMEP stations.

Figure 5 .
Figure 5. Seasonal average values of surface O 3 in µg m −3 from hourly measurements at AirBase (circles) and EMEP (squares) stations, and modeled values from MOZART for corresponding locations.The Mean Bias (MB, in µg m −3 ) and temporal correlation coefficient (r) for hourly values are also shown at the location of station observations.

Figure 6 .
Figure 6.Seasonal average values of MDA8 in µg m −3 calculated from hourly measurements at AirBase (circles) and EMEP (squares) stations, and modeled values from MOZART for corresponding locations.The MB, in µg m −3 and temporal correlation coefficient (r) for daily values are also shown at the location of station observations.

Figure 7 .
Figure 7. Number of days of exceedances of the EU long-term objective value for MDA8 (120 µg m −3 ) at AirBase (circles) and EMEP (squares) station locations.Shown are totals by season for observations and the MOZART and RADM2 simulations.For simplicity of viewing the data, stations with no exceedances are not plotted.

Figure 8 .
Figure 8. Yearly values of SOMO35 in mg m −3 multiplied by days calculated from hourly measurements at AirBase (circles) and EMEP (squares) stations, and modeled values for corresponding locations.

Figure 9 .
Figure 9. Seasonal average values of surface O 3 in µg m −3 .Contours are model output from the RADM2 simulation.Filled dots represent hourly measurements at AirBase rural background stations, filled squares represent measurements at EMEP stations.

Figure 10 .
Figure 10.Seasonal average values of surface O 3 in µg m −3 from hourly measurements at AirBase (circles) and EMEP (squares) stations, and modeled values from RADM2 for corresponding locations.The MB (in µg m −3 ) and temporal correlation coefficient (r) for hourly values are also shown at the location of station observations.

Figure 11 .
Figure 11.Seasonal average values of surface NO x in µg m −3 .Contours are model output from the MOZART simulation.Filled dots represent hourly measurements at AirBase rural background stations; filled squares represent measurements at EMEP stations.
reports NMB values of approximately −15 % for NO 2 for WRF-Chem simulations against hourly AirBase measurements for July 2001, in line with values of −12 and −19 % for the MOZART and RADM2 simulations in this study, respectively.Tuccella et al. (

Figure 12 .
Figure 12.Seasonal average values of surface NO 2 in µg m −3 from hourly measurements at AirBase (circles) and EMEP (squares) stations, and modeled values from MOZART for corresponding locations.The MB and temporal correlation coefficient (r) for hourly values are also shown at the location of station observations.

Figure 13 .
Figure 13.Seasonal average values of surface NO in µg m −3 from hourly measurements at AirBase (circles) and EMEP (squares) stations, and modeled values from MOZART for corresponding locations.The MB and temporal correlation coefficient (r) for hourly values are also shown at the location of station observations.

Figure 14 .
Figure 14.Seasonal average values of surface NO 2 in µg m −3 from hourly measurements at AirBase (circles) and EMEP (squares) stations, and modeled values from RADM2 for corresponding locations.The MB and temporal correlation coefficient (r) for hourly values are also shown at the location of station observations.

Figure 15 .
Figure 15.Net midday (11:00-14:00 CEST) surface photochemical production rate in ppb h −1 for O 3 , NO 2 , and NO shown for MOZART and RADM2 for July 2007.The last row shows the difference in net production rate in ppb h −1 (RADM2 subtracted from MOZART).

Figure 16 .
Figure 16.Sensitivity of average surface O 3 for July 2007 to a 30 % increase in emissions of NO x (upper row) or VOC (lower row), shown for the MOZART and RADM2 chemical mechanisms.Shown here is the percent change in O 3 concentration, i.e., 100 × ([O 3 ] +30 % emissions − [O 3 ] base )/[O 3 ] base .

Table 1 .
WRF-Chem options used in model simulations.

Table 2 .
Description of WRF-Chem simulations performed for this study.

Table 3 .
Observational datasets used for model evaluation.

Table 4 .
Domain-wide statistical performance of WRF-Chem against 3-hourly meteorological observations from BADC.Modeled quantities are from the MOZART simulation.

Table 5 .
Statistics for MOZART simulation against hourly observations from the AirBase network.Means and MB are expressed in µg m −3 ; NMB, MFB, and r are unitless.r is the hourly temporal correlation coefficient for all quantities except MDA8, for which it represents the daily temporal correlation coefficient.

Table 6 .
Statistics for MOZART simulation against hourly observations from the EMEP network.Means and MB are expressed in µg m −3 ; NMB, MFB, and r are unitless.r is the hourly temporal correlation coefficient for all quantities except MDA8, for which it represents the daily temporal correlation coefficient.

Table 7 .
Statistics for yearly SOMO35 in mg m −3 multiplied by days.

Table 8 .
Statistics for RADM2 simulation against hourly observations from the AirBase network.Means and MB are expressed in µg m −3 ; NMB, MFB, and r are unitless.r is the hourly temporal correlation coefficient for all quantities except MDA8, for which it represents the daily temporal correlation coefficient.

Table 9 .
Statistics for RADM2 simulation against hourly observations from the EMEP network.Means and MB are expressed in µg m −3 ; NMB, MFB, and r are unitless.r is the hourly temporal correlation coefficient for all quantities except MDA8, for which it represents the daily temporal correlation coefficient.