Air quality modelling in the Berlin-Brandenburg region using WRF-Chem v3.7.1: sensitivity to resolution of model grid and input data

Air pollution is the number one environmental cause of premature deaths in Europe. Despite extensive regulations, air pollution remains a challenge, especially in urban areas. For studying summertime air quality in the Berlin-Brandenburg region of Germany, the Weather Research and Forecasting Model with Chemistry (WRF-Chem) is set up and evaluated against meteorological and air quality observations from monitoring stations as well as from a field campaign conducted in 2014. The objective is to assess which resolution and level of detail in the input data is needed for simulating urban background air 5 pollutant concentrations and their spatial distribution in the Berlin-Brandenburg area. The model setup includes three nested domains with horizontal resolutions of 15km, 3km, and 1km and anthropogenic emissions from the TNO-MACC III inventory. We use RADM2 chemistry and the MADE/SORGAM aerosol scheme. Three sensitivity simulations are conducted updating input parameters to the single-layer urban canopy model based on structural data for Berlin, specifying land use classes on a sub-grid scale (mosaic option) and downscaling the original emissions to a resolution of ca. 1km x 1km for Berlin based on 10 proxy data including traffic density and population density. The results show that the model simulates meteorology well, though urban 2m temperature and urban wind speeds are biased high and nighttime mixing layer height is biased low in the base run with the settings described above. We show that the simulation of urban meteorology can be improved when specifying the input parameters to the urban model, and to a lesser extent when using the mosaic option. On average, ozone is simulated reasonably well, but maximum daily eight hour mean concentrations are underestimated, which is consistent with the results 15 from previous modelling studies using the RADM2 chemical mechanism. Particulate matter is underestimated, which is partly due to an underestimation of secondary organic aerosols. NOx (=NO+NO2) concentrations are simulated reasonably well on average, but nighttime concentrations are overestimated due to the model’s underestimation of the mixing layer height, and urban daytime concentrations are underestimated. The daytime underestimation is improved when using downscaled, and thus locally higher emissions, suggesting that part of this bias is due to deficiencies in the emission input data and their resolution. 20 The results further demonstrate that a horizontal resolution of 3km improves the results and spatial representativeness of the

run with the settings described above.We show that the simulation of urban meteorology can be improved when specifying the input parameters to the urban model, and to a lesser extent when using the mosaic option.On average, ozone is simulated reasonably well, but maximum daily 8 h mean concentrations are underestimated, which is consistent with the results from previous modelling studies using the RADM2 chemical mechanism.Particulate matter is underestimated, which is partly due to an underestimation of secondary organic aerosols.NO x (NO + NO 2 ) concentrations are simulated reasonably well on average, but nighttime concentrations are overestimated due to the model's underestimation of the mixing layer height, and urban daytime concentrations are underestimated.The daytime underestimation is improved when using downscaled, and thus locally higher emissions, suggesting that part of this bias is due to deficiencies in the emission input data and their resolution.The results further demonstrate that a horizontal resolution of 3 km improves the results and spatial representativeness of the model compared to a horizontal resolution of 15 km.With the input data (land use classes, emissions) at the level of detail of the base run of this study, we find that a horizontal resolution of 1 km does not improve the results compared to a resolution of 3 km.However, our results suggest that a 1 km horizontal model resolution could enable a detailed simulation of 1 Introduction Despite extensive regulations, air pollution in Europe remains a challenging issue: causing up to 400 000 premature deaths per year in Europe (EEA, 2015), air pollution is the number one environmental cause of premature deaths (OECD, 2012).Especially in urban areas, air pollution is a problem, with 97-98 % of the urban European population (EU-28) exposed to ozone levels higher than 8 h average concentrations of 100 µg m −3 , which the World Health Organisation (WHO) recommends not to be exceeded for the protection of human health, and ca.90 % of the urban European population (EU-28) exposed to PM 2.5 (particulate matter with a diameter smaller than 2.5 µm) levels higher than the WHO-recommended annual mean of 10 µg m −3 in 2011-2013 (EEA, 2016).Similarly, annual and hourly NO 2 limit values are still exceeded, mainly at measurement site close to traffic.In 2013, the European limit value of 40 µg m −3 was exceeded at 13 % of all stations, all of them situated at traffic or urban sites (EEA, 2016).In Berlin, measured NO 2 annual means exceeded the European limit value of the annual mean at all but three measurement sites close to traffic in 2014 (Berlin Senate Department for Urban Development and the Environment, 2015a).In addition, current controversies on NO 2 emissions from cars have triggered additional discussions on NO 2 in urban areas.
Numerical modelling is an important tool for assessing air quality from global to local scales.Over the last decades, air quality models have been used to understand the processes leading to air pollution as well as to build a basis for policies defining measures to improve air quality.With increasing computing capacities, model resolution has been increasing, and different types of 3-D regional chemistry transport models are able to resolve relevant processes down to a horizontal resolution of ca. 1 km × 1 km (Schaap et al., 2015).At these resolutions, the models can be used to study the atmospheric composition in the urban background.
As a basis for modelling work assessing air quality in the Berlin-Brandenburg area, this study evaluates a setup with the online-coupled numerical atmosphere-chemistry model WRF-Chem (chemistry version of the Weather Research and Forecasting model, Skamarock et al., 2008;Fast et al., 2006;Grell et al., 2005).In the setup presented here, WRF-Chem is coupled with a single-layer urban canopy model (Chen et al., 2011;Loridan et al., 2010).We evaluate the model setup with respect to its skill in simulating meteorological conditions and air pollutant concentrations, with a focus on NO x (NO + NO 2 ), but also evaluating for particulate matter (PM 10 , PM 2.5 ) and O 3 .The skill in simulating air quality in an online-coupled model is, besides the choice of the chemical mechanism, influenced by the prescribed emissions, the model resolution and the skill in reproducing the observed meteorology.The latter depends on the model resolution, on input data, such as land use data, and on parameterisations of the sub-grid-scale processes, such as effects of urban areas on meteorology.The objective of this study is to address which resolution and level of detail in the input data, including land use, emissions and parameters characterising the urban area, is needed for simulating urban background air pollutant concentrations and their spatial distribution in the Berlin-Brandenburg area.This is done by evaluating the model results of three nested model domains at 15, 3 and 1 km horizontal resolutions as well as three sensitivity simulations, including updating the representations of the urban area within the urban canopy model, taking into account a sub-grid-scale parameterisation of the land use classes, and downscaling the original emission input data from a horizontal resolution of ca.7 to ca. 1 km.In light of the high computational costs of running the model at a 1 km horizontal resolution, it is particularly helpful to find out under which conditions using this model resolution can lead to improved results compared to coarser resolutions.This can directly help the design of future air quality modelling studies over the Berlin-Brandenburg region and other European urban agglomerations of similar extent.
The WRF-Chem model has been applied and evaluated in different modelling studies over Europe.For example, Tuccella et al. (2012) evaluate a European setup at a horizontal resolution of 30 km × 30 km.Brunner et al. (2015) and Im et al. (2015b, a) analyse the performance of several onlinecoupled models set up for the Air Quality Model Evaluation International Initiative (AQMEII) phase 2. Among the simulations for a European domain, there are seven with different setups of WRF-Chem, performed with a horizontal resolution of 23 km × 23 km.Commonly reported biases of WRF-Chem in comparison to observations from synoptic surface stations include an underestimation of daily maximum temperatures and an overestimation of wind speed (Tuccella et al., 2012;Brunner et al., 2015).Furthermore, Brunner et al. (2015) conclude that the representation of other meteorological parameters relevant to air quality simulations, such as solar radiation at the surface, precipitation and planetary boundary layer height, is still challenging.WRF-Chem tends to underestimate ozone daily maxima over Europe (Tuccella et al., 2012) with especially pronounced underpredictions of observed ozone values exceeding policy guidelines (Im et al., 2015b).They attribute the deficiencies to the simulated meteorology, the chemical mechanism and the chemical boundary conditions.Mar et al. (2016) evaluated the performance of WRF-Chem for a European domain with respect to ozone, comparing different chemical mechanisms.They concluded that the simulated ozone concentration strongly depends on the choice of chemical mechanism, and that RADM2 leads to an underestimations of observed ozone concentrations.PM 10 is underestimated by WRF-Chem as compared to regional background observations (Im et al., 2015a).Tuccella et al. (2012) also report an underestimation of PM 2.5 .Both studies give various reasons for the mismatch in PM model results and observations, including an underestimation of secondary organic species by the aerosol mechanisms applied.Im et al. (2015a) report an overestimation of nighttime NO x in some models, including WRF-Chem, which they attribute both to a general underestimation of NO 2 during low-NO x conditions and to problems in simulating nighttime vertical mixing.They report that NO 2 is underestimated by most models.
WRF-Chem has also been applied at high spatial resolutions over urban areas, for example, Mexico City (Tie et al., 2007(Tie et al., , 2010)), Los Angeles (Chen et al., 2013), Santiago (Mena-Carrasco et al., 2012), the Yangtze River Delta (Liao et al., 2014) and Stuttgart (Fallmann et al., 2016).Tie et al. (2007Tie et al. ( , 2010) ) have explicitly assessed how the model resolution impacts the simulated ozone and ozone precursors in Mexico City and concluded that a resolution of 24 km is not suitable for simulating concentrations of CO, NO x and O 3 in the city centre.They suggest a ratio of city size to model resolution of 6 : 1 and conclude that a horizontal resolution of about 6 km is the best balance between model performance and computational time when simulating ozone and precursors in Mexico City.Furthermore, they conclude that the model results for ozone are more sensitive to the model resolution than to the resolution of the emission input data.Other studies have shown that increasing the model resolution does not necessarily lead to an improvement in model results, but that it can be beneficial for amplifying the urban signal (e.g.Schaap et al., 2015, and references therein).They emphasise that it is only useful to go to model resolutions finer than 20 km if model input data, such as land use data and emission data, are also available at similarly high resolutions.Fallmann et al. (2016) have combined WRF-Chem with RADM2 chemistry and MADE/SORGAM aerosols with a multi-layer urban canopy model for the area of Stuttgart, studying effects of urban heat island mitigation measures on air quality.One of their findings from the model evaluation is an underestimation of daytime NO 2 by up to 60 %, while O 3 is slightly overestimated during the day.
In the Berlin-Brandenburg region, there have been regional model simulations of particulate matter with an offline chemistry transport model (Beekmann et al., 2007), along with a measurement campaign focusing on particulate matter in 2001/02.Other modelling studies in this region focused on meteorology: Schubert and Grossman-Clarke (2013) assessed the impact of different measures on extreme heat events in Berlin.Trusilova et al. (2016) tested different urban parameterisations in the COSMO-CLM model and their impact on air temperature.Jänicke et al. (2016) used the WRF model to dynamically downscale global atmospheric reanalysis data over Berlin to a resolution of 2 km × 2 km, testing combinations of different planetary boundary layer schemes and urban canopy models.They conclude that simulated urban-rural as well as intra-urban differences in 2 m air temperature are underestimated and that the more complex urban canopy models did not outperform the simple slab/bulk approach.
To our knowledge, there are no published studies for the Berlin-Brandenburg region simulating chemistry and aerosols with an online-coupled regional chemistry transport model.Furthermore, only few of the above-mentioned studies included an assessment of urban NO x concentrations.In light of the recent exceedances of NO 2 in European urban areas, including Berlin, this study can contribute to filling this gap and serve as a basis for future modelling studies addressing NO x in European urban areas.
2 Model setup

Model description, chemistry and physics schemes
For this study, we use the Weather Research and Forecasting model (WRF) version 3.7.1 (Skamarock et al., 2008), with chemistry and aerosols (WRF-Chem, Grell et al., 2005;Fast et al., 2006).We use three one-way nested model domains centred around Berlin, at horizontal resolutions of 15 km × 15 km, 3 km × 3 km and 1 km × 1 km (Fig. 1).The model top is at 50 hPa, using 35 vertical levels.The first model layer is at approximately 30 m above the surface, with 12 levels in the first 3 km.The setup includes the RADM2 chemical mechanism with the Kinetic PreProcessor (KPP) and the MADE/SORGAM aerosol scheme.RADM2 has been used frequently in air quality applications over Europe (e.g.Mar et al., 2016;Im et al., 2015a;Tuccella et al., 2012); the effect of this choice of chemical mechanism on modelled concentrations is further discussed in Sect.4.2.We give the priority to using the KPP solver instead of the QSSA (quasi-steady-state approximation) solver, because Forkel et al. (2015) found that the latter underestimates nighttime ozone titration for areas with high NO emissions.However, this option does not allow us to include the full aqueousphase chemistry, including aerosol-cloud interactions and wet scavenging, and might thus reduce the model skill in simulating aerosols formed through aqueous-phase reactions as reported in Tuccella et al. (2012).All settings, including the physics schemes used in this study, are listed in Table 1, and the namelist can be found in the Supplement.We use the European Centre for Medium-Range Forecast (ECMWF) Interim reanalysis (ERA-Interim, Dee et al., 2011) with a horizontal resolution of 0.75 • × 0.75 • , temporal resolution of 6 h, interpolated to 37 pressure levels (with 29 levels below 50 hPa) as meteorological initial and lateral boundary conditions.This also includes the sea surface temperature, which is updated every 6 h.The data are interpolated to the model grid using the standard WRF preprocessing system (WPS).

Land use specification
An analysis of the USGS land use data commonly used in WRF showed that the land cover of the region around Berlin is not represented well.2004) (see also Table S1).Additionally, we distinguish between inland water bodies (USGS class 28) and other water bodies (USGS class 16).We map the urban land use classes in CORINE to three urban classes used in WRF-Chem, including "commercial/industry/transport" (USGS class 33), high (USGS class 32) and low (USGS class 31) intensity residential (Tewari et al., 2008), which can be characterised as follows: "low intensity residential" (31) includes areas with a mixture of constructed materials and vegetation.Constructed materials account for 30-80 % of the cover and vegetation may account for 20-70 % of the cover.These areas most commonly include single-family housing units, and population densities are lower than in high intensity residential areas."High intensity residential" (32) includes highly developed areas with a high population density.Examples include apartment complexes and row houses.Vegetation accounts for less than 20 % of the area and constructed materials account for 80 to 100 %.Commercial/industrial/transportation (33) includes infrastructure (e.g.roads, railroads) and all highly developed areas not classified as high intensity residential.
We implement the new land use categories as described in Tewari et al. (2008) (Fig. 2).In addition, we adjust the initialisation of the dry deposition of gaseous species to account for these new land use categories, as described in Fallmann et al. (2016).For the base run, we use the bulk approach of the land surface scheme, assigning the most abundant land use class within a model grid cell to the whole grid cell.In a sensitivity simulation, we test the mosaic approach (Li et al., 2013), allowing us to account for a heterogeneous land use classification within one model grid cell.Up to eight different land use types within one model grid cell are considered in our setup.

Urban parameters
We use the single-layer urban canopy model (Kusaka et al., 2001;Kusaka and Kimura, 2004) to account for the modified dynamics by cities, especially Berlin and Potsdam.The urban model takes into account energy and momentum exchange between urban areas (roofs, walls, streets) and the atmosphere and is coupled to the Noah land surface model.Surface fluxes (heat, moisture) and temperature are calculated as a combination of fluxes from urban and vegetated surfaces, coupled via the urban fraction assigned to the land use type of the grid cell (Chen et al., 2004).We choose to not use a more complex parameterisation of the urban canopy, such as the building effect parameterisation (BEP), because the computational cost is already very high at a horizontal resolution of 1 km × 1 km, and a more complex parameterisation of the urban canopy, along with the required increase of vertical model resolution, would increase the computational cost further and require a more detailed input dataset describing the urban structure.Moreover, the BEP is not applicable with the mosaic option in WRF so far and the only applicable planetary boundary layer (PBL) scheme in combination with the BEP and WRF-Chem is the Mellor-Yamada-Janjić scheme.This scheme often led to stronger biases in simulated 2 m air temperature than other parameterisations such as the YSU scheme (Hu et al., 2010;Loridan et al., 2013;Jänicke et al., 2016), the scheme selected for this study.In addition, Jänicke et al. (2016) could show that the BEP did not outperform simpler approaches such as the bulk scheme or the single-layer urban canopy model with respect to simu-lating 2 m temperature and that the PBL scheme had stronger influence on simulated 2 m air temperature than the urban canopy parameterisation.
In our base simulation, we use the default input parameters as specified in the look-up table included in the standard distribution of the WRF source code available from UCAR.For a sensitivity simulation (Sect.2.5), we calculate some of the urban input parameters to the model for Berlin (Table 2), which in previous studies have been found to be important.Geometric parameters include roof-level building height, standard deviation of the roof height, roof width and road width.The calculations are based on detailed maps of Berlin provided by the Senate Department for Urban Development and the Environment of Berlin.From the original data containing information on the location and number of floors of each house, the mean building height and the standard deviation of the building height is calculated assuming an average height of 3 m per floor, and the mean building length is calculated with the software QGIS, by calculating the surface area of each building geometry in the dataset and assuming its square root as each building's mean length.We combine these data with the CORINE land use data for Berlin mapped to the USGS classes (Sect.2.2), averaging these parameters over the parts of the city characterised by the same urban class.The maps further provide the location of individual road segments, which we use to calculate the total area covered by roads in Berlin.We combine this with the total length of all roads in Berlin (Berlin Senate Department for Urban Development and the Environment, 2011b) to obtain the average road width, which we assign to all three urban land use categories.We further update the urban fraction using a spatially more detailed classification of the land use types and the fraction of impervious surface of each area, provided by the Senate Department for Urban Development and the Environment of Berlin.Following Schubert and Grossman-Clarke (2013), we assume the urban fraction of a grid cell to be equal to the fraction of impervious surface.We then define the mean of impervious surface area, weighted by the area of the respective surface within each land use class as the updated urban fraction of the respective class.Following Fallmann et al. (2016) we use the values for thermal conductivity, heat capacity, emissivity and albedo of roofs, walls and streets specified in Salamanca et al. (2012).

Emissions
For the base run, anthropogenic emissions of CO, NO x , SO 2 , non-methane volatile organic compounds (NMVOCs), PM 10 , PM 2.5 and NH 3 are taken from the TNO-MACC III inventory, with a horizontal resolution of 0.125 • × 0.0625 • .The inventory is based on nationally reported emissions for specific sectors, distributed spatially based on proxy data.In comparison to version II of the inventory (Kuenen et al., 2014), version III includes, amongst other updates, an improved distribution of emissions especially around cities. The distribution was improved by no longer using population density as a default for diffuse (non-point-source) industrial emissions but using industrial land use as a distribution proxy.Residential solid fuel use (wood, coal) was allocated more to rural areas than to large city centres on a per capita basis.Seasonal, weekly and diurnal emission profiles for Germany are applied to the aggregated emissions.This, as well as the speciation of the different NMVOCs, is described in Mar et al. (2016) andvon Schneidemesser et al. (2016a).Mar et al. (2016) found that distributing emissions vertically did not strongly impact the model results near the surface.This, along with the low stack height of point sources within Berlin, is why in this study all emissions are released into the first model layer.As much of the NO x emitted within Berlin is emitted from diesel vehicles (off-road and on-road), which studies have shown to be composed of high proportions of NO 2 (e.g.Alvarez et al., 2008), NO x is emitted as 70 % NO and 30 % NO 2 (by mole).The latest available emis-sion dataset is for 2011, which is used in the 2014 simulations.Dust, sea salt and biogenic emissions are calculated online, the latter using the Model of Emissions of Gases and Aerosols from Nature (MEGAN v2, Guenther et al., 2006).
We perform a sensitivity simulation for testing the model sensitivity to the spatial resolution of the emission input data (Sect.2.5).As input to this sensitivity simulation, we downscale the anthropogenic emissions within Berlin onto a grid that is one-seventh of the original resolution, based on two proxy datasets, including traffic densities and population (Berlin Senate Department for Urban Development and the Environment, 2011a, b).Traffic densities are used to downscale all emissions from road transport, and population data are used to downscale emissions from industry, residential combustion and product use.Point sources are included in the grid cell within which the point source is located.In the TNO-MACC III inventory, all emissions from the energy industry within Berlin are point sources, and of the pointsource emissions from other industry sectors ca.55 % of the total emissions within Berlin for CO, 9-17 % for particulate matter and up to 1 % for other gases are included as point sources.Agricultural emissions within the city boundaries of Berlin are close to zero, which is why these are used at the original resolution.

Model simulations
Simulations are done for summer 2014 (31 May-28 August).We chose to simulate the summer of 2014, as this corresponds to the time period of the BAERLIN measurement campaign (e.g.Bonn et al., 2016).While mean observed temperatures in June and August showed little deviations from the observed 30-year mean  with mean temperatures of 17.0 • C (June) and 17.2 • C, the July mean temperature of 21.3 • C was 3.4 • C higher than the 30-year mean.Precipitation was 12 and 13 % lower than the 30-year mean in June (62.5mm) and July (60.2 mm), respectively, and it was 48 % lower than the 30-year mean in August, with 33.8 mm (Berlin Senate Department for Urban Development and the Environment, 2014a, b, c).
For the analysis, the first day of all simulations is discarded as spinup.A base run with the settings described above is done in order to evaluate the model performance in simulat-ing observed meteorology and atmospheric composition.In addition, sensitivity simulations done for this study are the following, with the changes applied to all three model domains of horizontal resolutions of 15, 3 and 1 km: -S1_urb: updated representation of the urban characteristics of Berlin (see Sect. 2.3 and Table 2); -S2_mos: consideration of the heterogeneity of the land use categories within one model grid cell (mosaic approach; see Sect.2.2); and -S3_emi: using emissions downscaled to ca. 1 km × 1 km (see Sect. 2.4).
The purpose of the sensitivity simulations is to assess which resolution and level of detail in the input data, including land use (S2_mos), emissions (S3_emi) and parameters characterising the urban area (S1_urb), are needed for simulating urban background air pollutant concentrations and their spatial distribution in the Berlin-Brandenburg area, particularly focusing on NO x .We particularly ask whether a horizontal model resolution of 1 km, together with the abovelisted specifications of the input data, leads to model results that differ from those obtained with a horizontal resolution of 3 km.
3 Observational data description and model evaluation procedure

Data description
In the following, we list the data and data sources that we use for evaluating the present WRF-Chem setup for Berlin and its surroundings.Table 3 gives an overview over all observational data and measurement stations in Berlin and its surroundings used in this study.

DWD stations
We use observations from the German Weather Service (DWD) for the variables of 2 m temperature, 10 m wind speed and direction and precipitation from stations within Berlin and Potsdam for 2014.A second-level quality control, as described in Kaspar et al. (2013), has been applied to the data.Additionally, we obtained mixing layer heights calculated from radiosonde observations directly from the DWD at the Lindenberg station south-east of Berlin, as described in Beyrich and Leps (2012).In addition, we use specific humidity data from the Global Weather Observation dataset provided by the British Atmospheric Data Centre (BADC) for the same stations.

TU stations
The Chair of Climatology of Technische Universität Berlin (TU) runs an urban climate observation network (Fenner et al., 2014), from which we use observations of 2 m air temperature to complement observations from DWD stations.We include this additional data source, as many of the TU stations are situated in urban built-up areas (see Table 3).We use quality-checked data aggregated to hourly mean values.

GRUAN network
The Global Climate Observing System Upper-Air Network (GRUAN) hosts radiosonde observations at high vertical resolution, of which we use observations of temperature in Lindenberg (Sommer et al., 2012) to compare them to the modelled profiles.The data used for this study are quality checked, processed and bias corrected as described in Sommer et al. ( 2012) and Dirksen et al. (2014).

UBA database and BLUME network
Legally required air quality observations in Germany are reported to the Federal Environment Agency (UBA).We use observations of PM 10 , PM 2.5 , NO 2 , NO and O 3 for 2014 reported to UBA.The data are collected from measurement networks operated by the federal states.In Berlin, the official measurement network is the BLUME network (Berliner Luftgüte-Messnetz), operated by the Senate Department for Urban Development and the Environment of Berlin.In addition to the data reported to the UBA database, we use PM 10 concentrations measured at three stations in Berlin and the 2 m temperature measured at the urban built-up station Nansenstraße from the BLUME network.

BAERLIN2014
The BAERLIN2014 ("Berlin Air quality and Ecosytem Research: Local and long-range Impact of anthropogenic and Natural hydrocarbons 2014") campaign took place in Berlin in summer 2014 and is described in detail in Bonn et al. (2016) andvon Schneidemesser et al. (2016b).For the present study, we use observations of PM 2.5 calculated from particle number concentrations collected near the Nansenstraße station of the BLUME network and observations of the mixing layer height collected at Nansenstraße with a ceilometer.In addition, filter samples taken at Nansenstraße were analysed for the composition of PM 10 (von Schneidemesser et al., 2016), which we use to compare to simulated aerosols.

Model evaluation procedure
In order to assess the model's skill in simulating observed meteorology, we compare the modelled (coarse domain) weather types with weather types calculated from the ERA-Interim reanalysis data for Berlin (Sect.4.1).The weather types are based on indices calculated to classify circulation patterns and are further described in Otero et al. (2016).We then focus on evaluating the modelled meteorology includ-  ing the following diagnostic variables: 2 m temperature (T 2), 10 m wind speed and direction (WS10 and WD10), the atmospheric structure via comparing temperature profiles and mixing layer height (MLH), as well as 2 m specific humidity (Q2) and precipitation.While T 2, WS10, WD10 and atmospheric vertical structure are important parameters for simulating atmospheric chemistry and aerosols, Q2 and precipitation will not have an impact on our results, as our setup does not include aqueous-phase chemistry or wet scavenging.However, we include Q2 and precipitation to complete the picture of the evaluation of simulated meteorology as well as to give an indication for future studies based on this setup.Finally, we evaluate the model performance for the main air pollutants including surface O 3 , NO x and PM, with a main focus on NO x .We evaluate the model results from all three domains with horizontal resolutions of 15, 3 and 1 km, which we also refer to as d01, d02 and d03.

Comparison with surface station data
The evaluation of surface parameters is based on statistical metrics including the Pearson correlation coefficient (r), the mean bias (MB) and the normalised mean bias (NMB).The metrics are defined as follows, with n the number of modelobservation pairs, M the modelled values, O the observations and σ the standard deviation of modelled or observed values: For the meteorological parameters, the metrics are calculated from instantaneous hourly modelled values and hourly averages of the observations.Wind speed is considered as a scalar and no metrics are calculated for wind direction.The O 3 , NO x and PM values are calculated from daily averages.
The NMB was only calculated for air pollutants and the mixing layer height.For ozone, we also consider the maximum daily 8 h mean (MDA8) concentrations, a metric used in the European Union's Air Quality Directive.
As an additional means of assessing the model performance, we look at conditional quantile plots (Carslaw and Ropkins, 2012) for some species.The conditional quantile plot displays the model results, split into evenly spaced bins, in comparison to observations temporally matching the values in the model result bins.Thus, it gives additional insight into how well the modelled values agree with the observations, e.g. on the range of modelled and observed values.
For the comparison between model and observations, we classify the stations in terms of their surroundings, distinguishing between urban built-up, urban green and rural areas for the meteorology observations, and between urban background, suburban background and rural areas for air quality observations, excluding those from traffic stations.

Evaluation of the atmospheric structure
The mean modelled temperature profiles are compared to observations from radiosondes as follows: as the observed temperatures have a much higher spatial resolution than the model, we select a subset of the observations for comparison with the model.For every modelled temperature profile at 00:00, 06:00, 12:00 and 18:00 UTC, we select the observations closest to the modelled geopotential height of each model level.The time averaging of modelled geopotential heights is done as follows: we divide the values into vertical bins corresponding to the 5th, 10th, 15th percentiles and so on, until the 95th percentile of the modelled geopotential height, and average the temperature as well as the geopotential height over each bin for both model and observations, and over each day of the modelled period.Even though observations of temperature profiles are only available outside of the urban area of Berlin, we include this comparison in order to get a general impression of how the model performs in simulating the vertical atmospheric structure in the lowest 2-3 km.
The modelled MLH is compared to observations in two different ways: firstly, using the planetary boundary layer height directly diagnosed by WRF-Chem, which in the YSU scheme is calculated based on comparing the Richardson number with a critical value of 0 (Hong et al., 2006).Secondly, by calculating the MLH from the simulated profiles of temperature, wind speed and humidity, defining the mixing layer height as the height where the Richardson number is 0.2, following Beyrich and Leps (2012).This corresponds to the method the MLH is derived from using radiosonde observations at Lindenberg.Table 4. Statistics of hourly 2 m temperature for JJA for stations, where the land use class of the respective grid cell changes with resolution."LU" refers to the WRF land use class of the grid cell in the respective domain, "Obs" refers to the JJA observed mean, "Mod" refers to the JJA modelled mean for the respective grid cell.MB is the mean bias for JJA and r is the correlation of hourly values.Obs, Mod and MB are in • C. The statistics are shown for the results from the model domains of 15 km (d01), 3 km (d02) and 1 km (d03) horizontal resolution.Generally, the modelled weather types (see Sect. 3.2) are consistent with those derived from the reanalysis (Fig. 3).Periods in which WRF-Chem weather types disagree with ERA-Interim weather types never exceed two subsequent days and the frequency of WRF-Chem weather types agrees similarly well with ERA-Interim weather types.
The temporal correlation of modelled hourly 2 m temperature with observations is between 0.88 and 0.91 at all stations in and around Berlin and all model domains (Tables 4 and S3 in the Supplement), which shows that the model represents the observed temperature variability well.This is supported by the analysis of the conditional quantiles (Fig. 4), which show that the modelled temperatures match the observations well for a wide range of values.The model is generally biased positively with up to +1.6 • C, though the bias at most stations is smaller than +1 • C (Tables 4 and S3).In absolute terms, this is within the same range, but never larger than the biases that Trusilova et al. (2016) and Schubert and Grossman-Clarke (2013) found using COSMO-CLM in combination with different urban canopy models for Berlin.Besides, the absolute mean biases are comparable to those reported by Jänicke et al. (2016), who mainly found negative biases in near-surface air temperature applying WRF 3.6.1 for Berlin and its surroundings, testing two planetary boundary layer schemes and three urban canopy models.
The histogram in the conditional quantile plot and the extent of the blue line marking the "perfect model" show that WRF-Chem does not reproduce the highest observed temperatures.This suggests that the model might have difficulties in simulating pronounced heat wave periods.However, comparing the modelled daily maximum temperatures to the observed daily maximum temperatures (Tables 5 and S4) shows that the bias of the daily maximum temperatures is of a similar magnitude as the mean bias, with one difference: while the bias of maximum temperatures modelled with 3 and 1 km resolutions is mainly positive, the bias of the maximum temperatures modelled with a 15 km resolution is negative.In absolute terms, the bias of the daily maximum temperatures is smallest for results obtained with a 1 km resolution, though they only differ very little from the results obtained with a 3 km resolution.
We find two important relationships with respect to model resolution: firstly, the model simulates higher temperatures in the model domain of which the model grid cell land use type is urban (stations Kaniswall, Dahlemer Feld, Marzahn, Schönefeld).Secondly, while the modelled 2 m temperatures Table 5. Statistics of daily maximum 2 m temperature for JJA for stations, where the land use class of the respective grid cell changes with resolution."LU" refers to the WRF land use class of the grid cell in the respective domain, "Obs" refers to the JJA observed mean, "Mod" refers to the JJA modelled mean for the respective grid cell.MB is the mean bias for JJA and r is the correlation of hourly values.Obs, Mod and MB are in • C. The statistics are shown for the results from the model domains of 15 km (d01), 3 km (d02) and 1 km (d03) horizontal resolution.generally differ between the 15 and 3 km resolution even if the land use type of both grid cells in which the station is located is the same; the June-July-August (JJA) mean modelled temperature only changes by more than 0.1 • C between the 3 and 1 km resolution if the land use type changes (stations Bamberger Straße, Nansenstraße, Schönefeld).This indicates that switching from a horizontal resolution of 15 to 3 km might improve the spatial distribution of modelled temperatures, while switching from a horizontal resolution of 3 to 1 km has only a very little effect on improving the model's skill in simulating the observed temperature, but might be more beneficial if the land use input data are specified with a higher level of accuracy.The comparison of simulated with observed temperature profiles (Fig. 5) shows that the model reproduces the observed temperature profile well at all times, but that the modelled temperature profile at 12:00 UTC is shifted to higher temperatures by ca. 1 • C. The result is similar for all model resolutions (the profiles for the 15 km and 3 km resolutions can be found in the Supplement in Figs.S1 and S2).In order to further evaluate how the present WRF-Chem setup simulates the observed vertical structure, we compare the simulated mixing layer height derived from simulated profiles of temperature, wind speed and humidity (in the fol- lowing also referred to as MLH-calc) to the mixing layer height derived from radiosonde observations at Lindenberg as described in Beyrich and Leps (2012) (Fig. 6).The results show that the model simulates the observed diurnal cycle of the MLH as well as the magnitude of the observed MLH at Lindenberg reasonably well: the bias of the daily mean MLH ranges between +87 m (13 %) and +113 m (16 %), Table 6.Statistics of daily minimum, mean and maximum mixing layer height for JJA."Obs" refers to the JJA observed mean, "Mod" refers to the JJA modelled mean for the respective grid cell.MB is the mean bias for JJA, NMB refers to the normalised mean bias and r is the correlation of hourly values.The values given in the column "YSU" refer to the MLH diagnosed directly by WRF-Chem, while "Calc" refers to the MLH calculated from modelled profiles of temperature, wind speed and humidity.Obs, Mod and MB are given in metres and NMB is given in %.The statistics are shown for the results from the model domains of 15 km (d01), 3 km (d02) and 1 km (d03) horizontal resolution.depending on model resolution, and the biases of the daily maximum and daily minimum are between +268 m (19 %) and +347 m (25 %) and between +26 m (14 %) and +48 m (26 %), respectively (Table 6).There is no consistent trend with increasing model resolution.It is important to note that these results refer to the MLH that we calculated from simulated profiles of temperature, wind speed and humidity.However, the MLH diagnosed by the model, in the following also referred to as MLH-YSU, underestimates the observations especially during nighttime (Fig. 6), with a bias of the daily minimum MLH between −99 m (−53 %) and −113 m (−60 %), or a MLH lower than the calculated one between −128 and −214 %.Differences between the different ways of deriving the MLH for daily maximum values are less pronounced, ranging between 24 m (1 %) and 73 m (4 %).This leads to the conclusion that the model generally simulates the atmospheric structure well, but that the planetary boundary layer scheme underestimates observed MLH during nighttime.Similarly, this indicates that the mixing might also be underestimated by the boundary layer scheme during nighttime conditions.

Station
Comparing the model results to ceilometer observations from Berlin at the Nansenstraße station also indicates that the diurnal variation is reproduced correctly (Fig. S9 in the Supplement).The comparison of daily minimum MLH with ceilometer observations also shows an underestimation of MLH-YSU in the same range as at Lindenberg.However, we do not know whether the magnitude of the mixing layer height derived from the ceilometer backscatter profile is directly comparable with the mixing layer height calculated from profiles of temperature, wind speed and humidity or with the mixing layer height calculated by the model.This makes it more difficult to evaluate the modelled mixing layer height quantitatively at the urban site Nansenstraße.For this, further studies assessing the comparability of MLH derived from radiosonde and ceilometer observations would be necessary.Simulated hourly wind speed correlates with observations with a correlation coefficient between 0.5 and 0.6 (Table S5 in the Supplement), which is comparable to simulations for the European domain (Mar et al., 2016).Wind speed is overestimated between 0.4 m s −1 (15 %) and 1.4 m s −1 (50 %), depending on the station.The overestimation is especially strong at stations with mean observed wind speeds below 3 m s −1 , as well as for a period of easterly winds in mid-July (Fig. 7).The most frequently observed wind direction at three stations in Berlin and in Potsdam in June, July and August 2014 is westerly.This is reproduced by the model, with better skill with increasing resolution (Fig. 8).Depending on the modelled wind direction, the bias in wind speed differs: while the bias (averaged over all four stations) is lower than 1 m s −1 for modelled wind from north to south-east, the bias is larger for wind simulated from east and north-east.In addition, the conditional quantile plot of wind speed, split by modelled wind direction, also shows that the model's skill in simulating wind speed from west and south-west is higher (see Fig. S3 in the Supplement).
Both the diurnal variability and the magnitude of specific humidity are simulated well by the model, with normalised mean biases between −7 and +7 % and correlation coefficients of 3-hourly values of around 0.8 (not shown).Precipitation is simulated well with the 3 and 1 km horizontal resolution: both the number of days with precipitation rates larger than 0.01 mm h −1 and the total amount of precipitation in the simulated period agree well with the observations (Fig. 9).Model results from the 15 km resolution overestimate the number of days with precipitation larger than 0.01 mm h −1 by ca. 30 % and the amount by ca.50 %.This shows that the  higher-resolved domains in the nested setup, using the Grell-Freitas cumulus scheme on all domains, improve the skill in simulating precipitation, which is an important conclusion for future studies with a similar setup aiming at including aqueous-phase chemistry and wet scavenging.

Nitrogen oxides and ozone
The mean bias of modelled NO x depends on the type of observations that it is compared with (Table 7): for rural sites close to Berlin and Potsdam, it is biased positively.Modelled NO x at urban background sites is mainly biased negatively, while the bias is positive or negative at suburban background sites.The maximum bias of all sites (Table 7) is improved with increasing spatial resolution from 15 to 3 km, from +11.9 to +5.3 µg m −3 (rural), +9.3 to +6.7 µg m −3 (suburban background) and −6.7 to −5.7 µg m −3 (urban background).This indicates that generally a horizontal resolution of 3 km is better suited to resolve the spatial NO x patterns within a city of the size of Berlin even with emission input data coarser than 3 km, which is in line with the results of Tie et al. (2010) for Mexico City.A 15 km resolution is not suffi-cient to resolve the differences between rural and urban concentrations (Fig. 10).Comparing the mean bias between the 3 and 1 km resolutions further shows that, with an emission inventory of 7 km horizontal resolution, the 1 km resolution does not generally improve the results.As a first step for model-based assessments of urban NO x concentrations, it is important to be able to simulate daily maximum urban background NO x concentrations well.In order to assess the model's skill in reproducing these concentrations, we compare modelled diurnal cycles of NO x to observed diurnal cycles (Fig. 11).The comparison shows that the WRF-Chem setup presented here is not able to simulate the observed diurnal cycle at any of the three resolutions, overestimating NO x concentrations during nighttime and underestimating during daytime, not capturing the peak in observed concentrations due to increased traffic densities in the morning and evening hours.The main reason for the night-  the multi-mechanism mean was only of the order of a few per cent for summertime conditions simulated with RADM2, which is the mechanism used in this study.A further reason for the model bias might also be the principal challenge of comparing grid cell averages with point observations, particularly in regions with a high variability on small spatial scales, which is quite typical for cities.Regarding the relatively coarse vertical resolution of the model, extrapolation from the first model level to the surface (e.g. Simpson et al., 2012) might allow for a better comparability between model and observation.The spatial representativeness of a measurement site for a larger area such as the 1 km × 1 km grid cells, however, might be somewhat limited particularly for urban background sites, which can be influenced by local sources and sub-grid-scale variations in emissions that cannot be captured with WRF-Chem.
Additionally, we compare the simulated NO and NO 2 to observations as described in Sect. 3.1 (Figs. 11,S6,S7 and Tables S6,S7 in the Supplement).As for NO x , the bias of modelled NO depends on the station type.For suburban and urban background stations, NO is on average mainly biased negatively up to −2.5 µg m −3 (−60 %), while it shows a positive bias at some of the rural stations.Part of this negative bias is due to a lower detection limit in the observation data ranging between 0.1 and 2 µg m −3 depending on the station.While this is not the main contribution to the bias in NO x , it does play a larger role when only looking at NO, as for some of the stations a large share of the observed hourly values lies at or below this threshold both in the observed and modelled data (up to 94 %).The diurnal cycle of NO is modelled in good agreement with the observations, but the peak values are underestimated (Fig. 11).Especially for urban sites, the bias is larger when simulated with a 15 km resolution than with 3 and 1 km resolutions.Modelled NO 2 is on average mostly biased high, with up to 11.1, 5.3 and 4.5 µg m −3 for rural sites and up to 10.2, 7.3 and 6.5 µg m −3 for suburban sites (15, 3 and 1 km resolution).Urban background sites are both biased high and low.It is important to note that the positive bias always results from overestimations during nighttime, while daytime NO 2 , as total NO x , is always biased low, though with a smaller daytime bias for suburban and rural sites than for the urban background.These results are in line with what has been discussed for NO x above and indicate that, in addition to the model resolution, the resolution of emissions might play an important role for simulating daytime NO x concentrations in cities, as more NO x is emitted near streets than at the edges of the city, which can hardly be captured with emission input data of a horizontal resolution of 7 km.
O 3 daily means and especially MDA8 ozone are underestimated by the model (Fig. 11 and Table S8), with biases of up to ca. −10 µg m −3 (mean) and −13 µg m −3 (MDA8).This is consistent with what has been reported for a coarse European domain using RADM2 chemistry (Mar et al., 2016) and in line with previous studies showing a deficiency of many online-coupled models, including WRF-Chem with the RADM2 chemical mechanism, in simulating peak ozone concentrations (e.g.Im et al., 2015a).Mar et al. (2016) suggested that the low bias in modelled ozone could be partially explained by the inorganic rate coefficients used in the RADM2 mechanism.Furthermore, it is in line with studies identifying the choice of chemical mechanism as a reason for differences in simulated ozone concentrations (e.g.Coates and Butler, 2015;Knote et al., 2015).The choice of chemical mechanism, but not so much the modelled meteorology being an important cause of this bias is further supported by the fact that maximum temperatures are generally simulated well by the model, and MDA8 ozone is underestimated even when daily maximum temperatures are simulated correctly.The mean O 3 is still simulated reasonably well, though the model underestimates at night and overestimates during the morning hours.The bias is consistent with a bias in NO x diurnal cycles discussed above: in particular, the underestimation of O 3 during nighttime is consistent with an overestimation of NO x ; the overestimation of O 3 in the morning hours might result from too much NO 2 accumulating at the surface, which is photolysed when the sun rises.

Particulate matter
The mean bias of the simulated PM 10 amounts to −50 % (Fig. 12 and Table S9 in the Supplement), which is relatively consistent at all eight stations within and around Berlin as well as at all three model resolutions.Modelled PM 2.5 concentrations are biased between −20 and −35 % (Fig. 12 and Table S10 in the Supplement).From previous studies with the MADE/SORGAM aerosol scheme it is known that it underestimates the secondary organic aerosol contribution to PM (Ahmadov et al., 2012).Comparing the JJA-averaged model output to components of PM 10 observed at Nansenstraße during the BAERLIN2014 campaign is in line with these results: while the observations show a mean concentration of organic carbon of 5.6 µg m −3 , the modelled particulate organic matter, including organic carbon, is on average 0.8 µg m −3 .In addition, the comparison shows that the contribution of black carbon (BC) to PM might be underestimated, with observed elemental carbon (EC) concentrations of 1.4 µg m −3 on average and mean modelled BC concentrations of 0.2 µg m −3 , though the modelled value is still within the range of observed values in individual samples.The underestimation of organic carbon (OC) and, to a lesser extent, BC being causes of the underestimation of PM 10 is supported by the fact that, on average, model results compare reasonably well with the observations of other components of PM 10 : modelled sulfate, nitrate and ammonium amounts to 1.8, 0.5 and 0.7 µg m −3 , while the mean observed concentrations are 1.9, 0.9 and 0.6 µg m −3 .Modelled sea salt amounts to 1.0 µg m −3 , and observed sodium and chloride are 0.5 and 0.6 µg m −3 , respectively.An additional underestimation of mineral dust or re-suspended road dust emissions, such as brake and tyre wear, primarily contributing to PM 10 , might explain why PM 10 is underestimated more than PM 2.5 .As for the simulated chemical species, part of the bias might be due to a somewhat limited comparability of grid-cell-averaged particulate matter with observations at a measurement site.It should further be noted that the bias of PM 2.5 daily means varies throughout the simulated period, with the concentrations being biased more negatively in periods where the wind speed is overestimated more strongly.This underlines that the correct simulation of meteorological parameters in the online-coupled model WRF-Chem plays an important role in simulating aerosols.The correlation of modelled daily mean PM 10 concentrations with observations ranges from 0.26 to 0.46 for the 15 km resolution, from 0.31 to 0.51 for the 3 km resolution and from 0.34 to 0.56 for the 1 km resolution.Correlations of simulated PM 2.5 daily means also fall into this range except at two urban background sites, Brückenstraße and Amrumer Straße, where the correlation coefficient is between 0.17 and 0.26 at all resolutions.

Sensitivity studies
In this section, we address whether the skill in simulating meteorology (T 2, WS10, MLH) is improved when updating the urban parameters and specifying land use classes on a sub-grid scale, as well as whether this has an impact on the skill in simulating NO x concentrations.Furthermore, we analyse whether using a higher-resolved emission inventory leads to differences in simulated NO x concentrations with horizontal model resolutions of 3 and 1 km.We focus on NO x , since as mentioned before, the bias found in the base run mean ozone concentrations and maximum daily 8 h ozone is likely not due to the simulated meteorology or resolution of emissions.Similarly, the bias of model results for PM 10 and PM 2.5 is mainly due to an underestimation of secondary organic aerosols by the aerosol mechanism as well as missing emissions and potentially also the vertical resolution as previously discussed.

Changes in meteorology in S1_urb and S2_mos
The positive bias in T 2 found in the model results at many sites is decreased for urban areas if the input parameters to the urban scheme are specified based on data describing the city of Berlin (simulation S1_urb, Table 4), which is mainly due to the fact that T 2 is overall simulated lower for urban areas in this sensitivity simulation.Specifically, there is only one site within the urban area (among all urban built-up and urban green stations) for which the model results with the 1 km horizontal resolution (d03) are biased more than ±1 • C (S1_urb, d02: 3 stations; base run, d03: 3 stations; base run, d02: 6 stations).Likewise, the simulation of daily maximum temperatures is improved.The results from this sensitivity simulation, similarly to the results from the base run, show that the differences between the results of the 3 and 1 km resolutions are largest if the urban class of the grid cell changes with changing resolution, though overall the results of the 1 km resolution match the observations slightly better than the results obtained with the 3 km resolution (Table 4).Even though on average the temperature bias is lower in S1_urb than for the base run, the conditional quantile plots show that the highest observed values are still not captured by the model (Fig. 4).Using the mosaic option of the land surface scheme, and thereby taking into account the sub-grid-scale variability of the land use classes within one model grid cell (simulation S2_mos), has a similar effect on simulated T 2 as in S1_urb: overall, simulated T 2 is lower than in the base run, which leads to a decrease in T 2 bias compared to observations.Furthermore, it leads to the results from the 1 and 3 km resolutions being more similar even at sites with different land use categories, which is referred to as grid convergence by Li et al. (2013) and might indicate that a resolution higher than 3 km is not needed in this case.The conditional quantile plots (Fig. 4) underline these results, showing almost identical median values and distributions for the 1 and 3 km resolutions, and furthermore reveal that the temperatures simulated with the 15 km resolution resemble the results with 3 and 1 km resolutions more than in any of the other simulations.At the 15 km model resolution and when applying the mosaic option, gradients at the edges of the city are resolved better than in the other simulations at the 15 km resolution, which is expressed through a lower mean bias at sites at the boundaries of Berlin.An important limitation using this option is the simulated daily maximum T 2, which is underestimated at most stations (Table 5).This feature was also found by Jänicke et al. (2016) for Berlin and its surroundings when applying the single-layer urban canopy model in combination with the mosaic approach and indicates that T 2 might be decreased too much when using this option.
There is no observational data from radiosondes available within the city, which is why we cannot draw conclusions on the importance of updating the urban parameters or using the mosaic option for urban areas from comparisons with observed profiles of temperatures or MLH.However, knowing that the MLH diagnosed from WRF-Chem (MLH-YSU) is biased low in the base run during nighttime, we compare JJA mean nighttime (20:00-02:00 UTC) MLH from the base run and S1_urb as well as S2_mos (Fig. 13).The results show that the nighttime MLH-YSU is simulated on average up to ca. 30 m lower in S1_urb than in the base run for most grid cells with the land use type low intensity residential.It is simulated higher than in the base run for grid cells with the land use type high intensity residential and commercial/industry/transport.This shows that the urban parameters can strongly influence the meteorology simulated in urban areas and suggests that they might have to be further refined for simulating the urban atmospheric structure correctly.
The nighttime MLH simulated with S2_mos is up to ca. 70 m lower than in the base run for urban areas, which is an even larger reduction than in S1_urb.As for S1_urb, grid cells with the dominant urban classes being high intensity residential and commercial/industry/transport have a higher MLH-YSU than other urban grid cells, though this effect is smoothed through the use of the mosaic option.
The bias in 10 m wind speed is reduced in S1_urb, ranging from +0.3 m s −1 (10 %) to +1 m s −1 (34 %) depending on the station (Figs. 7, 8 and Table S5).The bias is especially de-creased for two periods in mid-June and mid-August, where observed daily mean wind speeds are between 5 and 6 m s −1 , which is relatively high compared to the rest of the simulated period.In the base run, the model overestimates the observations during these periods, which is not the case in S1_urb.Similarly, the wind speeds during the periods in mid-July with easterly wind, where the base run strongly overestimates wind speeds, are biased by ca.1-2 m s −1 less (Fig. 7).The histograms in the conditional quantile plots further shows that the range of modelled wind speeds from S1_urb matches the range of observed wind speed better than in the base run (Fig. S4 in the Supplement).
Similar to S1_urb, the bias in wind speed is decreased in S2_mos, ranging from below +0.1 m s −1 (2 %) to +1.2 m s −1 (40 %) (Figs. 7,8,S4 and Table S5 in the Supplement).However, it should be noted that unlike for S1_urb, where the decrease in wind speed is distributed evenly throughout the day, wind speed in S2_mos is especially lower during nighttime, while maximum diurnal wind speeds are similar to those simulated in the base run (not shown).
Overall, the results show that when using a model setup with highly resolved nests, the simulated meteorology seems to be improved both by specifying land use input data and urban parameters for the simulated region and when using the mosaic option, though the biases in the diurnal cycles of T 2 and wind speed are reduced more in S1_urb.Particularly the differences between S1_urb and the base run for grid cells with land use types high intensity residential and industry/commercial/transport reveal that the specification of urban parameters can contribute to improving the model bias also in MLH.The results from S2_mos show that the mosaic option might be a useful alternative if computational resources are too limited to include higher-resolved nested domains.

Impact of meteorology changes on simulated NO x concentrations
Mean NO x concentrations simulated with S1_urb are generally higher than those simulated with the base run, with the difference between S1_urb and the base run for grid cells of the measurement stations of up to 9 % (15 km resolution), up to 13 % (3 km resolution) and up to 18 % (1 km resolution).Thus, the positive bias which has been found in the base run is increased in S1_urb.For all three domains, the differences are larger for urban grid cells.An analysis of the diurnal cycles reveals that these differences are mainly due to higher nighttime NO x concentrations in S1_urb (Fig. 11).This is consistent with previous results: an underestimation of MLH by the model (MLH-YSU) during nighttime leads to an overestimation of NO x .An even lower MLH in this sensitivity simulation (Sect.5.1) explains nighttime NO x concentrations being higher than in the base run.The overestimation of nighttime NO x might be further reinforced by lower simulated wind speeds in S1_urb.Daytime NO x , which we define as NO x concentrations between 07:00 and 17:00 UTC, changes only little in S1_urb compared with the base run at urban background stations in Berlin: results with a 3 km horizontal resolution show an increase in daytime NO x in S1_urb between 2 and 5 % and an increase between 5 and 7 % with a 1 km resolution compared to the base run.Results for simulated NO x from S2_mos are consistent with the results from S1_urb: simulated nighttime NO x is even higher than that simulated in the base run and in S1_urb, which is consistent with the larger difference between MLH-YSU simulated with the base run settings and within S2_mos.Daytime NO x changes even less in S2_mos compared to the base run, with changes between −1 and +2 % (3 km resolution) or +3 to +5 % (1 km resolution).
Overall, the results underline that the underestimation of mixing in the boundary layer is likely to have a strong influence on simulated nighttime NO x concentrations in urban areas, which is not corrected using the mosaic option or specifying the input parameters to the urban scheme.However, since the simulated MLH is sensitive to the change in urban parameters for high intensity residential and commercial/industry/transport urban areas, it shows that this could potentially have an impact on simulated NO x concentrations.The results from both S1_urb and S2_mos show that daytime NO x is influenced little by changes in the modelled meteorology, suggesting that the bias in daytime NO x is due to emissions that are too low or an incorrect distribution of emissions resulting from a resolution of the emission inventory that is too coarse, as mentioned in Sect.4.2.As previously mentioned, a further reason for this bias might be limitations in comparability between grid-cell-averaged simulated concentrations and point observations near the surface.

Resolution of the emission inventory
Evaluating the base run (Sect.4), we found that the improvement in simulating NO x concentrations with a 1 km horizontal resolution, as compared to a horizontal resolution of 3 km, is negligible when using emission input data at 7 km horizontal resolution.This result changes when providing emission input data with a horizontal resolution of ca. 1 km as described in Sect.2.4 (Fig. 10): the model is then able to resolve small-scale air pollution patterns and hotspots, which cannot be resolved at a horizontal resolution of 3 km.A comparison of the results for the urban background stations within Berlin (Amrumer Straße, Belziger Straße, Nansenstraße, Johanna und Willi Brauer Platz, Brückenstraße) helps to illustrate this: in order to minimise the bias by too little nighttime mixing, we only compare daytime (07:00-17:00 UTC) NO x simulated with 3 and 1 km horizontal resolution and downscaled emissions.Going from a 3 to a 1 km resolution, daytime NO x changes by +40, +12, −25, +16 and +161 % in S3_emi for the above-mentioned urban background sites, respectively (Fig. 11).As a comparison, the respective changes from the base run are +3, +1, −8, −3 and −3 %.This shows that a 1 km horizontal model resolution only leads to different results from a 3 km horizontal resolution when also using highly resolved emission input data.
Furthermore, the results from the above-mentioned urban background stations show that emissions that are too low within the city (either due to emissions that are too low overall or locally because of a coarse resolution of the emission inventory) can be a cause of the bias in daytime NO x concentrations.To illustrate that, we compare the daytime NO x concentrations from the base run and S3_emi.Using the original emissions, the emissions summed up over JJA in the grid cell where the respective station is located are 7.0, 5.4, 6.9, 3.1 and 7.0 t km −2 for the above-mentioned urban background stations, respectively, and 22.4, 8.4, 6.2, 2.5 and 79.9 t km −2 in the downscaled emission data.It should, however, be noted that, though downscaling of the original emissions can lead to a decrease in emission strength in some of the urban grid cells, it generally results in an increase in the city centre and a decrease in the suburban areas.This is due to the population density and the traffic density, which are used as proxies for the emission downscaling, being higher in the city centre.Using the downscaled emission data leads to an increase in simulated daytime NO x of 23, 22, 52, 20 and 51 % (3 km resolution) or 68, 36, 24, 44 and 308 % (1 km resolution) at the above-mentioned urban background stations, as compared to the base run.This shows that, despite small decreases in emissions in some of the grid cells, the generally increased NO x emissions in the city centre led to increases in simulated NO x concentration at all five sites.This result indicates that the downscaled emissions might be more suitable to represent gradients in emissions in the urban area, contributing to correcting the bias in simulated daytime urban NO x in the base run.
A comparison of results from S3_emi with observations at Brückenstraße (Table 7) shows that locally the bias in simulated NO x concentrations can increase strongly.While for most urban background stations in Berlin using the downscaled emissions improves both the bias of mean NO x and the bias of daytime NO x , the example of Brückenstraße shows that further modifications to the emission downscaling and processing might be necessary when simulating local NO x patterns: at the Brückenstraße site, the mean bias increases from −4 µg m −3 (1 km resolution, base run) to +26 µg m −3 (1 km resolution, S3_emi).The large overestimation is due to a point source being close to the site and the way point sources have been treated: as mentioned in Sect.2.4, point-source emissions are all released into the first model layer.Furthermore, the point-source emissions are distributed as area sources at the resolution of the emission inventory.This results in much higher emissions over a much smaller area in the downscaled emission inventory, locally increasing the concentrations in the vicinity of point sources.Likewise, the comparison of simulated and observed concentrations at rural and suburban sites just outside of Berlin shows that the model skill suffers from the lack of proxy data specifying the spatial distribution of emissions directly outside of Berlin.
Generally, comparing the results from the base run with the results from S3_emi leads to several conclusions: when simulating NO x concentrations in urban areas, a higher horizontal model resolution can be beneficial if an emission inventory of similarly high resolution is available.However, using a highly resolved emission inventory for a model domain with a similarly high resolution is only beneficial for improving the comparability with observations and the application to local studies if the emission inventory is of sufficient spatial precision.The downscaling approach presented here shows how locally highly resolved emissions can be calculated effectively and consistently by combining a readily available emission inventory with data available for many urban areas, such as population and traffic densities.Our results suggest that a further refinement of the proxy data could be useful, e.g. using proxy datasets covering more than the urban area itself.Further refinements could consist in using the housing type (or high population density as an indication for high-rise buildings) for better distributing residential heating emissions.As for the vertical distribution of emissions, as well as an increased vertical model resolution, Mar et al. (2016) state it has little impact on the model results.While this might hold for simulations of rural background air quality with domain resolutions of the order of 45 km, the present results suggest that it is of higher relevance to distribute point-source emissions into several vertical model levels when decreasing the model resolution and the resolution of the emission input data.Similarly, increasing the vertical model resolution at the same time might both help distribute emissions better and improve the modelled mixing.

Summary and conclusions
In this study, we evaluate a WRF-Chem setup for the Berlin-Brandenburg area with three nested model domains of 15, 3 and 1 km horizontal resolutions for 3 months in summer 2014.The results show that the model generally simulates meteorology well, though urban 2 m temperature and urban wind speeds are biased high and nighttime mixing layer height is biased low in the base run.On average, ozone is simulated reasonably well, but maximum daily 8 h mean concentrations are underestimated, which is consistent with the results from previous modelling studies using the RADM2 chemical mechanism.Particulate matter is underestimated, which is at least partly explained by an underestimation of secondary organic aerosols and consistent with previous studies.NO x concentrations are simulated reasonably well on average, but overestimated during nighttime and underestimated during daytime especially in the urban areas.
We specifically assess how the skill in simulated NO x is influenced by the model resolution, the prescribed emissions and the simulated meteorology, in turn depending on the model resolution, land use input data to the model and the parameterisation of the urban structure.This is done with three sensitivity simulations, including updating the representations of the urban structure within the urban canopy model (S1_urb), taking into account a sub-grid-scale parameterisation of the land use classes (S2_mos) and downscaling the original emission input data from a horizontal resolution of ca.7 to ca. 1 km (S3_emi).
For the base model run, a horizontal resolution of 1 km did not generally improve the results compared to a model resolution of 3 km.Furthermore, the mosaic option of the Noah land use model, enabling a sub-grid-scale parameterisation of the land use classes, led to a convergence of the results at the different model resolutions rather than an improvement of the results at the 1 km model resolution.However, this study has shown that a 1 km horizontal model resolution can be very valuable for simulating urban background air quality in the Berlin-Brandenburg region with small modifications, including a better representation of the nighttime mixing layer height in the model, a more detailed specification of urban land use together with the respective input parameters to the urban canopy model and a better spatial representation of urban emissions.
The simulation of the urban boundary layer height is crucial for correctly simulating diurnal cycles of NO x .In the base run, daily minimum (nighttime) mixing layer height simulated by the model is lower than observations outside of the urban area by more than 50 % on all domains.This is consistent with a strong modelled overestimation of NO x during nighttime.However, when calculating the mixing layer height from modelled profiles of temperature, wind speed and humidity the nighttime bias decreases from ca. +8 to ca. 26 %.Daily maximum mixing layer height is biased less, and the difference is smaller between the two different approaches of calculating the mixing layer height.This indicates that the calculation of the urban boundary layer height and nighttime mixing in the model might need to be adapted to better represent observed conditions during nighttime.
A more detailed specification of urban land use classes together with the respective input parameters can help better represent the heterogeneity of urban area in a model domain with 1 km horizontal resolution.This is shown by the modelled 2 m temperature only differing by more than 0.1 • C between the model resolutions of 3 and 1 km if the land use class of the respective grid cell changes.It is further shown by the simulation with updated urban parameters decreasing the positive bias in simulated wind speed in the base run by up to 0.5 m s −1 , from a mean bias in wind speed up to 1.5 m s −1 in the base run to a mean bias in wind speed of maximally 1 m s −1 in the sensitivity simulation where urban parameters have been updated.In addition, the nighttime mixing layer height is simulated higher in this sensi-tivity simulation for grid cells of the urban types high intensity residential and commercial/industry/transport, suggesting that the negative bias in mixing layer height during nighttime can also be corrected by better specifying the input parameters to the urban scheme and the urban land use classes.Further studies could target a comparison between the urban parameterisation used in this study with the more complex -and computationally expensive -approach of representing the urban meteorology with the building effect parameterisation (BEP) urban canopy model combined with a higher vertical resolution of the boundary layer.
When downscaling the emissions from a horizontal resolution of 7 to 1 km based on proxy data for Berlin, including population density and traffic densities, local pollution patterns can be resolved better with a model domain with a horizontal resolution of 1 km, compared to 3 km.A particular strength of this approach is its effective and consistent combination of a readily available emission inventory and locally available data, which can be applied generically to urban areas.In order to further refine this approach, the downscaling of the coarse emission inventory could be extended especially at and beyond the boundaries of the urban area, or the proxy data for industrial and residential heating emissions could be further refined.Alternatively, a highly resolved local bottom-up emission inventory can help increase the model's skill when simulating with a horizontal resolution of 1 km.In addition, the results have shown that a more detailed treatment of point-source emissions including their vertical distribution, as well as the vertical model resolution itself, could become important when going to a horizontal model resolution of 1 km.
Overall, these results can build a basis for the design of future air quality modelling studies over the Berlin-Brandenburg region and other European urban agglomerations of similar extent.The above-mentioned suggested modifications to the setup are based on data which, to a large extent, are available or easily producible for the Berlin-Brandenburg region and other European urban areas.Considering these modifications, we find the presented WRF-Chem configuration at a 1 km horizontal resolution a suitable setup for simulating urban background NO x concentrations, when used together with the single-layer urban canopy model with input parameters specified for the city of interest and combined with emission input data of a similar resolution as the model domain.

Figure 3 .
Figure 3.Comparison of weather types for Berlin calculated from ERA-Interim reanalysis data (top panel) and from WRF-Chem output from the domain with 15 km horizontal resolution (bottom panel).Up to three weather types are calculated for each day.

Figure 4 .
Figure 4. Conditional quantile plot of simulated and observed temperature ( • C).The model results are split into evenly spaced bins and compared to observations spatially and temporally matching the values in the model result bins.The red line denotes the median of each of these bins.Grey bars show the distribution of model results, blue outline bars the distribution of observations.The results are shown for the base run and sensitivity simulations S1_urb and S2_mos, each for all three model domains (d01 -15 km horizontal resolution, d02 -3 km, d03 -1 km).

Figure 6 .
Figure 6.Daily minimum, mean and maximum mixing layer height as observed in Lindenberg, diagnosed by WRF-Chem and calculated from modelled profiles of temperature, wind speed and humidity (base run, 1 km × 1 km horizontal resolution).

Figure 7 .
Figure 7. Daily mean observed and modelled wind speed from the base run, S1_urb and S2_mos, for all three model domains (d01 -15 km horizontal resolution, d02 -3 km, d03 -1 km).The figures show means over the daily means of three stations in Berlin (Tegel, Schönefeld and Tempelhof).The grey shades show the variability between the daily means of these stations, corresponding to the 25th and 75th percentiles of the individual stations' daily means.For the model results, the grid cells corresponding to the location of the stations were extracted.

Figure 8 .
Figure 8. Wind roses over observed and modelled values for JJA, including observations and model results for three stations in Berlin (Tegel, Schönefeld and Tempelhof) and from all three model domains (d01 -15 km horizontal resolution, d02 -3 km, d03 -1 km).The bars refer to the frequency of how often wind was coming from the respective direction and the colours indicate how often the wind speed was observed or modelled in the indicated interval.

Figure 9 .
Figure 9. (a) Station average (mean) precipitation sum of observations and model results (base run), (b) median number of days with precipitation observed or modelled.A day is counted if observed or modelled precipitation was more than 1 mm h −1 .Ranges indicate the variability between the different stations.Both panels (a) and (b) show averages over nine stations and the corresponding model grid cells in Berlin and its surroundings.Model results are given for all three model domains (d01 -15 km horizontal resolution, d02 -3 km, d03 -1 km).

Figure 10 .
Figure 10.JJA mean modelled (coloured fields) and observed (coloured circles) NO x concentration in Berlin and its surroundings from (a) the base run, (b) S1_urb, (c) S2_mos and (d) S3_emi.The left column shows results obtained with the 15 km horizontal resolution, the middle shows results from a 3 km horizontal resolution and the right column shows results from a 1 km horizontal resolution.

Figure 11 .
Figure 11.Mean diurnal cycles of NO, NO 2 , NO x and O 3 for all Berlin and Potsdam urban background stations as observed and modelled by the base run, S1_urb, S2_mos and S3_emi.Model results are given for all three model domains (d01 -15 km horizontal resolution, d02 -3 km, d03 -1 km).The diurnal cycle is averaged over six stations for NO, NO 2 and NO x and three stations of O 3 .The grey shaded areas represent the variability between the different stations' diurnal cycles, showing the 25th and 75th percentiles.

Figure 12 .
Figure 12.Daily mean PM 10 and PM 2.5 concentrations as observed and modelled (base run) at urban background stations in Berlin.Daily means are averaged over five stations for PM 10 and four stations for PM 2.5 .The grey shaded areas represent the variability between the different stations, showing 25th and 75th percentiles.Model results are given for all three model domains (d01 -15 km horizontal resolution, d02 -3 km, d03 -1 km).

Table 1 .
Physics and chemistry parameterisation.
The land use classes at the spatial resolution of 250 m are remapped to 33 USGS land use classes read by WRF, following suggestions ofPineda et al. ( In addition, the MODIS land use dataset as implemented in the WRF model from v3.6 only includes one category classifying urban areas.Therefore, we implemented the CORINE dataset (EEA, 2014) to replace the USGS dataset.The original CORINE dataset includes 50 land use classes.

Table 2 .
Urban parameters for Berlin for the three urban classes low intensity residential (31), high intensity residential (32) and commercial/industry/transport (33).

Table 3 .
Observational data in Berlin and Potsdam.If one class is given, it refers to the meteorology class if the network is Deutscher Wetterdienst (DWD), Global Climate Observing System Upper-Air Network (GRUAN) or TU, and to the chemistry class otherwise.The abbreviated name (Abbr.) is referred to in tables summarising statistics for the different stations.

Table 7 .
Statistics of daily NO x for JJA."Obs" refers to the JJA observed mean, "mod" refers to the JJA modelled mean for the respective grid cell.MB is the mean bias for JJA, NMB refers to the normalised mean bias and r is the correlation of hourly values.Obs, Mod and MB are given in µg m −3 and NMB is given in %.The statistics are shown for the results from the model domains of 15 km (d01), 3 km (d02) and 1 km (d03) horizontal resolution.

7
Code availability WRF-Chem is an open-source, publicly available community model.A new, improved version is released approximately twice a year.The WRF-Chem code is available at http:// www2.mmm.ucar.edu/wrf/users/download/get_source.html.The corresponding author will provide the modifications introduced and described in Sect. 2 upon request.