Interactive comment on “ Evaluation of high-resolution GRAMM / GRAL NO x simulations over the city of Zurich , Switzerland

The quality of the scientific content of the paper is very good. The authors describe the methodologies and data sets they use. The structure is good and makes the paper easy to read. Analysis is conducted with different statistical tools and the data is analyzed with respect to temporal and spatial properties with comprehensive measurement data sets. The authors also acknowledge that the city of Zürich is using their model for air pollution control, which adds value to the scientific contents.


Introduction
The urban population has grown steadily in the past century and already reached 50% globally and more than 75% in many developed countries. Urban areas with high population density are hot spots of air pollutant emissions, raising concerns regarding increased mortality and morbidity (Cohen et al., 2004;Jerrett et al., 2004;Beelen et al., 2013). Some of the most critical air pollutants 30 in terms of health effects are particulate matter (PM) and NO 2 , whose levels exceed national and WHO standards in many urban areas (e.g., in Europe; Beelen et al., 2014). In Switzerland, and more particularly in urban centres such as the Zürich area, despite improving trends, the urban population is still exposed to harmful levels of PM smaller than 10 µm (PM10) and NO 2 (Heldstab et al., 2011). Health effects of air pollution are well documented through numerous epidemiological studies 35 (Brunekreef and Holgate, 2002;Beelen et al., 2008;Raaschou-Nielsen et al., 2013), but these studies rely on coarse estimates of the average population exposure as it is very challenging to account for the steep gradients and large temporal variability of air pollutant concentrations in cities (Jerrett et al., 2004;Beelen et al., 2008Beelen et al., , 2013. Computing individualized pollution exposure in urban areas requires high-resolution simulations, with at least hourly resolution and spanning long periods of 40 time (years to decades) since health impacts can be triggered by both short-term exceedances of pollution thresholds or long-term continuous exposure to high pollution levels (Van Roosbroeck et al., 2006;Beelen et al., 2008;Lelieveld et al., 2013). Individualized exposure in not only useful for epidemiological studies, but also for air quality plans designed by cities to reduce the direct and indirect social and economic costs of air pollution (e.g., Lelieveld et al., 2013). Current air quality 45 plans are generally lacking a systematic cost-benefit assessment of different mitigation measures due to the lack of affordable model solutions that satisfy the demanding requirements in terms of resolution, temporal coverage, and source-specific information (Miranda et al., 2015).
In the present study, we focus on NO x , an air pollutant with particularly large spatial and temporal gradients due to its short lifetime (e.g., Vardoulakis et al., 2002). Representing the gradients in NO x 50 concentrations in cities is not yet achievable by standard chemistry-transport models, as they are limited to horizontal resolutions of typically a few kilometres (e.g., Terrenoire et al., 2015). Recent progress in computational fluid dynamics (CFD) models makes it possible to run high-resolution dispersion simulations at the city scale (Li et al., 2006;Kumar et al., 2009Kumar et al., , 2011Di Sabatino et al., 2013). However, the prohibitive computational cost of these simulations prevents their application in 55 the context of long-term urban exposure assessment (Parra et al., 2010). Currently, the most widely used models for urban exposure assessment and regulatory applications are models with a simplified parametrization of pollutant dispersion (e.g. Gaussian plume) such as ADMS (Stocker et al., 2012), AERMOD (Rood, 2014), SIRANE (Soulhac et al., 2011), IFDM (Lefebvre et al., 2011), or OSPM (Kakosimos et al., 2010). When correctly parametrized and calibrated, these models offer a reli-60 able representation of the average concentration distribution in cities (Soulhac et al., 2011;Briant et al., 2013;Brandt et al., 2013). However, they have difficulties in representing the dispersion in 2 complex building and street canyon configurations and to properly reproduce the temporal (hourly) variability due to varying meteorology (e.g., Soulhac et al., 2012;Ottosen et al., 2015). With the growing availability of urban air pollution observations due to recent advances in (low-cost) sensor 65 technology (Jiao et al., 2016;Gao et al., 2016), land-use regression models (LUR) are increasingly being used for air pollution assessment (Kumar et al., 2015b;Heimann et al., 2015), offering a performance comparable to CFD full physics models (Beelen et al., 2010). Yet, LUR models need a large amount of in-situ observations at strategic locations to represent the full spatial and temporal variability (Duvall et al., 2016;Mueller et al., 2015Mueller et al., , 2016Hasenfratz et al., 2015), and cannot be 70 extended backward in time to satisfy the needs of long-term epidemiological studies.
Considering the respective strengths and limitations of the standard urban air pollution modelling systems, Berchet et al. (2017) proposed a novel method taking advantage of high resolution accurate CFD modelling while keeping computational costs affordable, by using a catalogue-based approach merged with routinely available meteorological observations. They showed that it was computation-75 ally feasible to simulate hourly concentration maps over multiple years at building-resolving resolution which successfully capture most of the variability in NO x concentrations caused by variations in air flow and atmospheric stability. The main purpose of the present study is to provide a comprehensive evaluation of the above-mentioned method for NO x concentrations in Zürich, Switzerland, for the years 2013-2014. The modelling domain covers the entire urban area of Zurich and includes 80 8 continuous NO x monitoring sites as well as 65 NO 2 passive samplers. We demonstrate the high quality and robustness of the catalogue-based modelling system for hourly and daily concentrations.
Furthermore, we identify sources of errors and uncertainties in the modelling system and propose additional steps to improve the methodology.
In Sect. 2, the modelling chain applied to generate time series of pollution maps is described. In 85 Sect. 3, the set-up for the city of Zurich is presented including the available in-situ observations, the emission inventory, and auxiliary data sets. In Sect. 4, the performance of the model in terms of spatial distribution and temporal variability is evaluated with in-situ NO x and NO 2 measurements.
2 Approach and modelling system 2.1 Catalogue-based approach 90 Our approach relies on explicit physical simulations of air flow and pollutant dispersion. Such simulations on a city-wide domain must account for the cascade of scales influencing flow patterns, from the synoptic to the street and building scale. The synoptic scale defines the general meteorological conditions and the mean direction and strength of the large-scale flow in the city region. Land-use and topography restructure the synoptic weather at the regional scale by generating mesoscale phe-95 nomena such as thermally driven land-lake breezes and up-and down-slope circulations, urban heat islands, and channelling and blocking of the flow by the topography. Inside the city, these regional 3 CATALOGUE GRAMM Simulating the full transient evolution of the atmosphere over a multi-year period is not yet feasible at building resolving resolution (i.e. better than 10 m) for a whole city with current computing resources (e.g., Parra et al., 2010). Therefore, we approximate the full temporal dynamics by a se-110 quence of steady-state solutions selected from a pre-computed catalogue as described in Berchet et al. (2017). This catalogue is a discrete representation of all possible weather situations in terms of atmospheric stability and of large-scale wind speed and direction at the boundaries of the domain.
Binning large-scale wind directions and speeds into 36 (10 • each) and 7 (from 0.25 to 7 m·s −1 ) categories respectively, with seven possible Pasquill-Gifford classes for atmospheric stability as de-115 fined by the U.S. Environment Protection Agency (2000) leads to a catalogue of 1008 physically meaningful reference weather situations. As illustrated in Fig. 1, this catalogue is computed in a three-step procedure which subsequently generates the mesoscale winds computed with GRAMM and the corresponding urban-scale winds and air pollutant concentrations computed with GRAL.
Once the catalogue is available, a sequence of hourly weather situations is built based on in-situ 120 observations of wind speeds and directions in and around the city. For every hour of the simulated period, the weather situation in the catalogue is selected whose associated wind field best matches 4 the in-situ observations. As demonstrated in Berchet et al. (2017) where τ i is the unitless temporal profile of emissions for each sector i, c background (h) the background is that the full transient dynamics is replaced by a sequence of steady-state solutions, but as will be shown in this evaluation, this has only limited impact on the results.

GRAMM/GRAL modelling system
The catalogue-based approach relies on meteorological and on microscale flow and air pollutant GRAL is nested into GRAMM and is run here in diagnostic mode at 10 m resolution, which is different from Berchet et al. (2017) where GRAL was run in prognostic mode at 5 m resolution.
In diagnostic mode, the flow field around buildings is computed by interpolating GRAMM wind fields on a fine Cartesian grid, and assuming a logarithmic wind profile close to walls. Finally, mass 175 conservation is achieved by applying a Poisson equation to establish a pressure field to correct the velocities. In the prognostic mode, the flow is explicitly computed by forward integration of a set of prognostic equations. We chose here to use the diagnostic mode as the computation costs are lower, which allowed us to simulate a much larger domain covering the complete urban area of Zurich (see Sect. 3.1). We found only minor differences between the simulations with the two modes and 180 resolutions and thus discuss only the results for the diagnostic mode in the following. Lagrangian dispersion simulations are computed with virtual particles released from prescribed emission sources (Oettl and Hausberger, 2006;Oettl, 2014) and transported according to the pre-computed GRAL wind fields. Turbulent diffusion is represented by specific Langevin equations applicable for the full range of wind speeds, in particular for low-wind-speeds (Anfossi et al., 2006).

Evaluation approach
The European Commission expert panel FAIRMODE (Forum for AIR quality MODelling in Europe) has been tasked to define quality objectives and performance criteria for air quality models, following the Directive 2008/50/EC of the European Parliament (EC, 2001). These criteria have been described in Thunis et al. (2012) and Pernigotti et al. (2013). They are base on the following metrics:  hilly topography, highways are built through numerous tunnels, creating NO x emission hotspots at ventilation shafts and tunnel portals, which can optionally be treated in GRAL with a specific 225 algorithm described in Oettl et al. (2002), or simply as point sources at the tunnel gates.

General model inputs
As mentioned in Sect. 2, air flow computations require information on the topography, land-use types and buildings. Topographical information was taken from the ASTER GDEM2 data set (Advanced Spaceborne Thermal Emission and Reflection Radiometer -Global Digital Elevation Map

230
Version 2) at a resolution of 30 m and projected to the 100 m GRAMM grid and linearly interpolated to the 10 m GRAL grid. Information on land use (water bodies, forests, etc.) at a resolution of 100 m was taken from the CORINE Land Cover data set (version CLC2006) distributed by the European Environment Agency. The 44 CORINE land-use classes are translated into typical values for roughness length, heat capacity and thermal conductivity, albedo and soil moisture for GRAMM 235 computations. GRAL uses land-uses classes in terms of roughness length to account for surface drag caused by different types of vegetation, whereas the drag imposed by buildings is represented explicitly. The CORINE data set is projected similarly to ASTER to the GRAMM and GRAL grids.
Three-dimensional building information inside the city of Zurich was deduced from a vectorial building inventory provided by the municipality of Zurich. Buildings outside of the city are taken from

Emission data
Emission data are deduced from two very detailed inventories produced by the municipal (Umwelt- AWEL emissions are disaggregated to the GRAL grid using the building mask to attribute heating and industrial emissions to building roofs and other emissions to the space between buildings. As GRAL can account for the rise of hot plumes in ambient air by applying a slightly modified To limit the computational demand, we merged the original categories into a total of 25 groupcategories by adding up emissions with a similar temporal profile. For instance, we expect motorbike emissions to vary similarly to car emissions. Emission variability in Eq. 1 is determined based 280 on both pre-defined profiles and measured proxies. For all computed emission categories, we apply typical diurnal, weekly and seasonal cycles as used in the TNO-MACC emission inventory for Europe (Kuenen et al., 2011), with the exception of light duty traffic and heating emissions. 85 traffic counts are operated by the municipality in the city of Zurich. We use the hourly ratios of the total number of vehicles (summed over all sites) to the annual average hourly total. Heating emissions 285 follow a diurnal cycle as prescribed in the TNO-MACC emission inventory, but the seasonal cycle of such emissions is determined using so-called "heating-degree days" accounting for the outdoor temperatures as measured at different locations in the city. Heating degree days are computed at the daily scale using Eq. 2: with T ref = 20 • C and T min = 16 • C and T(t) the daily average outdoor temperature in the city at time t.
Heating emissions are scaled proportionally to the heating degree days parameter. As the total number of heating degree days varies from one year to another, depending on the meteorology, the scaling factor for heating emissions is chosen to keep consistent heating degree days and emissions 295 for the year 2010 for which the inventory was designed. Temperature data used to scale emissions from heating systems is gathered at the same locations 335 as the wind data. The average outdoor temperature for the entire city is calculated as the mean of available observations. Traffic counts are located all around the city. They count vehicles indifferent of their type on a 15-minute-basis direction-wise for all lanes of selected streets. We use hourly totals in the city to scale traffic emissions uniformly.

340
After generating the catalogue of wind and concentration fields, hourly time series of concentration maps have been generated for the years 2013 and 2014. In the following, these model outputs are evaluated against observations and an analysis of uncertainties of the model system is presented.  demonstrates that meteorological variability is a key factor driving the variability in concentrations and that this variability is very well captured by the catalogue-based modelling approach. Exceptions are the sites ROS, SCM and BAL for which the correlations are only 0.05 to 0.2 better than the emission-observation correlations. At these traffic sites the variability appears to be dominated by traffic intensity rather than by meteorology.

390
In the following, we compare observations to the "minimum" simulations due to their significantly better performance, and will further discuss the implications of this choice in Sect. 5.

Evaluation of the spatial distribution
The average distribution of simulated NO x concentrations is shown in Fig. 2 To evaluate the quality of this average distribution, we use eight continuous monitoring sites and 65 NO 2 passive samplers distributed rather uniformly over the city and covering the full range of pollution levels. Figure 3 and Tab. 2 compare average observations with simulations. As shown by 400 the quantile-quantile plot and the biases, there is no specific dependency of the mismatches on the concentrations. The fractional bias remains roughly the same (well below 50%) at all sites over the whole range of observed concentrations. As for the biases, it is considered that an air pollution model is performing well when the NMB is below 50% (e.g., Kumar et al., 2006). The relative bias only seems to increase at all sites at the lower and upper end of the concentration range, suggesting higher 405 uncertainties for very high and very low concentrations.
At passive sampler sites the comparison is complicated by the fact that the modelling system simulates NO x whereas passive samplers measure NO 2 . The ratio between NO 2 and NO x is often parametrized by a non-linear "Bachlin" function depending solely on the concentration of NO x (e.g., Düring et al., 2011). The function accounts for the fact that the ratio tends to increase with 410 increasing distance from the source and hence with decreasing NO x concentration. The ratios between biweekly averaged NO 2 and NO x concentrations as measured at the continuous sites (green dots in Fig. 3

Evaluation of the temporal variability
Good scores for the average spatial distribution of air pollutant concentrations have already been demonstrated for other modelling systems (e.g., Soulhac et al., 2011;Di Sabatino et al., 2007).
However, accurately reproducing not only average concentrations but also the temporal evolution from hourly to seasonal time scales is a much more challenging objective that has received little 430 attention so far. This section therefore focusses on evaluating the simulated temporal variability.

Example period
In Fig. 4, observations are compared to simulations for a selected period of time, October 2013. The period has been selected as the only time, when all sites were in operation, but also because the concentrations represented the average patterns of concentration variability rather than some spe-  the different concentration patterns during weekends (shaded periods in Fig. 4) are well detected.

440
Whereas background concentrations are the same at all sites and never exceed 50 µg·m −3 , the magnitude of local contributions varies significantly from one site to the other, very consistently with observations. FIL and HBR show almost no local contribution whereas street sites such as ROS and SCM are largely dominated by local traffic emissions. However, at the three most polluted sites, ROS, SCM and STA, the simulations deviate significantly from the observations for some periods.

445
As discussed earlier this is likely related to the large local gradients at these sites of 30  At the site HBR, 200 m above the altitude of the city centre, the concentrations are dominated by the background and are in most cases well reproduced. However, in some very specific situations, the modelling system appears to miss some transport of pollution from the valley to higher altitudes.
For instance, a pollution event was observed at all sites on Oct. 4 th in Fig. 4. It was well reproduced 465 by the model at all sites with the exception of HBR, where the model only marginally deviated from the background, suggesting underestimated vertical transport from the city centre, or a mismatch in simulated wind directions at higher altitudes.

Complete two-year period
The entire period of simulation, covering the years 2013 and 2014, is presented in Fig. 5  average; see Tab. 2). At all sites, FAC2 and r scores are 10-20% higher at the daily scale than at the hourly scale, indicating that the synoptic variability is better captured than the diurnal cycle. Events of low concentrations are generally correlated with higher wind speeds whereas concentration peaks are associated with particularly stable situations (stability classes E to G).
Although heating systems dominantly emit during winter time, they contribute only marginally

Diurnal cycle of concentrations
Beyond the seasonal and daily variability, the diurnal cycle of concentrations plays a key role in assessing the exposure of the population to air pollution as people commute and spend their day at different locations within the city. The mean diurnal cycles of observations and simulations are 525 presented in Fig. 7. Here, only weekdays are discussed as the diurnal cycle of emissions is more pronounced than during weekends. Consistent with the observations, the simulated concentrations are higher at daytime than during night, with a morning peak at all sites and a late afternoon peak at some sites. The morning rush hour leads to a stronger peak as the atmosphere is usually more stably stratified 530 at this time of the day than during the evening rush hour. At most sites, both in the model and the observations, the peak occurs at 7 a.m. UTC, which corresponds to 8 a.m local time in winter and 9 a.m. local time in summer. At HBR, the simulated and observed peaks happen later than at other sites. However, the observed peak is delayed by about one hour more than the simulated one.
As mentioned in Sect. 4.3.1, HBR is an elevated site with no significant emissions nearby. Thus, 535 pollution emitted in the morning in the city appears to be transported to the site with a delay of 2-3 hours, while in the model, the steady-state assumption makes pollutants to be transported virtually faster.
After the morning peak, the observed concentrations follow three possible paths: a steady decrease until the night time minimum (ROS, SCM, HBR), an afternoon plateau with a small peak around The uniform scaling of traffic emissions is a strong simplification that likely contributes to the dis-545 crepancies between simulated and observed diurnal profiles at some sites: a closer analysis of the data from the 89 traffic counters, for example, showed that the traffic intensity remains constantly high at daytime in the city center whereas there are clear morning and evening peaks in the outer districts. In addition, some streets are more intensively used by incoming traffic (strong morning peak), others by outgoing traffic (strong evening peak). The late evening peak at 8-9 p.m. UTC. could be 550 explained by late emissions such as a surge in domestic heating before the night not accounted for in our system. A second explanation for the absence of a late-evening peak in the simulations could be twofold: first, NO x is transported in the model as a passive tracer whereas in reality it is depleted by reaction with OH radicals. OH concentrations are highest when NO x is relatively low and solar radiation large (Ren et al., 2003), i.e., on sunny summer afternoons. The NO x lifetime is then reduced 555 to about 2-4 hours (e.g., Liu et al., 2016), and not accounting for this depletion will contribute to a positive model bias in the afternoon as seen especially at sites in the lower range of the concentration levels. Second, the observed late evening peak occurs well after the evening rush hour, suggesting that it is an integrated response due to accumulation of NO x over several hours. Such effects are not represented by our steady-state approach where the concentrations are solely determined by the 560 emissions of the actual hour.
During the night, observed concentrations at the sites BAL, DUE, ZUE, FIL and STA converge to a similar level, which is well reproduced by the model at BAL, DUE, and STA, but underestimated at ZUE and especially at FIL. A systematic underestimation at night is also observed at ROS, especially in the early morning hours. The sites ROS and FIL are located next to important traffic corridors 565 which might be used during the night more heavily than other roads including heavy duty traffic.
Heavy duty traffic is not allowed in Switzerland at night between 22.00 and 05.00 local time (20.00 -03.00 UTC in summer) but uses the early morning before the main rush hour intensively. Therefore, heavy duty traffic emissions close to these sites might be underestimated in our system. A second explanation can also be the missing accumulation of pollutants at night similar to the late evening 570 peak. Under stable nocturnal situations, pollutants are slowly dispersed and remain longer in the domain of simulation than during the day. Accounting for air pollution accumulation over more than one hour would help increase the nocturnal low concentrations.

Discussion and conclusions
A catalogue-based approach computed with the nested simulation system GRAMM/GRAL was ap- .

590
Recent progress in parametrized approaches allows standard urban air pollution models to reach performances at the yearly or even monthly scale (e.g., ADMS-urban system; Dedele and Miskinyte, 2015) comparable to our approach. The model accuracy at diurnal to daily time scales, however, have hardly ever been analysed, which makes it very difficult to place our results in context but also demonstrates the uniqueness of the simulations presented in this work. At shorter temporal scales, 595 our modelling system is still out-performed by very high resolution CFD models (Kumar et al., 2015a), but these systems are limited to small domains and short periods of time. Less complex systems such as SIRANE (Soulhac et al., 2012), solving the high-resolution flow only in street canyons and approximating the dispersion above the urban canopy as Gaussian plumes, perform similarly to our model, albeit at higher computational costs, limiting their application to periods of typically 600 a few weeks only. Despite the usage of an extremely detailed emission inventory, our simulations are still significantly limited by the representation of emissions, since only standard temporal profiles were applied to most sources, which are unable to capture the large temporal dynamics of real emissions. Real-time emission models accounting for the influence of actual activities such as traffic density and energy consumption or environmental factors such as outdoor temperature affecting not 605 only heating (whose intensity is already modulated by temperature in our model through heatingdegree days) but also cold-start traffic emissions (as suggested by the Handbook of Emission Factors for Road Transport v3.3; Keller et al., 2017) could further advance the representation of emissions variability in the future. Such emission models and complementary influence on emission variability could be informed for instance by mobile phone data and sensor networks. Such improved inputs 610 proved to significantly increase the performance in other models (e.g., Soulhac et al., 2012;Borrego et al., 2016). The main gain of the catalogue-based method is, above all, the reduced computational cost allowing for high-resolution simulations of long time periods with a time resolution down to one hour. Further developments are required to improve this approach by replacing fixed emission patterns by transient ones, obtained for instance with suitable traffic models.

615
A general overestimation of concentrations was found at all sites in our model, mostly related to insufficient dispersion in the model as well as to unrealistic accumulation of pollutants near buildings façades, which have a strong impact on simulations with the chosen 10 m horizontal resolution. The apparently too low dispersion may be related to the fact that traffic-induced turbulence is only crudely represented. Some limitations of the catalogue-based method were revealed, which are attributable 620 to the steady-state assumption and the limited model domain. Particles that have been transported in the city for more than an hour are assigned to the same hour they were released in the current version of the system. Future versions should account for the particle transport age, which can be made accessible in the GRAL model outputs. This would likely smooth out some of the unrealistic short peaks produced in our simulations. A long residence time of particles in the simulation domain 625 can also have implications in terms of chemistry. NO x depletion due to oxidation by OH radicals or night time N 2 O 5 chemistry was neglected in our system, as the typical lifetime of NO x in the atmosphere is never shorter than a few hours (during sunny summer afternoon; Liu et al., 2016).
Accounting for long residence times in the simulation domain may allow us to compute simplified chemical reactions within the frame of the catalogue-based approach.

630
As demonstrated in this study, our model system produces a very realistic representation of the spatial distribution and temporal variability of NO x in the city, which makes it a highly suitable tool for policy makers. The city of Zurich is indeed implementing the system as a new tool for improved air pollution control and urban planning. So far, our simulations have been generated without any input from actual air pollution measurements. Incorporating such observations through 635 data assimilation and machine learning methods could further enhance the quality of the model predictions and even better satisfy the requirements of epidemiological studies, which need to be based on accurate, unbiased data. The selection of weather situations in the catalogue could also benefit from assimilating concentration observations in the system, instead of wind and radiation observations only. Additional meteorological observations not directly related to the definition of 640 the catalogue, such as turbulence fluxes, temperature gradients or boundary layer height, are also increasingly available and might improve the selection of weather situations in the catalogue as well.
A fully integrated high-resolution modelling system would enable short-and long-term pollutant and greenhouse gas monitoring in cities for subsequent use in the development of mitigation strategies.
Finally, NO x concentrations are relevant for regulation purposes only as a proxy of NO 2 concen-645 trations. Oettl and Uhrner (2011) introduced a chemical module to GRAL, but the transient-based approach of this module with explicit computation of O 3 3-dimensional fields prevents benefiting from the reduced computation costs of the catalogue-based approach. An intermediate system, accounting for the average distribution of particle age and rough estimates of O 3 concentrations, should be tested in future to reproduce NO 2 concentrations at fine scales in cities.

650
Code and data availability The system GRAMM/GRAL is made available by the Technische Universität Graz on the following webpage: lampx.tugraz.at/ gral/index.php. The catalogue-based method is fully described in Berchet et al. (2017) and related Python scripts can be requested to the corresponding author.