Simulating atmospheric tracer concentrations for spatially distributed receptors: updates to the Stochastic Time-Inverted Lagrangian Transport model’s R interface (STILT-R version 2)

. The Stochastic Time-Inverted Lagrangian Transport (STILT) model is comprised of a compiled Fortran executable that carries out advection and dispersion calculations as well as a higher level code layer for simulation control and user interaction, written in the open source data analysis language R. We introduce modiﬁcations to the STILT-R codebase with the aim to improve the model’s applicability to ﬁne-scale ( < 1km ) trace gas measurement studies. The changes facilitate placement of spatially distributed receptors and provide high level methods for single and multi-node parallelism. We present a kernel 5 density estimator to calculate inﬂuence footprints and demonstrate improvements over prior methods. Vertical dilution in the hyper near-ﬁeld is calculated using the Lagrangian decorrelation timescale and vertical turbulence to approximate the effective mixing depth. This framework provides a central source repository to reduce code fragmentation between STILT user groups as well as a systematic, well documented workﬂow for users. We apply the modiﬁed STILT-R to light-rail measurements in Salt Lake City, Utah, United States and discuss how results from our analyses can inform future ﬁne-scale measurement approaches 10 and modeling efforts.


Anonymous Referee #1
Fasoli et al report on developments made for the Stochastic Time-Inverted Lagrangian Transport Model (STILT).They added high-level functions to make simulations using the R language.They added code to make parallelized simulations.They introduce a new method to deal with near-field emissions which have not yet been homogeneously mixed within the boundary layer.Fasoli et al. further introduce a smoothing technique to estimate plume surface response functions that is compared to the existing approach.Finally, they show how their developments perform in an experiment in which CO2 measurements have been taken aboard a light-rail in the Salt Lake City metropolitan area.
In general I find the manuscript well written, developments and results are presented in a concise and understandable manner.The manuscript fits the scope and contains enough scientific content to warrant publication in GMD.I have a number of comments that I would like to see addressed before publication, hence I end up with "minor revisions".

General comments:
1) There is no mentioning of how this work and code repository relates to the original code repository and work at http://stilt-model.org / BGC Jena.None of the co-authors are from Jena or other developers of STILT.I can only assume that this development has been made in accordance and in agreement with the rest of the STILT developers, and that there are no licensing issues.Should be checked and stated explicitly.
This work is intended to serve as the future replacement for the current "stiltR" wrapper, distributed from the BGC Jena SVN repository.We have been working with the BGC Jena team and the fortran source code remains hosted at BGC Jena.Migrating this wrapper code to GitHub has already enabled significant collaborative development between groups and has led to implementing features beyond those described in this paper.
2) Model performance is assessed in a very qualitative manner ("looks better") and sometimes overly positive.I suggest authors consider more quantitative assessments, and take a step back before claiming (see below) e.g., that the model represents enhancements in (individual) roadways and intersections.
Thank you for the suggestion.We have made an effort to improve the assessments of the results to quantify the differences between methods.Please see the specific comments regarding changes to the manuscript.
3) While developing a new method to deal with incompletely mixed sources close to the receptor, the fall back to limiting mixing to the crude 0.5*PBLH formulation.There is no physical basis for that, and I urge the authors to reconsider this artificial limitation.More on this below.
See comment relating to p4 l7ff below .

Specific comments:
p2 l31ff It is unclear to me why 1-10km and 0.1-1 hr spatial and time scales should qualify as "hyper" near-field.Unless you show that "near-field" is a common term that refers to larger spatial or temporal scales I suggest removing the "hyper", as is is hyperbole.
We thank the reviewer for their comment as because it is important to clarify the naming conventions of domain length scales for readers.We chose the term "hyper near-field" as an extension of the definition of "near-field" in the foundational work of Lin et al 2003, which was "a domain extending over 10 2 -10 3 km".To clarify this definition in the text, we have added the following statement: Previous work has defined the near-field domain as extending over 10 2 -10 3 km (Lin et al., 2003).

p3 l26ff Can this be used with other queue managers apart from SLURM?
As of writing, SLURM is the only cluster job scheduler that has been implemented.SLURM is open source and utilized heavily by the high performance computing (HPC) systems at the University of Utah.Due to limited availability of HPC clusters, SLURM is the only job scheduler that has been validated.However, modifications to the project scaffolding described in this manuscript that facilitate parallel computation within single-node and SLURM-scheduled environments opens the doors to other queue managers as well.We encourage future collaboration with users who have access to these job schedulers and would be willing assist with testing development code on their systems.To clarify this in the text, we have added the following statement: While SLURM is the only cluster job scheduler that has been implemented to date, the open source code can be modified to run on systems managed by other job schedulers including TORQUE/OpenPBS, Sun Grid Engine, OpenLava, Load Sharing Facility, or Docker Swarm using methods described by Lang et al. (2017).p4 l3ff Again, "hyper near-field" sounds very hyperbole and I suggest renaming it -you are talking about the region in which the well-mixed criterion does not hold.
See comment relating to p2 l31ff above.p4 l7ff Mixing to h = 0.5 * PBLH (p4 l7) is a crude assumption with no physical meaning -if you would wait long enough you would have mixing into 1.0 * PBLH (ignoring en-/detrainment) at the top of the BL.
While we agree that the h = 0.5 PBLH mixing height serves as a crude assumption when used for vertically diluting surface fluxes, it has been extensively validated for the traditional "near-field" domain.This assumption was first introduced by Gerbig et al., 2003 ( https://doi.org/10.1029/2003JD003770, Section 3.2.Depth of "Surface Layer") who performed a sensitivity study and found that "no significant change in the modeled vegetation signal was found" by varying the fraction of the PBL height considered between 0.1 and 1.0.
The goal of this manuscript is to simplify the model workflow and improve STILT's relevance to the HNF domain.Further, the more complex formulation based on turbulence theory is implemented as an optional feature which can be disabled by the user to replicate past simulations while taking advantage of the other improvements described in this manuscript.With this in mind, we retained the traditional h = 0.5 PBLH for the "near-field" domain for consistency with previous work.
The following text has been added: While the assumption that surface fluxes can instantaneously mix to h * has been validated within the traditional near-field domain (Gerbig et al., 2003), this method underrepresents the influence of HNF fluxes on the tracer mole fraction arriving at the receptor.
p4 l15ff You then derive a more complex formulation based on turbulence theory, which you propose to be better.However in the end you use h = min(h', h*), with h* the crude approximation (see above), which effectively stops dilution at 0.5 * PBLH.This seems wrongwhy not dilute up to whatever your new formula gives you, maybe cap at 1.0 * PBLH?There is no reason in reality why emissions from the ground should not be mixed further up than half the PBLH.
As you say, the formulation of h = min(h', h*) results in the use of the turbulence theory estimation until the traditional 0.5 PBLH is met.The model timestep at which h' is approximately equal to h* is the transition between the "hyper near-field" and "near-field" domains.Rather than a rigid definition of the "hyper near-field" length scale, setting the dilution depth to min(h', h*) allows the "hyper near-field" spatial domain to adapt to meteorological conditions while enabling a smooth transition between the two methods of vertically diluting surface fluxes.
To clarify this point in the manuscript, the following text has been added: The timestep at which h' ≈ h* signifies the transition between the HNF and traditional near-field domains.The formulation of h' grows the particle-specific dilution depth relative to local turbulence and enables the extent of the HNF domain to vary depending on receptor location, meteorological conditions, and local topography..For surface based applications, h' grows to h* over 0.1 -1 hr, affecting a spatial domain of 1 -10 km adjacent to the receptor.
p4 l21ff: this spatial domain should be variable and strongly dependent on receptor location, topography and meteorology -this should be emphasized.In general, sensitivity studies on how this new formulation performs are required.
We agree that it the spatial domain of the HNF is highly variable and have updated the text to clarify.See comment relating to p4 l15ff above .p4 l21ff: it should be mentioned how (whether?)this method will work with intermittent turbulence and nighttime (stable) conditions (see p7 l22 where you exclude nighttime values for such reasons).
We agree that we should emphasize that nighttime turbulence is an unsolved problem that is beyond the scope of this manuscript.
The following text was added to ~p9L10: We have defined the HNF using the effective vertical mixing depth of surface fluxes arriving at a receptor and shown this formulation to improve model agreement with observations.However, calculating the effective vertical mixing depth using turbulence variables σ w and T L does not extend well to stable nighttime conditions which remain a difficult problem (Holtslag et al., 2013) and a subject of future work.
Thanks for catching that mistake.
p5 l8ff: explain better: which particles go into the sigma calculations?
We have modified the equation notation and text to show that the sigma calculations are derived from the positions of all particles in the ensemble at each model timestep.
To clarify this in the text, we have modified the text following equation 3 to include: where σ x 2 and σ y 2 are Euclidean variances in the x and y positions of all particles in the ensemble at time t.
p5 l11: it would be helpful for the reader to see how b enters the two-dimensional Gaussian you are using for density estimation.
While we agree that is important to understand how the kernel bandwidth relates to the smoothing applied, visualizations and discussions regarding bandwidth selection and how it relates to bias-variance optimization can be found in many general descriptions of kernel density estimation.
To help the reader understand what effect the bandwidth has on the model output without requiring an additional visualization, we have added the following text to ~p5L17: As model time or total ensemble dispersion increase, the kernel bandwidths increase the amount of smoothing applied to each particle.p5 l22ff: "improved" is based purely on visual aesthetics ("looks more similar!")-a quantitative measure would be very beneficial here.
Agreed.We have performed additional analysis and added a quantitative measure for the difference between the calculation methods and the ideal case.
The following text has been added to ~p6L1: The effects of varying the smoothing parameter (f = 1,2) are shown and errors are quantified using the difference of calculated grid cells from the idealized brute force case.
For the typical case (N = 200 and f = 1), the kernel density estimator shows improved agreement with the brute force method (rmse = 5.60 * 10 -4 ppm (umol -1 m 2 s)) compared to the traditional dynamic grid coarsening (rmse = 5.79 * 10 -4 ppm (umol -1 m 2 s)), preserving a Gaussian plume adjacent to the receptor, a clustered area of high influence, and capturing split flow upstream.When the kernel bandwidths are doubled by increasing the smoothing parameter (f = 2), the footprint field becomes over-smoothed and becomes less similar with the brute force case (rmse = 5.66 * 10 -4 ppm (umol -1 m 2 s)).In the extreme case using atypically few particles (N = 10), the dynamic grid coarsening method produces a footprint field dominated by noise from individual particles (rmse = 6.12 * 10 -4 ppm (umol -1 m 2 s)).The kernel density estimator (f = 1) improves results but shows fragmentation further from the receptor (rmse = 5.75 * 10 -4 ppm (umol -1 m 2 s)).In this case, the kernel density estimator smoothing parameter enables users to manually widen the plume reproduced in the footprint.Doubling the smoothing parameter (f = 2) improves similarity with the smaller particle ensemble (rmse = 5.70 * 10 -4 ppm (umol -1 m 2 s)) and demonstrates how users can modify the kernel bandwidths to adapt the model to unique cases.While tracer mole fraction differences between the two footprint calculation methods vary depending upon the locations of footprint differences relative to sources, tracer mole fractions calculated using the kernel density estimator are more similar to the idealized brute force case.p5 l27ff: "compensating" it might be, but only in the case here -just concede what f is: a fudge factor without physical mean.
We have changed the language to highlight the reviewer's comment at ~p6L12: In this case, the kernel density estimator smoothing parameter enables users to manually widen the plume reproduced in the footprint.Doubling the smoothing parameter (f = 2) improves similarity with the smaller particle ensemble (rmse = 5.70 * 10 -4 ppm (umol -1 m 2 s)) and demonstrates how users can modify the kernel bandwidths to adapt the model to unique cases.
p7 l9: "dilution correction" refers to the fudge factor f being set to 2? Explain!
We agree that using the terms "dilution correction" and "vertical mixing depth correction" interchangeably was confusing for readers.We have changed the text to "HNF vertical mixing depth" to be consistent with terms used in the methods description (Section 2.3).
The following text has been added to ~p7L14: We compute footprint fields using the legacy dynamic grid coarsening (LEG) algorithm as well as gaussian kernel density estimation with the HNF vertical mixing depth correction (GWD) and without the HNF vertical mixing depth correction (GND) to illustrate the differences between methods.
p7 l20ff: You are doing the right thing by ignoring nighttime values, but I suggest you still include the nighttime data in the plots to elucidate the magnitude of this problem -this is something that all model approaches have in common and it helps to remind people that comparing nighttime values is difficult and care needs to be taken.
While we agree that it is useful to compare nighttime modeled values between manuscripts, the light-rail measurement platform only operates during specific hours of the day.We have clarified this in the text.
The following text has been added to ~p6L14: The light-rail train typically operates between the hours of 05:00-23:00 Local Daylight Time (LDT) and only these hours were used in analyses.
p8 l7: there is no appreciable "evening rush hour" peak in this figure.Neither does the model "capture" it, as it is too low throughout the day compared to observations.Remove.
We agree that discussing an "evening rush hour peak" may be inaccurate in this context.The late afternoon increase in modeled CO 2 is the result of increased emissions from both anthropogenic inventories as well as meteorological factors.We have removed the text as the reviewer suggested.
p8 l1 and Figure 7  We agree that it is important to not inflate the improvements and the language describing spatial resolution needs to be more clearly defined.
The following text was added to ~p7L18: This grid resolution was chosen to pair analyses with the 0.002 o Hestia inventory and because 0.002 o corresponds roughly with the size of a Salt Lake City block.
The following text was added to ~p8L21: The model generally produced mole fraction enhancements ( Δ CO 2 ) for grid cells containing or downwind from major roadways (Fig. 7).However, modeling intersection scale enhancements would require finer grid spacing capable of resolving sub-city-block spatial scales that is not yet feasible given current constraints on inventories, meteorological data, and computing resources.
The following text has been added to the Fig. 7 caption: The model captures the overall urban-suburban-rural CO 2 mole fraction gradient (top) as well as localized enhancements near grid cells containing large emitters such as busy roads (bottom).
p8 l18ff: Careful to make sure that you are not mistaking increasing resolution with the "hyper" near-field approach described earlier -this last section just shows that higher spatial resolution can be beneficial.Might want to rephrase "fine-scale approach".
We agree that readers may confuse the language with the hyper near-field definition.The text "fine-scale" has been changed to "0.002 o grid resolution".
Figure 6: axis labels missing, should appear at least once for x and y We have added the axis labels (Longitude and Latitude) to Figure 6 as recommended. Figure

Response to Reviewer 2
We thank the reviewers for their constructive feedback on our manuscript ( https://www.geosci-model-dev-discuss.net/gmd-2018-20/ ).The reviewers' comments are shown below in italics with our responses directly following.

Anonymous Referee #2
This work documents the workflow of STILT simulations and presents improved physical processes for fine-scale simulations.I appreciate the authors' efforts in addressing overdue problems for the community, in particular those who use STILT extensively.I hope that the authors continue updating their work through GitHub.
I can easily follow the method and think the paper is relatively well written given the conciseness in length.
Thanks for the positive comments.We hope that readers will agree.
I have some questions/concerns in the evaluation of the improved method.In current form, the authors do not characterize the errors, in particular in surface emissions.So it is hard to evaluate the results.The model evaluation is a key result in this study, and the authors need to describe how much they know (or pre-scribed) the errors in surface emissions (and others if prescribed) so that we can be sure that the better results from GWD are due to the improved schemes.
We have added a discussion regarding difficulties in estimating uncertainties in emissions inventories.Please see below for details.

Detailed Comments:
L13 -21: STITL-R should be applicable to other tracer gases, not only CO2.The authors describe CO2 only, which seem to be strange.This is probably because the authors show an evaluation study using CO2, but this CO2 focus is limited.
STILT's applications certainly exceed only simulating atmospheric CO2.We attempt to describe the use of LPDMs and the STILT model (~p2L9, ~p2L20) using generalized language such as "atmospheric mole fractions", "pollutant concentrations", and model applicability to "observed emissions" and "surface fluxes".We use urban CO 2 as the primary motivation for several reasons: urban CO 2 cycling is the focus of a large and growing body of scientific literature that this model update will play a prominent role in, it allows for the use of novel CO 2 surface flux inventories purpose-built for the study region (the Hestia model), and it applies well to the case study using the unique data available from the light-rail measurement system.
P2, L28 -29: Need to mention more recent work on city-scale or regional inversion work based on multiple receptors that uses STILT extensively.Literature review here does not represent a full range of the use of the traditional STILT, which I believe is import to for the reader to understand the context, and motivation for the new development.
We P3, L6: Need to include the reference for R properly.Not doing so is irresponsible because without R this work is not possible.
Thank you for the suggestion.We have added the citation for the R software at ~p3L7.
P3, L20: For large-scale simulations, the users have applied other types of parallelizations in running STILT, e.g., running multiple jobs (each job may represent one receptor for a give period) at the same time taking advantage of high performance computing.The authors need to briefly mention what the difference between the old method and the one introduced here would be although the method described here seems to be similar to what users have been using.Is there a new concept here?
We recognize that we did not adequately describe past efforts to run parallel simulations.While the concept of executing batches of receptors across multiple jobs is not new, users have previously had to write and run separate scripts defining the receptors and relevant data inputs for each job which can require significant manual labor or develop their own methods for batch processing receptors.The manuscript formalizes methods for automatically executing the parallel batches of receptors, with receptor batches distributed between the parallel jobs and managed by the code itself rather than the user.The workflow presented, controlled with run_stilt.rand with output saved to simulation ID directories, remains the same for serial and parallel execution with only changing the setting for the number of parallel processes.
To clarify this point, the following text has been added to ~p3L28 : However, past methods for parallelizing simulations require users to manually define batches of receptors and relevant meteorological inputs in unique initialization scripts and submitting each script as a separate job to the scheduler.While increasing the number of parallel threads decreases the size of each simulation batch, the requirements of the user become more complex.
We formalize methods for automatically distributing batches of receptors across many parallel threads managed by the model rather than the user.
P3, L27: Not all systems use SLURM although it is popular.Is there an option for a different job scheduling tool?
As of writing, SLURM is the only cluster job scheduler that has been implemented.SLURM is open source and utilized heavily by the high performance computing (HPC) systems at the University of Utah.Due to limited availability of HPC clusters, SLURM is the only job scheduler that has been validated.However, modifications to the project scaffolding described in this manuscript that facilitate parallel computation within single-node and SLURM-scheduled environments opens the doors to other queue managers as well.We encourage future collaboration with users who have access to these job schedulers and would be willing assist with testing development code on their systems.To clarify this in the text, we have added the following statement: While SLURM is the only cluster job scheduler that has been implemented to date, the open source code can be modified to run on systems managed by other job schedulers including TORQUE/OpenPBS, Sun Grid Engine, OpenLava, Load Sharing Facility, or Docker Swarm using methods described by Lang et al. (2017).

P4. L4 -22:
In many cases, PBL heights from meteorological models (e.g., WRF) are directly used to represent z_pbl.The authors need to clarify this and describe more on the use of WRF PBL related to equations ( 1) and ( 2).For HNF simulations, WRF needs to be run at a similarly fine sale, which is really expensive?If not, what would be the impact on h = min(h',hˆ*)?
The formulation for the HNF vertical mixing depth adjustment h' is intended to fix systematically low footprints without needing to explicitly resolve z pbl at HNF resolutions.It provides an estimate for the effective mixing depth based on homogeneous turbulence theory without requiring meteorological inputs (e.g.WRF) to be at a scale that explicitly defines the fine variations in PBL height within a city.However, the meteorological data are used outside of the HNF domain to calculate h* using a modified Richardson number method that has been extensively validated for the traditional "near-field" domain.
P5, L1-2: Reading this, my immediate thought was if this would require more simulation time to estimate the weighted influence.It would be nice to mention the cost.
Agreed.Calculating the footprint field using smoothing methods involves a cost tradeoff with a larger particle ensemble.While it is almost always less expensive to apply smoothing methods compared to calculating particle trajectories, quantifying the advantage is difficult.The cost to calculate particle trajectories varies depending on model configuration, meteorological data source, the size of the meteorological domain, and the size of the ensemble while the cost to apply smoothing depends on the method and the spatial and temporal domain of the output footprint.
To clarify this point, the following text has been added to ~p5L1: Computing trajectories of large particle ensembles (N > 10 4 ) is computationally expensive.To lessen the cost of each simulation, footprint fields are often calculated from smaller particle ensembles by applying smoothing methods to compensate for the smaller ensemble size.These smoothing methods are less computationally expensive than calculating trajectories for a larger ensemble but vary in their ability to reproduce the robust footprint field of the large particle ensemble.
P6, L32: Should not include a paper in preparation.
Agreed.We removed the citation since the manuscript is still in preparation.
P7, L5: 24-h backward in time seems to be too short.How was the upstream boundary condition treated?I see a short description from L17. Boundary conditions are complex due to wind directions.Is the wind consistent from one direction?I would like to see a more description on this.
We find that particles exist within the footprint domain for 11 hours on average.The meteorological domain encompasses a larger area than the footprint domain and fluxes from outside of the footprint domain are assumed to be resolved by the background atmospheric signal described at p8L6.
To clarify this point, the following text has been added to ~p7l24: Urban development and expansion in the area surrounding SLC is limited by the mountainous topography surrounding the city and the Great Salt Lake which restrict the expansion of the city and suburbs.This confines large anthropogenic and biologic sources into a relatively small area surrounding the SLV and simplifies boundary conditions for SLV-centric modeling efforts.From each receptor, 24 h backward trajectories of 200 particle ensembles were calculated using meteorological fields from the HRRR model, available at an hourly interval with a 3 km grid resolution.On average, particles travel within the model domain for 11 h.Computation of the 33,608 particle trajectories and a single set of footprints completed in 5.5 hours utilizing 80 parallel threads across 5 nodes, each equipped with 64 GB of memory with two 8-core Intel XEON E5-2670 2.6 GHz processors.6.7% of the simulations were not completed due to short-term outages in the HRRR data product.
P7, L30: Please use rˆ2 and state which method was used in calculating r.Pearson's method?How are these rˆ2 values statistically different?The simulations from GWD is distinguishably from a different distribution from the other two so that we have more confidence in GWD?Note that in this evaluation, we want to clearly see better results from GWD. Right?
As recommended, we have modified the text to use r 2 instead of r to explain model variance and have clarified that it is based on Pearson's method.
While there is likely no statistical significance in the differences between GND and LEG for this case study, we show that GND agrees better with the physical "ideal" case and may give improvements that depend on the locations of differences between GND and LEG relative to the locations of surface fluxes.With the vertical dilution correction (GWD), the results agree more closely with measurements in both time and space.
P8: L1: I think this is probably the most important single statement in this paper.I would like to know how the authors determined the uncertainty in the surface fluxes.Without precise uncertainty characterization, the results are not reliable.What if the inventory is systematically low and GWD overestimated the mole fraction, which could be shown to be closer to the observations than the other two methods?I believe that the authors have considered this point, but I don't see the details here to the level that I can clearly see the outperformance of GWD.Also we need to note that the rˆ2 values are all low and similar to each other.
We agree that it is important to investigate uncertainty in inventory estimates.While we can show improvements to footprint smoothing algorithms using physically constrained "ideal" cases, uncertainty estimates within emissions inventories remains an unresolved question within the emission inventory scientific community.Developers of the Hestia inventory have documented that "a devoted effort is needed to generate uncertainty and propagate those uncertainties through the Hestia approach to provide an improved understanding of where results are more or less certain in space and time.This remains a high priority for future research" (Patarasuk et al., 2016) and determination of GHG fluxes and uncertainty bounds is one of the primary goals in the ongoing Indianapolis Flux Experiment ( http://sites.psu.edu/influx/ ).Improvements to LPDMs can help future inverse modelling frameworks that would be better equipped to quantify uncertainties in flux inventories.
To further clarify this, we have added a discussion regarding the difficulties in assessing emission inventory uncertainties.Both of the inventories we discussed in the manuscript (Hestia and ODIAC) agree on the total emissions within the SLV domain which is evidence one inventory is not systematically lower than the other.However, mapping uncertainty to a moving receptor using two emissions inventories that encompass different spatial domains and allocate fluxes using different methods in time and space is a difficult question that requires more tools and analysis than are available in our present manuscript and should be the focus of future work.
The following text has been added to ~p7L20: Within the SLV domain where the inventories overlap, Hestia and ODIAC agree on the total anthropogenic emissions to within 1.5% during our study period.However, uncertainties of fluxes applied to our analyses are likely larger since the two inventories allocate fluxes differently in space and time.Further, only Hestia is used to represent the SLV whereas ODIAC is used outside of the SLV to account for regional-scale emissions.
Uncertainties in inventory estimates are difficult to quantify in time and space and require a devoted effort within the emission inventory scientific community to propagate uncertainties through underlying assumptions within each inventory (Patarasuk et al., 2016;Lauvaux et al., 2016).

P8, L6: Please be more quantitative. It is not clear what has been reproduced. C3
We have changed the text to generalize that the model sees enhancements downwind from major roadways and introduced a caveat that better details the limitations regarding model resolution.
The following text was added to ~p8L21: The model generally produced mole fraction enhancements ( Δ CO 2 ) for grid cells containing or downwind from major roadways (Fig. 7).However, modeling intersection scale enhancements would require finer grid spacing capable of resolving sub-city-block spatial scales that is not yet feasible given current constraints on inventories, meteorological data, and computing resources.Hoornweg et al., 2012;Gurney et al., 2015), the largest anthropogenic forcing on climate change (Canadell et al., 2007).As governing bodies examine ways to address climate change, urban areas are appropriately a focus for emissions regulation.
Atmospheric measurements (Duren and Miller, 2012;McKain et al., 2012) provide a top-down constraint for estimating urban carbon emissions, especially when combined with bottom-up information from fuel consumption statistics, traffic data, and building characteristics that result in highly resolved emission inventories (Gurney et al., 2009(Gurney et al., , 2012)).However, traditional evaluation strategies for estimating CO 2 emissions that focus on quantifying regional scale (10 2 to 10 3 km) averages at coarse resolutions are unable to resolve urban areas beyond bulk estimates.Implementing and evaluating effective policies for emissions mitigation requires understanding where, when, and how emissions occur at a within-city scale.
Novel measurement strategies are emerging to help resolve fine-scale within-city trace gas concentrations, such as measurements made from trains, buses, and cars (Apte et al., 2017;Bush et al., 2015;Lee et al., 2017) as well as dense networks of inexpensive sensors (Shusterman et al., 2016;?) :::::::::::::::::::::::::::::::::::: (Shusterman et al., 2016;Turner et al., 2016).However, traditional atmospheric modeling tools were not designed for densely located and spatially distributed measurements.Simulating atmospheric transport for multiple locations over time often increases the number of simulations by factors of 10 1 to 10 3 , necessitating the use of scalable parallel computing to best utilize available hardware and reduce total simulation time.To make use of recent measurement advances, modeling approaches must structure the model framework in ways that enable simulations to execute in parallel, adapt to finer spatial scales, and facilitate simulating atmospheric mixing ratios for locations distributed across space and time.
The link between measured atmospheric mole fractions and upstream surface fluxes is often established using Lagrangian particle dispersion models (LPDMs), popular tools for simulating atmospheric transport and dispersion in the Planetary Boundary Layer (PBL) (Lin, 2013).The LPDMs simulate transport of an ensemble of theoretical particles (representing air parcels) using a combination of mean winds interpolated from meteorological model fields with stochastic fluctuations representing turbulent motions introduced as a Markov process.This approach offers advantages over Eulerian methods by explicitly simulating transport trajectories and better representing atmospheric mixing, turbulent eddies, and convection (Lin, 2013).Particle motion can be simulated either forward in time from an emissions source or backward in time from a location of interest, referred to as the "receptor".The forward configuration is often used to simulate pollutant concentrations downstream from an emission source (Stohl et al., 2005) whereas backward simulations determine the source of observed emissions and quantify surface fluxes (McKain et al., 2012(McKain et al., , 2015;;Stein et al., 2015).As receptors are often greatly outnumbered by sources, significant computational savings are realized by applying LPDMs in the receptor-oriented configuration (Lin, 2013).
The Stochastic Time-Inverted Lagrangian Transport (STILT) model couples Lagrangian particle dispersion with the mean advection scheme from the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model (Stein et al., 2015) :::::::::::::: (Draxler and Hes STILT simulations are reversible in time (Lin et al., 2003), enable quantitative evaluation of transport error (Lin and Gerbig, 2005), and are closely coupled with the commonly used Weather Research and Forecasting mesoscale meteorological model (Nehrkorn et al., 2010), on which the High Resolution Rapid Refresh (HRRR) model is based (Sun et al., 2014).STILT is most commonly used to follow the backwards time evolution of a particle ensemble and calculate a receptor's footprint, a sensitivity matrix defining the upstream area that contributes to tracer mole fractions observed at the receptor.Footprints can be convolved with emissions inventories and an atmospheric background signal to calculate atmospheric mole fractions at the receptor, which is among the most common applications of the STILT model (Gerbig et al., 2003;Lin et al., 2004;Kort et al., 2008;Macatangay et al., 2008 This paper discusses limitations within the existing STILT codebase and introduces an updated framework intended to improve the model's applicability to fine-scale spatially distributed measurement approaches.::::::: Previous ::::: work ::: has ::::::: defined ::: the :::::::: near-field :::::: domain :: as ::::::::: extending :::: over ::: 10 2 :: -::: 10 3 : km :::::::::::::: (Lin et al., 2003).: We introduce the hyper near-field (HNF) area, typically covering length scales of 1 -10 km and time scales of 0.1 -1 hr, from which surface fluxes are diluted to a fraction of the PBL height and thus more strongly influence the receptor.Parameterizations within the STILT model were originally intended for regional scales and require refinements to improve source-receptor relationships in the HNF.We also describe a footprint calculation scheme using kernel density estimation, rescaling of the effective mixing depth for fluxes in the HNF, and methods for parallelizing simulations.The value of STILT as a tool for interpreting within-city CO 2 mole fractions is shown using an example of data collected on the roof of a train car on Salt Lake City :::::: (SLC), :::: Utah's light-rail system.We discuss how results from our analyses can inform future measurement approaches and modeling efforts.
2 Modifications to the STILT model

Software enhancements
The R ::::::::::::::::: (R Core Team, 2017) component of the STILT model exists as a group of core functions used to track particle locations, calculate footprints, and apply surface flux grids.User groups have built upon these functions, adding scripts for common modeling workflows and additional functionality.Key components of the higher level functions remain unpublished and undocumented prior to this paper, including a description of methods used to aggregate the particle ensemble to calculate footprints.Here, we adopt a widely-used collaborative software development platform (GitHub) as a common source code repository that meets the needs of STILT users.This repository is built upon existing advection and dispersion calculations but has restructured and modernized the core functions used to interact with the model (Fig. 1).
A single script (run_stilt.r)defines model inputs such as receptor locations and meteorological fields, controls and executes the parallelized model, and outputs footprints.Footprints are saved in a netCDF format consistent with conventions for Climate and Forecast metadata (cfconventions.org),the standard for gridded model datasets by the University Corporation for Atmospheric Research (UCAR).This format is compatible with most popular data analysis software platforms and facilitates analysis of model output.The script run_stilt.rserves as the primary STILT interface, interacting with R functions which in turn call Fortran subroutines for the bulk of calculations and providing a systematic, well documented workflow for users.
2.3 Hyper Near Field ::::::::: Near-Field : vertical mixing depth The influence of surface fluxes on air arriving at the receptor depends upon vertical dilution within the atmospheric column.

Kernel density estimation of footprint field
Prior to methods described in this section, STILT footprints have been calculated by accumulating the influence of particles over an averaging volume.To lessen grid noise from few particles spread throughout the grid, the spatial extent of the particle ensemble was used to dynamically coarsen the size of the averaging volume by a factor of 2 as the particle cloud spreads, first shown in Gerbig et al. (2003).However, at finer resolutions, this method results in excessive smoothing, removing information calculated by the advection and dispersion routines (Fig. 3 : 4).
The observations generally show higher CO 2 mole fractions in Salt Lake City :::: SLC's urban center and along the north-south oriented urbanized corridor centered in the SLV (Fig. 7), consistent with urban spatial CO 2 gradients observed in previous studies (Idso et al., 2001;Pataki et al., 2007).The lowest mole fractions were observed in the southwest corner of the SLV at the margin of recent suburban developments.At a finer scale, the high-frequency measurements show mole fraction enhancements near busy roads and intersections.Measured mole fractions are also consistently higher along a 3 km section of the light-rail track running along the center of a busy six-lane road.
Footprints are convolved with anthropogenic and biological CO 2 fluxes and added to background CO 2 mole fractions that are representative of CO 2 mole fractions that have not been influenced by urban emissions (Mitchell et al., in press).The background mole fractions are taken from a nearby high elevation measurement site at Hidden Peak at the top of the Snowbird ski resort in the Wasatch Mountains (Stephens et al., 2011).We use a similar approach to prior studies (e.g.McKain et al. (2012)) and focus this analyses on the afternoon and early evening hours (12:00-19:00 Local Daylight Time, LDT) to lessen the influence of boundary layer development, nocturnal stratification of the boundary layer, and shallow turbulence on measured mole fractions that would not be represented in the 3 km resolution of the HRRR meteorological fields.

Results
Observed and simulated mole fractions are averaged by hour of day to generate mean diel cycles, shown in Fig. 5. Observations show elevated mole fractions at night and early morning, decreasing into the afternoon as convective mixing increases (Mitchell et al., in press).All three of the simulated diel cycles derived from the different footprint algorithms systematically underestimate nighttime and early morning mole fractions, consistent with previous studies (Macatangay et al., 2008;McKain et al., 2015;Mallia et al., 2015;Lauvaux et al., 2016).However, during afternoon hours the simulated values track more closely with the observations, with GWD exhibiting closer correspondence than GND and LEG (Fig. 5).
Key differences between modeled and measured mole fractions exist near HNF sources at the sub-grid scale (Fig. 7).While the model does capture localized mole fraction enhancements near busy roads and intersections ('I' in Fig. 7), measured mole fractions are systematically higher than corresponding model estimates in these areas.These results indicate that the light-rail measurement platform is sampling emissions prior to mixing with the surrounding air.This is evident along the section of lightrail track that shares the six-lane road with other vehicles ('R' in Fig. 7) on which large discrepancies between measurements and model estimates are regularly observed.By diluting emissions throughout a larger grid cell, the model predicts elevated mole fractions localized within and downwind from cells containing significant sources but does not fully capture the magnitude of enhancement resulting from the close proximity to the emissions source.
To demonstrate the benefits of the fine-scale approach :::::: 0.002 :::: grid :::::::: resolution, we use the above 0.002 grid as well as spatially degraded 0.01 and 0.1 grid resolutions for : to :::::::: convolve footprints and fluxes (Fig. 8) to ::: and compare against light-rail measurements.Correlations over space between the time-averaged modeled and measured concentrations are highest for the 0.002 grid (r = 0.52).We find that degrading the resolution to 0.01 still captures the SLV-scale urban-suburban-rural mole fraction gradient but fails to resolve much of the roadway and intersection scale enhancements, resulting in a modest decrease in agreement with measurements (r = 0.48).However, degrading the resolution to 0.1 prevents the model from resolving much of the spatial mole fraction variation (r = 0.25).Further, evaluating the spatially-averaged concentration by hour of day shows improved agreement with measurements among the finer grid resolutions (0.002 , 0.01 ) over the more coarse 0.1 resolution (Fig. 9).While all three resolutions mimic the temporal pattern in the observed mole fraction enhancements due to the temporal variability assigned to emissions inventories, the finer resolutions better capture the mole fraction enhancements observed by the light-rail train in both time and space.

Summary and Conclusions
In this paper, we have introduced modifications to the STILT-R code that have improved the spatial averaging of the footprints and the model speed.These changes improve the functionality of the STILT model for applications investigating fine-scale patterns in urban emissions.Given the importance of footprints in the STILT workflow, a kernel density estimator was applied and shown to improve agreement with an idealized brute force method over prior methods.High level methods for single and multi-node parallelism were introduced in this distribution, significantly reducing total simulation time.We then applied STILT to simulate CO 2 mole fractions observed along a light-rail train in Salt Lake City :::: SLC, at high resolution and show that the model and observations track one another in terms of average spatial and temporal patterns during the afternoon period.However, key differences remain between modeled and measured mole fractions at night and in the proximity of HNF sources.

Figure 1 .
Figure 1.STILT workflow to model tracer mole fraction at a receptor.STILT advects particles and calculates the influence footprint for each receptor.Footprints are convolved with surface fluxes and an atmospheric background signal to model the tracer mole fraction.

Figure 2 .Figure 3 .
Figure 2. Receptor batches are distributed across parallel threads to enable multiple concurrent simulations.Provided memory limits are not exceeded, the total simulation time decreases linearly with the number of CPU cores available.

Figure 4 .Figure 5 .
Figure 4. Comparison of footprint calculation methods.Simulating a large number of particles (N = 10 5 ) and gridding by location (top left) gives a physically constrained expectation for the footprint.Using subsets of 200 particles (top) and 10 particles (bottom), the kernel density estimator demonstrates considerable improvements over the traditional dynamic grid coarsening.Modifying the kernel bandwidths (f = 2) can improve results in uncommon cases, such as the 10 particle ensemble.

Figure 6 .Figure 7 .Figure 8 .Figure 9 .
Figure 6.July 2015 afternoon Salt Lake Valley (SLV) Hestia-derived and non-SLV ODIAC-derived anthropogenic CO2 emissions, biological fluxes, and average STILT footprint.The anthropogenic and biological flux inventories convolved with the footprints give the contribution of near-field fluxes to measured mole fractions in ppm.The light-rail train is highly sensitive to HNF emissions sources and is strongly influenced by large roadways and agriculture adjacent to the line.
have added citations for McKain et al., 2012 and McKain et al., 2015 describing STILT modeling applications in Salt Lake City and Boston as well as Kort et al., 2013 describing STILT's use to assess measurement network design in Los Angeles.