Development of PM 2 . 5 source impact spatial fields using a hybrid source apportionment air quality model

An integral part of air quality management is knowledge of the impact of pollutant sources on ambient concentrations of particulate matter (PM). There is also a growing desire to directly use source impact estimates in health studies; however, source impacts cannot be directly measured. Several limitations are inherent in most source apportionment methods motivating the development of a novel hybrid approach that is used to estimate source impacts by combining the capabilities of receptor models (RMs) and chemical transport models (CTMs). The hybrid CTM–RM method calculates adjustment factors to refine the CTMestimated impact of sources at monitoring sites using pollutant species observations and the results of CTM sensitivity analyses, though it does not directly generate spatial source impact fields. The CTM used here is the Community Multiscale Air Quality (CMAQ) model, and the RM approach is based on the chemical mass balance (CMB) model. This work presents a method that utilizes kriging to spatially interpolate source-specific impact adjustment factors to generate revised CTM source impact fields from the CTM–RM method results, and is applied for January 2004 over the continental United States. The kriging step is evaluated using data withholding and by comparing results to data from alternative networks. Data withholding also provides an estimate of method uncertainty. Directly applied (hybrid, HYB) and spatially interpolated (spatial hybrid, SH) hybrid adjustment factors at withheld observation sites had a correlation coefficient of 0.89, a linear regression slope of 0.83± 0.02, and an intercept of 0.14± 0.02. Refined source contributions reflect current knowledge of PM emissions (e.g., significant differences in biomass burning impact fields). Concentrations of 19 species and total PM2.5 mass were reconstructed for withheld observation sites using HYB and SH adjustment factors. The mean concentrations of total PM2.5 at withheld observation sites were 11.7 (± 8.3), 16.3 (± 11), 8.59 (± 4.7), and 9.2 (± 5.7) μg m for the observations, CTM, HYB, and SH predictions, respectively. Correlations improved for concentrations of major ions, including nitrate (CMAQ–DDM (decoupled direct method): 0.404, SH: 0.449), ammonium (CMAQ–DDM: 0.454, SH: 0.492), and sulfate (CMAQ– DDM: 0.706, SH: 0.730). Errors in simulated concentrations of metals were reduced considerably: 295 % (CMAQ–DDM) to 139 % (SH) for vanadium; and 1340 % (CMAQ–DDM) to 326 % (SH) for manganese. Errors in simulated concentrations of some metals are expected to remain given the uncertainties in source profiles. Species concentrations were reconstructed using SH results, and the error relative to observed concentrations was greatly reduced as compared to CTM-simulated concentrations. Results demonstrate that the hybrid method along with a spatial extension can be used for large-scale, spatially resolved source apportionment studies where observational data are spatially and temporally limited.


Introduction
Variations in ambient pollutant species concentrations, including particulate matter (PM) and gases, are correlated with health outcomes -such as lower birth weight (Darrow et al., 2011;Wang et al., 1997), higher occurrences of bradycardia and central apnea (Campen et al., 2001;Peel et al., 2011), decreased peak expiratory flows and increased respiratory symptoms in non-smoking asthmatics (Peters et al., 1997) Published by Copernicus Publications on behalf of the European Geosciences Union.-and all cause lung cancer and cardiopulmonary mortality (Pope et al., 2002).Additionally, nanotoxicological studies report that particle uptake by cells and entry into blood and lymphs leads to oxidative stress in sensitive areas of the body such as lymph nodes, bone marrow, and the spleen (Oberdorster et al., 2005).Recently, in a study on the global burden of disease, of the 67 risk factors studied, exposure to ambient particulate matter (PM) pollution was the ninth highest risk factor leading to disability-adjusted life years (Lim et al., 2012).Many past epidemiological studies focused on associating PM mass (e.g., PM 2.5/10 : PM with aerodynamic diameters less than 2.5 or 10 µm) with the health outcomes, as opposed to individual species or the sources of the PM due to limited data availability or difficulties in quantifying source impacts.Epidemiological studies are examining the associations between individual species and health outcomes using data from ground observation networks, such as the Chemical Speciation Network (CSN) and the Southeastern Aerosol Research and Characterization Network (SEARCH) (Dominici et al., 2010;Samet et al., 2000;Sarnat et al., 2008;Tolbert et al., 2007).It is of further interest to determine the degree to which individual sources are influencing health events and to link human exposure and subsequent adverse impacts to sources and multi-pollutant mixtures (Laden et al., 2000;Thurston et al., 2005).Attributing individual component concentrations and the overall mixture of observed air pollution to specific sources, as well as linking those sources with adverse health impacts, is challenging.Typically, receptor models (RMs) are used to generate source apportionment (SA) results for epidemiological studies because longer time series are required (e.g., greater than 2 years) (Sarnat et al., 2008).
Several receptor-oriented SA models have been developed to quantify emission source impacts on pollutant concentrations.Each model has its own unique characteristics and associated uncertainties (Balachandran et al., 2012;Seigneur et al., 2000).Schauer and Cass (2000) used organic tracers for source apportionment using the chemical mass balance (CMB) method at two urban sites and one background site in central California (Watson et al., 1984).Their implementation addressed the improper accounting of volatile organic compounds (VOCs) from motor vehicle exhaust and wood combustion.Watson et al. (2001) reviewed several studies that used CMB for source apportionment, and reported that uncertainties in source contributions of VOCs led to uncertainties in impacts from important sources such as off-road vehicles, solvent use, diesel and gasoline exhaust, meat cooking, and biomass burning.The authors also describe several limitations of CMB, including reliance on existing observations and overlooking profiles that change between source and receptor due to factors such as dilution, aerosol aging, and deposition.Maykut et al. (2003) used positive matrix factorization (PMF) for source apportionment at an urban Seattle, Washington (USA), site with selected trace elements to distinguish combustion sources (Pattero and Tapper, 1994).Temperature-resolved organic and elemental carbon fractions were also used in Unmix to distinguish diesel and other mobile sources but did not lead to significantly different results (Henry, 2005).There was also difficulty in distinguishing small sodium-rich industrial sources due to the similarity to the aged marine aerosol source profile.
In an effort to improve the spatial and temporal resolution of SA data and improve source distinction, chemical transport models (CTMs) have been adapted to estimate emission impacts on pollutant concentrations.Marmur et al. (2006) conducted a comparison of source-oriented and receptororiented modeling results for a winter and summer month in the southeastern USA.The brute force method was used in the Community Multiscale Air Quality (CMAQ) model to calculate impacts from mobile sources, biomass burning, coal-fired power plants, and dust.The authors determined that meteorological effects had a strong impact on the temporal variation of CMAQ source impacts, where receptor model results exhibited more day-to-day variability.Koo et al. (2009) used the decoupled direct method (DDM) in the comprehensive air quality model with extensions (CAMx) to determine the sensitivity of particle sulfate concentration to changes in emissions of SO 2 and NO X from point sources; NO X , VOC, and NH 3 from area sources; and all emissions from on-road mobile sources (Byun and Schere, 2006;Dunker, 1981Dunker, , 1984;;Napelenok et al., 2006).DDM first-order sensitivities underestimated the impacts on sulfate concentration when all emissions are removed due to nonlinearities, as compared to brute force method results.Zhang et al. (2012) addressed this issue by calculating second-order sensitivities of inorganic aerosols using DDM, which better captured nonlinear responses to changes in emissions up to 50 %.
This work utilizes a hybrid CTM-RM method to provide spatial fields of source impacts for use in detailed healthrelated, spatiotemporal analyses (e.g., Sarnat et al., 2008).Spatially resolved source impacts and concentrations are key inputs for residential or county level exposure studies that investigate the impact of air pollution on regional health outcomes (Bell, 2006).The CTM-RM method combines the strengths of both source apportionment techniques in an effort to reduce uncertainty in source impact estimates.The goal of this study is to create spatial fields of source impacts by spatially interpolating source impact adjustment factors (ratios, or R's) and then applying those adjustments to CTM source impact fields.R's are generated by a hybrid CTM-RM SA approach that integrates observational data and results from a CTM to calculate an emission-based adjustment of source impacts at receptor locations (Hu et al., 2014).Kriging is employed to generate spatial fields of R's for 33 emissions sources.The spatial fields of adjustment factors are applied to original source impact fields to produce hybrid-adjusted source impact and species concentration fields for the continental USA.The adjustments can also be interpolated in time to adjust source impact fields on days when speciated observations are not available.The performance of the spatial extension is evaluated by performing data withholding and by comparing results to observations from other monitoring networks.The hybrid CTM-RM method, along with the spatial extension, provides air quality data fields for health studies that require spatially resolved exposure metrics.This approach can also be used to assist air quality planners in developing state implementation plans (SIPs) and assessing exceptional events, such as wildland fires.

Data
Observational data from 189 CSN monitors were used for model development and evaluation (Fig. 1).Data were obtained on 1 in every 3 or 6 days in January 2004 for a total of 9 days (e.g., 4, 7, 10, ... 28 January), which led to varying sample sizes for each observation day.The number of available monitors with speciated PM 2.5 data on observation days ranged from approximately 40 to 150 and each site had 5 to 9 observations over the period examined.CSN monitor measurements include total PM 2.5 , organic and elemental carbon, ions, and 35 metals.CSN monitors tend to be located in more densely populated areas such as urban and suburban areas, and data are more associated with highpopulation emissions sources such as mobile and cooking sources.Speciated PM 2.5 data are also available from the SEARCH (Hansen et al., 2003(Hansen et al., , 2006) ) and IMPROVE (Chow et al., 1993) networks, and those data were used for further model evaluation.The SEARCH network includes eight monitors in the southeastern USA, configured as urban/rural pairs.IMPROVE monitors are mainly located in pristine locations such as national parks and wilderness areas.Thirtyeight IMPROVE monitors in the eastern USA were used for model evaluation.IMPROVE monitors in the eastern USA were used due to their closer proximity with urban monitoring sites (e.g., less than 50 km), as opposed to western IM-PROVE sites which are much more spatially sparse.Additionally, modeled processes have higher uncertainty for the western USA due to complex terrain and meteorology, leading to added bias in the observation and model comparison (Baker et al., 2011).

CTM-RM hybrid method
This study utilizes a hybrid SA method that combines techniques of both CTMs and RMs to generate adjustment factors (symbolized by R) that improve source impact estimates.Hu et al. (2014) described the hybrid approach in detail, but it is briefly summarized here.First, gridded concentrations and emissions sensitivities of PM 2.5 species are generated using .CMAQ-DDM model sensitivities to emissions are designated as the original (base case) source impacts (SA base i,j ) for species i and source j .CMAQ-DDM was run with strict mass conservation (Hu et al., 2006), the SAPRC-99 chemical mechanism (Carter, 2000) and the aerosol module described in Binkowski and Roselle (2003).The modeling domain contains the continental USA, southern Canada, and northern Mexico, with 36 km grid resolution, Lambert Conformal Conic geographic projection, and 13 vertical layers of variable thickness extending from the surface to 70 hPa.Meteorological inputs were generated using the fifth-generation PSU/NCAR (Pennsylvania State University-National Center for Atmospheric Research) mesoscale model (MM5) with 35 vertical layers, implemented with the Pleim-Xiu land surface model (Grell et al., 1994, Pleim andXiu, 1995;Xiu and Pleim, 2001).Emissions inputs were processed using the Sparse Matrix Operator Kernel Emissions (SMOKE) module (CEP, 2003).Emissions data originated from a 2004 inventory that was projected from the 2002 National Emissions Inventory (NEI2002).Please refer to the preceding publication by Hu et al. (2014) for additional details about the emissions inventory.
Next, the original source impacts, receptor observations, and uncertainties are used as inputs to the objective function (Eq. 1) of the hybrid SA model: where the adjustment factors R j are optimized by minimizing the objective function, χ 2 .The initial R j values are specific to 1 site and 1 day, as the method is applied at monitors when speciated PM 2.5 data are available on observation days, and are then kriged and interpolated.The terms c obs i and c sim i represent the observed and CMAQ-simulated concentrations, respectively; weights the amount of change in source impact.Uncertainties in observation measurement (σ i, obs ), modeled concentrations (σ i, CTM ), and source strength (σ ln(R j ) ) are also included in the model.Specifically, σ i, obs is reported with measurements for each day from the CSN network; σ i, CTM is error in modeled concentrations, which is proportional to observed concentrations and remains constant for all sites and days; and σ ln(R j ) is uncertainty in source contribution expressed as the log of the factor of uncertainty, which also remains constant for each site and day.The uncertainties weight the adjustment of modeled source impacts, in that components with larger uncertainties are weighted less.
The objective function is minimized by using a nonlinear optimization approach known as sequential quadratic programming (Fletcher, 1987;Gill et al., 1981).The function is modeled using a ridge regression structure, as demonstrated by the second term, and uses an effective variance approach to balance model outputs.The effective variance approach is also utilized by versions of CMB, and the optimization method used here is, in essence, an extended CMB approach (Watson et al., 1984).Uncertainties in the first term of the objective function serve as effective variances of the numerator and are specified for each species i.Finally, R j are applied to SA base i,j to adjust original source impact estimates (Eq.2) and reconstruct simulated concentrations (c adj i ) at receptors to more closely reflect observations (Eq.3).

SA adj
Given that many of the source impact profiles are similar between categories such that colinearities are present, the variation of the R j values are constrained to 0.1 ≤ R j ≤ 10.Source impact profiles are derived from the information provided by Reff et al. (2009).In this manuscript, "source impact profiles" are different than "source profiles" in that they describe the source fingerprint at the receptor.In other words, the source profile can be altered, for example by the formation of secondary species.However, for many of the species, there is no secondary formation.It is assumed that within the accumulation mode, which contains most of the fine PM mass in CMAQ, the composition of the primary portion of the PM 2.5 from any source is the same, but secondary species can be formed, altering the source profile at the receptor.The specific steps taken in applying source profiles to CMAQ-generated data are described as follows.Source profiles for 84 source categories were presented in Reff et al. (2009), which were aggregated from roughly 300 PM 2.5 SPECIATE v4.0 profiles and contain estimates of trace metal contributions.The 84 PM 2.5 profiles were further aggregated into 33 categories, consistent with the sources of interest in this study.Then the contributions in the 33 profiles were used to speciate the "other" (sometimes called unidentified) portion of PM 2.5 (species name: A25) as output by CMAQ.The contributions of the 35 trace species were then used to split the "other" PM 2.5 into individual species, and results for these species, along with the other primary and secondary species are used.At the receptor, both the primary and secondary PM 2.5 contribution at the receptor are used to determine the new, receptor-oriented, source impact profiles.This same approach was used to generate receptor-oriented profiles in the preceding publication by Hu et al. (2014).
The hybrid method produces results that more closely reflect observations than the original CTM results, which are often biased (Hu et al., 2014).It accounts for more known source categories than traditional RM approaches (e.g., 33 vs. 6), and it links sources and observations both temporally and spatially.Additionally, the hybrid method generates estimates of the uncertainty in source impact predictions and identifies potential errors in source strength and composition.One limitation of the hybrid method is that results are only available at receptor locations when observations are available, limiting its spatial and temporal scope.In this paper, a spatial hybrid method is presented and evaluated, and it extends the benefits of the hybrid CTM-RM method through spatial interpolation.

Development of spatiotemporal fields
Spatial and temporal source impact fields can be developed by combining the hybrid CTM-RM method and geostatistical techniques.Hybrid-generated R j values were spatially interpolated for each observation day using kriging to generate spatial fields of source impact adjustment factors.Matlab © (v.7.14.0.739) was used to perform all geostatistical and optimization calculations.Daily-averaged spatial fields of CMAQ-DDM source impacts are adjusted by grid-by-grid multiplication of the original fields by the corresponding adjustment factor field, resulting in spatial fields of hybrid-adjusted source impacts that are available every third day, as are observations.Source impact fields for intervening periods are developed by interpolation of the R j spatial fields.Temporally interpolating R j values and then applying those adjustments to simulated source impact fields is preferred over simply interpolating the 1-in-3 day hybridadjusted source impact fields because temporally interpolating adjusted source impacts would smooth the fields, and the day-specific spatial and temporal variability in the emissions and meteorology captured by the CTM would be lost.

Method evaluation
Performance of the spatial extension was evaluated using a data withholding approach, as well as by comparison with data from the SEARCH and IMPROVE networks.For data withholding, we removed 10 % of the available observations (75 sets of observations at the monitors with speciated PM 2.5 data) and re-ran the spatial hybrid model.This led to a total of 75 observation sets being used in the model evaluation.All references to "withheld CSN data" refer to these 75 sets of withheld data.The remaining 90 % of the available observations were used to fit the variogram models, which were used in kriging to produce spatial fields of R j values.Concentrations are reconstructed using Eq. ( 3) with the spatially interpolated adjustment factors.Additionally, hybrid CTM-RM optimization is directly applied to withheld observation sites to assess the performance of the kriging model.Then the original CMAQ-DDM, directly applied hybrid (CTM-RM), and spatial hybrid (SH) concentrations are compared to measurements at withheld observation locations to evaluate the performance of each method in simulating concentrations.Linear regression was used to assess correlations between observations and modeled concentrations for each method.
In order to evaluate prediction performance in remote locations and in locations independent of CSN, CMAQ-DDM and SH concentrations were compared to observations at SEARCH and IMPROVE locations.Note that the application of the CTM-RM hybrid method, as conducted here, did not include SEARCH and IMPROVE data, and CTM-RM/SH results are independent of those observation data.The SEARCH and IMPROVE comparisons also address the issue of spatial representativeness of using only CSN data to produce spatial fields.This study uses available speciated CSN data over the entire USA, thereby providing a very spatially heterogeneous data set that is representative of key emissions and meteorology in each USA region.The lack of rural data available may present uncertainties in the spatial representativeness of R j values outside of urban regions.
Also note that 41 species, including total PM, were used for spatial field construction, but only results for 20 species are presented for comparison of CSN results and 15 species for SEARCH and IMPROVE results, as measurements for some trace metals are seldom above measurement detection limit.The possibility of added uncertainty in the optimization step due to detection limit issues was considered.Optimization was tested with the absence of species with limited availability, and no significant differences in model performance were found.The use of the measurement uncertainty in the objective function minimizes the role of those measurements on days when they are below the detection limit, but still accounts for the concentration levels being low.Using all available measurements in the optimization model is the preferred approach.

Spatial extension evaluation
CTM-RM and SH adjustment factors at withheld observation locations were compared using regression to evaluate the spatial interpolation that was performed using kriging.For each observation day (9 days), 10 % of available observations were randomly withheld, resulting in a total of 2,475 R j data points (75 observations locations × 33 source categories).Five outlying data pairs (< 0.5 %) were removed from this regression.Outlying data pairs are determined by examining the distribution of the directly calculated R j values (mean = 0.84, SD = 0.48) and the kriged R j values (mean = 0.83, SD = 0.30) at the withheld observation locations.Data pairs were removed if either value was more than 6 standard deviations from the mean R j value.The removed data points (5 points out of 2475) were well outside of this range.The remaining CTM-RM and SH factors had a Pearson correlation coefficient of 0.89, a linear regression slope of 0.83 ± 0.02, and an intercept of 0.14 ± 0.02 (Fig. 2).
Root mean square errors (RMSEs) were calculated for the adjustment factors by source (Eq.4) RMSEs for all sources were less than 0.4, with the exception of RMSEs for lawn waste burning, prescribed burning, and wood stoves (Table S1 in the Supplement).This is expected given the uncertainty in the burn emissions (Table S2).
Sources such as diesel, liquid petroleum gas, non-road natural gas, and Mexican combustion all had very low RMSEs, mean R j values near 1, and median R j values near 1.This indicates that there is little to no adjustment to these source impacts and that kriging captures the R j values calculated by the CTM-RM application.Mean and median R j values are within 20 % for most sources (Table S1).The overall mean R j value at withheld observation locations for all sources for CTM-RM and SH adjustment factors was 0.84 and 0.83, respectively, indicating a high bias in CMAQ-DDM overall, as expected from the base model performance evaluation (PM 2.5 was biased approximately 40 % high).
Cumulative distributions were examined for CTM-RM and SH adjustment factors for each source, and adjustment factors were highly correlated for each source (Fig. S1).Spatial interpolation captured CTM-RM trends for sources dominated by adjustment factors near 0.1, such as dust, lawn waste burning, prescribed burning, and wood stoves, though did not capture all of the extremely low adjustments (e.g., meat cooking in some locations).Sources that found little adjustment (R j = 1) include aircraft, diesel combustion (stationary sources), fuel oil burning, Mexican combustion, non- road liquid petroleum gasoline combustion, and sea salt, and were well captured by the spatial extension, as demonstrated by nearly identical cumulative distributions.The cumulative distribution plots exceed 1.0 (x axis) for dust, lawn waste burning, prescribed burning, and wood stoves.These sources are highly variable day-to-day, and CMAQ-DDM underestimations are possible in cases where the original emissions missed an actual burn or dust event.
Spatial fields of hybrid adjustment factors are presented for dust, on-road diesel and gasoline combustion, and wood stove sources (Fig. 3).Average R j values over all observation days are also presented for reference (Fig. S2).Typically, R j values were less than 1 for dust and wood stove impacts, indicating a high bias in those source impacts in the base CMAQ-DDM simulations.Spatial field values for on-road diesel and gasoline combustion R j are generally near one over most of the USA; however, R j values for those sources tend be below one in the southeastern region of the USA.
In general, for an R j value less than 1, the initial CMAQ-DDM estimate is reduced to be more consistent with observations.In turn, for an R j value greater than 1, the initial CMAQ-DDM estimate is increased to be more consistent with observations.An R j value of 1 indicates that no adjustment to the CMAQ-DDM is necessary to improve consistency with observations.As such, after application of the SH method, it was found that while many of the source impacts were adjusted relatively little (i.e., R j ≈ 1.0), dust-related and biomass burning-related impacts were often biased high in the original CMAQ-DDM simulation and therefore considerably reduced.
The distribution of all R j values was approximately lognormal, and an analysis was performed to determine whether log-transformation of R j values prior to the kriging step was necessary to reduce bias in source impact and concentration estimates (Fig. S3).In one approach, we log-transform the R j values at the monitors before kriging, and then the kriged values are unlogged before use in reconstruction.In the second approach, we do not log-transform before kriging.From the analysis it was determined that lognormal transformation of R j values was not necessary, as no significant difference was observed in reconstructed concentrations and source impact fields as a result of the transformation.
Additionally for method evaluation, withheld CSN observations were compared with SH concentrations, which were calculated using kriged R j values and Eq.(3) (Table S3).The mean concentrations of total PM 2.5 for withheld observation locations were 11.7 (± 8.3), 16.3 (± 11), 8.59 (± 4.7), and 9.2 (± 5.7) µg m −3 for the observations CMAQ-DDM, CTM-RM, and SH estimations, respectively.Levels of crustal metals (Al, Si, Ca, and Fe), K, and Cl were biased very high in the base CMAQ-DDM simulation, oftentimes an order of magnitude greater than observations.SH concentrations of metals were closer to the CSN observations.Error in simulated (sim) concentrations is calculated using Eq. ( 5): In Eq. ( 5), i represents observations and N represents the total number of observations withheld for evaluation.The error was 295 and 139 % for CMAQ-DDM vs. observations and SH vs. observations, respectively, for vanadium; and 1340 and 326 % for CMAQ-DDM vs. observations and SH vs. observations, respectively, for manganese.The large remaining errors stem from the source profiles leading some elements to being biased consistently high and others low.Further work to optimize source profiles can reduce residual errors.
Performance indicators for some species indicate poorer correlation, such as the β values for calcium for CMAQ-DDM (β = 1.22) and SH (β = 0.16) regression comparison (Table S4).However, all metrics presented must be taken into account and evaluated holistically.The α values for calcium indicate an improvement in performance, as the spatial hybrid value (α = 0.044) is closer to 0.0 than the CMAQ-DDM value (α = 0.13).Further, mean concentrations at withheld observation locations also indicate better performance of the SH model, where mean calcium concentrations were 0.041 (observed), 0.18 (CMAQ-DDM), and 0.050 (SH) (Table S3).According to the mean concentrations, the SH method performs best.Throughout the analysis, CMAQ-DDM estimates of trace metal concentrations were orders of magnitude too high, while SH results were closer to observations.While some individual metrics indicate better performance of CMAQ-DDM, overall performance of the SH method is most favorable.An important point is that the species where performance is less good are typically those species that have a smaller role in determining source impacts.For example, those species are very trace and/or have high uncertainties in the measurements or source profiles relative to their observed concentrations.
The SH method was further evaluated by comparing simulated concentrations to independent data from the SEARCH and IMPROVE networks (Tables S5 and S6).The mean concentrations over observation days were compared, as well as regression statistics for observations vs. modeled results.For the SEARCH network (N = 8 monitors), average concentrations of 15 species were compared to observations.Error in mean concentrations for crustal elements was significantly decreased (CMAQ-DDM and SH): Al, 2203 to 540 %; Si, 1228 to 271 %; K, 365 to 61 %; Ca, 402 to 61 %; Fe, 260 to 3 %; Cu, 231 to 38 %; and Se, 63 to 25 %.For the IMPROVE network (N = 38 monitors), errors in mean concentrations for crustal elements were also significantly decreased: Al, 704 to 24 %; Si, 371 to 24 %; K, 599 to 48 %; Ca, 361 to 36 %; Fe, 334 to 18 %; Cu, 186 to 57 %; and Se, 22 to 11 %.Linear regression metrics are also presented for SEARCH and IMPROVE monitors (Tables S7 and S8).Correlations for all SEARCH and IMPROVE species did not improve; however, estimation performance for most trace metals and ions improved.

Refined source impacts
Refined dust and biomass burning source impacts led to better agreement between simulated and observed concentrations of crustal (Al, Ca, Fe, Si) and biomass burning-derived elements (Cl, K).Original CMAQ-DDM estimates were biased very high for these species compared to observations.This is due to the apparently high bias in source impact profile estimates for biomass burning sources, which do not take into account long-range transport and deposition of biomass burning-related PM.Results suggest that due to atmospheric transformation processes, the source impact profiles are in error for some species, similar to the findings in Balachandran et al. (2013).Observations for some elemental species (Mg, P, V, Se) were highly influenced by measurement limitations (i.e., at or below detection limit) and showed the poorest correlation with modeled concentrations.Additionally, conversion of observed carbon species between analytical methods, from total optical transmittance to total optical reflectance equivalents, introduced potential bias into concentration comparisons.Other studies have shown that conversions may overcorrect observations of carbon species (Balachandran et al., 2013).
Average source contributions to PM 2.5 at withheld CSN observation locations were ranked from largest to smallest for base CMAQ-DDM, CTM-RM, and SH (Table 1).The top three sources were wood stoves, dust, and livestock emissions for base CMAQ-DDM simulations, the latter source capturing the influence of ammonia emissions on the formation of nitrate.The livestock category includes impacts from agricultural and farming activities.For CTM-RM and SH results, wood stoves (10th for both) and dust (13th for CTM-RM, 14th for SH) were ranked much lower than for CMAQ-DDM.Livestock emissions were ranked 1st for both the CTM-RM and SH hybrid applications.Source ranking for open fires was reduced from 10th (CMAQ-DDM) to 20th for both the CTM-RM and SH applications.The fuel oil source impact ranking increased from 12th for the base CMAQ-DDM simulation to 6th and 5th for CTM-RM and SH results, respectively.The order of source contributions at withheld observation locations for the CTM-RM and SH applications compared well, though often differed greatly from the base CMAQ-DDM rankings.The difference in rankings between CTM-RM and SH contributions was, at most, two positions.
The top three sources of primary PM 2.5 for January 2004, based on source emissions, were dust, wood stoves, and coal combustion, estimated at 1275, 5301, and 3407 metric tons per day, respectively (Table S2).However, uncertainties associated with dust and wood stove emissions are much higher than most of the other sources, a factor of 10 and 5, respectively (Hanna et al., 1998(Hanna et al., , 2001;;Hu et al., 2014).This uncertainty is driven in part by source variability.The large uncertainty and potential bias is reflected in the large shift in rankings for dust and wood stove source contributions to PM 2.5 .Other biomass burning sources such as lawn waste burning and wildfires have similarly large emissions uncertainties and likely large temporal variabilities, and their rankings were also significantly decreased.
Coal combustion includes the secondary formation of sulfate and remains in the top three sources for average SH PM 2.5 contributions, as its emissions uncertainties are low due to the availability of continuous emission monitoring data.SO 2 emissions are large (January 2004 domain totals: 72924.7 metric tons per day), as are NO X emissions (74619.7 metric tons per day) (Table S9).During the study Table 1.Source category abbreviations with average CMAQ-DDM, CTM-RM, and SH (spatial hybrid) source contributions to PM 2.5 concentrations for withheld CSN observation locations (N = 75 observations) for January 2004.Note: all averages and standard deviations are expressed in µg m −3 .Average total mass of withheld observations, and corresponding CMAQ-DDM, CTM-RM, and SH estimates were 11.7 (± 8.3), 16.3 (± 11), 8.59 ± 4.7, and 9.2 (± 5.7) µg m −3 , respectively.NR = Non-road, CM = Combustion.Secondary formation processes increase the impact of coal combustion, biogenic and livestock emissions relative to their initial primary PM contribution.January 2004 primary PM emissions estimates for biogenic and livestock were ranked 33rd and 31st, respectively.However, CMAQ-DDM, CTM-RM, and SH hybrid contributions ranked both sources significantly higher (biogenic rankings: 14th, 11th, and 9th, respectively; livestock rankings: 3rd, 1st, and 1st, respectively).Although primary PM 2.5 emissions from these sources are not large, secondary processes and emissions from gaseous precursors led to high source contributions (Table S9).Biogenic sources emit large quantities of volatile organic compounds which go on to form secondary organic aerosols.Livestock emissions of gaseous ammonia react with sulfate, nitrate, and other acids to form ammonium salts.Therefore, the SH method captures and refines impacts from sources that contribute precursors of PM 2.5 .

Refined Spatial Fields
Base CMAQ-DDM spatial fields were refined by applying R j fields for each source and on each observation day.An example of the adjustment can be found in Fig. 4, where the CMAQ-DDM spatial field of dust impacts is adjusted on 4 January 2004.Sources with high occurrences (∼ > 50 %) of adjustment factors less than 1 include biomass burning, metals processing, and natural gas combustion, and refined spatial fields for these sources are presented in the Supplement (Figs.S5-S7).Biomass burning includes impacts from agricultural burning, lawn waste burning, open fires, prescribed burning, wildfires, wood fuel burning, and wood stoves.The SH method significantly decreases impacts from biomass burning on 4 and 22 January in the eastern USA and for portions of the west coast (Fig. S5), largely driven by the observed potassium and organic compound (OC) levels being lower than simulated levels.On average, CMAQ-DDM simulated levels were a factor of 3.1 (± 1.1) times higher than SH values on 4 January, and a factor of 5.2 (± 1.0) times higher on 22 January.Metal processing impacts were reduced for areas highly impacted by smelting and metal works industries including the Ohio River valley and mid-Atlantic regions (Fig. S6).On average, the CMAQ-DDM values were 21 (± 21) % higher than SH values on 4 January, and 25 (± 21) % higher on 22 January for metal processing impacts.Natural gas combustion impacts (area and point sources only) were reduced for the southeastern USA, the Ohio River valley region, the Gulf of Mexico states, and parts of California and Texas (Fig. S7).On average, CMAQ-DDM levels were 35 (± 14) % higher than SH values on 4 January, and 72 (± 28) % higher on 22 January for natural gas combustion impacts.
Refined spatial fields of January 2004 averaged source impacts are presented for eight sources: (c, d) dust, (e, f) on-road mobile sources, (g, h) coal combustion, (i, j) sea salt, (k, l) metal-related sources, (m, n) fuel oil combustion, (o, p) biomass burning, and (q, r) agricultural activities (Fig. 5).Total PM 2.5 concentration fields are also included with overlapped observed concentrations from 28 January (a, b).The CMAQ-DDM spatial field overestimates concentrations in the Eastern USA, while overlapped concentrations agree more with spatial hybrid results.Modeled concentrations at monitors in mountainous areas, such as Salt Lake City, Utah, are underestimated due to local meteorological conditions (Gillies et al., 2010;Kelly et al., 2013).Wintertime temperature inversions, which cause stagnation in air circulation and consequently high air pollution episodes in industrial valleys, are challenging to capture in models.
Improved spatial field correlation is reflected in monthly averaged spatial fields (Fig. 5).SH dust impacts are greatly reduced domain-wide as compared to CMAQ-DDM.Monthly averaged refinement of biomass burning, where impacts were also greatly reduced, and metal-related source impact fields are consistent with results previously mentioned for 4 and 22 January.Sea salt impacts are localized to coastal areas as expected, and agricultural activity most greatly impacts the mid-western USA, an area dominated by farm lands.Coal and fuel oil combustion impacts are highest in the eastern USA and western Mexico (fuel oil only) and were adjusted very little as compared to the original CMAQ-DDM field.

Discussion
The SH method uses observations and modeled concentrations of species to adjust impacts on a source-by-source basis to provide spatially and temporally detailed source impact fields.The SH method also captures the impacts of secondary aerosol formation from precursor emission sources.Hybrid adjustment factors can be used to estimate the amount of change in emissions necessary for modeled results to better reflect observations, as emissions are roughly proportional to source impacts for primary sources (Hu et al., 2015).Kriging is an effective spatial interpolation method for spatially extending the CTM-RM model and generating spatial fields of adjustment factors.Kriging does not introduce significant error, as the adjusted fields maintain the spatial and temporal variability of the original fields, and this application led to simulated PM 2.5 mass concentrations being closer to observations.Adjusted spatial fields of source impacts capture prior knowledge of emissions impacts, meteorology, and chemistry.The SH method also improves simulated estimates of crustal and trace metal concentrations.
The SH method is being developed both to develop spatiotemporally accurate source impact fields that are consistent with observations, and also to provide an approach to increase our understanding of the spatiotemporal characteristics of source impacts in the United States.We find widespread adjustment to biomass burning and dust impacts (R j less than 1).These source impacts are consistent with observations, emissions estimates, and atmospheric transport and transformation.The SH method is also novel in that, although some sources may not emit a certain pollutant, there still may be some interactions with emissions from other sources leading to those species being part of the source impact.For example, in the case of agricultural fertilizer emissions, although NO x is not directly emitted, the influence on nitrate concentrations is calculated.Although traditionally not quantified in receptor-oriented source apportionment methods, taking into account inter-source interactions is important for determining the primary and secondary impacts of sources on air quality.This hybrid source-and receptororiented approach takes this into account and can determine impacts from elusive source interactions.However, this also shows that the formation of secondary species is often dependent upon multiple sources, and the impact of one source is dependent upon other sources, leading to ambiguity in source attribution.The approach here uses the sensitivities at current conditions, though also conducts a mass balance on a species-by-species basis minimizing any overall bias in the source impact attributions.Spatial hybrid inputs, methods, and results have inherent uncertainties and challenges that are associated with implementation.Input uncertainties include measurement error and challenges are posed with temporal availability and spatial representativeness of concentrations.Emissions inputs for each source are available at different temporal and spatial scales.For instance point source emissions are available at hourly intervals in some cases, while dust emissions are highly variable, both spatially and temporally.Area source emissions are estimated at weekly or monthly intervals and averaged source fingerprints for the primary components of the PM 2.5 emissions are used, which removes the consideration of locally varying source composition.Physical processes in CMAQ-DDM are uncertain as modeling atmospheric behavior is a complex undertaking.Also, first-order sensitivity approaches may not capture all nonlinearities in source-receptor relationships.SH results are also subject to potential systematic bias from the optimization and kriging steps, though our evaluation suggests those biases are minimal.

Conclusion
The spatial hybrid model is an effective approach for reducing the error in simulated source impact spatial fields through statistical optimization, instead of re-running CMAQ-DDM which is more computationally expensive.Despite the several points of uncertainty, SH source apportionment can provide daily, spatially complete source impacts across a large domain over a long time period.The SH technique does not necessarily isolate specific atmospheric processes, as it is not a chemistry or physics model.It is a model based on statistics with the assumption that by incorporating observations (truth) and modeled atmospheric processes (prediction), two results can be statistically combined together to yield a better approximation of source impacts.Efforts are continual for reducing uncertainties, increasing the time span of available results, and evaluating estimations with other data sources, such as satellite imagery and independent field measurements.In future studies, the model will be extended temporally to generate daily, adjusted spatial fields for the continental USA for multiple years and to develop improved source profiles for emissions characterization.Results from SH implementation are beneficial to policy makers, public health analysts, and other air quality scientists that use spatially and temporally complete source impact data in studies where outcomes influence human welfare.
The Supplement related to this article is available online at doi:10.5194/gmd-8-2153-2015-supplement.

Figure 1 .
Figure 1.Modeling domain (dotted, red line) and CSN, SEARCH, and IMPROVE monitors used for model development, application, and evaluation.

Figure 3 .
Figure 3. Spatial fields of kriged adjustment factors (R SH j ) for dust, on-road diesel combustion, on-road gasoline combustion, and wood stove sources for 4 January 2004.Adjustment factors at CSN monitors (denoted by circles) were generated using hybrid (CTM-RM) source apportionment.Note that each panel has a different scale.

Figure 4 .
Figure 4. Hybrid-kriging adjustment of the dust impacts on PM 2.5 on 22 January 2004: (a) original CMAQ-DDM simulation of dust source impacts; (b) spatial field of hybrid adjustment factors for dust (R SH j ); (c) adjusted spatial field of dust source impacts.

Figure 5 .
Figure 5. Average CMAQ-DDM and spatial hybrid source impacts on PM 2.5 for observation days in January 2004 for eight source categories.Total PM 2.5 with overlapped PM 2.5 observations for 28 January (a, b).Impact of (c, d) soil/crustal material, (e, f) traffic-related sources, (g, h) coal combustion, (i, j) sea salt aerosol, (k, l) metal-related sources, (m, n) fuel oil combustion, (o, p) biomass burning, and agricultural activities (q, r).