Decadal evaluation of regional climate , air quality , and their interactions over the continental US and their interactions using WRF / Chem version 3 . 6 . 1

The Weather Research and Forecasting model with Chemistry (WRF/Chem) v3.6.1 with the Carbon Bond 2005 (CB05) gas-phase mechanism is evaluated for its first decadal application during 2001–2010 using the Representative Concentration Pathway 8.5 (RCP 8.5) emissions to assess its capability and appropriateness for long-term climatological simulations. The initial and boundary conditions are downscaled from the modified Community Earth System Model/Community Atmosphere Model (CESM/CAM5) v1.2.2. The meteorological initial and boundary conditions are bias-corrected using the National Center for Environmental Protection’s Final (FNL) Operational Global Analysis data. Climatological evaluations are carried out for meteorological, chemical, and aerosol–cloud–radiation variables against data from surface networks and satellite retrievals. The model performs very well for the 2 m temperature (T2) for the 10-year period, with only a small cold bias of −0.3 C. Biases in other meteorological variables including relative humidity at 2 m, wind speed at 10 m, and precipitation tend to be siteand season-specific; however, with the exception of T2, consistent annual biases exist for most of the years from 2001 to 2010. Ozone mixing ratios are slightly overpredicted at both urban and rural locations with a normalized mean bias (NMB) of 9.7 % but underpredicted at rural locations with an NMB of −8.8 %. PM2.5 concentrations are moderately overpredicted with an NMB of 23.3 % at rural sites but slightly underpredicted with an NMB of−10.8 % at urban/suburban sites. In general, the model performs relatively well for chemical and meteorological variables, and not as well for aerosol–cloud–radiation variables. Cloudaerosol variables including aerosol optical depth, cloud water path, cloud optical thickness, and cloud droplet number concentration are generally underpredicted on average across the continental US. Overpredictions of several cloud variables over the eastern US result in underpredictions of radiation variables (such as net shortwave radiation – GSW – with a mean bias – MB – of −5.7 W m) and overpredictions of shortwave and longwave cloud forcing (MBs of ∼ 7 to 8 W m), which are important climate variables. While the current performance is deemed to be acceptable, improvements to the bias-correction method for CESM downscaling and the model parameterizations of cloud dynamics and thermodynamics, as well as aerosol–cloud interactions, can potentially improve model performance for long-term climate simulations.


Introduction
Regional atmospheric models have been developed and applied for high-resolution climate, meteorology, and air quality modeling in the past few decades.Comparing to global models with a coarser domain resolution (Leung et al., 2003), those regional models have advantages over global models because they can more accurately represent mesoscale variability (Feser et al., 2011) and also better predict the local variability of concentrations of specific species such as black carbon and sulfate (Petikäinen et al., 2012).General circulation models (GCMs) and global chemical transport models (GCTMs) are usually downscaled to regional meteorological models such as the Weather Research and Forecasting (WRF) model (Caldwell et al., 2009;Gao et al., 2012), re-Published by Copernicus Publications on behalf of the European Geosciences Union.
gional climate models such as REMO-HAM (Petikäinen et al., 2012), the regional modeling system known as Providing Regional Climates for Impacts Studies (PRECIS) (Jones et al., 2004;Fan et al., 2014), and a number of European models described in Jacob et al. (2007), as well as regional CTMs such as the Community Multiscale Air Quality Model (CMAQ) (Penrod et al., 2014;Xing et al., 2015).These regional models are used for climate/meteorology or air quality simulations.Some are applied for more than 10 years (Caldwell et al., 2009;Warrach-Sagi et al., 2013;Xing et al., 2015).However, these regional models either lack the detailed treatment of chemistry (e.g., in WRF), or use prescribed chemical concentrations (e.g., REMO-HAM uses monthly mean oxidant fields for several chemical species), or do not have online-coupled meteorology and chemistry (e.g., in CMAQ).In addition, the past regional model simulations and analyses have mainly focused on meteorological parameters such as surface temperature and precipitation, cloud variables such as net radiative cloud forcing, and chemical constituents such as ozone.Regional climate model simulations tend to focus on significant climatic events such as extreme temperatures (very cold or very hot) (Dasari et al., 2014), heat waves, heavy precipitation, drought, and storms (Beniston et al., 2007), rather than the important air quality and climate interactions.In addition, the impacts of complex chemistry-aerosol-cloud-radiation-climate feedbacks on future climate change remain uncertain, and these feedbacks are most accurately represented using online-coupled meteorology and chemistry models (Zhang, 2010;IPCC, 2013).An online-coupled meteorology and chemistry model, however, is more computationally expensive compared to an offline-coupled model (Grell et al., 2004), and thus requires significant computing resources for their long-term (a decade or longer) applications.With rapid increases in the availability of high-performance computing resources on the petaflop scale, however, long-term simulations using online-coupled models have become possible in recent years.For example, recently, the WRF model has been coupled online to the CMAQ model with the inclusion of aerosol indirect effects to study chemistry and climate interactions (Yu et al., 2014).
The online-coupled WRF model with Chemistry (WRF/Chem) (Grell et al., 2005) has been updated with a suite of physical parameterizations from the Community Atmosphere Model version 5 (CAM5) (Neale et al., 2010) so that the physics in the global CAM5 model is consistent with the regional model for downscaling purposes (Ma et al., 2014).There are also limited applications of dynamical downscaling (Gao et al., 2013) under the new Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report's Representative Concentration Pathway (RCP) scenarios (van Vuuren et al., 2011).Gao et al. (2013) applied dynamic downscaling to link the CAM-Chem globalclimate-chemistry model with WRF and CMAQ using RCP 8.5 and RCP 4.5 emissions to study the impacts of climate change and emissions on ozone (O 3 ).Molders et al. (2014) downscaled the Community Earth System Model (CESM) (Hurrell et al., 2013) to drive the online-coupled WRF/Chem model over southeastern Alaska using RCP 4.5 emissions; however, their study did not address the feedback processes between chemistry and meteorology.This study evaluates the online-coupled regional WRF/Chem model, which takes into account gas-and aerosol-phase chemistry, as well as aerosol direct and indirect effects.WRF/Chem is used to simulate the "current" climate scenario for 10 years from 2001 to 2010 using the RCP 8.5 emissions and boundary conditions from an updated version of CESM with advanced chemistry and aerosol treatments over the continental US (CONUS) (He and Zhang, 2014;Glotfelty et al., 2016), with a focus on air-quality and climate interactions.Both CESM and WRF/Chem include similar gas-phase chemistry and aerosol treatments.To our best knowledge, this study is the first to report the WRF/Chem simulation, evaluation, and analyses over a period of 10 years (i.e., 2001-2010) to assess whether the model is able to accurately simulate decadal long air quality and climatology by taking into account feedback processes between chemistry and meteorology.This study also assesses whether the RCP 8.5 emissions for the 10-year period are robust enough to produce satisfactory performance against observations with WRF/Chem.

Model configurations and simulation design
The model used is the modified WRF/Chem v3.6.1 with updates similar to those implemented in WRF/Chem v3.4.1 as documented in Wang et al. (2015a).The main updates include the implementation of an extended version of the Carbon Bond 2005 (CB05) (Yarwood et al., 2005) gas-phase mechanism with the chlorine chemistry (Sarwar et al., 2007) and its coupling with the Modal Aerosol Dynamics for Europe/Volatility Basis Set (MADE/VBS) (Ahmadov et al., 2012).MADE/VBS incorporates a modal aerosol size distribution, and includes an advanced secondary organic aerosol (SOA) treatment based on gas-particle partitioning and gasphase oxidation in volatility bins.The CB05-MADE/VBS option has also been coupled to existing model treatments of various feedback processes such as the aerosol semidirect effect on photolysis rates of major gases and the aerosol indirect effect on cloud droplet number concentration (CDNC) and resulting impacts on shortwave radiation.The main physics and chemistry options used in this study as well as their corresponding references can be found in Table 1.The simulations are performed at a horizontal resolution of 36 km with 148 × 112 horizontal grid cells over the CONUS domain and parts of Canada and Mexico, and a vertical resolution of 34 layers from the surface to 100 hPa.Considering the decadal applications of WRF/Chem in this work, which is much longer than many past WRF/Chem ap- plications, the simulations are reinitialized monthly (rather than 1-4 days used in most past WRF/Chem applications to short-term episodes that are on an order of months up to 1 year, e.g., Zhang et al., 2012a, b;Yahya et al., 2015a, b) to constrain meteorological fields toward National Centers for Environmental Prediction (NCEP) reanalysis data while allowing chemistry-meteorology feedbacks within the system.As discussed in Sects.3.1 and 3.3, the reinitialization frequency of 1 month may be too large to constrain some of the meteorological fields such as moisture that in turn affect other parameters, and a more frequent reinitialization may be needed to improve the model performance.The impact of the frequency of the reinitialization on simulated meteorological and cloud parameters will be further discussed in Sects.3.1 and 3.2.A list of acronyms used in this paper can be found in Table S1 in the Supplement.

Processing of emissions and initial conditions (ICs)/boundary conditions (BCs)
Global RCP emissions are available as monthly average emissions for 2000, 2005, and every 10 years between 2010 and 2100, at a grid resolution of 0.5 • × 0.5 • (Moss et al., 2010;van Vuuren et al., 2011).The RCP emissions in 2000, 2005, and 2010 are used to cover the 10-year emissions needed for WRF/Chem simulations, i.e., the periods of 2001-2003, 2004-2006, and 2007-2010, respectively. Processing global RCP emissions in 2000, 2005, and 2010 into regional hourly emissions needed for the 10-year WRF/Chem simulations requires essentially three main tasks.These include 1) mapping the RCP species to CB05 speciation used in WRF/Chem; 2) re-gridding the RCP emissions from 0.5 × 0.5 • grid resolution to the 36 × 36 km grid resolution used for regional simulation over North America; and 3) applying species-and location-dependent temporal allocations (i.e., emissions variation over time) to the re-gridded RCP emissions.Table S2 shows the species mapping between RCP species and CB05 species.To map the RCP species to CB05 speciation, some assumptions are made due to the relatively detailed speciation required by CB05.Some of the CB05 species are directly available in RCP; however, others are lumped into RCP groups; for example, the "other alkanals" and "hexanes and higher alkanes" in the RCP groups can be considered to approximately represent the acetaldehyde and higher aldehyde emissions required by CB05, respectively (Table S2).For the CB05 species such as ethanol, methanol, internal and terminal olefin carbon bonds in the gas phase, and elemental and organic carbon in the accumulation mode of the aerosol particles, other RCP groups are used to approximate these emissions (Table S2).For the remaining CB05 species that are not available in the RCP (i.e., chlorine, HCl, HONO, NH + 4 , NO − 3 , PAR, unspeciated PM 2.5 , H 2 SO 4 , and SO 2− 4 ), their 2000 emissions are based on the 2002 National Emission Inventory (NEI) (version 3, http://www.epa.gov/ttn/chief/emch/), while their 2005 and 2010 emissions are based on the 2008 NEI-derived emissions (version 2) from the Air Quality Modelling Evaluation International Initiative (AQMEII) project as described in Pouliot et al. (2015), which include year-specific updates for on-/off-road transport, wildfires and prescribed fires, and continuous emission monitoring-equipped point sources.To re-grid the RCP emissions, the RCP rectilinear grid is first interpolated to a WRF/Chem curvilinear grid using a simple inverse distance weighting (NCAR Command Language Function -rgrid2rcm), and a subset of the RCP grid that covers the WRF/Chem CONUS domain is then extracted.To derive a temporal allocation for monthly averaged RCP emissions, hourly emission profiles are taken from those used in-house WRF/Chem simulations over CONUS during 2001 (Yahya et al., 2015c), and 2006 and 2010 as part of the AQMEII project (Yahya et al., 2015a, b).The emissions for those existing inhouse simulations were generated based on the 2002 NEI; the emissions were generated with the Sparse Matrix Operator Kernel Emissions (SMOKE) model version 2.3.The emissions for the existing in-house 2006 and 2010 simulations were generated based on the pre-merged emissions provided by the US EPA, which were derived from the 2008 NEI with year-specific section emissions for 2006 and 2010 as part of the AQMEII.SMOKE version 3.4 was used to prepare the spatially, temporally, and chemically speciated "model-ready" emissions for the existing in-house 2006 and 2010 WRF/Chem simulations.Since NEI is updated and released every 3 years, the temporal profiles of emissions used in SMOKE for 2002SMOKE for , 2006SMOKE for , and 2010 are assumed to be valid for 3-4 years around the NEI years, i.e., 2001-2003, 2004-2006, and 2007-2010, respectively.The temporal allocations applied to the RCP emissions are therefore based on the SMOKE model's profiles for each species and source location, and include non-steady-state emissions rates (i.e., seasonal, weekday or weekend, and diurnal variability) that are valid for the entire simulation periods of 2001-2010.Specifically, the hourly re-gridded RCP emission rates for each species E, or E RCP hr , are calculated by where E RCP mon , E WRF mon , and E WRF hr represent the original monthly averaged RCP emissions rates, the monthly averaged WRF/Chem emissions rates, and the hourly WRF/Chem emission rates, respectively, which are valid at each model time t, layer z, and lat and lon grid points.The RCP elevated source emissions for sulfur dioxide (SO 2 ), sulfate (SO 2− 4 ), elemental carbon (EC), and organic carbon (OC) were also incorporated into the model-ready emissions for WRF/Chem using steps 1-3 and Eq.(1) above.Lastly, RCP aircraft source emissions for EC, nitric oxide (NO), and nitrogen dioxide (NO 2 ) are directly injected into the closest model layers.No temporal allocations are applied to the RCP aircraft source emissions.
Biogenic emissions are calculated online using the Model of Emissions of Gases and Aerosols from Nature version 2 (MEGAN2) (Guenther et al., 2006).Emissions from dust are based on the online Atmospheric and Environmental Research Inc. and Air Force Weather Agency (AER/AFWA) scheme (Jones and Creighton, 2011).Emissions from sea salt are generated based on the scheme of Gong et al. (1997).
The chemical and meteorological ICs/BCs come from the modified CESM/CAM5 version 1.2.2 with updates by He and Zhang (2014) and Glotfelty et al. (2016) developed at North Carolina State University (CESM_NCSU).WRF/Chem and CESM both use the CB05 gas-phase mechanism (Yarwood et al., 2005); however, WRF/Chem includes additional chlorine chemistry from Sarwar et al. (2007), whereas CESM_NCSU uses a modified version of CB05, the CB05 Global Extension (CB05GE) by Karamchandani et al. (2012).In addition to original reactions in CB05 and chlorine chemistry of Sarwar et al. (2007), CB05GE includes chemistry on the lower stratosphere, reactions involving mercury species, and additional heterogeneous reactions on aerosol particles, cloud droplets, and on polar stratospheric clouds (PSCs).Both WRF/Chem and CESM_NCSU use a modal aerosol size representation rather than a sectional size representation.While WRF/Chem includes MADE/VBS with three prognostic modes (Ahmadov et al., 2012), CESM_NCSU includes the Modal Aerosol Model with seven prognostic modes (Liu et al., 2012) that is used in CESM_NCSU.In addition to similar gas-phase chemistry and aerosol treatments, CESM_NCSU and WRF/Chem use the same shortwave and longwave radiation schemes (i.e., the Rapid and accurate Radiative Transfer Model for GCM (RRTMG)), though they use different cloud microphysics parameterizations, PBL, and convection schemes.As GCMs generally contain systematic biases that can influence the downscaled simulation, the meteorological ICs/BCs predicted by CESM_NCSU are bias-corrected before they are used by WRF/Chem using the simple bias-correction technique based on Xu and Yang (2012).Temperature, water vapor, geopotential height, wind, and soil moisture variables available every 6 h from the NCEP Final Reanalyses (NCEP FNL) data set are used to correct the ICs and BCs derived based on results from CESM_NCSU for WRF/Chem simulations.In this bias-correction approach, monthly climatological averages for ICs and BCs are first derived from both NCEP and CESM_NCSU cases.The differences between the ICs and BCs from the NCEP and CESM_NCSU climatological averages are then added onto the CESM_NCSU ICs and BCs to generate bias-corrected CESM_NCSU ICs/BCs.Assuming that the causes of the biases remain the same in future, this bias-correction technique can also be applied to future year simulations for which NCEP FNL data are not available.

Model evaluation protocol
The focus of the model evaluation is mainly to assess whether the model is able to adequately reproduce the spatial and temporal distributions of key meteorological and chemical variables as compared to observations on a climatological timescale.A scientific question to be addressed in this work is whether WRF/Chem is sufficiently good for regional climate and air-quality simulations on a decadal scale.A climatological month refers to the average of the month for all 10 years.For example, January refers to the average for all the months of January from 2001 to 2010.Statistical evaluations such as mean bias (MB), Pearson's correlation coefficient (R), normalized mean bias (NMB), normalized mean error (NME) (the definition of those measures can be found in Yu et al., 2006, andZhang et al., 2006), and index of agreement (IOA) ranging from 0 to 1 (Willmott et al., 1981) for major chemical and meteorological variables are included.IOA can be calculated as where O i and S i denote time-dependent observations and predictions at time and location i, respectively, N is the number of samples (by time and/or location), O denotes mean observation, and S denotes mean predictions over all times and locations; they can be calculated as IOA values range from 0 to 1, with a value of 1 indicating a perfect agreement.
For surface networks with hourly data, e.g., National Climatic Data Center (NCDC), the observational data are paired up with the simulated data on an hourly basis for each site.The observational data and simulated data are averaged out for each site.The statistics are then calculated based on the site-specific data pairs.The satellite-derived data are usually available on a monthly basis, and the simulated data are also averaged out on a monthly basis.The satellite-derived data are regridded to the same domain and number of grid cells similarly to the simulated data.The time dimension is removed for the climatological evaluation; the statistics are based on a site-specific average or a grid cell average.The statistics are then calculated based on the paired satellitederived vs. simulated grid cell values.The spatial and temporal analyses include spatial plots of MB over CONUS, spatial overlay plots of averaged simulated and observational data, monthly climatologically averaged time series of major meteorological and chemical variables, annual average time series, probability distribution functions of major meteorological and chemical variables, and spatial plots of major aerosol and cloud variables compared with satellite data.A summary of the observational data from surface networks and satellite retrievals can be found in Table S3.The variables that are analyzed in this study include O 3 , particulate matter with diameters less than and equal to 2.5 and 10 µm (PM 2.5 and PM 10 , respectively), and PM 2.5 species including sulfate (SO 2− 4 ), ammonium (NH + 4 ), nitrate (NO − 3 ), EC, OC, total carbon (TC = EC + OC), temperature at 2 m (T2), relative humidity at 2 m (RH2), wind speed at 10 m (WS10), wind direction at 10 m (WD10), precipitation, aerosol optical depth (AOD), cloud fraction (CLDFRA), cloud water path (CWP), cloud optical thickness (COT), CDNC, cloud condensation nuclei (CCN), downward shortwave radiation (SWDOWN), net shortwave radiation (GSW), downward longwave radiation (GLW), outgoing longwave radiation at the top of atmosphere (OLR), and shortwave and longwave  cloud forcing (SWCF and LWCF).While uncertainties exist in all the observational data used, systematic uncertainty analysis/quantification is beyond the scope of this work.In this work, all observational data are considered to be the true values in calculating the performance statistics.The information on the accuracy of most data used in the model evaluation has been provided in Table 2 of Zhang et al. (2012a).
Uncertainties associated with some of the observational data are discussed in Sect.3.

Meteorological predictions
Table 2 summarizes the statistics for T2, RH2, WS10, WD10, and precipitation.The model performs very well for a 10year average T2 with a slight underprediction (an MB of  2012) also reported that the precipitation predicted by WRF is too high compared to the North American Regional Reanalyses (NARR) data throughout the whole CONUS domain over a period of 1988-2007.Nudging and reinitialization have been the most commonly used methods to control such errors.Three sensitivity simulations are conducted for a summer month (July 2005) to pinpoint likely causes of the precipitation biases.The baseline simulation (Base) uses a monthly reinitialization frequency, CESM_NCSU ICs/BCs, and the Grell 3-D cumulus parameterization.The sensitivity simulations include (1) Sen1, which is similar to the Base case except with a 5-day reinitialization period; (2) Sen2, which is similar to Base except for using NCEP for the meteorological ICs/BCs; and (3) Sen3, which is similar to Base except for using WRF/Chem v3.7 with the multi-scale Kain-Fritsch (MSKF) cumulus parameterization instead of Grell 3-D.The differences in configuration setup in those sensitivity simulations are given in Table S4.The evaluation and comparison of the baseline and sensitivity results in July 2005 are summarized in Tables S5 and S6, and Fig. S1 in the Supplement.As shown in Tables S5-S6 and Fig. S1, the precipitation bias can be attributed to several factors including the use of a Grell 3-D cumulus parameterization scheme, the use of bias-corrected CESM_NCSU data (instead of NCEP reanalysis data), and the use of an reinitialization frequency of 1 month, among which the first factor dominates the biases in precipitation predictions.The simulated precipitation is very sensitivity to different cumulus parameterizations.Compared to scale-aware parameterizations such as the multi-scale Kain-Fritsch (MSKF) cumulus scheme, the Grell 3-D parameterization has a tendency to overpredict precipitation, particularly over the ocean.
Figure 1 shows the spatial distributions of MB for 10year average predictions of T2, RH2, WS10, and precipitation.Figure 2 shows the time series of 10-year average monthly and annual average T2, WS10, RH2, precipitation, O 3 , and PM 2.5 against observational data and IOA statistics.T2 (Fig. 1a) tends to be underpredicted over the eastern and western US and overpredicted over the central US.The bias correction method itself may also contribute to the slight biases in T2.A single temporally averaged (2001-2010) NCEP reanalysis file is applied to the 6-hourly BCs for each individual year, which would in some cases contribute to the biases in the climatological 10-year evaluation.T2 also tends to be overpredicted during the cooler months but underpredicted during the warmer months (Fig. 2a).While the bar charts in Fig. 2 show domain-average mean observed and mean simulated T2, IOA performance takes into account the proportion of differences between mean observed and mean simulated values at different sites.
The model performance in terms of IOA for T2 is slightly worse during the warmer months as compared to the cooler months; however, IOA values for all months are ≥ 0.9.The poorer IOA statistics for the warmer months are possibly influenced to a certain extent by the fact that the IOA tends to be more sensitive towards extreme values (when temperatures are maximum) due to the squared differences used in calculating IOA (Legates and McCabe, 1999).As shown in Figs.1b and 2b, the spatial distributions of MBs for RH2 follow closely the spatial distributions of MBs for T2, where T2 is underpredicted, RH2 is overpredicted, and vice versa.Unlike T2, the IOA for RH2 is highest during the warmer months and lowest during the winter months, but IOA for RH2 is generally high (> 0.7) for all months.WS10 is also generally overpredicted along the coast, over the eastern US, and some portions over the western US (Fig. 1c), consistent with overpredictions of T2 over the coast, and partially due to unresolved topographical features.In this case the topographic correction for surface winds used to represent extra drag from sub-grid topography (Jimenez and Dudhia, 2012) is used as an option in the 10-year WRF/Chem simulations; however, WS10 is still overpredicted, except for the areas of flat undulating land in the central US.Jimenez and Dudhia (2012) also suggested that the grid points nearest to the observational data might not be the most appropriate or most representative, and that the selection of nearby grid points can help to reduce errors in surface wind speed estimations.In this study, as the evaluation is conducted over the whole CONUS, the nearest grid points are used for evaluation, which could also result in errors in wind speed evaluation.The positive T2 and WS10 bias along the coast could be due to the fact that the model grids for temperatures and wind speeds are located over the ocean; however, the observation points are located slightly inland.As shown in Fig. 2, WS10 performs well on average for the months of April, May, and June, and is overpredicted for the other months.Nonetheless the climatological NMB for WS10 overall is low at 7.7 % (Table 2).WS10 has higher IOA values during the spring months and the lowest IOA during the summer months and in November.The model performs relatively well in predicting WD10 variability with a correlation coefficient (Corr) of 0.6, indicating overall a more southerly direction domain-wide predicted by the model compared to observations.Precipitation is overpredicted for all months except for June, especially during the summer months of July to August.Even with the inclusion of radiative feedback effects from the subgrid-scale clouds in the radiation calculations, precipitation is still overpredicted with the Grell 3-D scheme, which is consistent with the results shown by Alapaty et al. (2012).Precipitation mainly has lower IOAs during the summer compared to other months, except in June, which actually exhibits the largest IOA of all months.Even though June is considered a summer month, it does not show overprediction in precipitation compared to the other summer months.It is possible that in June, the overall atmospheric moisture content is low.This is consistent with simulated RH2 as June is the only month where RH2 is underpredicted compared to observations.
In general the model is able to reproduce the monthly trends in meteorological variables; for example, the predicted trend in T2 closely follows the observed trends by NCDC.The observed RH2 decreases from January to a minimum in April, and then increases from April to December.Although the model predicts a similar pattern in RH2, there is a lag in the RH2 minimum occurring 2 months later in June (Fig. 2b).For WS10, the observation peaks in April, as compared to the simulated peak in March.The model correctly predicts the observed WS10 minimum occurring in August.The model trend in precipitation is similar to observations, except during the summer months of July through September, where a large overprediction leads to a sharp increase in July, followed by a gradual decrease through December.ous years.WRF has worse performance especially at weaker wind speeds, as is the case from 2001 to 2007.Model performance for precipitation is more variable year-to-year, with IOAs ranging from 0.4 to 0.7; however, there is a systematic positive bias during the 10-year period.served and simulated values of T2.For T2, the simulated and observed PDFs are very similar (Fig. 3a), consistent with the statistics for T2, which shows only a small cold bias.The model overpredicts T2 at sites where temperatures are very low.The PDF for simulated RH2 is also shifted to the right of the observed RH2 (Fig. 3b), with an observed and modeled peak of 74 and 78 %, respectively.The PDF of the bulk of the simulated WS10 is narrower (between 2 and 6 m s −1 ) compared to that of observed WS10 (between 1 and 7 m s −1 ).The model thus overpredicts when near-surface wind speeds are low but underpredicts when wind speeds are very high.This suggests that the surface drag parameterization is still insufficient to help predict low wind speeds; however, it might have contributed to the reduction in the simulated moderately high wind speeds (Mass, 2012) (in this case, between 4 and 6 m s −1 ).There are also instances where the model predicts extremely high wind speeds (> 8 m s −1 ), which are also not found in the observed data.The PDF for simulated precipitation against NADP also shows a shift to the right (which extends beyond 60 mm), consistent with the statistics for overpredicted precipitation and also with the PDF of RH2.   Figure 5 shows the PDFs of maximum 1 and 8 h O 3 mixing ratios against CASTNET and AIRS-AQS.The PDFs of the observed and simulated O 3 mixing ratios are very similar.The model is able to simulate the range and probabilities of O 3 mixing ratios relatively well at both CASTNET and AIRS-AQS sites.At the CASTNET sites as shown in Fig. 5a and b, the model accurately predicts the peak maximum 1 h O 3 mixing ratio centered at ∼ 45 to 50 ppb and the peak maximum 8 h O 3 mixing ratio at ∼ 42.5 ppb.At the AIRS-AQS sites as shown in Fig. 5c and d, the predicted PDF is slightly shifted to the right of the observations for both maximum 1 and 8 h O 3 mixing ratios.It is also interesting to note that the PDFs for CASTNET and AIRS-AQS are quite different.CASTNET has a more uniform and normal distribution compared to AIRS-AQS.The distribution for CASTNET data is also shifted towards lower O 3 mixing ratios.The differences are attributed to the nature of the sites' locations, where the AIRS-AQS network includes a mixture of urban, suburban, and rural sites, leading to a less-uniform normal distribution of O 3 mixing ratios centered at relatively higher O 3 mixing ratios, while the CASTNET network includes mostly rural sites that exhibit a low maximum 1 and 8 h O 3 mixing ratios, thus leading to a more uniform normal distribution that is heavier towards the lower O 3 mixing ratios.
Figure 6 shows the diurnal variation of O 3 concentrations and IOA statistics for the four climatological seasons against CASTNET (panel a to d) and AIRS-AQS (panel e to h): winter -January, February, and December (JFD); spring -March, April, and May (MAM); summer -June, July, and August (JJA); fall -September, October, and November (SON).Figure 6a shows that in more rural sites (CASTNET) in winter O 3 tends to be underpredicted during the morning (01:00-09:00 LST -local standard time) and evening hours (18:00-24:00 LST).However, Fig. 6b shows that in general for all AIRS-AQS sites including urban sites, O 3 is systematically overpredicted for all hours of the day.The diurnal trends for CASTNET and AIRS-AQS are completely opposite for winter.As CASTNET sites are located in areas where urban influences are minimal, most of these sites are likely to be NO x -limited sites (Campbell et al., 2014).Underpredicted NO x emissions in rural areas can lead to underpredictions in O 3 concentrations in NO x -limited areas.As shown in Fig. 2a, T2 is generally overpredicted during the winter months, which explains the overpredictions in O 3 for most sites against AIRS-AQS.As shown in Fig. 6a, b and c, for CASTNET, the diurnal variations of O 3 in MAM and JJA are similar to that in JFD.As shown in Fig. 6d, slight overpredictions during the daylight hours of 10:00 to 17:00 LST occur in SON at the CASTNET sites; however, the trends are similar for morning and evening hours as compared to the other seasons.Similar to SON at the CASTNET sites, for AIRS-AQS sites, overpredictions during daylight hours occur in JJA and SON (Fig. 6g and h), and also to a much lesser extent in MAM (Fig. 6f).This is probably due to the overpredictions of T2, which are smallest during MAM compared to other months, as shown in Fig. 2a. Figure 7 compares the spatial distributions of 10-year averages of the predicted and observed hourly O 3 mixing ratios.The O 3 mixing ratios tend to be underpredicted in the eastern and northeastern US, where most of the CASTNET sites are located (Fig. 7a).This is consistent with the diurnal trends from Figures 6a to d, which also show underpredictions for CASTNET sites.From Fig. 1a, T2 is underpredicted on average over the northeastern US, which results in underpredictions in biogenic emissions in the rural areas from MEGAN2.This would in turn reduce O 3 mixing ratios in VOC-limited areas.O 3 photochemical reactivities would also be reduced due to reduced T2.O 3 mixing ratios are, however, overpredicted over the northwestern US, and also near the coastline of the western US.The overprediction of O 3 mixing ratios in the northwestern US can be attributed to an overprediction in the chemical BCs from CESM, as indicated by the high O 3 mixing ratios near the northwestern region of the domain boundary.

Particulate matter
The 10-year average PM 2.5 concentrations are overpredicted with an NMB of 23.3 % against IMPROVE, and underpredicted with an NMB of −10.8 % against the Speciated Trends Network (STN) (Table 2).In addition, the IOA trend in Fig. 4c shows very good performance for PM 2.5 against the Interagency Monitoring of Protected Visual Environments (IMPROVE), with IOA values > 0.8.IOA values for PM 2.5 against STN are high (∼ 0.6-0.8)during the spring and summer months, but lower (∼ 0.4) during the winter months (Fig. 4d).The IMPROVE surface network generally covers rural areas and national parks, while the STN surface network covers urban sites.The horizontal resolution of 36 × 36 km 2 used in this study may be too coarse to resolve the locally high PM  sons, PM 2.5 concentrations over the US in general tend to be higher due to an extensive use of wood-stove and cold temperature inversions, which trap particulates near the ground (EPA, 2011).As shown in Table 2, the concentrations of PM 2.5 species such as SO 2− 4 , OC, and TC are overpredicted at the IMPROVE sites, while the concentrations of the other main PM 2.5 species NO − 3 , NH + 4 , and EC are underpredicted at both IMPROVE and STN sites.TC concentrations, which are the sum of OC and EC, are overpredicted due to larger overpredictions of OC compared to the underpredictions of EC.The model also simulates both primary organic aerosol (POA) and secondary organic aerosol (SOA).OC is calculated as the sum of POA and SOA divided by the ratio of OA / OC, which is assumed to be a constant of 1.4 (Aitken et al., 2008).This calculation of OC using a constant of 1.4 is an approximation, which is subject to uncertainties when comparing simulated OC against observational data, as the ratio of OA / OC can be different in different environments (Aitken et al., 2008).
As shown in Table 2, at the STN sites, the model slightly overpredicts the concentrations of SO 2− 4 while underpredicting those of NO − 3 , NH + 4 , and EC.The overpredictions of SO 2− 4 are likely due to the uncertainties that arise from pro-cessing of the RCP SO 2 emissions.The RCP SO 2 emissions are only available as a total emission flux, and they are not vertically distributed to the important point sources such as furnaces and stacks.In this work, two steps are taken to resolve the RCP elevated SO 2 emissions in each emission layer.First, a set of factors are derived from the fraction of the elevated emissions in each layer to the vertical sum of emissions for NEI used by default in the SMOKE model with the NEI data.Second, these factors are applied to the total RCP emissions to obtain SO 2 emissions in each emission layer.The total RCP SO 2 emissions were higher than the total NEI emissions, resulting in higher surface and elevated SO 2 emissions.Figure 4g and h compare the modeled annual average time series for PM 2.5 against IMPROVE and STN observations, respectively.In general, the model performs well for PM 2.5 at the IMPROVE (IOA > 0.8) and STN (IOA ∼ 0.5-0.7)sites.A declining trend in PM 2.5 observed and simulated concentrations is also observed over the years.
For the later years (2007 to 2010), the model performs significantly better against IMPROVE compared to STN.As 2010 NEI emissions are used for the years 2007 to 2010, there are not many variations in the simulated PM 2.5 concentrations over these 4 years.Figures 7 and 8 show the spatial plots of 10-year averages of simulated 24 h average, PM 10 , PM 2.5, , and PM 2.5 species concentrations, overlaid with observations from both STN and IMPROVE.The underpredictions of PM 10 are dominated by an underprediction in the wind-blown dust emissions, especially in the western US (Fig. 7b).This is confirmed in Table 2, which shows an MB of −11.5 µg m −3 and an NMB of −51.2 % against PM 10 observations at AIRS-AQS sites.The observational data indicate the elevated concentrations of dust over portions of Arizona and California (> 50 µg m −3 ), which are not reproduced by the simulations (the simulated concentrations are much lower: < 20 µg m −3 ).The AER/AFWA dust module (Table 1) does not produce sufficient dust in this case, even though WS10 is overpredicted and is proportional to the dust emissions.The seasalt emission module by Gong et al. (1997), however, seems to produce a reasonable amount of sea salt, as shown by the similar concentrations between simulated and observational data for PM 10 near the coastlines.In addition, the MADE/VBS module in WRF/Chem does not explicitly simulate the formation/volatilization of coarse inorganic species.The coarse inorganic species are available, however, in the emissions, and are transported and deposited in a manner that is similar to non-reactive tracers.
The model performs well for PM 2.5 over the eastern US (Fig. 7c), where modeled concentrations are close to the observations; however, over the western US there are underpredictions in PM 2.5 , especially in central to southern California.Even though Table 2 shows in general an overprediction of SO 2− 4 against STN sites, the model underpredicts SO 2− 4 in regions of elevated SO 2− 4 concentrations, in particular, where concentrations are above 10 µg m −3 in the vicinity of significant point sources of SO 2 and SO 2− 4 over the eastern US (Fig. 7d).This is likely due to the coarse resolution (0.5 • × 0.5 • ) of RCP emissions, which probably results in a general overprediction of SO 2 emissions over a grid but which cannot resolve point sources smaller than the grid resolution.A similar pattern is found for NH + 4 over the eastern US due to underpredictions of high concentrations of SO 2− 4 (Fig. 8a).There are also large underpredictions in NH + 4 over the western US.The underpredictions in NH + 4 are likely due to underpredictions of NH 3 emissions from the RCP.The NH 3 emissions from the RCP are much lower than those of NEI emissions over the western US, by more than a factor of 5, especially over portions of California.Large underpredictions occur over both the eastern and western US for NO − 3 , EC, and TC (Fig. 8b, c, and d).The underpredictions in NO − 3 are more likely influenced by the underpre-dictions of NH + 4 rather than NO x emissions.NO x emissions for NEI are higher than those of the RCP for a number of point sources; however, in general, the RCP has higher NO x emissions.Other possible reasons for the underpredictions of NO − 3 concentrations include both prediction and measurement errors associated with SO 2− 4 and TNH 4 that can greatly affect the performance of NO − 3 , inaccuracies in the assumptions used in the thermodynamic model (e.g., the assumption that inorganic ions are internally mixed and the equilibrium assumption might not be representative, especially for particles with larger diameters), as well as inaccuracies in T2 and RH predictions (Yu et al., 2005).The statistics for IM-PROVE TC indicate overpredictions; however, the statistics for STN TC indicate larger underpredictions with an MB of −2.0 µg m −3 , which would explain the large underpredictions in PM 2.5 concentrations over the western US.The large underpredictions are in part impacted by uncertainties in emissions as well as due to uncertainties in the precursor gas emissions for these species, especially for TC.The RCP emissions of EC and POA are lower when compared to those of NEI.NEI emissions have a higher spatial resolution, and thus more adequately represent the emissions from point sources compared to RCP.The underpredictions of TC are also more likely due to underpredictions in EC as compared to OC, as shown in underpredictions of EC by Fig. 8c.As T2 is slightly underpredicted, these could have resulted in underpredictions in isoprene and terpene, which are major gas precursors of biogenic SOA, resulting in lower SOA and OC concentrations.In addition, the emissions of anthropogenic VOC species from the RCP, which are also of a lower spatial resolution compared to their emissions in the NEI, tend to also be lower than NEI levels, especially at point sources.The underpredictions for these particulate species, especially for water-soluble species including NH + 4 and NO − 3 , are also likely impacted by overpredictions in precipitation (Fig. 2d), which leads to an overprediction in their wet deposition rates and thus a reduction in their ambient concentrations.The overpredictions in WS10 also help contribute to the deposition of PM 2.5 and PM 2.5 species onto the ground (Sievering et al., 1987).

Aerosol, cloud, and radiation predictions
There are uncertainties in the satellite retrievals of various aerosol-cloud-radiation variables from the Clouds and the Earth's Radiant Energy System (CERES) and the Moderate Resolution Imaging Spectroradiometer (MODIS).Loeb et al. (2009) reported that the major uncertainties in the top of atmosphere radiative fluxes from CERES are derived from instrument calibration (with a net error of 4.2 W m −2 ) and the assumed value of 1 W m −2 for total solar irradiance.However, there is good correlation (R> 0.8) between the model and CERES for the radiation variables SWDOWN, GSW, and GLW, which are all measured at the surface (Table 2).Modeled OLR at the top of the atmosphere also has relatively good correlation (R ∼ 0.6).SWDOWN and GLW are both slightly overpredicted due to influences from biases in PM concentrations and clouds, but GSW and OLR are slightly underpredicted.
The overpredictions of the surface radiation variables are also impacted by the underpredictions in AOD and COT.AOD is underpredicted with an NMB of −24.0 %, and COT is underpredicted with an NMB of −44.3 %.These underpredictions indicate that less radiation is attenuated (i.e., absorbed or scattered) or reflected while traversing through the atmospheric column and clouds, thus allowing more radiation to reach the ground.Using the CESM model, He et al. (2015) also showed underpredictions in AOD and COT over CONUS against MODIS satellite retrievals.Figure 9 compares the spatial distributions of the 10-year average predictions of AOD (a and b) against the satellite retrieval data from MODIS.The simulated AODs show relatively large values over the eastern US, due to the relatively higher PM concentrations in this region of the US.The MODIS AOD, however, shows slightly elevated values over the eastern US, but the magnitudes are not as high as the simulated AOD over the eastern US.MODIS-derived AOD is also higher over the western US compared to the eastern US, and this trend is not found in the simulated AOD.The differences between the MODIS AOD and the simulated AOD are likely due to the differences in the algorithms used to retrieve AOD based on MODIS measurements and calculate AOD in WRF/Chem.For MODIS, AOD is calculated by matching the spectral reflectance observations with a lookup table based on a set of aerosol parameters including the aerosol size distributions from a variety of aerosol models, which differ based on seasons and locations (Levy et al., 2007).There are also different algorithms for dark land, bright land, and over oceans (Levy et al., 2013).The MODIS data are aggregated into a global 1 • gridded (Level-3) data set with monthly (MOD08_M3) temporal resolution (https://www.earthsystemcog.org/site_media/projects/obs4mips/TechNote_MODIS_L3_C5_Aerosols.pdf).The inaccuracies for the calculation of AOD in WRF/Chem include biases in aerosol size distribution, aerosol composition, aerosol water content, and reflectances.They can also arise from parameterizations in the calculations including the assumption of an internally mixed aerosol composition.Therefore, caution should also be taken when comparing simulated AOD with the satellite-derived AOD products.Toth et al. (2013) compared Aqua MODIS AOD products over the mid-to high-latitude Southern Ocean where a band of enhanced AOD is observed, to cloud and aerosol products produced by the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) project, and AOD data from the Aerosol Robotic Network (AERONET) and the Maritime Aerosol Network (MAN).They concluded that the band of enhanced AOD is not detected in the CALIOP, AERONET, or MAN products.The enhanced AOD band is attributed to stratocumulus and low broken cumulus cloud contamination, as well as the misidentification of relatively warm cloud tops compared with surrounding open seas.
Figure 9 also shows spatial distributions of the 10-year average predictions of CDNC (c and d), CWP (e and f), and COT (g and h), compared against the satellite retrieval data from MODIS.The cloud variables CDNC, CWP, and COT tend to be underpredicted for most of the regions over the US.However, CWP is largely overpredicted over the Atlantic Ocean.This is also likely due to the build-up of moisture over the Atlantic Ocean, also influencing precipitation as mentioned previously.CDNC is overpredicted over some regions in the eastern US, but there are also relatively large areas of underpredictions over both the land and ocean.This leads to an average domain-wide underprediction for CDNC (Table 2).This is likely due to the differences in deriving CDNC in the model and in the satellite retrievals.CDNC in the model is calculated based on the activation parameterization by Abdul-Razzak and Ghan (2000) based on the aerosol size distribution, aerosol composition, and the updraft velocity.The MODIS-derived CDNC from Bennartz (2007) is calculated based on cloud effective radius and COT, which would explain the differences in spatial patterns between model and observed data.As indicated by Bennartz (2007), the errors in CDNC can be up to 260 %, especially for regions with low CF (< 0.1).The model and MODIS spatial patterns are similar for CWP and COT over land, although the model values are underpredicted.King et al. (2013) reported that the MODIS retrieval of cloud effective radius when compared to in situ observations is overestimated by 13 % on average.Combined with overestimations in COT, this leads to overestimation of liquid water path.In addition, there can also be differences in satellite-derived cloud products from different satellites.For example, Shan et al. (2011) showed that the derived CLDFRA from MODIS and another satellite, the Polarization and Directionality of Earth Reflectances (POLDER), can differ with a global average of 10 %.
Figure 10 shows similar spatial plots for modeled vs. CERES-derived SWDOWN, OLR, SWCF, and LWCF.We note that modeled SWCF is calculated based on the differences between the net cloudy-sky and net clear-sky shortwave radiation at the top of atmosphere, which in turn are dependent on cloud properties including the CLDFRA, COT, cloud asymmetry parameter, and cloud albedo.It is possible that due to the overprediction of CLDFRA, the magnitudes of the simulated SWCF are greater than those from CERES (Fig. 10c and g), even though the other cloud variables are underpredicted.LWCF is calculated based on the differences in clear-sky OLR and cloudy-sky OLR, which in turn are dependent on CLDFRA, COT, and absorbance and radiance due to atmospheric gases.The underprediction of total-sky OLR (Table 2 and Fig. 10b and f) leads to an overprediction in LWCF.SWCF is largely overpredicted over the eastern US and especially over the Atlantic Ocean (Fig. 10c  and g).LWCF is also overpredicted by the model in similar locations to SWCF, such as in the southeastern US, and over the ocean in the eastern portion of the domain (Fig. 10d  and h).This is further confirmed by the underpredictions in SWDOWN over the Atlantic Ocean and in general over the eastern portion of the domain, as increased clouds (as a consequence of overpredicted AOD, CWP, and COT) and SWCF lead to less SWDOWN reaching the ground (Fig. 10a and e), which also eventually leads to a reduction in the OLR over the eastern portion of the domain.The larger negative SWCF and positive LWCF in the model compared to CERES, however, lead to an overall good agreement with CERES for the net cloud forcing (SWCF + LWCF; not shown).The mean bias for SWCF against CERES of 7.8 W m −2 and that for LWCF against CERES of 6.9 W m −2 are comparable to the results from the CMIP5 models of −10 to 10 W m −2 over the CONUS region (Fig. 9.5 in Flato et al., 2013).The evaluation of 10-year averaged predictions of aerosol-cloud-radiation variables is similar to the results from the WRF/Chem simulations in 2006 and 2010 by Yahya et al. (2015a, b).For example, WRF/Chem generally performs well for cloud fraction, but AOD, CDNC, CWP, and COT are underpredicted in both studies, which possibly indicates consistent biases for every year contributing to climatological biases.

Summary and conclusions
Overall, the model slightly underpredicts T2 with a mean bias of ∼ −0.3 • C, which is consistent with or better than other studies based on chemical transport models and regional climate models.The underpredictions in T2 correlate with the overpredictions in RH2.WS10 biases are likely due to issues with unresolved topography or due to inaccuracies in the selection of representative grid points.There are seasonal biases in precipitation, where overpredictions tend to occur largely over the summer months; however, precipitation is overpredicted every year between 2001 and 2010, likely due mainly to uncertainties in WRF cumulus and microphysics parameterizations.In particular, the use of a different cumulus parameterization scheme, e.g., based on the MSKF available in WRF/Chem version 3.7 or newer, has been shown in the sensitivity study to significantly reduce precipitation biases.Other factors contributing to the precipitation bias include the use of bias-corrected CESM_NCSU data (instead of NCEP reanalysis data) and the use of a reinitialization frequency of 1 month.A satisfactory model performance for meteorological variables is important and necessary when simulating future years, as data evaluation is not possible.Meteorological variables such as temperature, humidity, wind speed and direction, PBL height, and radiation have a strong impact on chemical predictions, and thus are critical to the satisfactory model performance when predicting chemical variables such as O 3 and PM 2.5 .Biases in O 3 and PM 2.5 concentrations can be attributed to biases in any of the meteorological and chemical variables.The model performs generally well for radiation variables, as well as for the main chemical species such as O 3 and PM 2.5 , which indicates that the processed RCP 8.5 emissions are reasonably accurate to produce acceptable results for the concentrations of chemical species.
Modeled O 3 mixing ratios at the CASTNET sites are slightly underpredicted, but are slightly overpredicted at AIRS-AQS sites, in part due to the fact that the CAST-NET sites are classified as rural, while the AIRS-AQS sites are classified as both urban and rural.O 3 mixing ratios at the AIRS-AQS sites tend to be overpredicted during the colder fall and winter seasons, and annually, O 3 mixing ratios are overpredicted every year from 2001 to 2010.O 3 mixing ratios at the CASTNET sites are underpredicted for all climatological months, while the largest underpredictions are observed from January to May.However, on a decadal timescale, WRF/Chem adequately represents the different O 3 PDFs at the AIRS-AQS and CASTNET sites.This study also showed that peak O 3 mixing ratios are observed over April and May rather than June to August, which is consistent with Cooper et al. (2014), who attributed this to emission reductions and opposite trends in O 3 mixing ratios over the eastern and western US over the last 20 years.Modeled PM 2.5 concentrations tend to be overpredicted at the IMPROVE sites but underpredicted at the STN sites.PM 2.5 at the IMPROVE sites tends to be underpredicted in spring and summer but overpredicted in fall and winter, while PM 2.5 concentrations against STN are persistently underpredicted for all climatological months.The IMPROVE and STN sites are classified as rural and urban, respectively.Due to the relatively coarse horizontal resolution of the model (36 × 36 km), the model is unable to capture the locally higher PM 2.5 concentrations at the STN sites.In general, however, the model performs relatively well for total PM 2.5 concentrations at the IMPROVE and STN sites, with NMBs of within ±25 %, although larger biases exist for PM 2.5 species.Model performance for PM 10 should be improved, as PM 10 also has important impacts on climate by influencing the radiative budget both directly and indirectly due to its larger size and higher concentrations.The choices of observational networks for model evaluation are therefore important as both networks can show positive and negative biases depending on the type and location of the sites (e.g., O 3 against AIRS-AQS and CASTNET, and PM 2.5 against STN and IMPROVE).The major uncertainties lie in the predictions of cloud-aerosol variables.As demonstrated in this study, large biases and error in simulating cloud variables exist even in the most advanced models such as WRF/Chem, indicating a need for future improvement in relevant model treatments such as cloud dynamics and thermodynamics, as well as aerosol-cloud interactions.In addition, there are large uncertainties in satellite retrievals of cloud variables for evaluation.In this study, most of the cloudaerosol variables including AOD, COT, CWP, and CDNC are on average underpredicted across the domain; however, the overpredictions of cloud variables including COT and CWP over the Atlantic Ocean and the eastern US lead to underpredictions in radiation and overpredictions in cloud forcing, which are important parameters when simulating future climate change.
In summary, the model is able to predict O 3 mixing ratios and PM 2.5 concentrations relatively well with regards to decadal-scale air quality and climate applications.The model is able to predict meteorological variables satisfactorily and with results comparable to RCM and GCM applications from the literature.Possible reasons behind the chemical and meteorological biases identified through this work should be taken into account when simulating longer climatological periods and/or future years.Aerosol-cloudradiation variables are important for climate simulations; the performances of these variables are not as good as that of the chemical and meteorological variables.They contain consistent biases in single-year evaluations of WRF/Chem.However, magnitudes of biases for SWCF and LWCF are comparable to those from the literature, which suggests that model improvements should be made in terms of bias correction of downscaled ICs/BCs as well as aerosol-cloud-radiation parameterizations in the model.In addition, having consistent physical and chemical mechanisms between the GCM and RCMs could help to reduce uncertainties in the results (Ma et al., 2014).Although CESM and WRF/Chem use similar chemistry and aerosol treatments in this work, they use somewhat different physics schemes that may contribute to such uncertainties.The development of scale-aware parameterizations that can be applied at both global and regional scales would help reduce uncertainties associated with the use of different schemes for global simulations and downscaled regional simulations.
preprocessor tool mozbc provided by the Atmospheric Chemistry Observations and Modeling Lab (ACOM) of NCAR and the script to generate initial and boundary conditions for WRF based on CESM results provided by Ruby Leung, PNNL.For WRF/Chem simulations, we would like to acknowledge high-performance computing support from Yellowstone (ark:/85065/d7wd3xhc) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation.
Edited by: V. Grewe

Figure 1 .
Figure 1.Spatial distribution of MBs for (a) 2 m temperature (T2), (b) 2 m relative humidity (RH2), (c) 10 m wind speed (WS10) from NCDC, and (d) weekly precipitation from NADP.Each marker represents the MB of each variable at each observational site.

Figure 3 Figure 3 .
Figure3shows the probability distribution functions (PDFs) of T2, RH2, WS10, and precipitation against NCDC and NADP for 10 years.The observed and simulated variables are averaged at each site for the 10-year period, and the pairs are then distributed into a PDF over 30 bins of ob- Nasrollahi et al. (2012) examined 20 combinations of microphysics and cumulus parameterization schemes available in WRF and found that most parameterization schemes overestimate the amount of rainfall and the extent of high rainfall values.In this study, while Grell 3-D ensemble cumulus parameterization contributes in part to the overpredictions of precipitation, most overpredictions occur at high thresholds as shown in Fig.3d, and they are attributed to possible errors in the Morrison two-moment scheme because the overpredictions of non-convective precipitation dominate the overpredictions of total precipitation.

Figure 4 .
Figure 4. Time series of 10-year averaged monthly mean observations (blue) vs. simulations (red) for (a) O 3 against AQS data, (b) O 3 against CASTNET data, (c) PM 2.5 against IMPROVE, and (d) PM 2.5 against STN, and annual averages for (e) O 3 against AQS data, (f) O 3 against CASTNET data, (g) PM 2.5 against IMPROVE, and (h) PM 2.5 against STN.IOA statistics (black diamonds) are also provided on the secondary y axes in panels (a)-(h).

Figure 5 .
Figure 5. Probability distribution functions (PDFs) of (a) maximum 1 h O 3 against CASTNET, (b) maximum 8 h O 3 against CASTNET, (c) maximum 1 h O 3 against AIRS-AQS, and (d) maximum 8 h O 3 against AIRS-AQS for 2001 to 2010 over 30 bins in the respective ranges for all variables.

Figure 6 .
Figure 6.Diurnal variation of observed vs. simulated hourly O 3 concentrations against CASTNET (left column from a to d) and AIRS-AQS (right column from e to h) for all climatological seasons.The x axes refer to hours in local standard time.

Figure 7 .
Figure 7. Spatial distribution of 10-year averaged hourly observed vs. simulated (a) O 3 for CASTNET and AIRS-AQS, (b) PM 10 from AIRS-AQS, (c) PM 2.5 , and (d) PM 2.5 sulfate from STN and IMPROVE.The background plots represent the simulated data, while observations are represented by the markers.

Figure 8 .
Figure 8. Spatial distribution of 10-year averaged hourly observed vs. simulated (a) ammonium, (b) nitrate, (c) EC, and (d) TC from STN and IMPROVE.The background plots represent the simulated data, while observations are represented by the markers.

Table 1 .
Model configurations and setup.

Table 2 .
The 10-year (2001The 10-year ( -2010) )average performance statistics for the simulated meteorological, aerosol, cloud, radiation variables, and chemical species against surface observational networks and satellite retrieval products.
Seasonal temperature biases of −1.8 to −2.3 • C were reported from an ensemble of regional climate models (RCMs) for a simulation period of 1971 to 2000 over the northeasternUS (Rawlins et al., 2012).He et al. (2015)also showed biases of −3 to 0 • C over CONUS when compared against NCEP reanalysis data.Kim et al. (2013) com- Caldwell et al. (2009) consistent with other studies that tend to report underpredictions in simulated T2.Brunner et al. (2014)reported a range of monthly MBs for T2 of −2 to 1 • C for simulations using a number of CTMs over individual years for 2006 and 2010 with reanalysis meteo-rological ICs/BCs.For example,Caldwell et al. (2009)attributed the overprediction in precipitation to overprediction in precipitation intensity but underprediction in precipitation frequency.Otte  et al. ( Table 2 summarizes the statistics for major chemical species.
IOAs of ∼ 0.6 at the AIRS-AQS sites, and underpredictions and IOAs of ∼ 0.6 to 0.8 at the CASTNET sites.