HEMCO v1.0: a versatile, ESMF-compliant component for calculating emissions in atmospheric models

. We describe the Harvard–NASA Emission Component version 1.0 (HEMCO), a stand-alone software component for computing emissions in global atmospheric models. HEMCO determines emissions from different sources, regions, and species on a user-deﬁned grid and can combine, overlay, and update a set of data inventories and scale factors, as speciﬁed by the user through the HEMCO conﬁguration ﬁle. New emission inventories at any spatial and temporal resolution are readily added to HEMCO and can be ac-cessed by the user without any preprocessing of the data ﬁles or modiﬁcation of the source code. Emissions that depend on dynamic source types and local environmental variables such as wind speed or surface temperature are calculated in separate HEMCO extensions.


Introduction
Accurate representation of emissions is essential in global models of atmospheric composition. Models typically rely on gridded emission inventory data, covering global or regional domains, which are often multiplied with scale factors to adjust for different species and temporal variability . New and updated emission inventories are continuously being developed by research groups and agencies, reflecting both improving knowledge and actual changes in emissions. Timely incorporation of this new information into atmospheric models is crucial but can involve laborious programming. Here, we present the Harvard-NASA Emission Component version 1.0 (HEMCO), a software interface for atmospheric models that automates the implementation of new inventories and allows the construction of userspecified combinations of existing inventories and scale factors on a per region and/or per species basis. HEMCO is compliant with the Earth System Modeling Framework (ESMF; Hill et al., 2004) software environment and thus can serve as a stand-alone emission component in Earth system models (ESM).
The general approach to determine emission of a given species in global atmospheric models is through a combination of base emissions and multiplicative scale factors. Base emissions are gridded external data generally constructed using a bottom-up approach based on best estimates of activity rates (e.g., fuel consumption) and emission factors (e.g., emitted mass of species per unit mass of fuel) (Granier et al., 2011). They may also include top-down constraints from atmospheric observations (e.g., Mieville et al., 2010). Scale factors applied to these base values adjust emissions at specific times to account for diurnal, day of week, seasonal, or year-to-year variability (van Donkelaar et al., 2006;Wang et al., 2010), or for environmental parameters such as wind or temperature (e.g., Zender et al., 2003;Guenther et al., 2012).
HEMCO is highly customizable as it can use a wide range of base emissions and scale factors. A reference library of data sets is included in HEMCO, and the user can supplement these with his/her own alternatives. These inventories need not be of the same grid dimensions or domain. Using the customizable configuration file, the HEMCO core module selects and assembles the emission arrays for the atmospheric model through a combination of the selected base emissions and scale factors. More interactive emission modules that depend on gridded source types (e.g., land use, vegetation type) and/or environmentally dependent scale factors (e.g., wind speed, surface temperature) are appended to the HEMCO core module as HEMCO extensions, as explained in Sect. 2.6.
HEMCO is designed so that it is well suited for use in ESMs, where the atmospheric composition module is coupled to modules describing atmospheric dynamics and other components of the Earth system (e.g., oceans, land, cryosphere). ESMs are interdisciplinary endeavors where stewardship of the code is distributed among several research communities, placing an additional hurdle on timely code updates. We have designed HEMCO so it can serve as an emission component for ESMs through the ESMF interface, thus allowing for seamless updating of emission inventories and extension modules. HEMCO is currently being incorporated into the Goddard Earth Observing System (GEOS-5) ESM of the NASA Global Modeling and Assimilation Office (GMAO) (Molod et al., 2012;Ott et al., 2010;Randles et al., 2013). The HEMCO code, written in FORTRAN 90, along with its current extensions and library of open-source emission inventory databases is available at http://wiki.geos-chem.org/HEMCO. Figure 1 illustrates the design of the HEMCO core module. HEMCO acts as a coupler between a set of emission data files organized in a data library and the external (atmospheric) model. Based on the specifications of the user configuration file, HEMCO selects the emission files to be used, schedules and invokes the corresponding data receiving commands, organizes the resulting data arrays, and calculates the emission fields for a given species and time upon request. Species definitions and geographical grid points to be used for the emission calculation are specified during initialization of HEMCO. The grid can be 3-D to allow for emissions at altitude (e.g., from tall stacks, aircraft), and all emission fields will be returned on this grid. Grid definitions are  Figure 1. Overview of the HEMCO core module, which acts as a coupler between the atmospheric model and emission files organized in a data library. All emission calculations are based on the information provided in the HEMCO configuration file, which is read and stored in the internal file list ("FileList") during initialization (Initialize). By modifying the content of the data library and configuration file, users can readily add new emission data and change emission calculation settings. HEMCO calculates emissions for a given species, date, and grid in two steps: first, all emission data to be updated are identified and selected from FileList and received through the data interface ("Receive"). The returned data become stored in the "EmisData" list, from which the final emission array is assembled in the second step of RUN ("Calculate"). This array is then returned to the atmospheric model. Routine FINALIZE, called at the end of a model run, cleanly removes all internal data.

Overview
typically determined from the external model, even though any grid is supported. The species definitions are used to identify the species names provided in the configuration file (Sect. 2.3), and pass the respective emission fields to the atmospheric model. Each species is defined by its name, molecular weight, and corresponding species name/ID used in the atmospheric model. Additional conversion factors can be defined, e.g., to convert mass of emitted species into mass of carbon or to map individual species into compound classes (and vice versa), e.g., for organic compounds. HEMCO receives all data through the data interface and all data arrays entering HEMCO are already on the requested grid; i.e., data reading and remapping operations are performed outside of HEMCO core. This facilitates the coupling of HEMCO to different data reading and regridding algorithms, as discussed further in Sect. 2.5.
For each gridbox x on the specified grid, HEMCO computes emissions e x,j (t) for requested species j and at time t. It does so by incorporating the emission inventories and scale factor data files selected and prioritized by the user through the configuration file. The resulting emissions e x,j (t) are then passed to the external model ( Fig. 1). Emission calculation may include a combination of different inventories n ∈ [1, p] covering different geographic domains (e.g., North America, China) and/or emission sectors (e.g., fossil fuel, open fires). For each selected inventory n, the emission e x,j,n (t) is calculated as multiplication of the base value b x,j,n (t) and m ∈ [1, q] scale factors s x,m (t), as defined in the configuration file: (1) Emissions e x,j,n (t) and b x,j,n (t) are in units of mass per unit area per unit time, and scale factors s x,m (t) are unitless. Scale factors represent (1) temporal emission variations including diurnal, seasonal, or interannual variability; (2) regional masks that restrict the applicability of the base inventory to a given region; or (3) species-specific scale factors, e.g., to split lumped organic compound emissions into individual species. Additional scale factors can be applied to have emissions depend on local environmental variables such as temperature or wind speed. These require specifications or functional dependencies and thus special treatment, as will be discussed in Sect. 2.6. The final emissions e x,j (t) are composed through addition and/or overwriting of all p employed inventories e x,j,n (t). To determine how inventories of the same species are added and prioritized, each inventory is assigned a category and hierarchy number in the configuration file. Within the same category, inventories of higher priority overwrite lower-priority data, while emissions of different categories are added. This system enables the user to prioritize selected regional inventories for a given sector (e.g., anthropogenic, biofuel) over a default global inventory for the same sector, while still allowing the global inventory to provide information for other sectors. Note that this assumes comparable sector definitions throughout the selected inventories and sectors. Inventories with inconsistent sector definitions can be distributed across multiple sectors (or lumped into one particular sector) by assigning user-specified weights to each sector.

Data library
The HEMCO data library contains the data files of all base emissions and scale factors available to users, who may also choose to extend it by adding their own. Depending on the specifications of the configuration file, only a subset of the library is effectively used for emission calculation. Table 1 lists the global and regional emission inventories currently included in the HEMCO library. They correspond to the standard emission settings currently used in the GEOS-Chem chemical transport model (CTM) (Bey et al., 2001). The choice of these emission inventories has been discussed and validated in many publications (e.g., Fairlie et al., 2010;Millet et al., 2010;van Donkelaar et al., 2006;Xiao et al., 2008;Yevich and Logan, 2003).
All files are in the Network Common Data Form (netCDF) format (http://www.unidata.ucar.edu/software/netcdf/) -the most commonly used data format in the climate community -and adhere to the Cooperative Ocean-Atmosphere Research Data Service (COARDS) metadata conventions. New inventories following these conventions can be readily added to the data library. On demand, support for other data formats/conventions can be added with relatively little effort through extension of the HEMCO data interface.
All data listed in Table 1 are on rectilinear (lon-lat) grids and become interpolated onto the simulation grid during model execution. At this stage of development, nonregular (e.g., curvilinear) grids are only supported when running HEMCO within the ESMF environment (see Sect. 2.5).

HEMCO configuration file
Users select base inventories and scale factors for their simulation through the HEMCO configuration file. Thus, HEMCO enables the user to incorporate new emissions and alter the composition of model emissions without the need to change any source code. A sample configuration file and the corresponding model emission field are shown in Fig The same diurnal scale factors are applied to the two regional inventories as for the EDGAR emissions, and the interannual variability of the EDGAR inventory, provided in YearScal.nc, is adapted to the Asian inventory. Finally, NO x ship emissions from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS; Wang et al., 2008), available in file Ship.nc, are used in addition to the above-mentioned inventories.
The first section of the configuration file (denoted base emissions) lists the base inventories (see Fig. 2). The first column ("Name") is a descriptive field identification name, followed by data reading information consisting of the (netCDF) data filename ("srcFile"), the data variable name ("srcVar"), as well as available time range and temporal resolution ("srcTime"), as described in more detail in Sect. 2.5. Column "Species" denotes the emissions species name used by the external model, which is adopted by HEMCO during initialization. It is used to ensure that the requested model species are correctly identified by HEMCO (emissions will be ignored otherwise). Column "ScalIDs" lists the identification numbers of all scale factors applied to this base inventory, with multiple scale factors separated by the  forward-slash sign. The numbers refer to the scale factor numbers specified in column "ScalID" of the second part of the configuration file, where all scale factors and masks are listed. For example, in the configuration file shown in Fig. 2, the EDGAR NO x inventory (line 5) is linked with scale factors 1 (DAY_NOX) and 2 (MONTH_NOX), defined on lines 12 and 13, respectively. The last two columns of the base data section give the emissions category ("Cat") and hierarchy ("Hier"). In the example of Fig. 2, the EDGAR_NOX field (Cat = 1; Hier = 1) is overwritten by the regional ASIA_NOX and EMEP_NOX data (Cat = 1; Hier = 2). The regional inventories are only applied to the region where they are defined, and EDGAR is used everywhere else. The ship emissions SHIP_NOX are given a different category (Cat = 2) and hence are added to the NO x field assembled for emission category 1.
The effect of adding the SHIP_NOX emission field to the configuration file is illustrated in the two maps shown in Fig. 2. Panel a depicts the NO x emission map obtained from running HEMCO without the ship inventory. Panel b shows the resulting emission map when line 8 is included.
Sections 2 and 3 of the configuration file list scale factors and region masks. All scale factors and masks are listed with the scale factor identification number (ScalID), descriptive field name (Name) and file attributes (srcFile, srcVar, src-Time). The unitless scale factors are either gridded data obtained from a data file (e.g., geographical variations in diurnal emissions) or a spatially uniform scalar directly defined in the configuration file in lieu of a data file. The latter makes it easy to uniformly scale emissions and/or to fractionate lumped emission inventories into different sectors or individual species (e.g., for organic compounds). Masks are binary scale factors (1 inside the region, 0 outside).

Core module and emissions calculations
The operation of HEMCO can be divided into three stages, all invoked by the external model ( Fig. 1): Initialize, Run, and Finalize. Initialize and Finalize are only executed once, at the beginning and end of the model simulation, respectively. The Run command is repeated at every emission time step.
The core of HEMCO consists of the internal data structures "FileList" and "EmisData". FileList contains the file information of all used base emissions and scale factors, such as data filename, variable, update frequency, etc. It is created  Figure 2. Sample HEMCO configuration file and resulting emission fields calculated by HEMCO. Emission inventories (base emissions) are listed in lines 5-8 with the identification name ("Name"), (netCDF) filename ("srcFile"), file variable of interest ("srcVar"), and temporal resolution ("srcTime": year/month/day/hour). The atmospheric model species name is given in the 5th column ("Species"), and all scale factors to be linked to the base emissions are listed in the 6th column ("ScalIDs"), separated by a forward slash. The emission category and hierarchy are defined in columns "Cat" and "Hier", respectively. Scale factors and region masks are listed in lines 12-19, starting with the scale factor ID ("ScalID", corresponding to the IDs given in the base emission section), the identification name ("Name"), and the data file information ("src-File", "srcVar", "srcTime", as for base emissions). in the first stage of HEMCO (Initialize) based on the content of the HEMCO configuration file. In addition to setting FileList, the initialization routine also receives emission species definitions (e.g., species name, molecular weight) and specifies the (emission) grid points to be covered by this central processing unit (CPU). In the case of a distributed computing environment, the emissions grid will be broken up across all available CPUs on the system. The grid defined during initialization is preserved over the whole course of the simulation and all emissions are returned on it. EmisData organizes the 3-D arrays of all base emissions and scale factors, which are stored in individual data structures ("containers") along with information on how these arrays are connected to each other. Each data array covers the specified emission grid and contains the values for the current simulation date. HEMCO automatically updates all arrays as the simulation date advances, based on the update frequency defined in the configuration file.
The second stage (Run) of HEMCO consists of two steps, namely receiving/updating content of the emissions list, and calculating the emissions e x,j (t). The "receive" command generates a data request, based on the information in FileList, which is sent to the data interface. The returned base emissions and scale factors (b x,j,n (t) and s x,m (t)) are on the specified emission grid and unit and become stored in a corresponding data container in EmisData.
In the second step of Run ("calculate"), emissions are directly calculated from the 3-D arrays stored in EmisData according to Eq. (1). For each base inventory n of compound j , emissions e x,j,n (t) are calculated first for every grid point x before all these values are merged into the final emissions e x,j (t), based upon the emission categories and hierarchies given to each inventory.

Data interface
The data interface provides the link between HEMCO and the input data files. Depending on the employed model environment, this step includes different operations and levels of complexity.
When run within ESMF, file reading and data interpolation are performed using the MAPL (Modeling Analysis and Prediction Program Layer) software toolkit built on top of ESMF (https://modelingguru.nasa.gov/docs/DOC-1118). In this case, the role of the data interface is to ensure that all files required by HEMCO are correctly identified and registered through MAPL, as well as to connect the final processed data to the HEMCO core module on every emission time step. The ESMF regridding utility supports a wide variety of grids, including rectilinear, curvilinear, and unstructured grids. More details on the HEMCO implementation within a MAPL/ESMF environment are given in Sect. 3.
If running HEMCO outside of ESMF, e.g., as part of a stand-alone CTM like GEOS-Chem, the data interface needs to perform data reading and remapping operations explicitly. In this case, a package of generic subroutines is called through the data interface module. All data reading parameters used by these routines -such as filename, data variable name, update frequency, etc. -are specified by the user in the configuration file and become stored in FileList. On every emission time step, HEMCO determines the files to be updated -based on the date at the current and previous time step and the specified update frequencies -and invokes the reading and remapping routines accordingly. The (netCDF) filename, data variable, and time stamp to be read are extracted by HEMCO from columns srcFile, srcVar, and src-Time, respectively, of the configuration file. The time stamp provided in srcTime has the format year/month/day/hour and indicates the available time range as well as the temporal resolution of the configuration file. Both discrete dates for timeindependent data (e.g., ship emissions in Fig. 2: 2000/1/1/0) and time ranges for temporally changing inventories (e.g., EDGAR: 1980-2010/1/1/0) are accepted. Time-uniform data are only read once and the same array is then used for all simulation dates. For time-varying data, the time slice most representative for the current simulation date is used.
For example, HEMCO automatically updates EDGAR NO x data whenever the simulation year changes within simulation years 1980 to 2010. Outside of this range, the closest available time slice is used.
At this stage of development, the HEMCO generic reading and remapping routines focus on rectilinear (lon-lat) gridswhich is the most commonly used grid type in (global) emission inventories -and all input data are expected to be on such a grid. The fact that all data reading and remapping routines are kept separate from the rest of the HEMCO code (see Fig. 1) simplifies connecting HEMCO to other data reading and remapping routines and/or extending existing functionalities, e.g., to use input data in formats different than netCDF or support additional regridding methods.

Extensions for environmentally dependent emissions
Emission inventories sometimes include dynamic source types and nonlinear scale factors that have functional dependencies on local environmental variables, which are best calculated online during execution of the model. Examples are wind dependence of dust emissions (Zender et al., 2003), or the temperature and light dependence of biogenic volatile organic carbon (VOC) emissions (Guenther et al., 2012). In such cases, HEMCO can host environmentally independent data sets (source functions) in its data library, but all other scale factors cannot be determined within the HEMCO core module. Instead, users can select a suite of additional modules (extensions) that perform online emission calculations based on environmental variables imported from other parts of the ESM. These extensions take advantage of many of the functionalities of HEMCO. Like the emission data used by the core module, gridded source function data (such as base emissions and environmentally independent scale factors) are specified in the configuration file and subsequently become read, organized, and stored through the HEMCO FileList and Emis-Data objects (Fig. 1), except that they are not used for the HEMCO core emission calculation. Instead, these data arrays are requested directly by the respective extension modules and used therein to calculate the emissions for the given process, together with the extension-specific parameterizations. All environmental variables used by the extensions, such as wind speed and surface temperature, are passed to HEMCO from the atmospheric model through the argument interface and then automatically made available to the extension modules.
The full set of extensions currently available in HEMCO is given in Table 1. Some of these extensions make modelspecific assumptions, e.g., on the dust size bins or VOC speciation, and thus may need some modifications when used in other model environments. The user can also write his/her own extensions and/or import existing model code as additional extensions. A template to facilitate the implementation of new extensions is provided with HEMCO. Figure 3 illustrates the functioning of the HEMCO extensions for an application that combines three standard emission inventories (Emis001.nc-Emis003.nc) and two extensions (dust emissions and biogenic VOC emissions). The extensions use externally specified, gridded data provided in the configuration file (Erosion.nc, MassFrc.nc, SrcFnc.nc), and compute additional scale factors dependent on environmental variables (wind, temperature, etc.) imported from the atmospheric model.
When calling HEMCO from the external atmospheric model, the HEMCO core code is executed first following the procedure depicted in Fig. 1. All six netCDF files listed in the configuration file are received and stored within HEMCO, but only files Emis001-003.nc are used in HEMCO core for emission calculation. The emissions array calculated in HEMCO core is then passed to the dust emission extension, where dust emissions are calculated as described in Fairlie et al. (2010). The time-invariant (gridded) data required by this extension, namely erodibility and mass fractions per surface type, are obtained from files Erosion.nc and MassFrc.nc, respectively, through the HEMCO core routines. Current meteorological information like wind speed (u, v) and surface temperature (T ) is obtained from the atmospheric model through the argument interface (Fig. 3). The calculated dust emissions are added to emissions e x,j (t) previously calculated in HEMCO core, and the combined emission array is then passed to the biogenic VOC emission extension module. This module requires the gridded source functions (base emissions) for each species -provided by file SrcFnc.ncalong with a suite of meteorological variables (temperature, radiation, etc.) (Guenther et al., 2012) provided by the atmospheric model. The final emission array comprising the sum of all three emissions arrays is then returned to the atmospheric model.

Implementations
HEMCO is a stand-alone emissions component that can be readily included into a new model environment. All required adjustments can be done at the interface level between HEMCO and the external model. So far, we have implemented HEMCO in the GEOS-Chem CTM driven by assimilated meteorological data (Bey et al., 2001), and the NASA GEOS-5 Earth system model. The GEOS-Chem implementation uses the ensemble of emission inventories listed in Table 1. New emission inventories are now added to GEOS-Chem through HEMCO, which greatly facilitates model updates.
Implementation of HEMCO into the GEOS-5 ESM is done through the ESMF interface. ESMF is a widely used modular software framework for ESMs (Hill et al., 2004). It enables the construction of ESMs by assembly of a number of stand-alone components connected to each other through the ESMF superstructure layer. Components are classified as gridded components, which are executed on a discrete grid, and coupler components, which connect gridded components and perform input/output operations. Components receive data through a special superstructure object class (import state), and make data available to other components by returning data as object class export state.
HEMCO contains all wrapper routines needed to embed it as a gridded component into an ESMF model application. Specifically, all files listed in the HEMCO configuration file are registered for data reading at the beginning of a model run. These files then become automatically read and interpolated in space and time through ESMF-generic routines, and HEMCO subsequently imports these arrays through the import state object during step "Receive" of the Run stage (Fig. 1). Likewise, the import state object is used to obtain data from other ESM components needed by some of the HEMCO extensions, e.g., meteorological fields such as wind speed and temperature, or source type classifications, e.g., vegetation type. All emission arrays calculated within HEMCO are returned as an export state object so that they are available to other model components (i.e., the transport or chemistry component).

Conclusions
HEMCO provides a flexible tool for atmospheric models to compute emissions for different sources, regions, and species through automatic combination, overlaying, and updating of user-selected inventories and scale factors. New data sets on any spatial grid and temporal resolution can be readily added to HEMCO without modification of the source code. Emissions and scale factors that depend on local environmental parameter such as wind speed and temperature are included through HEMCO extensions.
A particular advantage of HEMCO is that emissions become regridded and converted to desired units during model execution, which allows a straightforward implementation of new emission inventories with no need to preprocess these data. Thus, HEMCO is well suited for model intercomparison and emission sensitivity studies. These tasks require running the model with emission data that differ from the default emission settings, which is easily achieved in HEMCO by simply modifying the configuration file.
The strictly modular structure of HEMCO also makes it attractive for inverse modeling applications in which emissions are adjusted iteratively to provide an optimal fit to geospatial observations (see Enting, 2005). The adjustment factors can be easily implemented into HEMCO as additional scale factors, which are then applied to the base emissions.
HEMCO is ESMF-compliant and can therefore be readily used to compute emissions in Earth system models that rely on the ESMF structure. In such applications, HEMCO makes use of the MAPL/ESMF software toolkits to read and interpolate data fields from files as well as to connect HEMCO with other ESM components. HEMCO presently serves as the emission component for the GEOS-Chem CTM and for the NASA GEOS-5 ESM (via ESMF). It can be used easily in any Earth system model.

Code availability
The HEMCO code (in FORTRAN 90), current extensions and emission databases (listed in Table 1), and sample configuration files are available at http://wiki.geos-chem.org/ HEMCO.