Assimilation of MODIS Dark Target and Deep Blue observations in the dust aerosol component of NMMB-MONARCH version 1.0

A data assimilation capability has been built for the NMMB-MONARCH chemical weather prediction system, with a focus on mineral dust, a prominent type of aerosol. An ensemble-based Kalman filter technique (namely the local ensemble transform Kalman filter – LETKF) has been utilized to optimally combine model background and satellite retrievals. Our implementation of the ensemble is based on known uncertainties in the physical parametrizations of the dust emission scheme. Experiments showed that MODIS AOD retrievals using the Dark Target algorithm can help NMMB-MONARCH to better characterize atmospheric dust. This is particularly true for the analysis of the dust outflow in the Sahel region and over the African Atlantic coast. The assimilation of MODIS AOD retrievals based on the Deep Blue algorithm has a further positive impact in the analysis downwind from the strongest dust sources of the Sahara and in the Arabian Peninsula. An analysis-initialized forecast performs better (lower forecast error and higher correlation with observations) than a standard forecast, with the exception of underestimating dust in the long-range Atlantic transport and degradation of the temporal evolution of dust in some regions after day 1. Particularly relevant is the improved forecast over the Sahara throughout the forecast range thanks to the assimilation of Deep Blue retrievals over areas not easily covered by other observational datasets. The present study on mineral dust is a first step towards data assimilation with a complete aerosol prediction system that includes multiple aerosol species.


Introduction
Among the different aerosol species, mineral dust is one of the main components of the atmospheric aerosol loading and is of great interest for a variety of reasons. Mineral dust plays an important role in the earth's energy balance and has a relevant impact on economical activities, on the ecosystem, on health, as well as on weather and climate (Knippertz and Stuut, 2014). The strong dust storms occurring near emission sources constitute a big hazard to air traffic and road transport as they can reduce the visibility down to few metres. However, dust does not affect only local economies: because of its transport over thousands of kilometres, it has an impact from local to global scales. Dust deposition provides nutrients to continental and marine ecosystems. African dust for example has a role in fertilization of the Amazon rainforest (Yu et al., 2015), while dust deposition over oceans has implications for sea biogeochemistry as the iron contained in the dust particles is a nutrient for phytoplankton, whose photosynthetic activity in turn affects the carbon cycle (Jickels et al., 2005). Dust has health implications both close to and far from sources. For example, studies have shown the usefulness of dust aerosol climatologies in predicting part of the year-to-year variability of the seasonal incidence of meningitis in Niger (Pérez García-Pando et al., 2014), while particulate matter measurements taken in areas far from sources show that Saharan dust outbreaks have adverse effects of cardiovascular and respiratory conditions (Mallone et al., 2011;Morman and Plumlee, 2013;Pandolfi at al., 2014). Mineral particles perturb the earth-atmosphere's radiation budget through their interaction with the short-wave radiation, through scattering and absorption, and long-wave radiation, through absorption and re-emission. Due to this redistribution of the energy, dust aerosols can have a strong impact on atmospheric processes at short (weather) and long (climate) term periods, while they can affect atmospheric circulations at large spatial scales (e.g. Asian monsoons; Lau et al., 2006). Furthermore, this can generate feedback processes since changes in weather and climate can in turn lead to changes in the dust cycle.
Different types of ground-based (e.g., Kim et al., 2011;Pey at al., 2013) and space-borne (e.g., Kaufman et al., 2005;Luo et al., 2015) observations have been utilized to describe the variability of atmospheric dust. However, due to either insufficient spatial representativeness or accuracy, the spatiotemporal features of dust aerosols are not fully captured by the current observing system. Neither do models accurately describe atmospheric and surface dust concentrations . High uncertainties are also in our knowledge of the optical and micro-physical properties of dust, and in our representation of its vertical structure. The latter has implications for the radiation's budget and transport. On the other hand, an accurate quantification of dust's spatial and temporal distribution is key in correctly characterizing the effect that it has on the earth's energy balance, as well as in improving the skill in forecasting its concentrations in the atmosphere as well as in forecasting the weather (Pérez García-Pando at al., 2006;Grini et al., 2006;Chaboureau et al., 2011).
Regional and global centres, predicting the most important aerosol species or dust only, participate in different model inter-comparison initiatives like the Aerosol Comparisons between Observations and Models (AeroCom; Tsigaridis et al., 2007) project, the International Cooperative for Aerosol Prediction (ICAP; Sessions et al., 2015) initiative, and the WMO Sand and Dust Storm Warning Advisory and Assessment System (SDS-WAS; Terradellas et al., 2015). Multimodel ensemble spreads give an indication of large uncertainties in the modelling schemes and confirm the need for a better characterization of aerosols. Relatively recently because of these large uncertainties, the atmospheric composition community has begun to make use of data assimilation (DA) for better characterization and prediction of atmospheric constituents such as aerosols and trace gases (Bocquet et al., 2015). Though their dynamic is mainly driven by forcings such as emissions, recent studies showed that the usage of observations through data assimilation has improved significantly the accuracy of short-term forecast and the global monitoring of both aerosols and trace gases (Benedetti et al., 2009;Elbern and Schmidt, 2001). Since the first experiments in the early 2000s, the assimilation of aerosol observations is now operational in some of the main aerosol forecasting centres (Sessions et al., 2015). Zhang et al. (2014) have highlighted in particular the importance of a combined assimilation of satellite products for aerosol forecast.
The Earth Sciences Department of the Barcelona Supercomputing Center (ES-BSC) is implementing a gas-aerosol module able to predict atmospheric composition at different spatial and temporal scales within the NMMB (Nonhydrostatic Multi-scale Model on the B grid; Janjic and Gall, 2012) state-of-art meteorological model. This modelling system is known as the Multiscale Online Nonhydrostatic At-mospheRe CHemistry mode (NMMB-MONARCH). We report here on the extension of NMMB-MONARCH with a data assimilation functionality using satellite aerosol optical depth. NMMB-MONARCH version 1.0, as in Pérez et al. (2011, where the model was previously named NMMB/BSC-CTM), considers dust only, but other aerosols are being implemented (Spada et al., 2017). The focus of this work on mineral dust is justified by the operational services provided by NMMB-MONARCH. This model provides an operational dust forecast for the Barcelona Dust Forecast Centre under an initiative of the World Meteorological Organization. It participates in the multi-model dust ensemble of the aforementioned ICAP initiative, providing daily global dust forecasts of up to 120 h. It also provides daily regional forecasts of up to 60 h to the WMO SDS-WAS system. Before this work, the system did not have an aerosol data assimilation capability and dust was produced uniquely from model estimated surface emission fluxes. The present study on mineral dust is a first step towards data assimilation with a complete aerosol prediction system that includes multiple aerosol species (not only dust but also sea salt, sulfate, and organics).
Previous studies of assimilation of dust aerosol only have been conducted for the Chinese Unified Atmospheric Chemistry Environment -Dust (CUACE/Dust) forecast system (Niu et al., 2008;Wang and Niu, 2013). These studies have used variational data assimilation techniques (3D-Var) which require, in their most practical implementation, precalculated and constant in time model error structures. Alternatively, ensemble-based techniques use flow-dependent model error amplitudes and structures which evolve during forecast and, in theory, should be able to capture better instabilities in the background flow (Evensen, 1994;Kalnay et al., 2007). Dust AOD is currently assimilated at the UK Met Office with a hybrid variational data assimilation technique (hybrid 4D-Var).
In this work we present the coupling of NMMB-MONARCH with an ensemble-based technique known as local ensemble transform Kalman filter (LETKF; Hunt et al., 2007;Miyoshi and Yamane, 2007). The LETKF scheme has been shown to be particularly suitable for the assimilation of aerosol information since it has been observed by Anderson et al. (2003), Shinozuka andRedemann (2011), andSchutgens et al. (2013) that aerosol fields have limited spatial correlations. Long-range transport of dust could be an exception to this. Since detailed studies of spatial correlation length scales for dust long-range transport are still missing in the literature, in this work we assume that what has been derived (limited spatial correlations) in general for aerosols is valid for dust. The utility of ensemble-based techniques for global aerosol simulations has been shown in previous studies (Schutgens et al., 2010a;Sekiyama et al., 2010;Rubin et al., 2016;and more recently Yumimoto et al., 2016). The main novelty in our study is the creation of the ensemble, our implementation is based on known uncertainties in the physical parametrizations of the sophisticated dust emission scheme used by the NMMB-MONARCH model, as well as in the use of observations particularly relevant for dust applications, like MODIS Deep Blue.
The NMMB-MONARCH chemical weather prediction system is described in more detail in Sect. 2, with emphasis on its dust module. A description of the data assimilation scheme and of the assimilated observations follows respectively in Sects. 3 and 4. We report then in Sect. 5 about the characteristics of the simulations that we have run, in Sect. 6 about the evaluation methodology that we have followed, and in Sect. 7 about the evaluation results of our simulation experiments. The final section concludes the paper with a summary of this development, the main results achieved, and future perspectives.

The NMMB-MONARCH model and its mineral dust component
The ES-BSC is implementing a new gas-aerosol module within the NMMB meteorological model from the Unites States National Centers for Environmental Prediction (NCEP). The new modelling system is known as NMMB-MONARCH Jorba et al., 2012;Spada et al., 2013;Badia et al., 2016, where it was previously named NMMB/BSC-CTM), and is developed in collaboration with NCEP and other research institutions. The chemistry (aerosols included) and meteorology are fully online integrated. NMMB-MONARCH is able to work with a wide range of spatial scales thanks to its unified non-hydrostatic dynamical core, keeping consistent parametrizations at different spatial and temporal scales. Furthermore, the dynamical core and the coupled modules are computationally highly efficient, satisfying current and projected operational requirements. The rest of this section will briefly describe some characteristics of the dust component of NMMB-MONARCH, with particular focus on the emission scheme. The dust emission scheme implemented in NMMB-MONARCH follows the empirical relationship of Marticorena and Bergametti (1995) and Marticorena et al. (1995) according to which the vertical dust flux is proportional to the horizontal sand flux. The horizontal to vertical flux ratio reflects the availability of dust in four soil populations (clay, silt, fine/medium sand, and coarse sand) (Tegen et al., 2002). The horizontal sand flux is modelled as the flux of the saltating particles H simulated according to White (1979) and is proportional to the third power of the wind friction velocity. A soil moisture-dependent threshold on the friction velocity determines the velocity above which the soil particles begin to move in horizontal saltation flux. This threshold is dynamically estimated according to soil characteristics. Soil moisture effects are included following Fécan et al. (1999) through the soil moisture correction parameter included in the expression for the threshold on the friction velocity. A sectional approach is used for the transport of dust particles, i.e. the dust size distribution is decomposed into small size bins. More exactly, dust is modelled using eight dust size bins varying from 0.1 to 10 microns, and, within each transport bin, dust is assumed to have a time-invariant lognormal distribution (Zender et al., 2003). The total vertical flux mass is distributed among the dust transport bins according to a specific dust distribution at sources. NMMB-MONARCH uses a distribution over sources derived from D' Almeida (1987) which assumes that the vertical dust flux is size distributed according to three lognormal background source modes. More explicitly, the dust vertical mass flux F b (kg s −1 m −2 ) in a given transport bin b at each grid cell is given by where S is a source erodibility factor defined on bare ground surfaces, representing the probability of having accumulated sediments in the given grid cell that are potential dust sources; (1 − V ) is the grid's fraction of bare soil; α (m −1 ) is the horizontal to vertical flux ratio calculated for four soil population classes (clay, silt, fine/medium sand, and coarse sand); H (kg s −1 m −1 ) is the horizontal sand flux; M i,k is the mass fraction of background source mode i carried in a transport bin k calculated following Zender et al. (2003), and weighted by a specific background source mode coefficient m i ; and C is a global tuning factor empirically set to 0.768, which is meant to compensate for the uncertainty associated with the different components of F k . More details about the above formulation of dust emission can be found in . The mineral dust module has been extensively evaluated in studies at global and regional scales Haustein et al., 2012;Huneeus et al., , 2016, showing that its evaluation scores lie in the upper range of the AERO-COM model evaluation performance scores. However, these evaluation efforts confirmed, similarly to other modelling systems, different sources of uncertainty in the NMMB-MONARCH dust modelling.

The data assimilation scheme
We have coupled NMMB-MONARCH with the LETKF scheme (Hunt et al., 2007;Miyoshi and Yamane, 2007;Schutgens et al., 2010aSchutgens et al., , 2013 with four-dimensional extension as described in Hunt et al. (2007), in order to estimate optimal initial conditions for our dust model. The overall scheme implements an iterative approach consisting in a forecast step and state estimation step. The state estimation step combines information from mineral dust observations and a prior first guess, or background from our model. A short-term forecast is used as background information. The background incorporates information from past observations; therefore, the analysis is estimated using both current and past observations. LETKF is a development of the ensemble-based transform Kalman filter (ETKF; Bishop et al., 2001) and of the local ensemble Kalman filter (LEKF; Ott et al., 2004), and is particularly suited to highperformance computing applications. A very attractive feature of an ensemble-based technique is the use of a flowdependent background error covariance, which is derived from the ensemble of model states at the assimilation time, and evolves during forecast. At any given time in fact the state estimate is represented by an ensemble of system states and its uncertainty is derived from the ensemble. LETKF has the advantageous feature that it applies localization, i.e. it performs the analysis locally (at each grid point only observations within a certain distance are assimilated), allowing the global analysis to explore a much higher-dimensional space than the subspace spanned by the ensemble (whose dimensionality is limited by the number of ensemble members). Localization also reduces the effect of spurious longrange covariances, effectively eliminating them after a given distance. This is particularly suitable for the assimilation of aerosol information since, as mentioned in the introduction, it has been observed that aerosol fields have limited spatial correlations (∼ 100 km). Schutgens et al. (2010a, b) have already shown the positive impact of assimilating aerosol ground station observations using a LETKF assimilation system for the SPRINTARS model, while Sekiyama et al. (2010) used it to assimilate CALIOP vertical profiles in the MASIN-GAR model and Dai et al. (2013) used it to ingest MODIS observations in the NICAM-SPRINTARS model.
Here we discuss the basic concepts behind the LETKF algorithm; a more detailed description of the scheme can be found in Hunt et al. (2007). Consider a state vector x of the dynamic variables of a system (for our application these are dust mass mixing ratios). The mean analysis increment at a grid point is estimated as a linear combination of the background ensemble perturbations X b : where we use the superscripts "a" and "b" to denote respectively the analysis and background state vector, and where the ith column of the matrix . ., k} with k ensemble members, i.e. the difference between the ith ensemble forecast x b(i) and the ensemble forecast mean x b . w is termed the "weight" matrix specifying what linear combination of the background ensemble perturbations is added to the background mean to obtain the analysis ensemble. The "weight" matrix is given by where Y b is the background ensemble perturbation matrix in observation space (or background observation ensemble perturbation matrix), R is the observation error covariance matrix which we assume diagonal, I is the identity matrix, y o is the vector of observations, and y b is the mean background observation ensemble. The background observation ensemble is obtained by applying the observation operator h(·) to the ensemble forecast members . LETKF uses R-localization, i.e. the localization is performed in the observation error covariance matrix, making the influence of an observation on the analysis decay gradually toward zero as the distance from the analysis location increases. To achieve this, the observation error is divided by a distance-dependent function that decays to zero with increasing distance: e − dist 2 l 2 , where "dist" is the distance in the grid space between an observation and the model grid in which the analysis is calculated, and l is the horizontal localization factor.

Ensemble perturbations
We run the data assimilation scheme under an imperfect model scenario assumption: each ensemble member is run with a different perturbation of uncertain model parameters in the dust emission scheme. In dust modelling, the emission source term is a particularly large contributor to model error . In the case of NMMB-MONARCH one of the components of the uncertainty in the emission term has been identified for example in the vertical flux distribution at sources (Gama et al., 2016).
The model ensemble is created by perturbing the vertical flux of dust in each of the eight dust bins. As described in Sect. 2, NMMB-MONARCH follows a sectional approach for dust, i.e. the size distribution is decomposed into small size bins that from bin 1 to bin 8 go from 0.1 to 10 µm with division intervals at 0.18, 0.3, 0.6, 1, 1.8, 3, and 6 µm. This is equivalent to perturbing the total vertical flux as well as its size distribution at sources. The perturbations are extracted imposing some physical constraint: correlated noise is used across the bins so that noise correlation decreases with increased difference of the normalized cubic radius among the bins; the noise is applied multiplicatively and has mean 1 and standard deviation of 30 % of the unperturbed value in each bin; and the final distribution is unimodal. Figure 1 shows how the vertical flux is perturbed in our ensemble simulations. Additionally, we have perturbed the threshold friction velocity for dust emission by extracting a multiplicative random factor from a normal distribution with mean 1 and spread 0.4. This considers the uncertainty of the model with respect to both surface winds and soil humidity. At low Figure 1. Distribution of the mass vertical flux at sources across the eight dust transport bins for the different ensemble members in different colours, where the bin sizes from bin 1 to bin 8 go from 0.1 to 10 µm with division intervals at 0.18, 0.3, 0.6, 1, 1.8, 3, and 6 µm. The distribution derived from D' Almeida (1987), and used in the standard forecast, is the dashed red line, with horizontal bars indicating the standard deviation of the noise used to create the perturbations. The mean of the ensemble perturbations is the dash-dotted line.
resolution, model surface winds are typically underestimated over dust sources. Also, the model uses the scheme of Fécan et al. (1999) to calculate the increase in the threshold friction velocity with soil humidity, which is typically overestimated in arid regions (Haustein et al., 2015). The spin-up period for the ensemble ensures that perturbations applied at the sources propagate everywhere in the globe. For this reason at this first stage of development of our ensemble system we did not consider a combined meteorology and source perturbation necessary. The structure of our source perturbations, for both types of perturbations, is temporally and spatially constant.

Observation operator
Our state vector is the dust mass mixing ratios. Therefore an observation operator is needed to map the ensemble mean state vector into the observation space. The simulated AOD at wavelength λ is calculated at a given observation location according to the following linear operator: where ρ b (kg m −3 ) is the particle mass density, r b (m) is the effective radius, M b (kg m −2 ) is the dust column mass loading calculated from each dust bin mixing ratio, and Q λb is the extinction efficiency factor which is calculated for using the Mie scattering theory assuming dust spherical, non-soluble particles, and, within a bin, a lognormal distribution for dust with a geometric radius of 0.2986 µm and a standard deviation of 2.0.
When using in the state vector the total mass mixing ratio, as we will explain in Sect. 5, an ensemble averaged extinction efficiency is calculated as in Schutgens et al. (2010b) as an average of the extinction efficiency of the individual bins weighted by the bin mixing ratios.
Hereafter, when we will use the term AOD without specifying the wavelength, we imply that we refer to aerosol optical depth at a wavelength of 550 nm, which is the most commonly reported value in the literature.
4 Observational data

MODIS Dark Target and OMI
We consider for assimilation the MODIS Level 3 AOD product produced by the US NRL and the University of North Dakota (hereafter called NRL MODIS). The NRL MODIS product is produced for the purpose of assimilation into aerosol transport models (Zhang and Reid, 2006;Hyer et al., 2010;Shi et al., 2011), post-processing the Level 2 MODIS Dark Target product from the so-called Collection 5 (Remer et al., 2008;Levy et al., 2007a, b), and is available over both land and ocean. The MODIS Level 2 product is an average of the 1 km by 1 km retrievals (at nominal resolution) generated by the Dark Target algorithm applied to top-of-atmosphere reflectances observed by the MODIS sensor onboard NASA polar-orbiting satellites Terra and Aqua. The NRL MODIS Level 3 product is filtered and corrected in order to eliminate outliers and gross systematic biases, spatially aggregated into a 1 • by 1 • mesh in order to avoid the assimilation of subgrid features, and an error is estimated for each observation. The product is generated every 6 h at 00:00, 06:00, 12:00, and 18:00 UTC and is based on MODIS Level 2 observations in a 6 h interval around those times. The retrieval errors estimated by NRL/University of North Dakota were used for the observation errors. They include the instrumental error variance and the spatial representation error variance. Following the approach in Zhang et al. (2008), we assume uncorrelated observation errors. These observations pertain to the total atmospheric aerosol column, not just to dust AOD. The selection of observations in dust-dominated conditions is performed using retrievals of the Ångström exponent (AE) from the original MODIS Level 3 product (Hubanks et al., 2008), for information about the size of the particles, and using retrievals of the Aerosol Absorbing Index (AAI) from the Ozone Monitoring Instrument (OMI) sensor (Torres et al., 2007), for information about the absorption characteristics of the particles. Ångström exponent (AE) values are based on quality-assurance-weighted 470 and 660 nm optical depths over land, and 550 and 865 nm optical depths over sea. Observations are selected when daily MODIS Aqua or Terra products contain a value for AE < 0.75 and daily OMI products contain a value for AAI > 1.5. Figure 2 shows an example of the NRL MODIS Level 3 product for a day of

MODIS Deep Blue
The MODIS Dark Target product does not provide information over very bright reflective surfaces, including deserts, as the retrieval algorithm assumes low surface albedo. We consider the assimilation of the MODIS Deep Blue Level 3 daily AOD product from Collection 6 whose algorithm retrieves AOD also over bright arid land surfaces, such as deserts. The Collection 6 product covers all cloud-free and snow-free surfaces, and can be potentially very useful for mineral dust applications as it is able to provide an observational constraint close to dust sources. The Deep Blue algorithm uses top-ofatmosphere reflectances at 412 and 470 nm. In the presence of heavy dust load the reflectance at 650 nm is also used. The algorithm exploits the fact that, over most surfaces, a darker surface and stronger aerosol signal is seen in the blue wavelength range than at longer wavelengths. The quality of the MODIS Deep Blue AOD product is improved in Collection 6 compared to Collection 5, as the work of Sayer et al. (2014), based on Level 2 retrievals, showed. Similar findings, for the northern African and Middle East deserts, were reported by Gkikas et al. (2015), who used Level 3 retrievals over the period 2002-2014.
We have applied to this product the same filter for dustdominated conditions described in Sect. 4.1. In addition we have masked out Level 3 retrievals obtained with less than 30 Level 2 retrievals, since Gkikas et al. (2016) showed that the agreement between MODIS-AERONET is improved when the sub-pixel spatial representativeness is increased. The MODIS Deep Blue observations are not corrected for possible systematic biases; however, we are aware that for future applications we should address any possible bias in the product. It is important to notice that the Level 3 product is an aggregation of Level 2 retrievals that is produced using the highest-quality retrievals (i.e. retrievals with qualityassurance flag value 3). Furthermore, we have applied a quality control to all the assimilated observations based on normalized first-guess departures. As a proxy for the normalization factor, we have used the standard deviation of first-guess departures.
A study by Sayer et al. (2014) shows that the highestquality data have an absolute uncertainty of approximately (0.086 + 0.56 AOD 550 ) / AMF, where AMF is the geometric air mass factor with a typical AMF value of 2.8. We have used this quantification of the uncertainty for the Level 3 product. Furthermore, we have defined the representation component of the error as the standard deviation of the values used in the aggregated product. Although a more accurate treatment for the representation error could be envisaged following for example the approach of van Leeuwen (2014), we deem small the impact that our approximation has on the analysis. Figure 3 shows an example of the MODIS Deep Blue Collection 6 Level 3 product for a day of May 2007 after the filter for dust-dominated conditions is applied.  Fig. 4.

AERONET
For validation purposes we have used observations from the ground-based stations of the global Aerosol Robotic Network (AERONET; Holben et al., 1998) of direct-sun photometers. These observations have not been assimilated in our test simulations. In particular, we have used their retrievals of column-integrated aerosol optical depth from direct-sun photometric measurements. The retrievals are obtained observing the extinction of direct solar radiation due to the presence of aerosols in the atmosphere. For this reason AERONET retrievals are not available under cloudy sky conditions and during night-time. These observations suffer from a relatively sparse spatial coverage but are very valuable for validation purposes as their uncertainty in these retrievals is estimated to be between 0.01 and 0.02 (Eck et al., 1999). Several studies have in fact used the AERONET data for validation purposes, or for the correction of biases in satellite measurements (Zhang and Reid, 2006;Hyer et al., 2010). We considered cloud-screened and qualityassured (Level 2.0) direct-sun AOD retrievals between 440 and 870 nm. AERONET AOD at 550 nm was obtained using the Ångström law.

Numerical simulation set-up
We have run a set of different experiments (listed in Table 1): a control experiment to produce a 5-day forecast (hereafter called the Control experiment) with the same operational configuration (but at a coarser resolution) and version that provides daily global forecast to the aforementioned ICAP multi-model ensemble, and which is initialized for dust from the previous day 24 h forecast (FC + 24). Assimilation experiments were run with NRL MODIS AOD (hereafter called the DA-NRL experiment) and with NRL MODIS AOD and MODIS Deep Blue AOD (hereafter called the DA-NRL-DB experiment) with a pre-processing to the observations as described in Sect. 4. Additionally, we have also run free ensemble simulations without assimilating any observation (hereafter called ENS-free-run). We have also run a 5-day forecast experiment initialized from the analysis produced by the DA-NRL-DB experiment (hereafter called the AN-initialized experiment) in order to evaluate the impact of the analysis on the forecast. The Control experiment was run for 5 months in the spring/summer period of 2007 (from 1 April to 31 August 2007) starting from a cold start for dust and with a spin-up period of 1 month (April 2007) which is excluded from the analysis of the results. Also, the ensemble is spun up before data assimilation is applied.
We use a 24 h assimilation window and observations are considered for assimilation at four time slots within the window, at 00:00, 06:00, 12:00, and 18:00 UTC. The system uses as first guess a 1-day forecast with output every 6 h. Simulated observation and background departures are calculated at each time slot. The time slots are exactly the ones corresponding to the times in which NRL MODIS AOD observations are available. We are using the LETKF implementation with a four-dimensional extension as described in Hunt et al. (2007). The state vector comprises the mixing ratio at all the time slots considered and so does the observation AOD vector. Background observation means y j and perturbation matrices Y j are formed at the various time slots j when the observations are available. They are then vertically concatenated to form a combined background observation mean y and perturbation matrix Y. y and Y are used for the calculation of a weight matrix w using the standard LETKF, i.e. we calculate a single w based on all innovations throughout the day. This same w is then applied to the state vector at different times throughout the assimilation window.
We have tuned different aspects of the data assimilation system, including testing the number of ensemble members, different perturbations of the ensemble, and a different state vector for the control variables. Using 24 ensemble members did not produce a significant impact on the dust analysis compared to the use of 12 ensemble members. This could be explained by our setting of a localization factor which makes the influence of an observation on the analysis decay gradually toward zero as the distance from the analysis location increases. We have set the horizontal localization factor to the value 1 in all the data assimilation experiments. This means that after two grid points the localization function is very close to zero. The value chosen is in the range of the ones used in previous studies such as Rubin et al. (2016) and Yumimoto and Takemura (2011). Covariance localization in fact effectively eliminates background spatial correlations after a certain distance, and might have solved possible sampling errors introduced by the low dimensionality of the 12-member ensemble compared to the 24-member ensemble. We also apply vertical localization following the Miyoshi and Yamane (2007) approach of localizing the error covariance vertically for radiance assimilation. The observation error is divided by the square of the model AOD normalized sensitivity function.
We have tested the usage of different perturbations of the dust emission scheme: a perturbation of the mass vertical flux per dust bin, or a perturbation of both the mass vertical flux and the threshold on the wind friction velocity. As we show in the next section, the latter configuration was deemed better as it spans a larger space of possible system states.
We have tested two different options for the state vector: a control variable consisting of the mixing ratio of eight individual dust bins or the total dust mixing ratio defined as the sum of the eight dust bins at each grid point and for all the vertical levels. In the latter case the mixing ratios for the in-dividual dust bin after data assimilation are determined from the background, i.e. from their relative fractions before assimilation. The observation operator is calculated using the original mixing ratio following the approach for the observation operator in Schutgens et al. (2010b). The tests that we have performed show that representing individually the bins in the state vector does not have any significant impact on the analysis, while it increases the computational cost of the assimilation compared to using the total mixing ratio. Moreover, the use of a bulk approach is common in systems assimilating total AOD values as the observations are not able to fully constrain the individual bin profiles. We should note that this choice of state vector makes still meaningful the creation of the ensemble perturbing the vertical flux for the individual bins, as this allows us to express in the background the uncertainty in the size distribution at sources, and to span a larger space of possible system states.
In the next section we show the results of assimilating NRL MODIS NRL and MODIS Deep Blue observations using 12 ensemble members obtained by perturbing the mass vertical flux per bin at sources together with the threshold on the wind friction velocity, as described in Sect. 3.1, and using the total dust mixing ratio as the analysis variable in the state vector. All simulations were run on a global domain with 40 hybrid pressure-σ layers, 5 hPa top pressure, and a horizontal resolution of 2.8 • by 2 • . The NCEP final analysis at 1 • by 1 • at 00:00 UTC was used to initialize the meteorology at every forecast run.

Methodology for the evaluation of the simulations
The evaluation of the simulations is done in three stages: (a) an internal check of the data assimilation system; (b) evaluation of the analysis using as reference independent observations; (c) evaluation of a 5-day forecast with and without analysis initialization using as reference independent observations.
The consistency of the data assimilation system is checked through considerations of statistics of the ensemble, of simulation departures from assimilated observations, and of the temporal mean of assimilation increments. The ensemble mean and the coefficient of variation for the ensemble are calculated with and without data assimilation. The coefficient of variation is defined as the ratio of the standard deviation of the ensemble to the ensemble mean. Additionally, statistics for first-guess (FG) and analysis (AN) departures are calculated, where departures are defined as the difference between assimilated observations and simulations (first-guess or analysis), while mean increments are defined as the temporal mean of differences between analysis and first guess at the different time slots of the assimilation window.
The evaluation of analysis and forecast with respect to independent observations are performed in terms of statistics of model field errors e i from observations, where e i =  o i +m i is added to the most widely used set of statistics for the error as it behaves symmetrically with respect to underestimation and overestimation without emphasizing the outliers, and is normalized to the sum of observation and simulation values. The SD of the error, though it can be derived from the other statistics, is also reported so as to make more explicit the changes in the bias-free mean square error and aid the interpretation of the evaluation results. The above set of evaluation statistics are calculated for measurements from individual ground-based stations, groups of stations, regional domains observed by satellite sensors, and globally.
For AERONET AOD measurements dust-dominated conditions are identified using the approach of Basart et al. (2009) as follows: AOD is classified as "Dust" AOD when the associated AE < 0.75; we set "Dust" AOD to 0 when the associated AE > 1.3; we identify a mixed aerosol type when the associated 0.75 < AE < 1.3. The latter values are excluded from the validation. We use the AERONET AOD value closest to the model time step and within a ±30 min interval. For satellite AOD retrievals we use the set of satellite observations quality-controlled and filtered for dustdominated conditions used in the assimilation step. We use these satellite observations to validate uniquely the forecast range following the assimilation window. We show the forecast evaluation statistics corresponding to measurements and simulations at 12:00 UTC only, so that they refer to an approximately equal number of pairs of observations and model simulated values at each forecast lead time that we are considering. Hence a smaller number of AERONET observa- tions (at 12:00 UTC only) are used to verify the forecast compared to the ones used in the evaluation of the analysis.
We have identified eight regions of interest for the validation purposes in our study period, namely Long Atlantic transport (LongAtl), Short Atlantic transport (ShortAtl), Sub-Sahel (SubSahel), Sahara (Sahara), Extended Mediterranean (ExtMediter), Middle East (MiddleEast), Central Asia (CenAsia), and East Asia (EastAsia). These names do not necessary correspond to the conventional names of exact geographical locations but are meant to identify regional domains in a convenient way according to dust intrusions and to group observational stations. Most of the regional domains contain ground-based stations with a minimum number of observations during the study period (stations with fewer than 30 "Dust" observations are discarded), with the exception of Central and East Asia. The ground-based stations are listed in Table 2 and shown in the map in Fig. 5 together with regional domains used for the validation of the experiments against either ground-based or satellite observations.

Ensemble, departure, and increment statistics
We compare here the dust fields in the Control, ENS-freerun, DA-NRL, and DA-NRL-DB experiments in terms of mean values and, when applicable, ensemble spread. Figure 6 shows the dust AOD values averaged over a month of the study period for the four above experiments, and the difference in AOD between the data assimilation experiments and the ENS-free-run. By visual inspection it can be noticed that the ensemble mean of the ENS-free-run experiment compares well with the Control experiment, which suggests that the ensemble perturbations are altering only to a small extent the model mean state as defined by a standard run. The analysis clearly shows conspicuous changes in the dust field compared to the Control experiment or the ENS-freerun. Figure 7 shows the coefficient of variation for AOD in the ENS-free-run and the data assimilation experiments. Data assimilation clearly lowers the values of the coefficient of variation in the regions where observations are present, with values lower for the DA-NRL-DB than for the DA-NRL experiment, which indicates a reduction of the ensemble spread due to the assimilated observations. The high values of the coefficient of variation in the Southern Hemisphere, with or without data assimilation, are due to the perturbation of the dust sources present in the southern part of the globe. These values are not negligible due to differences among the ensemble members normalized to small dust AOD values. The ensemble of Fig. 7 (and Fig. 6) is created by perturbing the emitted mass vertical flux for each dust bin and the threshold on the friction velocity generating dust horizontal flux. Creating the ensemble without perturbing the threshold on the friction velocity produces a reduced spread. See Fig. 8 for this second configuration of the ensemble with a coefficient  of variation for the ENS-free-run in the left panel and for the experiment with data assimilation in the right panel. Perturbing the threshold on the friction velocity has an impact on the spread also outside source regions because, as explained earlier, the spin-up period for the ensemble ensures that perturbations applied at the sources propagate everywhere. Furthermore, this ensemble configuration better represents model uncertainty since the ratio of the prior total spread (the square root of the sum of the ensemble background variance and the observation error variance) to the prior RMSE (of the ensemble background against NRL MODIS and MODIS Deep Blue observations) is closer to 1 compared to when no perturbation is applied to the threshold on the friction velocity. It should be noted, however, that this chosen ensemble configuration under-represents uncertainty since it has a prior total spread smaller than the RMSE (ratio equal to 0.82). Other better perturbations should be tested for a future implementation since an underrepresentation of the background uncertainty might translate into giving a lower weight to the observations with respect to the background. We evaluate in the rest of this section the assimilation experiments in terms of statistics of the departures of the analysis and first guess from the assimilated satellite observations. Figure 9 shows for May to August 2007 first-guess dust AOD (in the left panels) and analysis dust AOD (in the right panels) versus observations for the DA-NRL and DA-NRL-DB experiments. The departure statistics with respect to the two sets of observations that we have assimilated are in Table 3.
In both experiments a smaller scatter and a higher correlation coefficient for the analysis indicate that the assimilation improves the agreement with observations and hence a positive sanity check of the data assimilation system. The asymmetric behaviour of all the analysis scatter plots suggests that the system is more successful in correcting too high AOD values than correcting too low AOD values, which could be due to the fact that usually we have larger observation errors and a smaller ensemble spread for low AOD values. The BIAS is significantly smaller than the RMSE and the RMSE improves in the analysis over the forecast. The issue of a higher BIAS in the analysis departures compared to the first-guess departures has been identified in other assimilation systems (see Benedetti et al., 2009, Sect. 4) and might be attributed to the fact that AOD is a positive definite variable, as this provides a deviation from the Gaussianity condition in the prior which is assumed in the analysis step. A solution to this problem worth investigating in the future would consist in applying a transformation of the state variables into new variables which present Gaussian features, a procedure known as Gaussian anamorphosis (Amezcua and Van Leeuwen, 2014). Figure 10 shows global maps of mean dust AOD analysis increments, i.e. the monthly averaged difference between analysis and short-term forecast respectively in the case in which only NRL MODIS AOD observations are assimilated and in the case in which also MODIS Deep Blue AOD observations are assimilated. Both experiments show non-zero systematic increments which are to be interpreted as sys-    tematic corrections that these sets of observations are making, in particular removing mass close to sources and, to a lesser extent, adding mass in the outflow. The spatial distribution of the increments highlights the role that MODIS Deep Blue observations play in particular over the Sahara dust sources. There are some regions where the two data assimilation experiments show opposite increments. This could be due to unresolved conflicting biases between the two types of MODIS retrievals.

Validation of the analysis
We perform in this section a validation of the dust fields simulated either with or without data assimilation through a comparison with observations from ground-based stations that have not been assimilated for May to August 2007. We calculate the statistics for individual stations and for groups of stations. Figure 11 shows the time series of dust AOD values for May to August 2007 for the Control experiment (blue), for the analysis of the DA-NRL (green) and of the DA-NRL-DB (red) experiment, and for AERONET obser-vations in dust-dominated conditions (black) at six locations within the different regional domains of Fig. 5, which are in the proximity of dust sources (Tamanrasset in Algeria), affected by short-range dust transport (Dakar in Senegal, Ilorin in Nigeria, and Hamim in the United Arab Emirates), or affected by long-range dust transport in Europe (Lecce in Italy), and across the Atlantic (La Parguera in Puerto Rico). For reference, the MODIS AOD observations from the assimilated dataset (NRL and Deep Blue), which are at the closest distance and within a 2 • radius from the location of the AERONET station, are also included in the time series (magenta circles). Note, however, that these latter observations are not an independent reference for validation of the analyses, nor are they entirely representative of the observational constraint used to calculate the analysis in the given station location. The time series show an overestimation in the Control experiment of the optical depth near the sources, and to a smaller extent in the transport, which clearly suggests that the model tends to overestimate dust emissions. The current calibration for model version 1.0 has the shortcoming of accurately capturing long-range transport at the expense of an overestimation over the sources. This overestimation is reduced with data assimilation. By a first eyeball inspection, the AOD simulation variance is reduced by data assimilation and is more in accordance with the AOD observation variance.
Maps in Fig. 12 show results of validation statistics calculated for the full study period at each AERONET station for the three experiments performed. These maps allow us to appreciate the strongest features of the three simulations at individual AERONET stations and how those stations are representative of the regional domains that we have identified. The Control experiment shows that the strongest BIAS and highest RMSE are in the sub-Sahel region. The BIAS indicates that the model systematically overpredicts AOD in that region. The highest FRGEs are in the long transport over the Atlantic or Europe, as expected in areas of low AOD values. The correlation between model and observation values is in general lower near source areas than in outflow regions. This could be due to the too coarse model resolution not able to follow as well as the observations the dynamic of the dust field near source areas. The assimilation of MODIS NRL observations decreases some of the strongest biases, in particular in the dust outflow regions in the Sahel and over the African Atlantic coast, which is reflected in a reduced FRGE and RMSE, and is associated with improved correlation. The assimilation of the MODIS Deep Blue observations additionally to the NRL MODIS observations is of further benefit: it reduces the BIAS and RMSE downwind from the strongest dust sources of the Sahara. It is also relevant to notice that the additional assimilation of MODIS Deep Blue observations improves the correlation over the above areas and in the Arabian Peninsula.
The chart plots for the validation statistics calculated for all the AERONET stations considered (hereafter called global statistics) and for stations grouped according to regional domains of interest are respectively in Figs. 13 and 14. Global statistics show that assimilation produces in general a better representation of dust concentrations in the atmosphere, and that the assimilation of Deep Blue retrievals has a positive impact over the assimilation of Dark Target retrievals only.
When considering the regional domains, the assimilation of NRL MODIS AOD has a positive impact on the quality of the analysis everywhere, with the only exception of a slight increase in RMSE in the Middle East region. This positive impact is more pronounced in the short Atlantic transport and in the sub-Sahel region. The additional assimilation of MODIS Deep Blue AOD has a considerable positive impact in the Sahara, sub-Sahel, and Middle East regions, and is neutral or slightly detrimental in the rest of the transport, in particular in the long-range Atlantic transport. The correlations for the global domain and for all the regional domains are highly statistically significant with the exception of the Sahara region (in the Control and DA-NRL experiments only), where the number of observations is smaller than other regional domains.
It should be noted, however, when interpreting the above statistics that the validation against AERONET observations may introduce significant errors when comparing a global model grid box against a point observation (Schutgens et al., 2016).

Validation of the forecast
We have validated the forecast up to 5 days ahead initialized at 00:00 UTC from either the Control experiment or an analysis (from DA-NRL-DB). We have calculated for May to August 2007 the errors for the forecast at 12, 36, 60, 84, and 108 h (hereafter indicated as FC + 12, FC + 36, FC + 60, FC + 84, and FC + 108) with respect to either AERONET observations or satellite observations. As mentioned when describing our evaluation methodology, we use as a reference the set of satellite observations from the Dark Target and Deep Blue algorithm ingested in the assimilation step, i.e quality-controlled and filtered for dust-dominated conditions. They are used only to validate the forecast range following the assimilation window. As expected, all the validation statistics worsen with increased forecast step in both experiments (see Fig. 15 for global statistics). The impact of initializing the model with a dust analysis is positive on the first day. The analysis produces a better forecast in terms of BIAS and RMSE (and also SD of the error) up to FC + 108, and a better correlation on the first day. The correlation is slightly lower from FC + 36 onwards. The conclusions drawn by validating against AERONET or satellite observations are equivalent. Results calculated for regional domains (Fig. 16) show that the Control experiment tends Figure 14. BIAS, RMSE, CORR, FRGE, and SD for the Control experiment, the DA-NRL experiments, and the DA-NRL-DB experiment calculated against AERONET observations for groups of stations within the regional domains in Fig. 5. The dust mean AOD for the observations used for validation during the experiment period is also reported.
to overestimate AOD everywhere with the exception of Central and East Asia. This suggests an overestimation in particular of the Sahara emissions which is consistent with the bias found in the analysis and which is maintained during the forecast. The correlations for the global domain and for all the regional domains, at all forecast lead times, are highly statistically significant. Initializing the 00:00 UTC forecast with the DA-NRL-DB dust analysis reduces the overestima-tion compared to satellite retrievals on the first day of the forecast consistently with the improvement observed in the analysis in the previous section. However, this produces an underestimation of AOD in the long-range Atlantic transport during all the forecast lead times, which, because of the relatively small AOD values in that area, is reflected in particular in the FGRE. Although there is an overestimation of AOD, there is a better agreement of the temporal evolution in that Figure 15. BIAS, RMSE, CORR, and FRGE for the forecast at 12, 36, 60, 84, and 108 h of the Control (blue) and AN-initialized (red) experiments, i.e. the experiment initialized with the DA-NRL-DB analysis, calculated against AERONET observations (left) and against global satellite retrievals, both NRL MODIS and MODIS Deep Blue, (right) filtered for dust-dominated conditions. The AERONET stations are the ones in Fig. 5. The dust mean AOD for the observations used to validate the 12 h forecast during the experiment period is also reported.
region. The underestimation of AOD in the Atlantic transport might be due to too strong a deposition which affects in particular the long-range transport, and in the standard run is compensated for by an overestimation over the sources. As said earlier, a shortcoming of the current model calibration is to capture well the long-range transport at the expense of an overestimation over the sources, which data assimilation reduces. To identify the exact cause of it will require, however, further investigation together with a better adjustment of the current model parameters. With the exception of this underestimation of AOD across the Atlantic, all the error statistics and correlation coefficients are improved in the first day of the forecast in all the regional domains. The error of the analysis-initialized forecast is lower also in the rest of the forecast range (up to 5 days), though, after day 1, the correlation with satellite observations in some regions (Sub-Sahel and ShortAtl) is lower for the analysis-initialized forecast than for a standard forecast. It is particularly relevant to notice that the dust forecast over the Sahara is improved for all the statistics and throughout the forecast range.

Conclusions
We have developed a data assimilation system for the NMMB-MONARCH model version 1.0, which considers dust only, while other aerosols are being implemented. We have coupled NMMB-MONARCH with an ensemble-based data assimilation technique known as LETKF. For this purpose we have created a forecast ensemble based on known uncertainties in the physical parametrizations of the mineral dust emission scheme. We have processed satellite aerosol optical depth retrievals for assimilation with a dust filter. Due to the presence of other aerosols in the selection of dustdominated conditions, uncertainties might have been intro-duced into our assimilation process. It should be noted however that the identification of dust-dominated conditions is performed in this study as a proof of concept to demonstrate the potential of using data assimilation in NMMB-MONARCH, and will not be strictly necessary in a future model upgrade including all the major aerosol species. Still, efforts towards aerosol speciation could continue to be pursued when assimilating information about total aerosol optical properties. In this respect, operational centres currently rely merely on model background to distribute assimilation increments among the different aerosol species.
Assimilation experiments showed that aerosol optical depth retrieved with the Dark Target algorithm can help NMMB-MONARCH to better characterize atmospheric dust. This is particularly true for the analysis of the dust outflow in the Sahel region and over the African Atlantic coast. The additional assimilation of Deep Blue retrievals has a further positive impact in the analysis downwind from the strongest dust sources of the Sahara and in the Arabian Peninsula.
An analysis-initialized forecast performs better (lower forecast error and higher correlation) than a standard forecast everywhere on the first day of the forecast. The only exception to this is an underestimation of the forecast of AOD in the long-range Atlantic transport. The error of the analysisinitialized forecast is lower also in the rest of the forecast range (up to 5 days), though, after day 1, in sub-Sahel and short Atlantic transport the temporal evolution of dust is less in agreement with independent observations, compared to a standard forecast. Particularly relevant is the improved forecast over the Sahara throughout the forecast range thanks to the assimilation of Deep Blue retrievals over areas not easily covered by other observational datasets. To the best of our knowledge, this is the first study quantifying the benefit of assimilating MODIS Deep Blue from Collection 6 specifi- Figure 16. As the right panel of Fig. 15 but for the different regional domains of Fig. 5. cally for mineral dust simulations. This product is currently operationally assimilated by the UK Met Office, who consider only Deep Blue observations over desert, and by the European Centre for Medium-Range Weather Forecasts.
In our future implementation of the forecast ensemble, we plan to exploit spatial patterns of variation in model parameter uncertainty, for example source-dependent uncertainties, as well as uncertainties in the deposition term. A better representation of uncertainties in dust emission flux inherently will help the representation of uncertainties in other parts of the dust cycle. A recent study by Rubin et al. (2016) shows that, for their system, combined meteorology and aerosol source ensembles are necessary to produce sufficient spread in outflow regions. Notwithstanding that their conclusion might be system-dependent, we will take into account their results in our future studies.
Code and data availability. Copies of the code and data are readily available upon request from the corresponding authors.
Competing interests. The authors declare that they have no conflict of interest.