Accounting for model error in air quality forecasts: an application of 4DEnVar to the assimilation of atmospheric composition using QG-Chem 1.0

Abstract. Model errors play a significant role in air quality forecasts. Accounting for them in the data assimilation (DA) procedures is decisive to obtain improved forecasts. We address this issue using a reduced-order coupled chemistry–meteorology model based on quasi-geostrophic dynamics and a detailed tropospheric chemistry mechanism, which we name QG-Chem. This model has been coupled to the software library for the data assimilation Object Oriented Prediction System (OOPS) and used to assess the potential of the 4DEnVar algorithm for air quality analyses and forecasts. The assets of 4DEnVar include the possibility to deal with multivariate aspects of atmospheric chemistry and to account for model errors of a generic type. A simple diagnostic procedure for detecting model errors is proposed, based on the 4DEnVar analysis and one additional model forecast. A large number of idealized data assimilation experiments are shown for several chemical species of relevance for air quality forecasts (O3, NOx, CO and CO2) with very different atmospheric lifetimes and chemical couplings. Experiments are done both under a perfect model hypothesis and including model error through perturbation of surface chemical emissions. Some key elements of the 4DEnVar algorithm such as the ensemble size and localization are also discussed. A comparison with results of 3D-Var, widely used in operational centers, shows that, for some species, analysis and next-day forecast errors can be halved when model error is taken into account. This result was obtained using a small ensemble size, which remains affordable for most operational centers. We conclude that 4DEnVar has a promising potential for operational air quality models. We finally highlight areas that deserve further research for applying 4DEnVar to large-scale chemistry models, i.e., localization techniques, propagation of analysis covariance between DA cycles and treatment for chemical nonlinearities. QG-Chem can provide a useful tool in this regard.


Introduction
In recent years, data assimilation (DA) of atmospheric constituents has become a key tool for providing more accurate forecasts and reanalyses of the atmospheric composition. The increasing availability of chemical observations from both satellites and ground-based instruments allowed to reduce the uncertainty of atmospheric chemistry models in a large number of applications. Utilization of DA can be found in the modeling of volcanic ash (Lu et al., 2016), in operational air quality forecasts at continental scale  or in the reanalysis of the global atmospheric composition at decennial scale (van der A et al., 2010;Inness et al., 2013). Data assimilation can also be used to infer surface fluxes of long-lived chemical compounds (Thompson and Stohl, 2014;Chevallier et al., 2005). A review of the utilization of data assimilation for the atmospheric composition can be found in Zhang et al. (2012) and Bocquet et al. (2015).
The main goal of DA is to reduce the uncertainties of a model through a timely combination of model results and observations. This is generally done by means of correcting the so-called "control variables" of the given model. with the model initial state (Elbern et al., 1997) or chemical emissions (Chevallier et al., 2005). However, inaccurate identification of the model uncertainty can lead to a wrong adjustment of the model through DA, even though the spread between model predictions and assimilated observations is reduced (Tang et al., 2016). The choice of the control variable, and the approximate knowledge of its uncertainty (also named background error covariance), is therefore critical for the design of an appropriate assimilation algorithm and to ensure correct results of DA.
Principal sources of uncertainty of atmospheric chemistry models include the model initial condition, model ancillary data or parameters, model physical parameterization, chemical mechanism, etc. (Beekmann and Derognat, 2003;Mallet and Sportisse, 2006). Since different chemical species can be sensitive to different physical and chemical processes, the main sources of uncertainties can also differ from species to species. For example, long-lived species like CO 2 or CO are mostly sensitive to uncertainties in surface fluxes, which can be corrected using variational algorithms in combination with long assimilation windows of several weeks (Chevallier et al., 2005;Koohkan and Bocquet, 2012). This is possible since the chemical reactivity and the sensitivity to the initial condition are negligible for long integration of the model. Uncertainty in the transport processes is also generally neglected (Babenhauserheide et al., 2015). Data assimilation of short-lived gases like tropospheric O 3 or NO 2 , which are encountered in air quality applications, is instead trickier. O 3 and NO 2 are involved in rapid chemical reactions and sensitive to several model parameters ranging from reaction rates, emissions of primary species, clouds and radiation, boundary layer mixing, etc. It is much more difficult in this case to identify a single and predominant source of uncertainty in the model predictions.
In most air quality operational models a pragmatic choice is currently made by setting the control variable to the initial state of the measured species and using short forecast/assimilation cycles (e.g., 1 hour). Since ground-based measurements are generally available at hourly frequency, most used sequential DA algorithms, such as optimal interpolation (OI), 3D-Var or ensemble Kalman filters (EnKFs) , all provide a strong constraint on model trajectories (Wu et al., 2008). This strategy gives robust results for operational analyses because chemical fields are corrected every hour. Since the control variable correspond to the assimilated observations, this also permits to estimate the background error covariance from previous validation of the model against observations (Hollingsworth and Loennberg, 1986) and keeps from the difficult diagnosis of the true model uncertainties.
However, the model dynamics are neglected in OI or 3D-Var DA schemes. Attempts of using ensembles of model analyses to specify the background error covariance within 3D-Var dynamically did not show clear improvements over the static case (Jaumouillé et al., 2012). Specification of cross correlations between interacting chemical species in the 3D-Var background error covariance matrix is also particularly difficult, because chemical interactions depend on the local concentrations and on meteorological conditions. As a consequence, multivariate chemical DA with 3D-Var schemes has not been yet documented in the literature. In EnKF systems, the forecast model is used to propagate and estimate the background error covariance, but ad-hoc adjustments are necessary to avoid the collapse of the ensemble variance and obtain realistic covariance matrices for 1 h forecasts (Gaubert et al., 2014;Constantinescu et al., 2007a). As a result, costly algorithms such as EnKF or 3D-Var hardly give better results than more simple OI for chemical reanalyses (Rouil and the MACC team, 2014). More importantly, very little improvement is obtained, regardless of the employed DA algorithm, for the next-day model forecast (Wu et al., 2008). Forecasts of reactive gases such as O 3 or NO 2 , but also other pollutants such as aerosols mixtures (PM 10 and PM 2.5 ), depend weakly on the initial condition and are more sensitive to model settings such as surface emissions or physical parameterizations. Current operational systems can achieve accurate reanalyses of observed chemical species through DA but parameter estimation and, more generally, model errors must be taken into account in DA to improve chemical forecasts of reactive gases and particles.
Some studies evaluated more advanced DA algorithms to jointly correct surface emissions of precursor species and initial condition of observed species. For example, Elbern et al. (2007) employed a 4D-Var scheme in combination with assimilation windows of 24 h to assimilate O 3 , NO x and SO 2 measurements. A similar study has been done also in the context of a toy model experiment by Hamer et al. (2015), where only emissions of precursor species are adjusted to improve O 3 forecasts. Results seem promising but still rely on the assumption that the model is almost perfect, i.e., that there are no additional sources of uncertainties in the model forecast other than the controlled variables (the initial state and the selected emissions). This can lead to the overcorrection of control variables when other non-negligible model errors exist, for example, due to the meteorological forcing, photochemistry coefficients, dry or wet deposition. Concerning EnKF implementations, some authors also tested joint optimization of the chemical state and precursor emissions (Miyazaki et al., 2012;Tang et al., 2011;Constantinescu et al., 2007b). EnKF naturally includes model uncertainties in its formulation, which can be added through stochastic perturbation of model parameters during the ensemble forecast (Evensen, 2003). However, EnKF corrects the model trajectories sequentially. In a typical air quality context, the emissions of O 3 precursor species (e.g., NO x and VOCs) in the early morning or night can affect the concentration of observed species (e.g., O 3 ) in the early afternoon, when the photochemistry takes place. When using EnKF, the information made available by afternoon measurements cannot be used to correct the model at previous hours. Based on above-mentioned facts, we determine that the following capabilities are needed to further improve DA in air quality applications: permit simultaneous assimilation and optimization of multiple chemical species (multivariate DA), with possible chemical interactions and very different lifetimes; include model error and account for disparate sources of model uncertainty; allow long assimilation windows to make best use of the information content of frequent air quality observations. This can be accomplished by using the so-called "weak constraint" 4D-Var (Trémolet, 2006), which is an extension of the 4D-Var algorithm that accounts for the model error. On top of the operators already needed to perform the strong constraint 4D-Var (e.g., the tangent linear and adjoint codes of the forecast model), the formulation of the weak constraint 4D-Var requires the definition of the model error covariances Q, which can be difficult to estimate in real applications (Trémolet, 2007). Recent studies have shown that the linear operators needed in the weak constraint 4D-Var formulation can be approximated through an ensemble of model forecasts.
For example, the 4D-Var-EnKF (Mandel et al., 2016) uses an ensemble approach to mimic the tangent linear and adjoint model for the minimization of the weak constraint 4D-Var cost function. The iterative ensemble Kalman smoother (IEnKS, Bocquet and Sakov, 2014) is a nonlinear 4DEn-Var formulated under perfect model assumptions, which can also be used to estimate erroneous model parameters through an augmented state formalism (Bocquet and Sakov, 2013). A major asset of IEnKS for chemistry applications is that it can also account for strong nonlinearities of the forecast model. The 4DEnVar method (Desroziers et al., 2014) uses an ensemble of nonlinear model trajectories to estimate both the error covariances for the initial condition and the model error, as well as to approximate the tangent linear and adjoint model. These approaches are generally referred in the literature as ensemble variational EnVar (Lorenc, 2013), as opposed to hybrid methods, which make use of ensembles only to specify error covariances matrices in variational algorithms (Belo Pereira and Berre, 2006). EnVar methods have a major advantage for atmospheric chemistry applications: they avoid the construction of tangent linear and adjoint codes of the forecast model, which are still lacking for most of the operational CTMs, or are becoming very difficult to be maintained due to the rapid evolution of models and computer architectures. The main advantage of the 4DEnVar method is that it permits to account for a generic model error through the addition of stochastic perturbations during the model integration step (like in EnKF). Moreover, it focuses exclusively on the estimation of the model state, which is the only variable that is directly constrained by observations. This avoids the difficult specification of Q still needed in the 4D-Var-EnKS. As all ensemble-based methods, 4DEnVar also naturally supports multivariate chemical DA, with the cross-covariance terms between chemical species being automatically obtained from the ensemble of the nonlinear model forecasts.
Variants of the 4DEnVar have been already tested in real numerical weather prediction (NWP) applications (Lorenc et al., 2015). The method has proven to be affordable for large-scale operational NWP models, even though the skills of the operational hybrid 4D-Var are not yet matched. To the knowledge of the authors, EnVar-type methods have not yet been implemented in air quality or atmospheric chemistry models and only one study has already examined the potential of EnVar methods for chemical DA (Haussaire and Bocquet, 2016). Note that, operational NWPs are already based on well-matured 4D-Var DA systems, whereas very few atmospheric chemistry models are based on such systems. Therefore, there is more room for improvement from the EnVar type of algorithms in air quality models than in NWP. Hence, the main objectives of this study are to present a new atmospheric chemistry toy model built for assessing and comparing performances of several DA algorithms; examine the potential and limits of the 4DEnVar algorithm for air quality analyses, compared to the generally used 3D-Var; present a new procedure based on 4DEnVar to improve chemical forecasts on the next day.
The purpose is to examine state-of-the-art DA algorithms in the reactive gases/air quality context and, therefore, to guide future developments for the operational DA systems. Four gaseous species with very different lifetimes and chemical mechanisms, currently well observed either from satellites or from ground-based instruments, are considered for this study (CO,O 3 , NO 2 and CO 2 ). Using a simplified model in this context permits faster implementation of complex DA algorithms, cheaper numerical experiments and more straightforward interpretation of the DA results (Fairbairn et al., 2013). The latter is particularly true compared to DA experiments done using real observations, with generally unknown error statistics. Compared to already-mentioned simplified models (Hamer et al., 2015;Haussaire and Bocquet, 2016), which are, respectively, 0-D and 1-D, the newly proposed model is 3-D and uses the same tropospheric chemistry scheme of operational air quality models. This allows us to reproduce more features of real models, for example, the complex interactions of reactive chemistry and large-scale advection or the effect of boundary conditions. This also permits to better examine typical issues of DA within large systems, like the emergence of sampling errors due to the finite size of the ensemble and the consequences of localization techniques. Additionally, the use of 3-D fields and operators eases the estimation of numerical costs and possible bottlenecks of the DA algorithm in terms of operational implementation. We remind readers that the objective of this study is to demonstrate the applicability of a DA algorithm that could outperform currently implemented methods in operational centers, but with an acceptable compromise between computational costs and precision. Finally, the toy system has been implemented using the library for data assimilation OOPS (Yannick Trémolet, personal communication, 2015) to ease the exchange of assimilation algorithms or toy models between scientists. The paper is outlined as follows. The developed atmospheric chemistry model will be presented in Sect. 2. A summary of the data assimilation algorithms employed in this study is given in Sect. 3. The first section of the numerical results (Sect. 4.1) presents a number of DA experiments done under the hypothesis of the perfect model. A detailed comparison between 4DEnVar and 3D-Var is presented, as well as a sensitivity study on the principal parameters of the 4DEnVar algorithm, i.e., the ensemble size and the localization choices. In Sect. 4.2, the effects of a model error are investigated using the 4DEnVar algorithm. A statistical comparison of the 4DEnVar and 3D-Var performances on multiple cycles of analyses and forecasts is presented in Sect. 4.3. Finally, conclusions are given in Sect. 5.

Model description
A new atmospheric chemistry low-order model has been developed for this study and named QG-Chem. The objective was to reproduce typical features of chemical fields from large-scale chemical transport models (CTMs), but maintaining the computational cost low enough to allow a comfortable usage on a personal computer. The meteorological forcing is computed using a two-layer quasi-geostrophic (QG) model, representative of midlatitude mesoscale dynamics (Pedlosky, 1992). The QG wind field is used to advect the chemical species, which makes QG-Chem a coupled meteorological-chemistry model. This choice permits to examine the behavior of DA in the presence of complex gradients of wind fields and vorticity. Since all DA algorithms make strong assumptions on the model dynamics (Sect. 3), it is important to test them in the presence of advection patterns that can be found in real applications. Nevertheless, the focus of this study remains on atmospheric chemistry. Therefore, a detailed tropospheric chemical mechanism has been considered. Aspects of DA concerning the coupling between meteorology and chemistry are also left for future work, with the present using QG-Chem in a CTM-like mode. The details of the meteorological and chemical models are given next.

Quasi-geostrophic meteorology
The two-layer QG model is a geophysical fluid model composed of two atmospheric layers of fixed depth and potential temperature. It is a simple model of the atmosphere at midlatitudes, whose main forcings are represented by the Coriolis force and the orography or surface heating. The governing equation is the conservation of the potential vorticity q = (q 1 , q 2 ) expressed in nondimensional variables (Fandry and Leslie, 1984): where the subscripts 1 and 2 stand for the top and bottom layer, respectively. ∇ 2 is the two-dimensional Laplacian, R s represents orography or heating, β is the (nondimensionalized) northward variation of the Coriolis parameter at a fixed latitude, F 1 and F 2 couple the layers together being a function of Coriolis force, layer depths, gravity and typical length scale. The stream function ψ = (ψ 1 , ψ 2 ), whose horizontal derivatives give the horizontal wind field (u i , v i ), can be considered as the model state vector. The code of the QG model that is distributed with the OOPS DA library have been used for this study. The depth of the two layers, the resolution of the horizontal grid and the integration time step t are the main model parameters that can be set at runtime. The dimensional scaling and model orography are fixed, as well as the extension of the domain, which is 12 000 km in the zonal direction and 6300 km in the meridional direction. The domain is cyclic in the east-west direction, i.e., the model fields are periodic in this direction. The stream function is set to climatological values at meridional walls (Dirichlet boundary conditions). For all the experiments presented in this study, a coarse resolution of approximately 750 km (16 × 8 grid points, respectively, for the east-west and north-south directions) has been used. We remind readers that the focus of this study is to test chemical DA algorithms in a toy model framework. Therefore, there is no stringent requirement on the realism of the meteorological fields and no need to reproduce a real atmospheric situation. The only desired property is to obtain wind fields that exhibit typical patterns of the complex atmospheric circulation. A summary of the QG model parameters used in this study is detailed in Table 1.

Tropospheric chemistry
The state vector of the QG model has been extended to include chemical species. The regional atmospheric chemical mechanism (RACM) (Stockwell et al., 1997), which describes 96 chemical species with about 300 reactions, has been implemented. This chemical scheme has been developed for air quality modeling and is currently used by a number of operational models in Europe . Photochemistry and its diurnal cycle are included via look-up tables, assuming global clear-sky conditions. Surface fluxes of chemical species (emissions and dry deposition velocities) are assigned at runtime and are kept constant during the temporal integration of the model. Chemical species are advected by the QG wind field using the semi-Lagrangian scheme used for solving the QG governing equation (Eq. 1).
After the advection of the species, their concentrations are updated by addition of the chemical tendencies. They are computed by solving the stiff ODE system that describes the adopted chemical mechanism. The ODE system is of the nonlinear form: where C represents the local species concentrations, P (C) and L(C) the production and loss terms. The stiffness of the systems comes from the wide range of values that can take the loss terms leading to a large range of chemical lifetimes.
The above system is integrated using the adaptative semiimplicit scheme (ASIS). ASIS is a one-step semi-implicit scheme with prognostic time steps. To solve the system of linear equations associated with the semi-implicit scheme, ASIS uses the generalized minimal residual method (Saad and Schultz, 1986) which appears to be very competitive in terms of computation time with good convergence properties. The ASIS solver is mass conservative and adapts its sub-time step to the adopted tolerance errors as described by Verwer (1994). In our application, the ASIS solver uses a 7 s minimum sub-time step, an absolute error tolerance of 10 4 molecules cm −3 and a relative tolerance error of 0.01. Therefore, a common integration time step (dt) for both the dynamical and chemical solvers is used, which is set to dt = 10 min to ensure reliable chemistry solutions. The dry deposition is computed for each species before the application of the ASIS solver, i.e., using the main model time step dt. Concentrations are updated according to where i denotes the chemical species and λ i is the corresponding deposition timescale in s −1 , proportional to the deposition velocity. The chemical field is set equal to climatological values on the N-S boundaries, which correspond to Dirichlet-type conditions. The values of surface pressure and temperature used by the chemical mechanism are fixed globally and do not depend on the QG fields. These modeling choices let this study focus on the following main processes of air quality models: emissions, chemistry and transport. Therefore, only the bottom layer of the QG-Chem model will be analyzed throughout this study. A summary of the configuration of QG-Chem is given in Table 1.

Description of the case study
A model run of 20 days is performed prior to DA experiments, starting from the initial condition given in Table 2, which corresponds to a homogeneous, relatively clean atmosphere and zonal circulation. Emissions (Table 3) are taken from the study of Crassier et al. (2000), FLUX case, representative of the urban environment of Paris. Spatial heterogeneity of model concentrations is obtained by scaling the reference emissions in Table 3 by 0.01, 1 and 0.25, respectively, on the western, central and eastern parts of the domain (Fig. 1). Deposition velocities from the French air quality model MOCAGE , averaged over the Paris region during the month of July 2010, are used and set constant over the QG-Chem domain. Values are reported in Table 4. The chemical concentrations at the meridional boundaries are set to the same values as in Table 2. Hence, the presence of N-S boundaries can eventually counterbalance the growth of long-lived species by advection of clean air masses from outside the domain. This configuration relates to regional air pollution modeling mainly concerning the type and amplitude of chemical emissions, spatial heterogeneity of sources and presence of boundaries.
Results for four key species are considered through the study: nitrogen dioxide (NO 2 ), ozone (O 3 ), carbon monoxide (CO) and carbon dioxide (CO 2 ). The first three are of concern for air quality, since they have an elevated toxicity and their concentration is strongly related to anthropogenic emissions. The chemical reactivity and typical tropospheric lifetime of NO 2 , O 3 and CO is, however, very different. NO 2 typically arises from the oxidation of nitric oxide (NO) in combustion processes. It is highly reactive, lasting in the atmosphere from a few hours in summer to several days in winter, and it is Table 2. Initial conditions used to initialize the truth simulation. The initial fields are constant over the QG-Chem domain. Same values are also assigned to the meridional boundaries of QG-Chem during all simulations. Values are equal to zero for chemical species that are not listed below. Chemical concentrations are expressed in volume mixing ratio units (vmr).

Variable
Value 150 × 10 −9 Cl 1 × 10 −12 HOCl 1 × 10 −13 HCl 1 × 10 −12 Br 1 × 10 −13 BrO 1 × 10 −13 HBr 1 × 10 −13 an ozone precursor. O 3 is a secondary gas, mainly formed by the reaction of nitrogen oxides and hydrocarbons under sunlight. In the troposphere, it has a typical atmospheric lifetime of 2-3 weeks. CO is produced by partial oxidation of carbon compounds, which can occur in industrial or natural combustion processes. It also participates in O 3 chemistry and has an atmospheric lifetime of about 1-2 months. Finally, CO 2 is of major concern for its effect on climate and has the longest lifetime among all the considered species (> 30 years). However, CO 2 fluxes are not activated in the experiments, which makes this gas behave like a passive tracer in our study. Averaged model fields (24 h) on day 20 are shown in Fig. 2. We note the presence of zonal gradients of chemical concentrations for species that are strongly related to surface emissions (e.g., NO 2 , O 3 and CO), which are higher in the central part of the domain (Fig. 1). The developed cyclonic circulation also allows the accumulation of longerliving pollutants (O 3 , CO) in correspondence with the central low-pressure system, whereas patterns of short-living gases (NO 2 ) maintain a stronger similarity with the geographical distribution of the sources. The maps also display the influence of meridional boundary conditions, which produce local minima in O 3 and CO fields in correspondence with the advection of clean air masses from outside the domain. The average model trajectory during 24 h shows significant differences among all considered species. Since all species are advected and surface fluxes are constant in time, these features arise from the complex chemical interactions and photochemistry. Note, for example, the daylight increase of O 3 as a consequence of NO 2 production from NO emissions during nighttime and daytime photolysis. In contrast, CO shows an almost linear increase in time, due to a constant surface emission and longer lifetime. Most of the numerical experiments that are shown later in this study start on the abovediscussed day.

Data assimilation algorithm
We considered two data assimilation algorithms in this study: 3D-Var and 4DEnVar. The first is the simplest type in the family of variational DA algorithms and currently the most used in operational chemical assimilation systems . It is taken as a reference against which the benefits of more complex (and costly) algorithms can be assessed. 4DEnVar is an hybrid algorithm that combines benefits of variational and ensemble methods. It is already used in a number of NWP models (Buehner et al., 2010;Lorenc et al., 2015) and was tested in the framework of meteorological toy models (Desroziers et al., 2014;Fairbairn et al., 2013). A summary description of the two algorithms is given below, as well as some specific aspects relative to the atmospheric chemistry implementation presented in this study. In the third section, a method based on the postprocessing of 4DEnVar output is proposed to correct model biases.  The Object Oriented Prediction System (OOPS), a generic software framework to develop data assimilation systems (Y. Trémolet, personal communication, 2015), was used to implement and run all the DA experiments described in this study.

3D-Var
The 3D-Var analysis can be computed after a model forecast time step by means of minimizing the quadratic cost function J (Kalnay, 2003): for the increment δx used to correct the previous forecast x b (also named background), i.e., x = x b + δx, where x is the control variable (e.g., the 3-D chemical state). Here, B and R are the background and observation error covariance matrices, H the linearized observation operator that transforms an increment of the control variable into an increment in the observation space, δy o the difference between the observations vector y o and the previous forecast , using the nonlinear observation operator. The minimization of J can be achieved with standard techniques for the solution of large linear systems: the B-preconditioned conjugate gradient algorithm has been used in this study (Derber and Rosati, 1989). The result of the minimization is the analysis increment δx a (3-D). To advance in time, the analysis x b + δx a is used as the new initial condition for the following forecast step and so forth. The relative simplicity, efficiency and robustness of the 3D-Var algorithm make it very suitable for operational models. Its practical implementation requires mainly the development of a covariance model for B (Weaver and Courtier, 2001). Multivariate chemical assimilation can be performed with 3D-Var by extending the 3-D control variable x to contain multiple 3-D model variables (chemical species).
In this study the control variable x is set to represent the complete model state, i.e., the stream function ψ plus the 96 chemical species. The covariance matrix B is modeled through the sequential application of 1-D square root correlation operators and a diagonal matrix, representing the background error standard deviation: where C z , C x , C y and C v are, respectively, the vertical, zonal, meridional and multivariate correlation operators and is the variance (diagonal matrix). C x and C y are isotropic homogeneous correlation operators providing Gaussian spatial structures. The other correlation operators are represented by symmetric positive-definite matrices. The following parameters are used to set B: one horizontal-length scale that defines the decorrelation scale for the zonal and meridional coordinates, one value for the vertical correlation and one value for the chemical correlation between each couple of model variables. The variance is specified using one global value for each variable. The resulting B is uniform and homogeneous on the horizontal plane. More complex B models could be introduced to account, for example, for spatial variability and heterogeneity of the background error covariance. However, this is out of the scope of the present study, which is intended to reproduce typical operational chemical DA settings, where B is usually specified using a single variance and correlation length for each chemical species. Only surface observations are considered in this study. Therefore, the observation operator H is represented by the bi-linear interpolation of model values to the observation location. A diagonal observation error covariance R is used, as it is the case in most real DA systems.
One drawback of 3D-Var is that DA results rely strongly on the background error covariance B, which should depend on the previous assimilation cycles and forecast errors (flow dependence). In practical applications, B is usually set constant, estimated from verification of previous forecasts (climatological) and/or tuned to provide the best fit of the analyses against independent observations. In the case of multivariate chemical assimilation, the estimation and validity of climatological error covariances between chemical species has not yet been demonstrated. As a consequence, multivariate corrections are normally neglected (C v = I). This simplification is also used in this study. Finally, 3D-Var provides a correction to the control variable each time the system is observed (every hour in air quality applications). This does not allow to exploit the dynamical information contained in observation time series and prevent all estimation of model error terms.

4DEnVar
The 4DEnVar algorithm can solve the main drawbacks of 3D-Var by introducing the temporal dimension in the quadratic cost function (Eq. 6), similarly to what the classic 4D-Var algorithm does (Dimet and Talagrand, 1986). However, compared to the latter, it avoids the introduction of the tangent linear and adjoint codes of the forecast model. Following the notation in Desroziers et al. (2014), the 4DEnVar cost function can be written as where all underlined terms are now time dependent (4-D).
The control variable x becomes the temporal trajectory of the model state (ψ plus the 96 chemical species in this study). The cost function is computed for an assimilation window that can span several hours or days. The assimilation window is discretized with an arbitrary number of sub-windows, which defines the temporal dimension of the 4-D vectors and matrices in Eq. (9). The minimization of J returns a 4-D vector δx a , which provides the analysis trajectory for the entire assimilation window. The forecast error covariance B e is estimated from an ensemble of perturbed model trajectories: where x are the four-dimensional perturbation and L denotes the size of the ensemble. It results that B e describes spatial (3-D), multivariate and temporal covariances at once. An analogy with the 4D-Var weak constraint cost function shows that B e represents a numerical approximation of the linearized forecast model and model error covariance (Desroziers et al., 2014). As for the 3D-Var case, the minimization of J is achieved using a B-preconditioned conjugate gradient algorithm (algorithm no. 3 in Desroziers et al., 2014). H and R are block diagonal matrices whose blocks are the operators H and R defined in Sect. 3.1 for each subwindow.
A nice property of the 4DEnVar algorithm is that the effect of the model error covariance, generally denoted by Q (Trémolet, 2006), can be introduced easily by adding physical perturbations during the computation of the ensemble of forecasts. This lets the model dynamics develop the complex covariances that derive from physical perturbations, without the need to specify and implement a covariance model for Q. Whenever the sole initial condition of the ensemble is perturbed at the beginning of the assimilation window, the 4DEnVar approximates the 4D-Var algorithm in the strong constraint formulation. The flexibility of the model error specification in 4DEnVar is a strong asset for atmospheric chemistry DA, where the sources of uncertainty can be highly variable and a function of the chemical species.
The main drawback of 4DEnVar in practical applications to large-scale models is related to the finite size of the ensemble. As a consequence, the 4-D error covariance B e is not fully ranked and covariance terms may contain statistical noise. The effect of a noisy covariance is the appearance of spurious analysis increments far away from assimilated observations, or between physically uncorrelated model variables (in the multivariate case). Since with 4DEnVar temporal covariances are also estimated through the ensemble, this effect concerns also the time dimension. This is a typical issue with all ensemble-based methods and large systems, and demands the introduction of a localization operator, which attenuates non-local increments. The localization is applied to B e by where • denotes the Schur product (entry-wise product) and C is a 4-D correlation operator that damps non-local covariances. The numerical implementation of Eq. (11) is made under the following approximations: the same 3-D (and multivariate) correlation operator C is used for all 4DEnVar subwindows. It follows that C is a block matrix with all elements set equal to C. Hence, in order to specify C, we could use the covariance operator described in Eq. (7) by setting the variance terms to one. The choice of parameters of the correlation operators is discussed in the results section. This simplification, also called static localization, significantly reduces the numerical cost of the algorithm (Desroziers et al., 2014) at the price of degraded precision when increasing the length of the DA windows (Bocquet, 2016). The development of localization procedures that are more consistent with the dynamics of the forecast model is an ongoing research topic (Bocquet, 2016;Desroziers et al., 2016) and possible applications to QG-Chem will be considered in a future study.

Diagnosis of model error and forecast correction with 4DEnVar
One of the objectives of this study is to retrieve model error information from the 4DEnVar solution, which can be potentially used to improve the chemical forecasts for the next day. The 4DEnVar analysis increment δx a accounts for the correction of both the initial condition and the model forecast (model error). However, the control variable δx a only contains the model state. It follows that the correction of the initial condition is simply given by δx (t=0) a , i.e., the first element of the 4-D vector δx a . The correction of the model error contributes to the values of δx (t>0) a but a diagnostic procedure has to be applied to retrieve it.
We can compute a forecast trajectory x f using the nonlinear model starting from the updated initial condition x Subtracting it from the 4DEnVar analysis at t > 0, we obtain This difference contains information about the contribution of the model error in the 4-D state, as explained below. Let us first define the model error η and analysis error e a as where x (t) * is the truth at time t, M (t−1)→(t) is the integration of the model from time (t − 1) to time t, with the temporal discretization matching the length of 4DEnVar sub-windows (Sect. 3.2). From Eq. (13), we have . .
where we have used the Taylor expansion and assumed that the forecast model can be linearized inside each sub-window: This equality can be rewritten as Using the definition (Eq. 14) and the linear assumption on the model, Eq. (12) yields Substituting the equality (Eq. 15) into the latter, we get Therefore, if the error (e (t) a ) is small, the vector x (t) represents the contribution of the model error η on the 4-D state, accumulated through the preceding 4DEnVar sub-windows. Hereafter, we will name x (t) effective model error, to be distinguished from the model error, which is generally associated with η.
In a first instance, x can be used to diagnose the underlying presence of model error in the forecast. Furthermore, if the model error is stationary along multiple DA windows (bias), x could be used to correct the forecast for the next window: where the superscript denotes the assimilation window index and the x f is the corrected forecast. Validation of operational air quality models shows that biases contribute to a large part of the model uncertainties Zyryanov et al., 2012;Huijnen et al., 2010), which justifies the implementation of the proposed bias correction procedure. Note that this procedure is compatible with model biases that show an hourly variability but are stationary on successive days, as it is found in most of air quality models Gaubert et al., 2014). Note also that the computation of the effective model error is blind to the type of the underlying model error (e.g., chemical emissions or physical parameterizations). The two main requirements needed to make a useful estimation of x are (i) analysis errors smaller than model errors and (ii) an approximate knowledge of the sources of model error, necessary to generate informative ensembles. Alternatively from the proposed procedure, model error terms η (t) could be diagnosed for each sub-window and then applied to the next-day forecast on an hourly basis. However, this correction method is more intrusive, because it must be applied during the nonlinear model forecast and was not considered for this study.

Results and discussion
Numerical experiments are described and discussed in this section. The objective is to assess the performances of the 4DEnVar algorithm for the assimilation of the four key species: NO 2 , O 3 , CO and CO 2 . In all experiments, a model simulation with unperturbed parameters (same as the one in Sect. 2.3) represents the truth. The experiments are performed during the meteorological-chemical situation described in Sect. 2.3. Synthetic observations are generated from the truth by applying H (Sect. 3.1) and by adding a normally distributed error (Table 5). Four observation locations are considered for the experiments (Fig. 1), where chemical species are observed hourly. The relatively low density of the observation network allows us to assess DA skills at unobserved locations. Model forecasts are produced by perturbing only the initial condition (perfect model), the surface emissions (model error) or both. DA is performed with either 3D-Var or 4DEnVar, and the obtained analyses are compared to the truth. The meteorology is never observed nor perturbed, which corresponds to use QG-Chem in a CTM-like mode.
Operational air quality centers collect hourly observations and perform DA typically once per day . Therefore, a 24 h assimilation window is adopted when using the 4DEnVar algorithm, with 1 h sub-windows matching the observations' frequency. For the same reason, 24 sequential cycles of 1 h are adopted with 3D-Var. The main processes affecting air quality forecasts (daily emissions, evolution of the mixing layer, photochemistry) have a period of approximately 24 h. Therefore, a 24 h window permits to account for errors in main model processes. The utilization of longer windows is theoretically possible with 4DEnVar, assuming that the linearization of the model perturbations remains valid. However, the numerical cost of the minimization increases with the windows' length when keeping fixed the duration of the sub-windows. The cost-benefit Table 5. Background (σ B ) and observation (σ O ) error standard deviation used in DA experiments expressed in volume mixing ratio units and, within brackets, in percentage of the average field value in Fig. 2.

Perfect model experiments
Experiments considering a perfect model are presented in this section for one assimilation cycle of 24 h. Emphasis is placed on the reanalysis capabilities of DA. A 24 h long forecast is produced by perturbing only the initial condition. Initial perturbations are computed applying B 1/2 (Eq. 8) to a Gaussian uncorrelated noise field N (0, I) at t = 0. Since chemical concentrations can span different orders of magnitudes in the atmosphere, the standard deviation values set in B 1/2 depend on the chemical species (Table 5). The background error standard deviations have been chosen to be about 10-20 % of the average field values (Fig. 2). The same horizontal correlation length has been set for all species (750 km). Multivariate correlations in B 1/2 are switched off so that perturbations of chemical species are not correlated at t = 0. Chemical correlations can, however, arise at t > 0 due to chemical couplings.
DA is applied to correct the 24 h long perturbed forecast. The same covariance matrix B that was used to produce the initial perturbation is applied at each cycle of the 3D-Var analysis (1 h). Hence, the background error covariance used in DA is perfectly known at t = 0, which represents the best possible case in DA. At t > 0 the true B depends on the observations assimilated at previous steps and on the model dynamics, which makes the use of a fixed B within 3D-Var a raw approximation. However, this is the typical setting of operational air quality models.
With 4DEnVar, an ensemble of 16 forecasts is generated perturbing the initial condition with the same B as above. Therefore, the ensemble size is small compared to the dimensions of the system (16 × 8 × 97 = 12 416 variables). When using small size ensembles, the localization of the sample covariance is necessary. In this study, horizontal localization is applied to the 4-D ensemble covariance using Eq. (11), by setting a horizontal length scale equal to the double of values used for B (1500 km). Multivariate localization is performed using a correlation coefficient of 0.5, which was chosen empirically. A sensitivity analysis concerning the ensemble size and the localization choices is presented later in this section. Since we use QG-Chem in a CTM-like configuration and the two layers are chemically not coupled, the vertical terms of the covariance or localization matrix are always set to zero in this study, without any impact on the presented results.

Univariate assimilation
This section compares results of 3D-Var and 4DEnVar DA in univariate settings, i.e., one independent assimilation experiment is performed for each of the four chemical species. The controlled species corresponds to the species that is measured and that is perturbed at the initial time. With 3D-Var, this is obtained setting all terms of B to zero except for the 3-D covariance of the selected species. With 4DEnVar the same is obtained by setting all multivariate localization coefficients to zero. Figure 3 compares the temporal trajectories of the analysis of each species obtained from 3D-Var and 4DEnVar. Each figure provides the temporal trajectories at the two grid points shown in Fig. 1: one located in the polluted region and in correspondence with assimilated observations (grid point A) and one in the cleaner region and slightly displaced from the measurements location (grid point B).
In addition to the comparison of the two DA algorithms, these experiments also permit to assess the impact of the initial perturbation on chemical forecasts. First of all, we remark that O 3 forecasts are very close to the truth values after 24 h, which is a consequence of the fact that O 3 is strongly controlled by precursor emissions and photochemistry. The memory of the initial condition is rapidly lost for O 3 , as it was also demonstrated within regional air quality models (Jaumouillé et al., 2012;Wu et al., 2008). This is not the case for CO 2 and CO, which have a longer lifetime. Therefore, the initial perturbation is advected by the wind field and the spread between the forecast and the truth lasts longer in time. NO 2 , which lasts a few hours in a summertime atmosphere, is practically not sensitive to the perturbation of the initial condition (Fig. 3). Note also that the chemical concentrations are always lower at the clean location (grid point B) than at the polluted one (grid point A), except for CO 2 that is neither emitted nor chemically produced. Again, this is a consequence of the geographical variability of the emission factors (Fig. 1).
We remark that the 4DEnVar provides in general better analyses than 3D-Var for all species. First, it can be observed from Fig. 3 that the analysis time series obtained with the 4DEnVar are smoother than those resulting from 3D-Var be- Here,X (i,j ) is the average concentration of the truth values (Fig. 2) Plots on the left are obtained using 3D-Var, on the right using 4DEnVar.
where N = 24 is the total number of sub-windows, x * (i,j ) and x (i,j ) denotes the temporal trajectories of the truth and the analysis (or forecast) at the grid point (i, j ), respectively. In Fig. 4 the blue color means that RMSE values after DA have reduced (DA improved the forecast) and the red color means that RMSE values after DA have increased (DA degraded the forecast). Therefore, we can see from these figures that with 4DEnVar, improvements of the RMSE at unobserved locations are more pronounced than those of 3D-Var (more widespread blue regions and less red regions in RMSE gain). This is likely due to a better description of the background error covariance, which is flow dependent and closer to the true forecast error covariance within 4DEnVar. For example, the background error covariance used to assimilate NO 2 with 3D-Var is highly overestimated for t > 0. This happens because, when the model is perfect, the NO 2 field rapidly converges to the truth after a few hours (Fig. 3). In this specific case, the RMSE is degraded even at observed locations (Fig. 4, bottom left plot). This is an instructive example of the effects of incorrectly specified B within 3D-Var. Finally, note that for other species also 3D-Var is capable of decreasing RMSE at unobserved locations. This effect is a result of the advection of the analysis increments, and, as expected, is more pronounced with long-lived species (CO, CO 2 ) than with more reactive or emitted gases (O 3 , NO 2 ). Table 6 reports the minimum, maximum and average of the field values in Fig. 4, with the sign inversed to display positive values for positive gain of DA, and negative otherwise. It can be seen that with 4DEnVar, the maximum degradation of RMSE (i.e., the minimum gain in absolute values) is always smaller by a factor 2 to 5 than the maximum gain, and the average RMSE gain is always positive (i.e., DA improves the forecast). The appearance of local but relatively small RMSE degradation can be tolerated in atmospheric chemistry, because the chemical system has a dissipative behavior and errors in the model state cannot grow during the forecast step. Since the error covariance of the initial condition is perfectly known in the experiment setup, the degradation of the 4DEn-Var analysis is a consequence of the algorithm hypotheses (e.g., the linearization of the forecast model) or of the numerical implementation (e.g., the finite size of the ensemble and the localization approximations).

Multivariate assimilation
A second set of DA experiments has been performed, but perturbing and assimilating the four chemical species at the same time. Therefore, one 24 h long forecast and the corresponding analysis has been computed for all species, instead of four independent analyses as before. In the case of 3D-Var, elements of B related to the four assimilated species are set using the same parameters as in Sect. 4.1.1. Elements related to unobserved species are kept at zero as well as for crossvariable correlations. This leads to a multi-species assimilation, which is not yet multivariate. Effects on unobserved variables or between species are still permitted by chemical couplings in the forecast model.
The gain on RMSE obtained with 3D-Var for the four species (not shown) is very similar to those obtained in Sect. 4.1.1 with independent experiments.
With 4DEnVar, the corresponding multi-species assimilation has been tested by setting the cross-variable correlation coefficients to zero in the localization operator. In addition, a multivariate case has also been examined by setting the correlation coefficients to 0.5. In both cases results (not shown) were found to be very similar again to those in Fig. 4.
These results indicate that, when the initial condition is solely taken as a source of uncertainty, the chemical coupling between species does not influence DA much. This is confirmed by the ensemble standard deviation of the 4DEn-Var experiments in Sect. 4.1.1. For each of the four assimilation experiments the average ensemble standard deviation has been computed for all species (perturbed and not perturbed at t = 0). The ensemble standard deviation of unperturbed species stays below 1 % of the local concentration (not shown), compared to typical values of about 10-15 % for the species that are perturbed initially. Therefore, the perturbation of the sole initial state does not affect significantly the chemical balance. This also justifies neglecting the cross correlations between chemical species in operational systems  (Table 6) are shown in different color, compared to reference results obtained with 3D-Var. Positive values mean better RMSE gain with 4DEnVar than with the reference 3D-Var. Average RMSE gain has been multiplied by 10 to better highlight the differences among the experiments.
that assimilate hourly observations sequentially. This result was expected for weakly reacting species like CO 2 or CO but was not evident for reacting gases such as O 3 and NO 2 . A possible reason is that the amount of NO 2 produced hourly by the oxidation of emitted NO is much larger than the applied initial perturbation. Therefore, the O 3 photochemical production, which happens later during the day, is not much influenced by the perturbation of NO 2 at midnight. It would be interesting to verify if similar results also hold when larger perturbations are applied during daytime. A wider exploration of different chemical regimes is left for a future study. 4DEnVar was capable of providing similarly good results as in Sect. 4.1.1 when enabling the cross-variable covariances (and the respective localization). This means that the noise of chemical cross covariances due to the small ensemble size (16 members) did not degrade the results. This leaves hope for an effective multivariate chemical assimilation when the role of chemical couplings becomes larger (Sect. 4.2).

Ensemble size and localization
This section examines the impact of the principal parameters of the 4DEnVar algorithm, i.e., the ensemble size (16 members in previous experiments) and the localization length scale, on the analysis RMSE. These two aspects are closely linked with the dimension of the system and the eigenspectrum of the error covariance matrices (Furrer and Bengtsson, 2007). For instance, for a strongly decaying spectrum relatively few ensemble members are enough to provide a good approximation of the error covariance matrix. If this is not the case, a larger ensemble size allows in principle less severe localization. However, no theoretical formulation exists already for the 4DEnVar that links these parameters, e.g., the localization scale with the ensemble size. Moreover, the number of parameters of the localization operator (e.g., the horizontal length scale) might depend on the chemical species. A full exploration of the parameter space becomes rapidly unpractical even in a simplified model framework. Finally, the approximations made in the implementation of the localization operator (Sect. 3.2) can potentially have a larger effect than the choice of the parameters themselves. Therefore, an empirical approach has been used in this paper, and we postpone a detailed theoretical analysis to future work. In the empirical approach, first the ensemble size is examined using fixed but reasonable values for the localization operator (Sect. 4.1). The ensemble size can be one of the main limiting factors in operational forecast centers and the smallest ensemble providing better results than 3D-Var for all species was taken. Second, the impact of horizontal and chemical localization are examined keeping the selected ensemble size fixed.
Results for the case study are shown in Fig. 5 (varying ensemble size), Fig. 6 (varying horizontal localization) and Fig. 7 (varying cross-variable localization). The minimum, maximum and mean RMSE gain (Table 6) of the analysis are considered to compare 4DEnVar and 3D-Var experiments. Hence, the plots display the differences between the RMSE gain of 4DEnVar minus the 3D-Var one, with the sign opportunely adjusted to display positive values when 4DEnVar beats 3D-Var, negative values otherwise. In general, the desired case is represented by all bars (mean, maximum and minimum gain) being positive. Nevertheless, a positive average gain with the sporadic occurrence of negative maxi-mum/minimum gains can be tolerated, if the values for the minimum gain bar do not become too negative. Since the minimum RMSE gain is already negative (Table 6), negative values for the corresponding bars in Figs. 5, 6 and 7 mean that the degradation of the 4DEnVar analysis is larger than the 3D-Var analysis, which is not desired.
Results are very similar using 64 or 128 members, for all species, suggesting that some convergence of the assimilation scores is achieved with more than 64 members (Fig. 5). As expected, the accuracy starts to decrease using less than 32 members, with most of the gain of 4DEnVar over 3D-Var being lost using only 8 members. In a few cases, the RMSE gain occasionally decreases when increasing the size of the ensemble. This is due to statistical fluctuations when the ensemble sizes are small. To avoid misinterpretation of statistical noise, the comparison of 3D-Var and 4DEnVar is repeated in Sect. 4.3 for a larger number of DA windows. We remark finally, that results with 16 members satisfy the requirements expressed above: better average RMSE gain than 3D-Var for all species and a limited number of cases of RMSE degradation. Hence, an ensemble size of 16 members is retained. We remind readers that the objective of this study is to demonstrate the applicability of a DA algorithm that could outperform currently implemented methods in operational centers, but with an acceptable compromise between computational costs and precision. Therefore, even if an ensemble size of 32 or 64 would have represented a more accurate option for this study, we found it more valuable to assess the potential of 4DEnVar when computational resources might be limited. The choice of the horizontal localization scale is more delicate, because it is intimately linked to the model dynamics and depends on the ensemble size and on the assumptions made to construct and apply C (Sect. 3.2). Figure 6 shows that horizontal localization is necessary to obtain meaningful results with 4DEnVar. Second, increasing the localization scale to values as high as 3000 km, compared to the 750 km of the initial perturbation scale, has the effect of improving the maximum RMSE gain but also degrading significantly the minimum gain. This is not desired, as explained above. The best results, considering employing the same homogeneous and global localization scale for all chemical species, can be found for the value of 1500 km.
The configuration of a multivariate localization, or chemical localization in our study, presents the same issues as the spatial one. A similar approach as in the above paragraph has been taken. The numerical experiments described in Sect. 4.1.2, which considered multivariate cases, are used to compute error differences in Fig. 7. Again, we remark that without any localization the statistical noise of the ensemble covariance significantly degrades the results compared to 3D-Var. A localization coefficient of 0.5 reduces efficiently the effect of noisy correlations in the 4-D ensemble covariance, leaving the possibility of describing multivariate effects. We remind readers that multivariate effects were found to be very small in perfect model experiments presented until now, but can be much larger when a model error is introduced (Sect. 4.2).
We conclude that a simple localization scheme, based on global and empirically tuned parameters, provides already encouraging results for the application of 4DEnVar to large-scale chemistry models. The development of more sophisticated localization operators (Bocquet, 2016;Desroziers et al., 2016) and more rigorous methods to estimate their parameters (Ménétrier et al., 2015), represents a current subject of research, and will be the topic of a future study.

Model error experiments
In the previous section, we have shown how 4DEnVar is able to match or even outperform 3D-Var results when the model is perfect. However, the main interest of 4DEn-Var for atmospheric chemistry arises when the model is not perfect, i.e., a case that is more easily addressed using ensemble-based methods. In this section, 4DEnVar experiments are conducted in the presence of a model error term. A typical source of uncertainty in CTMs is represented by anthropogenic or biogenic emissions (Kok, 2011;Zhao et al., 2011;Ma and van Aardenne, 2004). In the case of reactive species like NO x , erroneous emissions can impact strongly the formation of secondary species such as O 3 (Lei and Wang, 2014;Sillman, 1999). Errors in surface emissions already produce complex and rich dynamics and will be used as a test bed to investigate the effects of model error in DA. Other sources of uncertainties, e.g., chemistry parameters or meteorology, will be addressed in a future study, considering that the same methodology used here can be applied.
Similar single-cycle DA experiments are conducted as in Sect. 4.1. The main difference with the previous section is that experiments are done here during the spin-up period of the model (days 2-4). This allows us to examine the model error estimation during the pollution build-up period, when the chemical system is in a transient phase and daily cycles of reactive gases are not yet stationary. This represents a more challenging and realistic situation for testing the bias correction procedure (Eq. 17), which is fully consistent only in stationary conditions. The true NO emissions (Table 3) are perturbed by a multiplicative factor, which is sampled from a log-normal distribution with mean and sigma equal to 0.5 and 0.8, respectively. Forecast emissions are increased by a multiplicative factor of 2.35, whereas the log-normal distribution has been used to generate emission perturbations for the ensemble of forecasts. The emissions perturbation is constant in time but not in space, due to the geographical variability of emission factors (Fig. 1).
The main objective of this section is to illustrate the application of the bias estimation procedure (Sect. 3.3) on chemical fields. Chemical interactions alone can already give rise to complex temporal dynamics, which can produce unattended behavior within typical hypotheses of most DA schemes (Tang et al., 2016). Therefore, a simplified model setup is used in this section by means of deactivating the advection of chemical species. This reduces QG-Chem to a collection of 0-D chemistry models and allows us to focus on the effects of model errors in chemistry. Aside from this, the same exact configuration as before is used for 4DEnVar (initial condition error, ensemble size, localization). DA results in a more general case, when both chemistry and advection are activated, will follow in Sect.

Univariate assimilation
Three DA experiments are shown in Fig. 8: (i) activating only the initial perturbation, (ii) activating only the model error and (iii) with both perturbations activated. Only O 3 observations have been assimilated and the chemical localization coefficients are set to zero to compute univariate analyses. With the NO emissions increased by a factor of 2.35, the forecast produces higher concentrations of O 3 (Fig. 8, middle plot). We remark also that the effects of model error in O 3 appear later in the day, due to the photochemistry, whereas the perturbation of the initial condition has a larger effect in the first part of the day. In this specific case, it is also interesting to note that, when both perturbation are applied, a compensation of the two errors cancels out the differences between the forecast and the truth later in the day. This is an example of compensating errors in atmospheric chemistry, which might be hard to detect when comparing model results to observations.
The 4DEnVar analyses agree well with the truth in all three cases, similarly to what was obtained in Sect. 4.1.1. This satisfies the main requirement for a meaningful computation of the effective model error (Sect. 3.3). The estimation of the effective model error with Eq. (12) is also displayed in Fig. 8. We remind readers that the effective model error is expressed in the same physical units of the model state, i.e., chemical concentration units (ppbv). The true effective model error is computed by subtracting the truth from a forecast initialized from the truth, and it is also added to the figures.
The effective model error is zero in the first experiment, which is correctly diagnosed by the procedure. In the second experiment, the estimated effective model error is approximately zero until 10:00 UTC and grows to positive values of about 8 ppbv later in the afternoon, which is coherent with the underlying perturbations and chemical mechanism.
The temporal average of the effective model error (Fig. 9) shows that the larger errors are diagnosed in the center of the domain, coherently with the characteristics of the emissions errors (global scaling of NO emissions). The geographical patterns and the differences from the true effective error (bottom right plot in Fig. 9) are a result of the observations location and localization scale. With the proposed procedure, the effective model error estimation relies on the localization scale that has been used for the 4DEnVar analysis. Therefore, the effective model error approaches zero moving far from assimilated observations, no matter what the spatial patterns of the true model error are. In the third experiment, the temporal behavior of the effective model error is well captured at the observed location. However, significant differences with the second experiment are visible in the spatial distribution of the error. Ideally, the estimation of the effective model error should provide exactly the same results in the second and third experiment. Differences arise because DA is not perfect due to small ensemble size, linearization hypotheses, observation number and observation errors. Also, when adding de-  Figure 10. Multivariate effective model error estimation. Temporal trajectories for day 2, as in Fig. 8 but for NO 2 , NO and CO species (not assimilated), obtained assimilating only O 3 . Uncertainties were both in the O 3 initial condition and in the forecast model (NO emissions), corresponding to the third experiment in Fig. 8. Only difference with the previous experiment in is that a non-zero multivariate localization coefficient is used here.
grees of freedom to the sources of uncertainty and keeping constant the observation network, DA becomes more challenging. For example, if observations of O 3 were available only in the late afternoon, no discrimination between the two sources of error could have been achieved in the third experiment. However, thanks to the hourly frequency of O 3 observations, the temporal and large-scale features of the model error have been retrieved also in the third case.

Multivariate assimilation
The effects of O 3 assimilation on other chemical species are shown in Fig. 10. These were obtained by repeating the third experiment of Sect. 4.2.1 but setting the chemical localization coefficient to 0.5 instead of zero. Compared to the results discussed in Sect. 4.1.2, multivariate corrections are now very significant. For example, analyzed NO and NO 2 concentrations are almost halved and the initial forecast errors greatly reduced. On the other hand, CO, which was not perturbed initially nor is it strongly coupled to NO / NO 2 / O 3 concentrations, is not modified at all by the DA. This shows that multivariate aspects of chemical DA can be well captured by the 4DEnVar algorithm, even with a small ensemble.
The effective model error can be computed for all the variables of the state vector, and is plotted in Figs. 10 and 11. The error fields of NO 2 and NO reproduce well the temporal and spatial features of the perturbation on NO emissions that was implemented in the forecast. Note that since NO is rapidly converted into NO 2 during the night, the initial linear increase of the effective model error is only observable on NO 2 . On the other hand, during daytime, a strong presence of the effective model error is found for both NO and NO 2 . However, the NO 2 and NO errors trajectories are not linear during daytime, due to the complex photochemistry.
Strong nonlinearities of the chemical system cannot be taken into account by the current implementation of the 4DEnVar algorithm. When the NO / O 3 relationship was strongly nonlinear, inaccurate analyses and, therefore, estimations of the effective model error have been found (not shown). This difficulty could be overcome by introducing external loops within 4DEnVar, similarly to how it was already done with the IEnKS algorithm (Haussaire and Bocquet, 2016). This represents the objective of a future study. Alternatively, the combined assimilation of O 3 and NO 2 or a more severe chemical localization can reduce the occurrence of analysis errors in nonlinear regimes.

Impact on chemical forecasts
The analysis computed in the previous section has been used to initialize chemical forecasts for the following 48 h. This was done to evaluate the potential of the forecast bias correction procedure (Eq. 17). The corrected forecasts (CF) of O 3 , NO 2 and NO are compared to the initial 3-day forecast (OF), the 48 h forecast initialized from the latest available analysis without any correction (AF) and the truth (Fig. 12).
First, the AF converges very rapidly (in about 12 h) to the OF for all species, confirming the limited effects of the state correction on the chemical forecasts for the next days (Wu et al., 2008). On the other hand, the CF is very close to the truth during the first 24 h, indicating that the hypothesis of a stationary effective model error in successive days (but hourly variable) seems appropriate in this case. During  the third day, a positive forecast correction is still achieved. However, chemical concentrations have evolved too much to be efficiently corrected using the bias estimated on the first day. Estimating and correcting the forecast tendencies, instead of the state, could provide a better result for the third day. Accounting for model errors, either bias or tendencies, for the next-day forecast requires the model uncertainties to be stationary. A stationary error has been used in this study since surface emissions were taken constant in time. Most of regional air quality models seem largely affected by stationary errors  and assuming the persistence of the bias during 24 h looks reasonable. However, the main sources of uncertainties of real CTMs need first to be identified to allow a meaningful effective model error estimation. Therefore, the implementation of 4DEnVar to a real CTM is necessary to further demonstrate its potential for air quality forecasts.  Fig. 10). Curves are displayed in dark blue when no bias correction is applied to the final forecast and are salmon colored when the effective model error estimation in Fig. 10 is used to correct the final forecast (Eq. 17). pared to Sect. 4.2, all emissions in Table 3 are now perturbed, using a log-normally distributed scaling factor. Synthetic observations are generated from a 7-day truth simulation. The four species are assimilated for a period of 7 days, starting on day 20. However, at the end of each cycle of 24 h (forecast, analysis and next-day forecast), the initial condition is reinitialized with the truth concentrations. Initial condition perturbations are recomputed, advancing the seed of the pseudo-random generator. Therefore, the seven daily cycles are statistically independent. This is done to increase the experiment statistics, without having to deal with the propagation of the analysis covariance through consecutive cycles. This aspect will be investigated in a future study.

Statistical comparison between 4DEnVar and 3D-Var
The relative RMSE between the experiments and the truth is computed at each grid point for the 7-day period as in Eq. (19). Results are summarized in Table 7: minimum, maximum and average values of the relative RMSEs computed over the QG-Chem domain are reported for 3D-Var and 4DEnVar experiments.
We confirm preceding findings concerning the reanalyses capabilities of 4DEnVar, which provides superior results to 3D-Var for all species. Results are particularly good for species strongly related to emissions (CO and NO 2 ), which show reduced average RMSE values compared to 3D-Var results. This is a consequence of precisely accounting for emission uncertainties within 4DEnVar. A similar result was obtained by Haussaire and Bocquet (2016). They showed that by using an ensemble forecast of the meteorology, thus partially accounting for model error, the root mean square error of IEnKS (a nonlinear 4DEnVar method) on the low-order online tracer model L95-T was improved by 25 to 50 %. Moreover, the 4DEnVar maximum RMSE is also similar or lower than in 3D-Var reanalyses, which suggests that the occurrence of degraded results compared to 3D-Var (negative bars in Fig. 5) is not systematic.
Similar conclusions can be drawn for the RMSE of the next-day forecast. We remind readers that the bias correction procedure has been used with 4DEnVar, whereas no correction is applied with 3D-Var. CO and NO 2 forecasts, show significantly lower RMSE with 4DEnVar, due to the forecast bias correction. O 3 forecast improvements are also observed, even if smaller than those for the two precursors. With 4DEn-Var, forecasts of CO 2 are slightly worse than with 3D-Var. Since CO 2 is not affected by the considered model error and it is not chemically coupled to other species, the relative bias correction term should be strictly equal to zero. However, the small ensemble size and use of localization introduce statistical noise in the effective model error estimation, which can impact the forecast correction. This issue can be mitigated by selectively setting to zero the chemical localization coefficient between species that are chemically uncoupled. However, results remained on par with 3D-Var in this study.
We can conclude that the forecast of species related to surface emissions, either directly or through chemical couplings, can be significantly improved when the model error is considered within DA. The forecasts of CO 2 , which is a passive tracer in our study, seem instead dominated by the number of assimilated observation, no matter which DA algorithm is employed. However, this is not the case in real applications, where CO 2 concentration is modulated by uncertain anthropogenic emissions and natural sinks.
We finally remark that EnKF, which also takes advantage of the information available from the ensembles, could have represented an alternative and possibly more accurate refer-E. Emili et al.: QG-Chem ence than 3D-Var for this study. However, it is significantly more costly than 3D-Var, it introduces some difficulties like the definition of an optimal inflation procedure (Constantinescu et al., 2007a;Gaubert et al., 2014) and was not yet available in the OOPS DA library at the time of the study. A more comprehensive comparison of DA schemes of increasing complexity and cost within QG-Chem is left for a future study, the present being focused especially on the bias correction procedure using 4DEnVar.

Conclusions
The objectives of this study were (i) to develop a new toy model framework to test advanced DA algorithms for atmospheric chemistry and (ii) demonstrate the potential of the 4DEnVar method for air quality or more general tropospheric chemistry applications. In particular, we addressed the questions of how to jointly assimilate chemical species with very different lifetimes and possible chemical couplings, and how to account for model error to improve forecasts for the next day.
An atmospheric chemistry reduced-order model (QG-Chem) has been developed, based on quasi-geostrophic meteorology and a detailed tropospheric chemistry scheme. It has been used to simulate the complex spatiotemporal patterns of reactive gases (NO x , O 3 ) and long-lived species (CO, CO 2 ) under the effects of emissions, chemistry and transport. QG-Chem has been coupled to a generic library for data assimilation (OOPS) and has been proven to be well suited to perform a large number of DA experiments. Concerning temporal aspects and assimilated observations, the experiments have been designed based on the implementation of chemical DA in operational air quality forecast centers.
A number of DA experiments have been conducted to compare 4DEnVar analyses with 3D-Var analyses in a perfect model hypothesis, in a univariate and a multivariate setting. The sensitivity of 4DEnVar results to the ensemble size and localization method was also assessed. Results with 4DEn-Var are generally better for all chemical species even when using a small ensemble size of 16 members, provided that ensemble localization, even if basic, is applied. This suggests that considering the linearized model dynamics to derive a flow-dependent background error covariance can be beneficial for chemical reanalyses. Thanks to 4DEnVar, this can be obtained without need of tangent linear and adjoint codes of the complex CTM. Multivariate effects were found to be not significant when a perfect model is used, suggesting that multivariate DA goes together with model error for atmospheric chemistry applications.
Clear advantages of using an ensemble method, which is significantly more costly than 3DVar, have been found when model errors were introduced. It has been shown that 4DEn-Var is able to take into account heterogeneous errors in chemical emissions and complex chemical couplings to cross cor-rect precursor species (e.g., NO and NO 2 ), based only on hourly observations of secondary species (O 3 ).
Using DA windows that are long enough (24 h in this study), 4DEnVar allowed to account for the different timescales of the chemical mechanism and the corresponding effects in DA. For example, the delayed impact of NO emission errors in afternoon O 3 concentrations was correctly accounted for by 4DEnVar. The contribution of model errors to the 4-D chemical state can be estimated at the cost of an additional forecast, providing quantitative information on possible forecast biases, and, in the case of stationary errors, a method to correct next-day forecasts. This has been tested with success in several independent DA windows, showing that 24 h forecasts of NO 2 and CO can be twice as accurate with 4DEnVar as with 3D-Var. Better results have been also found for O 3 forecasts.
We conclude that 4DEnVar is potentially of high interest for chemical DA. We determine the main benefits to be the implicit specification of a flow-dependent and multivariate error covariance matrix, the possibility of accounting for model errors through stochastic perturbation and the opportunity for using DA windows long enough to catch typical features of model errors in air quality models. The computational cost of 4DEnVar is higher than 3D-Var, but a small ensemble of at least 15 to 20 members remain affordable within most operational centers.
The application to a real CTM remains necessary to evaluate if the advantages of 4DEnVar shown in this study hold in real applications, where model uncertainties are not perfectly known, as well as observational ones. Aspects related to the vertical propagation of the information, which have been neglected with QG-Chem, could also represent an additional challenge in real systems. Finally, research on algorithmic aspects of 4DEnVar is needed to implement more accurate localization operators, to account for nonlinear chemical regimes and to correctly propagate the analysis covariance through successive DA windows. QG-Chem could represent a useful tool for these type of studies.

Code and data availability
The QG-Chem code is copyright of the CERFACS laboratory. The sources and the data used in this study are available upon request to E. Emili (emili@cerfacs.fr) or D. Cariolle (cariolle@cerfacs.fr).