A 4 D-Var inversion system based on the icosahedral grid model ( NICAM-TM 4 D-Var v 1 . 0 ) : 2 . Optimization scheme and identical twin experiment of atmospheric CO 2 inversion

A 4-dimensional variational method (4D-Var) is a popular technique for source/sink inversions of atmospheric constituents, but it is not without problems. Using an icosahedral grid transport model and the 4D-Var method, a new atmospheric greenhouse gas (GHG) inversion system has been developed. The system combines offline forward and adjoint models with a quasi-Newton optimization scheme. The new approach is then used to conduct identical twin experiments to investigate optimal system settings for an atmospheric CO2 inversion problem, and to demonstrate the validity of the new inversion sys5 tem. In this paper, the inversion problem is simplified by assuming the prior flux errors to be reasonably well known and by designing the prior error correlations with a simple function as a first step. It is found that a system of forward and adjoint models that has smaller model errors, but with nonlinearity has comparable optimization performance to that of another system that conserves linearity with an exact adjoint relationship. Furthermore, the effectiveness of the prior error correlations is demonstrated, as the global error is reduced by about 15 % by adding prior error correlations that are simply designed when 10 65 weekly flask sampling observations at ground-based stations are used. With the optimal setting, the new inversion system successfully reproduces the spatiotemporal variations of the surface fluxes, from regional (such as biomass burning) to global scales. The optimization algorithm introduced in the new system does not require decomposition of a matrix that establishes the correlation among the prior flux errors. This enables us to design the prior error covariance matrix more freely.


Introduction
Using the Bayesian algorithm, an inverse model estimates spatiotemporal variations of surface fluxes from observations of atmospheric concentrations with the help of a priori information.The power of such techniques has been demonstrated in previous studies, such as the synthesis inversion analysis by Peylin et al. (2013) that demonstrated significant variations in regional carbon budgets at seasonal to interannual timescales.
Our primary interests are in long-lived greenhouse gases (GHGs) such as CO 2 and CH 4 .Due to strong require-ments for high-precision measurements necessary to identify changes in surface fluxes, observations of such atmospheric constituents have been quite limited.At the start of CO 2 measurement programs, flask sampling was conducted mostly at background surface sites, with a typical sampling frequency of 1 week (e.g., Conway et al., 1994).These data have been used in a synthesis inversion method (Enting, 2002) to estimate subcontinental-scale CO 2 fluxes (e.g., Rayner et al., 1999;Gurney et al., 2002;Baker et al., 2006b).In recent decades, the global observation network of atmospheric GHGs has significantly expanded to include various measurement platforms.For example, in situ continuous observation measurements are now regularly taken at background stations, as well as at tall towers to infer regional continental fluxes (e.g., Sasakawa et al., 2010;Andrews et al., 2014).Moreover, worldwide aircraft observation programs have started to collect observations on a regular basis (e.g., Machida et al., 2008;Sawa et al., 2015;Matsueda et al., 2015), along with satellite observations dedicated to measurements of GHGs (Yoshida et al., 2013;Saitoh et al., 2016).These numerous GHG observational data can be exploited to estimate surface fluxes on a much finer scale than a subcontinental scale.The synthesis inversion method takes an approach based on Green's function matrix, in which subcontinental-scale regions are prescribed and the total flux for each region is set as a parameter.Therefore, the synthesis inversion method has limitations in resolution both of fluxes and observations.In contrast, the 4D-Var method has nearly no limitation in the number of observations it can accommodate.The method also has the ability to estimate fluxes at the resolution of the model grid, so that regional CO 2 flux anomalies such as biomass burnings become detectable.The 4D-Var method has been employed in numerical weather prediction (NWP), in which a weather model's initial state of the atmosphere is optimized, using observations, to improve weather prediction in many leading operational centers (e.g., Dee et al., 2011;Kobayashi et al., 2015).In NWP, the 4D-Var method is performed over successive time windows.The 4D-Var method has also been used in oceanography for a long time (e.g., Smedstad and O'Brien, 1991;Stammer et al., 2002;Usui et al., 2015).However, application of the 4D-Var method to GHG inversion raises different issues than those associated with NWP, since a much longer assimilation window is needed, and we mainly optimize boundary conditions (surface fluxes), not necessarily the model's initial conditions (concentration fields).
In developing our 4D-Var inversion system, we have introduced unique numerical techniques that have not been used in previous 4D-Var inversions of GHGs; these techniques use an icosahedral grid transport model based on the Non-hydrostatic Icosahedral Atmospheric Model (NICAM; Tomita and Satoh, 2004;Satoh et al., 2008Satoh et al., , 2014)), along with an efficient optimization scheme of Preconditioned Optimizing Utility for Large-dimensional analyses (POpULar; Fujii and Kamachi, 2003;Fujii, 2005).NICAM is one of the most advanced general circulation models (GCMs), with its dynamical frame structured with quasi-homogenous grids that are made by recursively dividing an icosahedron, which is completely different from the regular latitude-longitude grid models used so far.For the new inversion system, we have employed the NICAM-based Transport Model (NICAM-TM), naming our inversion system as NICAM-TM 4D-Var.The previous accompanying paper of Niwa et al. (2017) derived and evaluated the offline forward and adjoint models of NICAM-TM.This paper describes the entire NICAM-TM 4D-Var system, including the optimization scheme POpU-Lar.
One prominent feature of POpULar is that it does not require the inversion of a large prior error covariance matrix.Although the Bayesian inversion algorithm generally requires the inverse matrix of a prior error covariance, its direct calculation is infeasible due to its large matrix size.Therefore, previous studies have avoided the inverse matrix calculation by simply neglecting the off-diagonal elements of the prior error covariance or employing eigenvalue decomposition (Chevallier et al., 2007;Meirink et al., 2008).Actually, the off-diagonal elements represent the error correlations among prior fluxes in space and time.In fact, Chevallier et al. (2012) found that the biosphere model they used to construct the prior CO 2 flux has positive error correlations within certain spatiotemporal ranges, suggesting that the error correlations introduced in the prior error covariance matrix could propagate observational information to estimated flux values more effectively.Furthermore, introducing the flux error correlations reduces the degree of freedom and this may provide positive influences, e.g., reduction of noises, in a flux estimation, especially when observational networks are sparse.However, eigenvalue decomposition that was used in Chevallier et al. (2007) or Meirink et al. (2008) would become difficult when the specified prior error covariance is complicated or time-consuming when the spatiotemporal resolution of fluxes is increased.Since the POpULar optimization scheme does not require eigenvalue decomposition, we can easily introduce flux error correlations into the inverse calculation.POpULar was developed originally for oceanography assimilation (Usui et al., 2006(Usui et al., , 2015)); our study is its first application to an atmospheric trace gas inverse problem.
In order to validate and verify the new inversion system, we have conducted identical twin experiments of atmospheric CO 2 inversion, in which pseudo-observations, instead of real observations, produced by "true fluxes" are assimilated.It has been demonstrated that such an identical twin experiment is an effective way to test the ability of an optimization scheme in an inversion calculation (e.g., Baker et al., 2006a;Chevallier et al., 2007;Yumimoto and Takemura, 2013;Liu et al., 2014).By conducting sensitivity tests based on the identical experiment, we investigate optimal system settings in the context of adjoint models and optimization schemes.In addition, we demonstrate the utility of introducing error correlations in the prior fluxes.
As described in Niwa et al. (2017), NICAM-TM has two types of adjoint models: one is a discrete adjoint model and the other is a continuous one.The discrete adjoint model ensures model linearity and maintains an exact adjoint relationship with its corresponding forward model, while the continuous adjoint model is nonlinear and consequently loses the exact adjoint relationship but has less model errors.Gou and Sandu (2011) compared a continuous adjoint model with a corresponding discrete adjoint model using a regional chemical transport model and evaluated the effect of using these two models in the optimization of initial states of tropospheric ozone.In that study, they found the continuous adjoint to be superior in ideal assimilation cases.However, when real observations were used in assimilation, they found that both adjoint models showed similar performance, but the discrete adjoint performed slightly better than the continuous adjoint.Therefore, which adjoint is better may depend on the assimilation settings.Approaching the problem differently from Gou and Sandu (2011) when applied to CO 2 , we try to optimize surface fluxes of CO 2 , which has a much longer lifetime in the atmosphere and hence requires a longer assimilation window than ozone.In this study, we evaluate the effect of using discrete and continuous adjoint models as applied to CO 2 inversion problems.Practically, the inversion performance depends on various conditions.Therefore, as a first step, we assume the prior flux errors to be well known.This ideal assumption certainly overvalues the performance of the system, but it would clarify the effect of the adjoint models.Furthermore, using the optimal inversion settings determined from the sensitivity tests with the ideal prior flux errors, we also investigate how the current observation network could optimally constrain surface flux estimates.

System overview
An inversion problem employs Bayes' theorem, and an optimal solution of control variable x is obtained where the cost function defined below is minimized.
where B is the prior error covariance matrix of x, M is the forward transport model operator matrix which links surface fluxes to atmospheric concentration measurements taken at specified locations and time, and R is the error covariance matrix of the misfit between observations and modeled concentrations.It should be noted here that M is assumed linear according to the linear property of atmospheric transport, though we use both linear and nonlinear models as described in Sect.2.3 below.The control variable vector x represents the increment from the prior flux x pri , and the observation vector y dobs represents differences between the modeled concentrations from x pri and observed concentrations y obs , i.e., y dobs = y obs − M x pri .In fact, the control variable vector x consists of initial concentrations as well as surface fluxes; thus, atmospheric concentration fields are uniquely determined in the model.However, for simplicity, we only optimize surface fluxes in this study.In the 4D-Var method, the optimal x is determined after iterative calculations that use the gradient of the cost function with respect to x: (2) The last term on the right-hand side is derived by a forward simulation M x, followed by an adjoint simulation M T R −1 M x − y dobs , which are performed by the forward and adjoint models of NICAM-TM, respectively.Both the forward and adjoint models are driven by archived meteorological data (e.g., mass fluxes, temperatures, turbulent coefficients, cumulus base mass fluxes; details are found in Niwa et al., 2017).The meteorological data are prepared by a GCM run of NICAM, in which horizontal winds are nudged towards reanalysis data to simulate real atmospheric flow fields.In summary, Fig. 1 shows a schematic figure of the NICAM-TM 4D-Var system.In practice, the 4D-Var calculations are conducted as follows: i.The online NICAM is run with nudging to make meteorological data that are used in the following simulations of the forward and adjoint NICAM-TM, ii. the forward NICAM-TM is run to calculate atmospheric concentration fields of a target atmospheric constituent forced by prior flux data, iii. the differences between the modeled and observed concentrations are calculated, iv. the adjoint NICAM-TM is run to calculate the gradient of the cost function from the model-observation differences that are weighted by the error covariance of observation-model misfit, and v. the prior flux data are modified according to the cost function and its gradient using the POpULar optimization scheme.
The prior flux data are replaced with those given by (v), and then the (ii)-(v) calculation steps are repeated until the flux data are sufficiently optimized according to a convergence criterion.Finally, the optimized flux is called the posterior flux.

NICAM-TM
NICAM achieves consistency between tracer transport and air density change, which assures Lagrangian conservation and mass conservation simultaneously, due to its dynamical frame based on the finite volume method (Satoh et al., 2008;Niwa et al., 2011a).This property is absolutely necessary for transport simulations of long-lived tracers such as GHGs.Because of this, NICAM-TM has been developed and used for transport and inversion studies of CO 2 (Law et al., 2008;Patra et al., 2008;Niwa et al., 2011bNiwa et al., , 2012;;Peylin et al., 2013).Its fundamental transport performances have been evaluated satisfactorily using observations of radon ( 222 Rn) and sulfur hexafluoride (SF 6 ) (Niwa et al., 2011a(Niwa et al., , 2012)).
Detailed descriptions of the offline forward and adjoint models of NICAM-TM are found in the accompanying paper of Niwa et al. (2017).Reanalysis data used with nudging in the online calculation are from the 6-hourly Japan Meteorological Agency Climate Data Assimilation System (JC-DAS) reanalysis (Onogi et al., 2007).The archived meteorological data consist of air mass densities, three-dimensional air mass fluxes, vertical diffusion coefficients, mixing ratios of water substances, temperatures, and cumulus base mass fluxes.These data are consistent with the dynamical calculation of NICAM, though their temporal resolutions are decreased from the original model time step (20 min) to 1-to 3-hourly steps in the offline model calculations (Niwa et al., 2017).The horizontal resolution is set at glevel-5 ("5" denotes the number of division from the original icosahedron; see Fig. 1 of Niwa et al., 2017), whose grid interval is about 240 km, and the number of vertical model layers is 40.
As described in Niwa et al. (2017), NICAM-TM has the option of running either in the discrete or continuous adjoint mode.The linearity of the discrete adjoint is ensured and the perfect adjoint relationship is achieved with the forward model in which the nonlinear flux limiter (Thuburn, 1996) in the advection calculation is turned off.In contrast, the continuous adjoint has the same nonlinear flux limiter as that of the forward model calculation to maintain the monotonicity of the advection adjoint, but this fails to achieve the perfect adjoint relationship with the forward model.In fact, the flux limiter could improve the model accuracy due to its nonoscillatory property, but such improvement seems small in a CO 2 transport simulation (Niwa et al., 2017).Also, the error induced by switching off the flux limiter could be compensated by smaller time steps, though it increases the computational cost.Using the identical twin experiment described later, we conduct sensitivity tests to elucidate some of the impacts on the inversion results due to the differences in the adjoint models (discrete or continuous), details of which are described in Sect.2.3.

POpULar
For optimization, we use the scheme of POpULar (Fujii and Kamachi, 2003;Fujii, 2005).The POpULar scheme is based on the optimizing scheme developed by Derber and Rosati (1989) (hereafter DR89).Fujii and Kamachi (2003) extended the linear conjugate gradient method of DR89 to nonlinear cases using a quasi-Newton method.
In Eqs. ( 1) and ( 2), the matrix size of B is too large to be inverted if B is an off-diagonal matrix, i.e., when prior error correlations are considered.Therefore, a transformation of the control variables is often applied, with one prominent transformation being x = B −1/2 x.This transformation provides efficient preconditioning to accelerate the convergence of iterative calculations (Lorenc, 1988).In fact, several 4D-Var inversion systems for atmospheric trace gas constituents employ this transformation with eigenvalue decomposition (Chevallier et al., 2007;Meirink et al., 2008), though feasibility of eigenvalue decomposition still depends on the matrix size and the designing of the error correlations.In contrast, DR89 and POpULar also use the preconditioning of Lorenc (1988) but they do not require eigenvalue decomposition.The detailed algorithm of DR89 is described in Appendix A. Below, we describe the POpULar scheme but readers are encouraged to consult Fujii (2005) for further explanation.
The quasi-Newton scheme of POpULar employs the limited-memory version (Nocedal, 1980;Liu and Nocedal, 1989) of the Broyden-Fletcher-Goldfarb-Shanno formula (L-BFGS) that updates the search direction d of the control variable x using an approximated inverse Hessian of J (≡ H).Using the transformation of Lorenc (1988) ( x = B −1/2 x), the transformed control variable is updated in iterative calculations as where k is the iteration counter, d = B −1/2 d, g = B 1/2 g, and If an optimization problem is highly nonlinear, α k is iteratively sought with quadratic interpolation (Fujii, 2005).However, in a linear or weakly nonlinear problem, such as the one considered here in this study, the initial guess of α k = 1 is valid at most iterations.Using the L-BFGS formula, H k,0 is calculated with m pairs of that are derived from the previous iterations as where I is the identity matrix, ρ k = 1/ y T k p k , and γ k represents the scaling coefficient (Shanno and Phua, 1978) calculated as Using non-transformed variables, Eqs. ( 5)-( 7) above are simply rewritten by replacing I with B as where where Then, the search direction can be expressed as a linear combination of h, z, and p as where a k,l and b k,l are the scalar coefficients that are determined by h k and m pairs of p and z (details are described in Appendix B).It should be noted here that Eq. ( 11) does not require the calculation of B −1 .In order to avoid the calcu- The cost function and its gradient at iteration k are written as Along with the updating of x by K k can also be recursively updated as Furthermore, c k and q k can be updated using Eqs.( 14) and ( 11), respectively, as In practice, POpULar uses Eqs. ( 10)-( 17) above with the initial condition of Thus, we can see that the sequence of these equations does not require B −1 .Practical calculations of Eqs. ( 11) and ( 17) are described in Appendix B.

Identical twin experiment design
The identical twin experiment is designed for an atmospheric CO 2 inversion.We first run a forward simulation to construct a set of atmospheric pseudo-observations using a prescribed flux dataset considered as "true fluxes".In the experiment, a different flux dataset is used as the prior fluxes and the pseudo-observations are assimilated into the system to modify the prior fluxes, which are expected to get closer to the true fluxes.
The forward simulation to construct the pseudoobservations is performed with the online NICAM-TM with the flux limiter of the advection scheme turned on.After a 3-year spin-up, concentration values are extracted at 65 locations that emulate well-known CO 2 ground-based observation sites (locations are shown in Fig. 2 and the corresponding detailed information is listed in Appendix C).Although some sites actually operate in situ continuous observations, we assume weekly sampling at all the sites for simplicity.Therefore, the total number of observation data is 65 (sites) × 52 (weeks) = 3380.To sample well-mixed air masses, the timing of flask sampling is set at 13:00 LST at each site.Random values with a standard deviation of 0.2 ppm are added to the extracted model values to mimic actual measurement uncertainties.
The analysis period, i.e., the assimilation window, is chosen as the year 2010 and monthly mean CO 2 fluxes are optimized in the inversion.Therefore, the number of control variables to be optimized is 12 (months) × 10 242 (the number of horizontal grid points) = 122 904.Since we do not optimize the initial concentrations, we use the same initial concentrations for the true and assimilation runs.Optimizing only the surface fluxes is reasonable since these fluxes are the main driver of the atmospheric CO 2 concentration variability.In "real" inversion analyses, there are errors in initial concentrations, but we can reduce their impact by disregarding the optimized fluxes during the early part of an assimilation window.

True and prior flux datasets
Table 1 summarizes the prior and true flux datasets.All of the prior and true flux data are provided as monthly means.
Both for the prior and true fluxes, we use the same fossil fuel emissions of Carbon Dioxide Information Analysis Center (CDIAC) (Andres et al., 2013) and we assume that the fossil fuel emissions in the prior flux dataset are perfectly known.Therefore, fluxes other than the fossil fuel emissions are optimized in the identical twin experiment.
For the true flux dataset, we use the net ecosystem production (NEP) data of the Carnegie-Ames-Stanford Approach (CASA) model (Randerson et al., 1997) modified by the inversion of Niwa et al. (2012), and the climatological pCO 2 -based sea-air exchange data of Takahashi et al. (2009).The reason for the inversion modification of the CASA flux field is that the original CASA NEP is annu-ally balanced (that is, annually integrated flux is zero everywhere).Therefore, to be more realistic, we scale the terrestrial flux values at each latitude band and month such that the zonal average value coincides with that of the inversion flux.The true flux dataset also contains biomass burning emissions from the Global Fire Emissions Database (GFED) version 3.1 (van der Werf et al., 2010).
For the prior terrestrial fluxes, we use net biome production (NBP is NEP plus disturbance emissions such as biomass burnings) data from a process-based ecosystem model called Vegetation Integrative SImulator for Trace Gases (VISIT; Ito and Inatomi, 2012).For the prior ocean fluxes, we use sea-air exchange data based on shipboard pCO 2 measurements calculated by Japan Meteorological Agency (Iida et al., 2015), which are slightly modified by replacing the undefined fluxes in marginal seas with the fluxes from Takahashi et al. (2009).
General spatiotemporal variations of the integrated true and prior fluxes described above can be compared with the posterior fluxes shown in Figs.7-10.Both the true and prior ocean fluxes are based on the shipboard pCO 2 measurements and thus have similar spatial patterns.However, there are significant differences in their seasonal patterns, as shown in Fig. 9.While the true and prior terrestrial fluxes have much larger magnitudes than the ocean fluxes, they do show distinct differences.Those differences are attributable not only to the different biosphere models used but also to the biomass burnings (GFED and the part of VISIT NBP).To evaluate biomass burning emission, GFED utilizes satellite fire spot data, while VISIT uses only prognostic model-derived variables such as fuel load and soil moisture (Kato et al., 2013).Although VISIT cannot sufficiently represent the actual spatial patterns of biomass burning due to the stochasticity of wildfire regime, regional emission amounts are comparable to those of GFED.
For the global terrestrial and ocean areas, the annual net uptakes of the true flux are 2.4 and 1.4 Pg C, respectively.Meanwhile, those of the prior flux are 3.3 and 2.3 Pg C, respectively.

Prior error covariance
In this study, we simplify the inversion problem by using idealized prior errors.Specifically, the diagonal elements of B (variance) are set proportional to the differences between the prior and true fluxes ( x), with the assumption that the prior flux errors are well known.However, in reality, this is too optimistic because we do not know the true flux errors in an actual inversion analysis.However, this simplification enables us to easily verify the validity of the optimization scheme introduced in this study and highlight the effects of the adjoint models in the sensitivity tests.Partially, we add errors into the prior flux errors by not including the biomass burning emission of GFED in x so that we can also see how the system works with wrongly defined prior errors.Therefore, x represents the flux differences between VISIT NBP and the modified CASA NEP for land, and between Iida et al. ( 2015) and Takahashi et al. (2009) for ocean.To construct the diagonal elements of B, the x is scaled for specified global land and ocean total values as where i denotes the element index, and r is the scaling factor.
The scaling factor r is determined so that the annual global uncertainty, corresponds to 3.0 and 0.5 Pg C yr −1 for land and ocean fluxes, respectively.The vector a acts as the spatiotemporal averaging operator and ax produces the annual global average of x.
For the off-diagonal elements in matrix B (covariance), we introduce a simple spatial correlation with the Gaussian function as where l i,j is the horizontal length between i and j , and L is the correlation scale length.Here, we assume no temporal correlation since we are optimizing relatively low temporal resolution fluxes (monthly means).In Eq. ( 26), we use the globally unique scale lengths for land and ocean, L lnd and L ocn , set at 500 and 1000 km, respectively.There is no cross correlation between land and ocean fluxes.These correlation scale lengths are determined from the results of previous studies (such as Rödenbeck, 2005;Chevallier et al., 2007;Basu et al., 2013), although they did not use the Gaussian function as in Eq. ( 26) but used instead an exponential decay function.Figure 2 shows four examples of the error correlation distributions determined by exp(−l 2 /2L 2 ) with L lnd = 500 km and L ocn = 1000 km.
As described above, we use idealized variances and simply define covariances to construct the prior error covariance matrix.In fact, the configuration of the prior error covariance significantly affects the performance of the inversion and a careful evaluation needs to be conducted, which we leave for a future study.

Observation-model misfit error covariance
We set the error covariance for model-observation misfit R at 1 ppm 2 (ppm is used here as equivalent to the dry air mole fraction unit of µmol mol −1 ) for all the variances and 0 ppm 2 for all the covariances; therefore, R is a unit diagonal matrix.Actually, introducing off-diagonal elements in R is difficult compared to B because the inverse calculation of R is necessary at some stage in the calculation.Nevertheless, the "no covariance" assumption is relatively reasonable considering the sparse spatiotemporal distribution of the observations used here.The misfit uncertainty of 1 ppm is arbitrary but reasonable in representing model-observation misfits, based on the reported numbers published in previous studies (e.g., Patra et al., 2008).

Diagnostic measures
After the assimilation, we evaluate how close the posterior fluxes approach the true fluxes by using root mean square error (RMSE) measures described below.On the global scale, we use where i and m denote the i model grid and the mth month, respectively, and N denotes the number of model grids.x post represents the posterior flux value and x true denotes the true flux value.We also investigate error distributions by which is calculated for each model grid i.

Sensitivity tests
Using the above-described identical twin experiments, we conduct sensitivity tests around the adjoint model, the optimization scheme, and the prior error covariance as follows.
We test two forward and adjoint model sets: one set preserves linearity and complete adjoint relations using the forward model with the flux limiter turned off in the advection scheme and the discrete adjoint model (LINEAR), while the other one is a nonlinear and non-exact adjoint set using the forward model with the flux limiter on and the continuous adjoint model (NONLINEAR).
Originally, the POpULar optimization scheme was designed for nonlinear problems of ocean dynamics by modifying the linear optimization scheme of DR89, as described in Sect.2.1.3.In atmospheric CO 2 inversions, the atmospheric transport process is treated as linear, i.e., tracer is passive, and nonlinear chemical processes are not involved.However, if we use the NONLINEAR model set, the problem becomes slightly nonlinear due to the use of the flux limiter.Therefore, we also examine the behavior of the linear (DR89) and nonlinear (POpULar) optimization schemes with the two model sets (LINEAR and NONLINEAR).
Furthermore, we also compare results between the two cases involving diagonal B and off-diagonal B to confirm the effectivity of prior error correlations.In the diagonal B case, we set the off-diagonal elements to zero and rescale the diagonal elements by r (Eq.24) such that the global uncertainty σ (Eq.25) equals that of the off-diagonal B. In summary, we conduct eight sensitivity tests using the LIN-EAR/NONLINEAR model sets, the DR89 and POpULar optimization schemes and diagonal and off-diagonal B (Table 2).

Sensitivities to adjoint models, optimization schemes, and error correlations
In each of the sensitivity tests, we perform a total of 60 iterations to optimize the surface fluxes.Figure 3 shows the cost function variations against the number of iterations.Similarly, Fig. 4 shows the global root mean square error (GRMSE) variations against the number of iterations; Table 2 lists GRMSE values after 60 iterations.Figure 3 indicates that 60 iterations are sufficient to achieve convergence for the off-diagonal cases (Fig. 3c and d) and close to convergence for the diagonal cases (Fig. 3a and b), with the exception involving the NONLINEAR+DR89 case with the diagonal B. Similarly, GRMSE shows a smooth reduction as the iterative calculation proceeds in all the cases, except for the case of NONLINEAR+DR89 with the diagonal B (Fig. 4a), where the GRMSE reduction in value matches that of its corresponding LINEAR case.But after about 30 iterations, the two curves diverge rapidly as the NONLIN-EAR case increases in GRMSE to a value at the 60th iteration that is greater than the value at the beginning of the iteration.This is due to the fact that DR89 is strictly designed for a linear problem (Appendix A) and hence is incompatible with NONLINEAR.However, if the covariances are included in B, NONLINEAR+DR89 produces a GRMSE reduction curve similar to that of the LINEAR case (Fig. 4c).This may be due to the fact that the smoothing effect of the error correlation in B suppresses the incompatibility of DR89 with the NONLINEAR model set.POpULar does not show such incompatibility with NON-LINEAR (Fig. 4b and d).This is because the POpULar optimization algorithm allows model nonlinearity.For the LIN-EAR cases, the GRMSE values from the use of POpULar are the same as those in the DR89 cases, irrespective of the prior error covariance setting (Table 2).
In all the cases except DR89 with the diagonal B, the LINEAR and NONLINEAR model sets show similar convergence speed, suggesting that there is almost no difference in the optimization ability between these two sets (Fig. 3b-e).In the nonlinear case, the cost function defined by Eq. ( 1) is no longer quadratic and the minimum point is not uniquely determined.Therefore, use of the NONLINEAR model set has a risk of falling into a local minimum, causing deterioration of the optimization, but we find that it is not the case in our inversion experiment.We note that NONLINEAR generates smaller GRMSE values than those generated by LIN-EAR while using POpULar, the differences, although modest, are noticeable with the off-diagonal B (Fig. 4d).This is because the observations are constructed by the model with the flux limiter, which is more compatible with the NON-LINEAR model set; that is, NONLINEAR has smaller model errors than LINEAR.In fact, when the observations are constructed without the flux limiter, the GRMSE value from the LINEAR case becomes smaller and comparable to that of NONLINEAR (not shown).In this case, the GRMSE value from NONLINEAR does not change so much from the control case, suggesting that NONLINEAR is relatively insensitive to its model error in the optimization.
Comparing Fig. 4c and d with Fig. 4a and b, we see that the off-diagonal B produces significantly smaller GRMSE than  An implication of this result, especially for the global ocean, is that the diagonal B fails to reduce GRMSE from the prior condition (Table 2).Intrinsically, the optimization of the ocean fluxes is difficult compared to that of the land fluxes, because ocean flux variations are much smaller than those associated with the land, but also in our case we use a true ocean flux that is similar to the prior flux.The result indicates that introducing spatial error correlations in B is necessary in order to modify such small ocean fluxes.
Figure 5 shows RMSE distributions of the posterior fluxes calculated with the diagonal and off-diagonal B; the RMSE distribution of the prior flux is also shown as a reference.
Posterior fluxes are produced after 60 iterations using POp-ULar+NONLINEAR.As already indicated in Fig. 4c and d, the RMSEs of the two posterior fluxes are much smaller than that of the prior, demonstrating the optimization skill of our new system.The prior flux has RMSE exceeding 15 × 10 −7 mol m −2 s −1 in the terrestrial areas, where biosphere activity is large (e.g., eastern North America, South America, east and southeast Asia, and Africa) (Fig. 5d).Those errors are effectively reduced by the assimilation to values mostly less than 10 × 10 −7 mol m −2 s −1 .Moreover, in the off-diagonal B case, the errors are further reduced, especially in the areas where observations are sparse, e.g., Africa, south Asia, South America, and Australia (Fig. 5c).This indicates that introducing error correlations allows an effective propagation of atmospheric CO 2 information to surface fluxes located beyond the observation footprints that are determined by atmospheric flow fields.

Posterior flux features
In this section, we present general features of the posterior fluxes and compare them with the true fluxes.We assess when, where, and how much the posterior fluxes are reliable when real observations are used.The posterior fluxes analyzed here are derived after 60 iterations by NONLIN-EAR+POpULar with the off-diagonal B, which has been shown to display the smallest error among the above eight sensitivity tests.
First, we compare atmospheric CO 2 concentration fields that are generated by the forward NICAM-TM from the true, prior, and posterior fluxes.Figure 6a shows the time series of simulated atmospheric concentrations at the Minamitorishima (MNM) observational site located in the western North Pacific.In fact, the pseudo-observation network (Appendix C) includes MNM, and its location is shown in Fig. 2. In Fig. 6a, we see a good agreement between the posterior and the observational concentration values.The root mean square difference between them is 0.19 ppm, which is much less than the prescribed model-observation misfit er- ror of 1 ppm (Sect.2.2.3).To see more overall consistency of the inversion, χ 2 is calculated from the minimum of the cost function (J min ) as where m is the number of observations (3380).Although the theoretically expected value of χ 2 is 1 (Tarantola, 2005), our experiment results in χ 2 = 0.08 (J min = 131).This is because we have a perfect transport simulation (i.e., the model error is zero).That is, the same transport model is used for preparing the observations and the assimilation.Consequently, the prescribed value of 1 ppm for the observationmodel misfit error is rather conservative (but reasonable for a practical inversion with real observations).In fact, when we repeat the experiment with 0.2 ppm observation-model misfit errors with which we perturb the observations, we obtain a value of χ 2 = 0.89 (J min = 1496), which is much closer to 1.
Figure 6b-d show latitude-time cross sections of zonally averaged surface concentration simulated from the true flux field (Fig. 6b), its differences from the concentration field generated by the prior flux (Fig. 6c), and by the posterior flux (Fig. 6d).Compared to the zonally averaged concentration field generated from the true fluxes, a period of low concentration at the northern latitudes starts 1 month earlier in the field forced by the prior fluxes.Furthermore, latitudinal gradients between 30 • S and 30 • N are weaker during January-May.These features are modified by the assimilation so that the spatiotemporal variation of the posterior CO 2 concentration is nearly identical to the concentration variations obtained from the true flux field (Fig. 6d).
To see the degree of similarity in the flux distribution between the posterior and true fluxes, we show monthly mean flux distribution for March, July, and September of 2010 in Fig. 7, along with the corresponding prior fluxes.As shown in the figure, the general pattern of the true flux for each month is retrieved relatively well, as indicated in the posterior flux distribution, from the prior flux.
For March, the true flux has larger sources in the northern midlatitudes to high latitudes than those of the prior flux, and this feature is retrieved well by the inversion (Fig. 7a-c).Furthermore, it is noteworthy that the inversion successfully retrieves the distinct source/sink contrast patterns around the Equator in Africa and South America.For July, the true flux shows a larger biosphere sink in the northern latitudes than the prior flux; the posterior flux calculated by our inversion method shows a sink of similar magnitude as the true flux (Fig. 7d-f).However, there remains a mismatch in the spatial distribution of the sink.Upon closer inspection, one can see that the posterior flux has large sinks in west Siberia and eastern North America (Fig. 7e), while the true flux has a comparably large sink in western Canada (Fig. 7f).During the summer season, the vertical transport over a continental region is quite active, and the emission from the surface is well mixed in the vertical.Therefore, this causes the ground-based sta-  tions to observe only diluted flux signals, resulting in weak constraints on the flux estimation.For September, the prior flux is still characterized by sinks over most of the continental regions.However, the true flux has already changed to a source in the northern high latitudes, and the posterior flux is able to reproduce that feature (Fig. 7g-i).Moreover, the distinct source/sink contrasts in Africa and South America, signs of which are the opposite of those in March, are again well retrieved by the inversion.The ideal prior error variance (Sect.2.2.2) might have helped this achievement to some extent.Using the difference between the true and prior fluxes to determine the prior error variance allows the system to know where and by how much observational signals should propagate to surface fluxes.These results suggest that, under the assumption that the prior fluxes are well known, our new inversion system is capable of reproducing continental flux patterns by using only the surface observations.Furthermore, even when the prescribed prior flux errors are wrong, the inversion system is capable of obtaining reasonable flux estimates, given that "hotspot"-like CO 2 sources from biomass burnings have been successfully detected.In Fig. 8, we focus on the fluxes in Indochina for March, and in South America and Africa for September of 2010.In those regions, the true fluxes derived from GFED show regional distinct anomalies compared to VISIT, but they are not used to assign the prior error variance (Sect.2.2.2).In 2010, large biomass burnings occurred under a severe dry condition in the Amazon (Lewis et al., 2011).Compared to the prior fluxes that have moderate spatial variations, the posterior fluxes show large sources in those areas that are comparable to the true flux patterns, though the posterior source peaks are underestimated in Thailand and the Amazon, and are shifted from the coast to the inland area in southern Africa.The capability to detect such flux anomalies will be enhanced if more observations are available.
Figure 9 compares zonally integrated seasonal cycles of the posterior fluxes over terrestrial and ocean areas in three latitudinal bands with those of the prior and true fluxes.As expected from Fig. 7, the seasonal cycle of the posterior flux agrees generally well with that of the true flux, compared to that of the prior flux.However, the degree of agreement is latitudinally dependent.The region where the amplitude and the phase of the seasonal cycle of the posterior flux agree the most with those of the true flux is in the northern midlatitude to high latitude band (30-90 • N) (Fig. 9a and b), for both the land and the ocean.The result reflects the relatively dense surface observational network in this latitude zone.The seasonal cycles of the posterior fluxes in the other latitudinal Latitudinal profiles of annual zonal mean fluxes can also be retrieved well (Fig. 10).For example, compared to the prior flux, the posterior flux matches the true flux and shifts the sink towards lower latitudes (Fig. 10a).Moreover, the true and posterior fluxes consistently show two distinct source peaks across the Equator.Figure 10b shows that this flux pattern is due mostly to the terrestrial flux (Fig. 10b).In the southern extratropics, on the other hand, the posterior flux is not modified significantly from the prior flux, suggesting weak constraints due to a sparse number of observational stations in that area.Compared to the terrestrial flux, the ocean flux has much smaller latitudinal variations (Fig. 10c).In spite of such small variations, we see improvements in the prior flux by the inversion in the northern latitudinal band of 30-60 • N and in the Southern Ocean in the latitudinal band of 50-70 • S. At other latitudes, the posterior flux has similar profiles to those of the prior flux.

Conclusions
In this paper, we have introduced, described, and tested our new 4D-Var inversion system based on the icosahedral grid model, NICAM.Adding to the offline forward and adjoint models of NICAM-TM, this study has introduced the optimization method of POpULar, which constitutes an essen-tial part of the 4D-Var system.Moreover, we have conducted identical twin experiments to confirm the capability and utility of the new system for atmospheric CO 2 inversion.
Based on the results of the sensitivity tests using various combinations of the optimization method (DR89 and POpULar) and the discrete and continuous adjoint models (LINEAR and NONLINEAR), we have found that a combination of POpULar and the continuous adjoint (NONLIN-EAR+POpULar) leads to the smallest error.This is due to the smaller transport error of the continuous adjoint and the flexibility of the POpULar optimization method against the model nonlinearity.Even with LINEAR, POpULar shows high optimization capability, though its error is slightly larger than when used with NONLINEAR.The similar optimization performance of the continuous and discrete adjoint models achieved in this study is consistent with the assimilation result obtained by Gou and Sandu (2011) with real ozone observations.If model linearity and perfect adjoint relationship are strongly required, the combination of LIN-EAR+POpULar could be a sensible choice.For instance, the perfect adjoint relationship is desirable to approximate as accurately as possible the inverse Hessian that is defined as a symmetric matrix.In fact, an accurately approximated inverse Hessian can be considered as the posterior error covariance and can be applied to estimate error reductions and quantify observational impacts.A study into the application of the inverse Hessian matrix is in progress and will be reported elsewhere.
We have introduced spatial error correlations with the simple Gaussian function in the off-diagonal elements of the prior error covariance and have shown that it significantly improves the posterior fluxes, reducing the global flux error by about 15 %.This result is consistent with other 4D-Var inversions, but we have achieved it in an easier way that does not require matrix inverse calculation nor any matrix decomposition; the optimization algorithm of POpULar has an advantage in its simple treatment of the off-diagonal matrix.This would prove to be a more powerful tool when more complicated flux error structures are considered.For instance, Meirink et al. (2008) assumed that the spatial and temporal correlations are mutually independent, but POpULar would enable us to consider cross correlations in space and time.However, it will not be a trivial task to appropriately design the prior error covariance in terms of accuracy and computational cost.A complicated flux error structure may require a considerable computational resource, even though POpU-Lar requires only the matrix-vector multiplication with the prior error matrix.A further research effort in this direction is required.In this study, arbitrary numbers and ideal prior errors have been used, respectively, for the error correlation scales and the diagonal elements (variance) of the prior error covariance matrix.Therefore, our next stage of research will involve an investigation of optimal prior error design, taking the advantage of POpULar.
By using our new 4D-Var inverse system, we have successfully retrieved general features of the true flux variations from weekly flask observations obtained from 65 groundbased stations, though with the assumption that the prior flux errors are reasonably well known.Moreover, a remarkable performance of the new system is demonstrated by the result that the inversion is able to detect regionally limited flux anomalies caused by biomass burnings that are not represented in the prescribed prior errors.However, even with the perfect (non-biased) transport model and the ideal prior error variances, improvements in the prior flux estimate in some regions have been found to be limited due to the sparseness of the observations.Further improvements are expected by adding data from in situ continuous measurements and worldwide observations by aircraft (e.g., Machida et al., 2008) and satellites (e.g., Yoshida et al., 2013;Saitoh et al., 2016) in a future study.
Appendix C: Pseudo-observation sites

Figure 1 .
Figure 1.A schematic figure of NICAM-TM 4D-Var.A Roman numeral indicates each calculation described in the text.

Table 1 .Figure 2 .
Figure 2. Locations of surface flask observation site (magenta triangle) and four distributions of error correlation introduced in the off-diagonal elements of B, which are centered at two land grids (60 • N, 110 • E and 60 • N, 180 • ), and two ocean grids (0 • , 150 • E and 0 • , 60 • W) selected for example.

Figure 3 .
Figure 3. Cost function changes with iterations for the linear (red) and nonlinear (blue) models.Sensitivities to the optimization schemes (DR89: left; POpULar: right) and to the prior error covariance matrixes (diagonal: upper; off-diagonal: bottom) are shown.

Figure 4 .
Figure 4.The same as Fig. 3 but for GRMSE changes with iterations.

Figure 5 .
Figure 5. RMSE distributions of the posterior fluxes derived by POpULar+NONLINEAR with the diagonal B (a) and the off-diagonal B (b).The difference between (b) and (a) is shown in (c) as well as the RMSE distribution of the prior flux (d).

Figure 6 .
Figure 6.Atmospheric CO 2 concentrations simulated by the forward NICAM-TM driven by the true, prior, and posterior fluxes.The left panel shows the time series of concentrations at the Minamitorishima observational site located in the western North Pacific.Black open circles denote pseudo-observations that are assimilated in the experiment, and green and blue lines represent CO 2 concentrations calculated from the prior and posterior fluxes, respectively.The right panel shows latitude-time cross sections of surface zonal averages calculated from the true flux (b), its differences from the concentration field generated by the prior flux (c) and by the posterior flux (d).

Figure 7 .Figure 8 .
Figure 7. Monthly mean distributions of the prior (left), posterior (middle), and true (right) CO 2 fluxes.Fluxes are shown for March (a-c), July (d-f), and September (g-i) of 2010.Note that the fluxes do not include fossil fuel emissions, which are not optimized in the inversion.

Figure 10 .
Figure 10.Latitudinal profiles of annual zonal mean CO 2 fluxes (gray: true; green: prior; blue: posterior) for the total (a), terrestrial (b), and ocean (c) areas.Note that fossil fuel emission is not included.

Table 2 .
List of the eight sensitivity tests and the global root mean square errors (GRMSE) of each posterior flux for the global land, the global ocean, and the global total.The numbers in parenthesis are those of the prior flux.

Table C1 .
List of pseudo-observation sites used in the identical twin experiment.