FLEXINVERT: An atmospheric Bayesian inversion framework for 1 determining surface fluxes of trace species using an optimized grid 2

7 We present a new modular Bayesian inversion framework, called FLEXINVERT, for 8 estimating the surface fluxes of atmospheric trace species. FLEXINVERT can be applied to 9 determine the spatio-temporal flux distribution of any species for which the atmospheric loss 10 (if any) can be described as a linear process and can be used on continental to regional and 11 even local scales with little or no modification. The relationship between changes in 12 atmospheric mixing ratios and fluxes (the so-called source-receptor relationship) is described 13 by a Lagrangian Particle Dispersion Model (LPDM) run in a backwards in time mode. In this 14 study, we use FLEXPART but any LPDM could be used. The framework determines the 15 fluxes on a nested grid of variable resolution, which is optimized based on the source-receptor 16 relationships for the given observation network. Background mixing ratios are determined by 17 coupling FLEXPART to the output of a global Eulerian model (or alternatively, from the 18 observations themselves) and are also optionally optimized in the inversion. Spatial and 19 temporal error correlations in the fluxes are taken into account using a simple model of 20 exponential decay with space and time and, additionally, the aggregation error from the 21 variable grid is accounted for. To demonstrate the use of FLEXINVERT, we present one case 22 study in which methane fluxes are estimated in Europe in 2011 and compare the results to 23 those of an independent inversion ensemble.


Introduction 26
Observations of atmospheric mixing ratios (or concentrations) of trace species (gases or 27 aerosols) contain information about their fluxes between land/ocean and the atmosphere. 28 Atmospheric inversions use this information formally in a statistical optimization to find 29 spatio-temporal distributions of trace gas (or aerosol mass) fluxes (e.g. Tans et al. 1990). This 30 can be done provided that there is a model of atmospheric transport relating changes in fluxes 31 to changes in mixing ratios (or concentrations), that is, the so-called Source-Receptor 32 Relationships (SRRs). Basically, two types of models are used: Eulerian models, in which 33 atmospheric transport and chemistry are calculated relative to a fixed coordinate, or 34 an atmospheric gas forward in time from its sources/sinks to receptors (i.e. measurement 23 sites) or backwards in time from receptors to its sources/sinks using the identical model 24 formulation (Stohl et al. 2003;Seibert and Frank 2004;Flesch et al. 1995). The forward and 25 backward calculations are equivalent but one direction can be much more computationally 26 efficient than the other. For instance, if there are few receptors but many sources/sinks, the 27 backwards mode is more efficient. This is the case, for instance, when particles are tracked 28 backwards from a relatively small number of available atmospheric observation sites (i.e. 29 receptors), as in our demonstration case. This feature makes LPDMs very efficient for the 30 purpose of atmospheric inversion and they have been previously used in numerous studies 31 (e.g. Gerbig et al. 2003;Lauvaux et al. 2009;Stohl et al. 2010;Thompson et al. 2011;Keller 32 et al. 2012;Brunner et al. 2012). Lagrangian models may be used on a global scale (e.g. Stohl 33 et al. 2010), sub-continental scale (e.g. Gerbig et al. 2003) or on a regional scale in the order of a few hundred square kilometers (e.g. Lauvaux et al. 2009). Owing to their favorable 1 treatment of atmospheric turbulence in the boundary layer, LPDMs can even be used down to 2 scales of a few hundreds of meters (Flesch et al. 1995) and have been used for inferring 3 source strengths for local sources (e.g. farmsteads and oil spills). A further advantage of 4 LPDMs is that they can be run backward exactly from a measurement site, unlike Eulerian 5 models, where site measurements are represented by the averaged value of the corresponding 6 grid cell. By focusing on local or regional scales, fine resolution may be used without running 7 into problems of too large a number of unknown variables (in this case the fluxes). Fine 8 resolution is desirable as it reduces the model representation error, also known as aggregation 9 error (Kaminski et al. 2001;Trampert and Snieder 1996) but it must be traded-off with the 10 total number of flux variables to be determined, which is subject to computational constraints. 11 12 Using LPDMs to solve the inverse problem, however, also has disadvantages. In LPDMs, 13 virtual particles are typically followed backward in time only for the order of days to a few 14 weeks, thus the influence of the atmospheric chemistry and transport and surface fluxes 15 further back in time (the so-called background mixing ratio) must be taken into account 16 separately. Although forward 3D simulations in LPDMs are possible, in order to reproduce 17 background signals, such as seasonal variability, simulations of months to years would be 18 necessary and, therefore, computationally too expensive (Stohl et al. 2009). Alternatively, the 19 background mixing ratio can be accounted for using either observation-or model-based 20 approaches. Observation-based approaches use some filtering method (either statistical or 21 based on meteorological criteria) to identify observations representative of the background i.e. 22 air not (or only minimally) influenced by fluxes during the time of the backwards calculations 23 (e.g. Stohl et al. 2009;Manning et al. 2011). However, the background is strongly influenced 24 by meteorology e.g., air transported from higher latitudes or altitudes may have significantly 25 different mixing ratios compared to air transported from lower latitudes or altitudes even if in 26 both cases no emissions occur during the backward calculation. This makes the determination 27 of an observation-based background difficult. Model-based approaches involve coupling the 28 back-trajectories at their point of termination to the mixing ratios determined from a global 29 model. 30

31
One approach is to run the LPDM on a regional domain and couple this to a global model at 32 the domain boundary. This approach was adopted by Rödenbeck et al. (2009), who use a 2-33 step method to first solve for the fluxes on a coarse grid using a Eulerian model and to calculate the background mixing ratios at the receptors, and second to perform the inversion 1 at regional scale on a finer grid using an LPDM. A similar approach was developed by Rigby 2 et al. (2011) but using a 1-step method. A drawback of both these approaches, however, is 3 that only the coarse resolution Eulerian model is used to calculate the background mixing 4 ratios at the receptors and, thus, is more susceptible to transport errors. We use a different 5 approach and couple the LPDM, run on a global scale, to a Eulerian model at the time 6 boundary, such as done by Koyama et al. (2011). This approach utilizes the more accurate 7 transport of the LPDM to calculate the background at the receptors. 8 9 In this paper, we present a new framework, called FLEXINVERT, for optimizing fluxes by 10 employing an LPDM that can be coupled to mixing ratio fields from a global (Eulerian) 11 model. This method may be used from large continental scales down to local scales and can 12 be used for sparse as well as dense observation networks. In this method, the LPDM is used to 13 transport air masses and, thus, the influence of fluxes, to each receptor. The fluxes inside the 14 domain are optimized on a grid of variable resolution, where finer resolution is used in areas 15 with a strong observation constraint, i.e. close to receptors, and coarser resolution is used 16 elsewhere. FLEXINVERT, as it is presented here, requires that the LPDM is run on a global 17 domain, or at least that the domain is large enough so that trajectories do not exit the domain. 18 In summary, the features of this method are: 19 20 -atmospheric transport (SRR) is calculated using a single model, i.e. the LPDM 21 -the LPDM needs only to be run once for each species and receptor to find the SRRs, 22 as the output can be applied to optimize the fluxes for any domain and resolution (as 23 long as the resolution is no finer than that of the LPDM run) 24 -the variable resolution grid means that fine resolution close to receptors minimizes 25 model representation errors 26 -background mixing ratios can be provided either by coupling to mixing ratio output 27 from a global model or alternatively by using an observation-based method 28 -the background mixing ratios are optionally included in the optimization 29 -the influence of fluxes from outside the domain on the mixing ratios at the receptors is 30 accounted for without having to solve for them explicitly, thereby reducing the 31 dimensionality of the problem 32 33 Variable grid resolution has been used in atmospheric inversions previously such as in the 1 studies of Manning et al. (2003), Stohl et al. (2010 and Wu et al. (2011). Our method for 2 defining the variable grid is based on that of Stohl et al. (2010). However, we have also 3 implemented a re-optimization of the fluxes at variable resolution back to the finest model 4 resolution based on the method of Wu et al. (2011). 5 6 This manuscript is structured as follows: first we describe the inversion framework and the 7 variable grid formulation and, second, we present an example using real observations of 8 methane (CH 4 ) dry-air mole fractions to optimize CH 4 emissions in Europe. 9 10 2. Bayesian framework for linear inverse problems 11

Forward model 12
For cases where the atmospheric transport and chemistry are linear, the change in mixing ratio 13 of a given atmospheric species can be related to its fluxes by a matrix operator. Furthermore, 14 the absolute mixing ratio can be related to its fluxes plus the background mixing ratio, which 15 together form the so-called state vector. This is shown in Eq. 1 where y mod (M×1) is a vector of 16 the modeled mixing ratio at M points in time and space, x (N×1) is a vector of the N state 17 variables discretized in time and space, and H (M×N) is the transport operator. 18 For simplicity, we describe the case where the state variables are optimized for only one time 20 step, although the framework is able to optimize many time steps simultaneously (for an 21 overview of the variables and their dimensions see Tables 1 and 2). We construct the matrix 22 H from three components of the atmospheric transport to each receptor: 1) transport of fluxes 23 within a nested domain (i.e. within the global domain), H nest , 2) transport of fluxes outside the 24 nested domain, H out , and 3) contribution of mixing ratio at the time and location when the 25 back-trajectories terminate, i.e. the initial mixing ratios, H bg (see Fig. 1). Similarly, x is 26 constructed from the fluxes inside the domain, f nest , fluxes outside the domain, f out (there are 27 no common variables between f nest and f out so there is no double counting of fluxes), and 28 initial mixing ratios taken from the output of a global model, y bg . (Note that from hereon we 29 refer to the contribution to the observed mixing ratio from where the trajectories terminate as 30 the initial mixing ratio and the contribution from the initial mixing ratio plus from the fluxes 31 outside the domain as the background mixing ratio, this is explained in section 2.1.2). 32 Equation 1 can thus be expanded to: 33 that there is zero contribution from fluxes inside the domain in the background mixing ratio. 23 To avoid overestimating the contribution from inside the domain and, hence, underestimating 24 the background mixing ratio, we also select the lower quartile of the prior simulated mixing 25 ratios in a moving time window. Lastly, we calculate a running average of the background 26 mixing ratios using a time window of 90 days, which is then linearly interpolated to the 27 timestamp of the observations. This is done for each receptor. Similar methods for the 28 background calculation have been used previously for cases where no reliable global model 29 estimate of the mixing ratio was available (e.g. Stohl et al. 2010). 30 31

Variable resolution grid 32
To reduce the number of variables in the inversion problem we aggregate grid cells where 1 there is little constraint from the atmospheric observations. In this way, we define a new 2 vector of the fluxes to be optimized, f nest vg and transport matrix, H nest vg which are on a grid of 3 variable resolution (vg = variable grid). The aggregation algorithm is based on time-averaged 4 SRRs optionally convolved with the prior flux estimate. The variable grid is set-up starting 5 with a coarse grid, which is refined in a specified number of steps following the method of 6 Stohl et al. (2009). For example, starting with a coarse resolution of 4°×4° the grid may be 7 refined in two steps to resolutions of 2°×2° and 1°×1°. The refinement is made so that the flux 8 sensitivity (optionally multiplied by the prior flux) in each grid cell at its final resolution (e.g. 9 1°×1°, 2°×2° and 4°×4°) is above a given threshold. It is also optional whether or not to make 10 the grid refinement over water bodies and ice so that grid cells with a water/ice area of 99% or 11 more are not refined further reflecting cases where the water/ice surface fluxes are either 12 smaller, more homogeneous and/or more certain than the land surface fluxes. For determining 13 which grid cells are land/water/ice we use the International Geosphere Biosphere Programme 14 land-cover dataset (IGBP-DIS) (Belward et al. 1999). 15

16
To convert from the fine to the variable grid, we define a projection operator Γ (W×K) where K 17 is the number of grid cells in the nested domain at the original resolution and, W, the number 18 at variable resolution. Each row of Γ corresponds to a cell in the variable grid, and is a 19 summation vector on the fine grid. The row vectors, λ i , of Γ are orthogonal thus λ i λ j T = 0 for i 20 ≠ j (since each fine grid cell can only belong to one variable grid cell). The flux vector, f nest 21 and the matrix, H nest on the variable grid can be found according to: 22 f vg nest = Γf nest and H vg nest = H nest Γ T (5) 23 where "T" indicates the matrix transpose. The fluxes in f nest are weighted by the ratio of the 24 area of fine grid to the variable grid into which it is aggregated. The forward model on the 25 variable resolution grid can thus be written as: 26 where ε agg (M×1) is the model representation error from having reduced the resolution of the 28 model (for a schematic for the forward model see Fig. 1). It is also known as the aggregation 29 error and has been described by Trampert and Snieder (1996), Kaminski et al. (2001), and 30 Thompson et al. (2011). We describe the calculation of the aggregation error in section 2.5. 31 32

Aggregation of the background mixing ratios
The contribution of fluxes outside the domain to the change in mixing ratios at the receptor 1 points (i.e. H out f out ) can be added to the initial mixing ratio (H bg y bg ). The contribution to the 2 modelled mixing ratio (i.e. y mod ), which is not accounted for by the SRRs and fluxes inside 3 the domain (i.e. the background mixing ratio), is then defined by a new matrix, M cg(M×R) , on a 4 coarse grid (cg), which has rows corresponding to M observations and columns corresponding 5 to R grid cells or latitudinal bands. When the initial mixing ratio is calculated using the 6 sensitivity matrix, H bg and mixing ratio fields, y bg from a global Eulerian transport model, 7 then M cg is defined as: 8 where ∘ indicates the Hadamard matrix product, F out (M×P) has M rows of (f out ) T , and Y bg (M×P) 10 has M rows of (y bg ) T . The matrix, Γ cg(R×P) , is a projection operator from the Eulerian model 11 resolution to a coarse resolution of R grid cells (note Γ cg ≠ Γ). Noteworthy, is that the matrix 12 multiplication H out ∘F out is made using the original resolution of the LPDM and fluxes and 13 that the conversion to the coarse grid is performed only on the mixing ratios, thus avoiding an 14 aggregation error in this component. When the background is calculated using the 15 observations themselves, then M cg is defined as: 16 where Γ bg(R×M) is an operator to map the background mixing ratios to a matrix where the 18 background for each measurement is allocated to one of R latitudinal bands. Note that the 19 contribution from grid cells outside the domain is not explicitly included as it is assumed that 20 this contribution is incorporated into the definition of y bg when it is calculated from the 21 observations (see section 2.1.2). 22

23
For both methods, the columns of M cg correspond to the mixing ratios in each of the R coarse 24 grid cells (or latitudinal bands when using the observation-based method) such that the sum of 25 each row gives the total background mixing ratio for each measurement (note that for the 26 observation-based method there is only one non-zero entry in each row). The spatial 27 distribution of the contribution to the background mixing ratio (dimension R) is maintained as 28 it is these contributions that are optimized in the inversion. The uncertainty in the initial mixing ratios and in the contribution to the mixing ratio from 6 fluxes outside the domain can be considerable. Therefore, we include this component in the 7 optimization problem. The prior state vector for optimization, x b thus contains variables for 8 the surface fluxes (on the variable resolution grid) and variables for the optimization of the 9 mixing ratios (on the coarse resolution grid defined by Γ cg ). 10 11 Based on Bayes' theorem, the most probable solution for x is the one that minimises the 12 difference between the observed and modelled mixing ratios while also depending on the 13 prior state variables, x b and their uncertainties (for details on Bayes' theorem see e.g. 14 Tarantola (2005)). Assuming that the uncertainties have a Gaussian probability density 15 function (pdf) this can be described by the cost function: 16 where B (N×N) is the prior flux error covariance matrix (see section 2.5), R (M×M) is the 18 observation error covariance matrix (see section 2.6), and y obs is a vector of the observed 19 mixing ratios. There exist a number of methods to find the x for which Eq. 10 is at a 20 minimum; we use the approach of finding the first derivative of Eq. 10 and solving this for x. 21 By rearrangement, x can be found according to Eq. 11. Equation 11 has a number of 22 alternative formulations and the one we use is the most efficient when the number of 23 observations is smaller than the number of unknowns, since the size of the matrix to invert 24 (HBH T +R) has dimensions of M×M: 25 The inverse of (HBH T +R) is found by Cholesky factorization (using the DPOTRF and 27 DPOTRI routines from the LAPACK library). The corresponding posterior error covariance

Prior error covariance matrix 1
Errors in the prior flux estimates are correlated in space and time owing to correlations in the 2 biogeochemistry model, upscaling model, or anthropogenic emission inventory that was used 3 to produce these estimates. Most often, there is little known about the true temporal and 4 spatial error correlation patterns. Here we define the spatial error correlation for the fluxes as 5 an exponential decay over distance, such that each element in the spatial correlation matrix, 6 C S is: 7 where d ij is the distance between grid cells i and j in a given timestep and k S is the spatial 9 correlation scale length on land or ocean (we assume that fluxes on land and ocean are not 10 correlated with one another). The temporal error correlation matrix, C T is described similarly 11 using the time difference between grid cells in different timesteps. The full temporal and 12 spatial correlation matrix, C is given by the Kronecker product: C T ⊗C S . The error covariance 13 matrix for the fluxes, B flux vg(W×W) is the matrix product of correlation pattern, C and the error 14 covariance of the prior fluxes, σσ T where σ is a vector of the flux errors. We calculate the 15 error on the flux in each grid cell (on the fine grid) as a fraction of the maximum value out of 16 that grid cell and the 8 surrounding ones. Finally, the B flux vg matrix is scaled so that the square 17 root of its sum is consistent with a total error value assigned for the whole domain. This error 18 estimate may be from e.g. comparisons of independent biogeochemistry modelled fluxes or 19 flux inventories. The correlation matrix could be calculated for the fine grid and converted to 20 the variable grid using the prolongation operator as ΓB flux Γ T . However, we calculate B flux vg 21 directly for the variable grid (dimensions W×W) as the multiplication step B flux Γ T is very slow 22 if K is large and/or if there are many timesteps. In addition, B flux (K×K) is calculated for the fine 23 grid for a single time step only, as it is needed in the calculation of the aggregation error (see 24 section 2.6) and for the optimization of the posterior fluxes back to the fine grid (section 2.8). 25 We assume that the errors for the scalars of the background mixing ratios (i.e. a cg ) are 26 uncorrelated and have a fixed prior value (e.g. 1%). The error variance for these scalars is 27 appended to B flux vg to give B (N×N) . 28 29

Aggregation error 30
The aggregation incurred by reducing the spatial resolution of the model can be calculated by 1 projecting the loss of information in the state space into the observation space (Kaminski et al. 2

2001). The full aggregation error covariance matrix E agg
(M×M) is given by: 3 where Γis the projection of the loss of information in the variable grid compared to the fine 5 grid. The matrix Γcan be calculated simply from the rows vectors λ i of the projection 6 operator Γ, which are weighted by the square root of the row sum so as to have unit length: 7 where I is the identity matrix. As λ i λ i T is a matrix of size P × P, where P can be on the order 9 of 10,000 to 100,000, it is not calculated directly but rather HΓas follows: 10

Observation error covariance matrix 13
The errors in the observation space incorporate measurement as well as model transport and 14 representation errors. For the measurement errors, we use values of the measurement 15 repeatability as given by the data providers. The measurement errors can be given as a single 16 value or for each observation, in which case it is read from the observation files. Transport 17 errors are extremely difficult to quantify and depend not only on the model but also on the 18 input data, resolution and location. Therefore, we do not quantify the full transport error, but 19 only the part of it that can be estimated from the model FLEXPART, i.e. the stochastic 20 uncertainty, which arises by the representation of transport with a limited number of particles 21 (see Stohl et al. (2005)). The stochastic error, however, is likely to be much smaller than the 22 full transport error. It is possible, however, to include an additional estimate of the transport 23 error into Eq. 17, if this information were available. The error in the modelled mixing ratio is 24 calculated using the stochastic uncertainty in the same way that the mixing ratios themselves 25 are calculated. We consider two types of representation error: observation representation error 26 and model aggregation error (discussed above). The observation representation error is 27 calculated from the standard deviation of all measurements available in a user-specified 28 measurement averaging time interval, based on the idea that if the measurements are 29 fluctuating strongly within that interval then their mean value is associated with higher 30 uncertainty than if the measurements are steady (e.g. Bergamaschi et al. 2010). If only one 31 measurement is available during this interval, then a user-defined minimum error is used 1 instead. The measurement and transport errors are assumed to be uncorrelated. Although this 2 is a common assumption, correlations likely exist between e.g. hourly observations owing to 3 errors in the modelled boundary layer height and wind fields, which could lead to temporal 4 correlations. However, in the current version of FLEXINVERT, we do not account for these 5 correlations, hence, we define a diagonal matrix with elements equal to the quadratic sum of 6 the measurement, transport model and measurement representation errors: 7 Another assumption that is made is that the observed -modelled mixing ratio residuals have a 9 Gaussian distribution (Eq. 10 is based on this assumption). Therefore, in cases where the 10 distribution is highly skewed, observations corresponding to the tail of the distribution will

Optimization of the fluxes to fine resolution 20
The optimal solution of the fluxes, f nest* vg is found for the variable grid according to Eq. 11 21 and the corresponding posterior error covariance matrix, A is found according to Eq. 12. 22 However, it is not possible to directly apply the inverse of the projection operator to retrieve 23 the optimal emissions at fine resolution since the operation from the variable to the fine 24 resolution is ambiguous; there is insufficient information to redefine the fluxes at fine 25 resolution. To find the optimal emissions at fine resolution, f nest* (K×1) , we use an adaptation of 26 the method of Wu et al. (2011). This method involves a second Bayesian optimization step, 27 which uses the prior information about the distribution of the fluxes within each grid cell on 28 the variable resolution grid: 29 (see Appendix A for the derivation of Eq. 18 and 19). Since we only optimize the fluxes, i.e. 31 f nest* , the matrices B flux and A flux represent only the parts of the error covariance matrices 32 corresponding to flux errors. We have introduced a new error covariance matrix, B flux naw , which is the non area-weighted (naw) version of B flux , i.e. calculated using the flux errors not 1 weighted by the ratio of the grid cell areas on the fine and coarse grid. Also, we have 2 introduced Γ unit , which is equivalent to Γ but with each row vector normalized by the row 3 sum so that they have unit length. Our method departs from Wu et al. (2011) in that for the 4 error in posterior state vector on the variable grid we use the error covariance of the posterior 5 solution on the variable grid A, rather than a Dirac distribution. The inverse of 6 (ΓB flux Γ T +A flux ) is found by Singular Value Decomposition (SVD) using the DGESDD 7 routine from the LAPACK library. We find the posterior error covariance matrix A flux* (K×K) 8 also for the fine resolution fluxes according to: 9 The inverse of the matrices B flux naw , A flux , and (B flux naw -1 +Γ unit T A flux-1 Γ unit ) are also found by 11 SVD, which can also be used for matrices that are non-positive definite. This optimization to 12 the fine grid should be carefully evaluated if used. Alternatively, we also include a simple 13 mapping back to the fine grid by distributing the flux in a coarse grid to the corresponding 14 fine grid cells based on the prior relative flux distribution at fine resolution. Using an inequality constraint in the cost function Eq. 10 would mean that the first derivative 21 would be undefined. Therefore, we adopt a "truncated Gaussian" approach following Thacker 22 (2007), in which inequality constraints are applied by treating these as error-free observations. 23 The inequality constraints are applied to the posterior fluxes derived previously (i.e. with no 24 inequality constraint). This is described by the following equation, which is analogous to Eq. 25 11: 26 where P (Q×K) is a matrix operator to select the Q variables that violate the inequality constraint 28 and c is a vector of the inequality constraints of length Q. The inequality constraint does not 29 only affect the grid cells with negative values but there is also some adjustment to other cells 30 according to the correlations described by the posterior error covariance matrix, A flux . The 31 posterior error covariance matrix, however, is unchanged as the observation error covariance 32 matrix in this case is zero. To apply the inequality constraint requires running a second code, 1 which uses the output of FLEXINVERT. 2 3 A brief description of the software, its inputs, and outputs, is provided in Appendix B. 4 5

Case study: estimation of CH 4 fluxes in Europe 6
We provide a case study using FLEXINVERT for the estimation of methane (CH 4 ) fluxes in 7 Europe. Methane was chosen, as it is an important greenhouse gas with an atmospheric 8 lifetime of approximately 10 years (Denman et al. 2007) and since its loss in the troposphere 9 is principally by reaction with the OH radical, which can be approximated as a linear process. 10 The fluxes of CH 4 are mostly positive (i.e. from the surface to the atmosphere) although small 11 negative fluxes of CH 4 by oxidation in soils are also possible (Ridgwell et al. 1999). Europe 12 was chosen as it is reasonably well covered by observations, both discrete air sampling and 13 in-situ measurements. The important sources of CH 4 in Europe are mostly anthropogenic, 14 namely agriculture, landfills, and oil and gas exploitation (including fugitive emissions as 15 well as those from incomplete combustion). Natural sources of CH 4 are less important in 16 Europe and principally wetlands and mostly in the higher latitudes. In this case study, we 17 estimate the total fluxes of CH 4 in the nested domain from 30°N to 70°N and 15°W to 45°E at 18 monthly resolution for the year 2011. 19 20

FLEXPART runs 22
FLEXPART (version 8.1) (Stohl et al. 1998;Stohl et al. 2005) was used to generate the SRRs 23 by running 10-day backwards mode simulations from each of the receptors (i.e. the 24 observation sites). FLEXPART was run at 1.0°×1.0° resolution with meteorological analyses 25 from the European Centre for Medium-Range Weather Forecasts (ECMWF). Backwards 26 ("retro-plume") simulations were made by releasing 20,000 virtual particles in 3-hourly 27 intervals and the SRRs (or equivalently emission sensitivities) were saved as 24-hour 28 averages. Particles were released from the sampling inlet height at each observation site (see 29 Table 3). The loss of CH 4 by reaction with the OH radical was also included in the backwards 30 simulations even though the loss is very small over a 10-day period. Figure 2a

Observations 2
We used measurements of CH 4 from approximately weekly samples in the National Oceanic 3 and Atmospheric Administration Global Monitoring Division (NOAA GMD) Carbon Cycle 4 and Greenhouse Gases (CCGG) network. These measurements are made using Gas 5 Chromatographs fitted with Flame Ionization Detectors (GC-FID). In addition, we used data 6 from a number of in-situ measurement sites. These included in-situ GC-FID instruments 7 operated by the Umweltbundesamt (UBA), the Institute for Atmospheric Sciences and 8 Climate (ISAC) and the Advanced Global Gases Experiment (AGAGE) as well as in-situ 9 Cavity Ring Down Spectrometers (CRDS) operated by EMPA and the Finnish 10 Meteorological Institute (FMI). All measurements were reported as dry-air mole fractions in 11 parts-per-billion (abbreviated as ppb) on the NOAA2004 calibration scale, except AGAGE 12 data, which were reported on the Tohoku University scale but were converted to the 13 NOAA2004 scale using a conversion factor of 1.0003 (see Table 3). 14 15 In-situ measurements were assimilated as averages of the afternoon observations (12:00 to 16 18:00) at low altitude sites and as averages of night-time observations at mountain sites 17 (00:00 to 06:00) and the corresponding FLEXPART SRRs were selected and averaged in the 18 same way. Discrete measurements were assimilated as available and matched with the closest 19 available 3-hourly SRR to the sampling time. The measurement error was defined as 5 ppb 20 based on the repeatability of the measurements and, in the case of the in-situ data, the 21 representation error was defined as the standard deviation of the afternoon observations. 22 23

Prior fluxes and initial mixing ratios 24
The prior flux was composed from estimates of anthropogenic and natural emissions from a 25 number of different models and inventories (see Table 4 for details) and the total global 26 source amounted to 610 TgCH 4 y -1 . Methane fluxes were resolved monthly in the wetland, 27 ocean, termite, wild animal, soil, and biomass burning estimates, while the anthropogenic and 28 geological flux estimates were only resolved annually. For the anthropogenic and biomass 29 burning sources, the 2010 estimates were used, as no estimates were available for 2011. For 30 the remaining natural sources, climatological estimates were used. All fluxes were used at a 31 spatial resolution of 1.0°×1.0°. 32 33 Prior flux error covariance matrix, B flux , was calculated as described in section 2.5 using a 1 spatial correlation length of 500 km, k S =500, and a temporal correlation length of 90 days, k T 2 = 90. For the calculation of the flux errors we used a fraction of 0.5 of the maximum flux out 3 of the cell of interest and the 8 surrounding ones. 4

5
The background mixing ratios may be estimated either from the observations themselves or 6 by coupling FLEXPART to a global model (see sections 2. 1.2 and 2.3). For the latter method, 7 FLEXPART calculates the sensitivity to the mixing ratio at the termination point of the 8 virtual particles. These sensitivities were coupled to daily 3D fields of CH 4 mixing ratios from 9 the atmospheric chemistry transport model, TM5, in order to calculate the initial mixing ratios. 10 The TM5 model was run at 6.0°×4.0° horizontal resolution with 25 eta pressure levels using to approximately 4 ppb. 20 21

Sensitivity tests 22
We performed six inversions to test the sensitivity of the posterior fluxes and error reduction 23 to the spatial correlation scale length (S1 to S3), to the optimization of the background (S4), 24 to the filtering and averaging of the observations (S5), as well as to the background estimation 25 method (S6). The tests are summarized in Table 5. 26 27

Results 28
The inversions were run on a Linux Ubuntu machine with 62 GB memory. The maximum and 29 mean memory usage was 18 and 6.4 GB, respectively, and each inversion took approximately 30 1.8 days to complete. 31 32 Figure 3 shows the observed CH 4 mixing ratios at in-situ measurement sites compared with 33 those simulated with the TM5 model and FLEXPART using the prior and posterior fluxes from test S1. At high altitude sites, namely, CMN, JFJ, and SSL, the global model tends to 1 underestimate the synoptic variability largely due to the coarse resolution. This can be 2 quantified by the Normalized Standard Deviation (NSD) (i.e. the SD of the model normalized 3 by the SD of the observations), which for TM5 was 0.46, 0.81 and 0.71, compared with 0.81, 4 0.75, and 1.07 for FLEXPART, for the three sites respectively. On the other hand, TM5 5 overestimated the variability at MHD, a coastal site, with a NSD of 2.53 compared with 0.97 6 in FLEXPART. Again, this is likely to be due to the coarse resolution in TM5, which cannot 7 accurately resolve the location of MHD and overestimates the influences of land fluxes at this 8 site. 9 10 To examine the differences between the two methods of estimating the background mixing 11 ratios, we compare the background determined in test S1 (model-based method) and test S6 12 (observation-based method). The results are shown in Fig. 4 at the in-situ measurement sites. 13 The two methods compare reasonably well with one another with the mean difference 14 between the two backgrounds being between -7 and 4 ppb for the different sites. At MHD, 15 however, observation-based background is considerably lower than the model-based one 16 (difference of -11 ppb). This departure is caused by an overestimation of the prior 17 contribution to the mixing ratio from fluxes inside the nested domain, and since this is 18 subtracted from the observations that have been identified as being representative of the 19 background, this leads to an overall too low background estimate at this site (see section 2.1.2 20 for details). 21

22
The model performance at the measurement sites for test S1, a priori and a posteriori, is 23 summarized in Table 6, which compares the correlation coefficient (R), Root Mean Square 24 Error (RMSE) and NSD of the simulated mixing ratios versus the observations. As expected, 25 the mixing ratios a posteriori agree better with observations. To assess the assumptions made 26 about the uncertainties and error correlation scales used in B and R, we look at χ 2 , which has 27 the value of the cost function at the optimum (equivalently the weighted sum of squares 28 divided by the number of observations). Ideally, χ 2 would be equal to 1 indicating that the 29 posterior solution is within the limits of the prescribed uncertainties. In actual fact, the χ 2 30 values are larger than 1. χ 2 increased with increasing spatial correlation scale length with 31 values of 2.24, 2.56, and 2.97 with k S of 200, 300 and 500 km, respectively, which is as 32 expected since a longer correlation scale length corresponds to fewer degrees of freedom. 33 resulted in larger SDs over the averaging interval (1 day) and, hence, larger uncertainties in 1 the observation space. sensitivity tests are shown in Fig. 5. The posterior fluxes and flux increments from tests S1 to 5 S5 are quite similar, however, on close inspection there are a few notable differences. 6 Decreasing the spatial correlation scale length from 500 to 200 km (tests S1 to S3) resulted in 7 a more heterogeneous pattern of flux increments as the greater degrees of freedom allowed 8 smaller spatial scales to be modified in the inversion. Although, overall the patterns of flux 9 increments from all tests were consistent with lower emissions, relative to the prior, over 10 France, Italy and the UK, and higher emissions over Austria, Hungary, and eastern Europe. 11 When the background mixing ratios are also optimized (test S4) there is only a small change 12 with respect to test S1; namely, in S4 the emissions are slightly lower over the United  Figure 6 shows the error reduction for the six sensitivity tests. The largest error reductions are 26 found using k S = 500 km, i.e. in tests S1, S4, S5 and S6, for which the error reduction is 27 almost identical. The error reductions in tests S2 and S3 are smaller and more limited to 28 central Europe as compared to S1. Again, this is because increasing the correlation scale 29 length results in fewer degrees of freedom for the inversion and effectively spreads the 30 atmospheric information over a larger area. 31 32 Lastly, we compare the simulated mixing ratios using the a priori and a posteriori fluxes 33 (from test S1) with observations at an independent site, i.e. one that was not included in the inversion, Puy de Dôme, France (PUY). Figure 7 shows the observed, prior, posterior and 1 background mixing ratios at the time-stamp of the observations at PUY. Both the prior and 2 posterior mixing ratios overestimate the observed variability with NSD of 2.24 and 2.04, 3 respectively. This is most probably owing to both the topography (the station is located on a 4 volcanic cone, which represents a very abrupt change in topography) as well as the fact that 5 there are significant emissions in the prior around the station. A likely explanation is that 6 FLEXPART overestimates the BL height at PUY and thus overestimates the influence of 7 local emissions on this site. Despite the model transport errors at this site, using the posterior 8 fluxes improves the RMSE (23.1) and correlation coefficient (0.18) compared to the prior 9 (26.4 and 0.16, respectively). 10 11

Discussion 12
The results for the sensitivity tests S1 and S6, using the model and observation based 13 background mixing ratios, respectively, highlight the challenge of robustly identifying the 14 background and the influence that this has on the optimized fluxes (see Fig. 5). There are 15 different problems associated with each method and warrant further discussion here.  Second, using a filtering of the observations to derive the background will also lead 2 circularity, i.e. if the same observations are also used to optimize the background in the 3 inversion, and this case should be avoided. When the observations are used to derive the 4 background, biases only arise in the detection of the background signal. The background 5 mixing ratio may fluctuate depending on the altitude and latitude of the air masses' origin. In 6 addition, if the site is in an area of strong local fluxes, a background signal may not detectable. 7 Analysing the modelled back trajectories in such cases may help determine if candidate 8 observations for the background calculation are likely to have been influenced by fluxes in the 9 domain or not. Furthermore, the observation-based method for determining the background is 10 not appropriate for species such as CO 2 , which have a strong diurnal cycle and thus no 11 definable background. used. There is only one exception, i.e. BENELUX, where our estimate is 24% lower than the 24 lowest limit of the ensemble range. This may be due, at least in part, to real changes in 25 emissions. However, it may also be due to differing distributions of the posterior emissions 26 close to the boundaries of BENELUX with France and Germany, which considering the small 27 area of BENELUX, may become important in the calculation of the total emission. Another 28 contributing factor may also be that in the inversions in the Bergmaschi et al (2014) study, the 29 station, Cabauw (52.0°N, 4.9°E), in the Netherlands, was included (whereas it was not 30 included in our inversion), which likely also has a strong influence on the posterior fluxes in 31 BENELUX. 32 33

Summary and conclusions
We have presented a new Bayesian inversion framework, FLEXINVERT, for the estimation 1 of surface to atmosphere fluxes of atmospheric species. The framework is based on Source -2 Receptor Relationships (SRR), which describe the relationship between changes in mixing 3 ratio at a receptor "point" and changes in fluxes, calculated by the Lagrangian Particle 4 Dispersion Model, FLEXPART. Fluxes may be optimized at any given temporal resolution 5 and on a nested grid of variable spatial resolution. The variable grid is determined using the 6 information of the integrated SRRs and has finer resolution where there is a strong 7 observational constraint and coarser resolution where there is a weak constraint. In this 8 framework, the background mixing ratio, i.e. the contribution to the mixing ratio at the 9 receptors not accounted for by transport and fluxes inside the nested domain, is calculated by 10 coupling FLEXPART to the output of a global Eulerian model (or alternatively, in the case 11 that no such model output is available, it is calculated from the observations themselves). The 12 background mixing ratios are also included in the optimization problem. 13

14
We demonstrated the performance of FLEXINVERT in a case study estimating CH 4 fluxes 15 over Europe in 2011. The posterior fluxes were found to compare well to the results from an 16 independent inversion ensemble consisting of four different transport models and inversion 17 frameworks. Although, we have only presented an example for CH 4 , FLEXINVERT can be 18 applied to any species for which atmospheric loss (if any) can be described as a linear process 19 such as radioactive decay, dry and wet deposition, and oxidative chemistry. Furthermore, the 20 framework can be used on continental, regional and even local scales with little or no 21
(where A flux is the posterior error covariance matrix and Γ is the projection operator) and we 12 can express ρ(f * ) as: 13 (where B flux is the prior error covariance matrix on the fine grid) and by substituting Eq. 2 and 15 3 into Eq. 1 we derive the expression for ρ(f * |f): 16 the cost function can be thus be defined as: 18 and the first derivative: 20 thus we can derive the expression for x * at the minimum: 22 The code corresponding to the inversion framework described in this paper is called 27 FLEXINVERT and is available from the website: http://flexinvert.nilu.no under a GNU 28 General Public License. FLEXINVERT is coded in Fortran90 and has been tested with the 29 gfortran compiler and the Linux Ubuntu operating system and a makefile for gfortran is 1 included. To run FLEXINVERT, the LAPACK and NetCDF libraries for Fortran must be 2 installed. The current version of FLEXINVERT can be run directly with output from 3 FLEXPART 9.2. 4 5 B.2. Input data 6 FLEXINVERT uses two definition files, the first specifies the paths, filenames, and other file-7 related information (files.def), and the second specifies the settings for each inversion run, 8 such as the domain, dates, and uncertainties (control.def). 9 10 -FLEXPART files 11 FLEXINVERT looks for FLEXPART output files for each receptor in directories with the 12 following naming convention: /STN_NET/YYYYMM/ where STN_NET is the name of the 13 receptor and must be the same as that given in the station list file and in the prefix of the 14 observation files. The FLEXPART files required are: header, grid_time (and 15 grid_initial when computing the background using global model output). It is 16 important to note that if the full 3D SRR fields are saved in the grid_time files, the 17 reading of these files becomes considerably slower, therefore, it is recommended to save 18 only the surface layer of the SRR fields in the grid_time files. However, if the 19 grid_initial files are used, these need all layers. (An option for this grid_time and 20 grid_initial was added into FLEXPART 9.2). Also, note that FLEXINVERT expects 21 the stochastic errors to be written to the grid_time files, if these are not written then a 22 minor modification is required in readgrid.f90. 23

-Station list file 24
This file specifies the receptors (where there are observations) to include in the inversion 25 The default file has the following format: receptor name, latitude, longitude, altitude, 26 For testing purposes, the following files are also written but in most cases will not be 27 required: 28 29 • gain.nc 30 NetCDF file containing the Gain matrix, BH T (HBH T + R) -1 (y -Hx b ). 31 • covbfin.nc 32 NetCDF file of the prior error covariance matrix on the fine grid (covbfin) with units 1 (kgm -3 s -1 ) 2 . Note that the errors are scaled by the numerical scaling factor. 2 • covbfinaw.nc 3 As for covbfin.nc but containing the area-weighted errors (covbfinaw). 4 • covagg.nc 5 NetCDF file of the aggregation errors in units of mixing ratio squared (e.g. ppb 2 ). 6 • grid_operator.nc 7 NetCDF file of the projection operator, Γ, from the fine to the variable grid. 8 • grid_coarse.nc 9 NetCDF file of the projection operator, Γ cg , from the fine to the coarse grid. 10

C. Applying inequality constraints 8
After running FLEXINVERT, a separate code may be run to apply inequality constraints. The 9 inequality constraint code is similarly written in Fortran90 and has been tested with the 10 gfortran compiler and the Linux operating system. To run the code, the LAPACK and 11 NetCDF libraries for Fortran must be installed. This code is available from the website: 12 http://flexinvert.nilu.no. 13    500 km all c model-based, not optimized S6 500 km afternoon/night only observation-based, not optimized a The method of calculation and whether or not the background mixing ratios were optimized in the 2 inversion 3 b Low altitude sites averaged afternoon data, high altitude sites averaged night data 4 c Averaged all data over 24 h.    Figure 7. Comparison of the prior (blue), posterior (red) and background (green) simulated 1 CH 4 mixing ratios (ppb) with observations (black) at the independent site, PUY. Results are 2 shown for test S1. 3 4 5