LS-APC v 1 . 0 : A tuning-free method for the linear inverse problem and its application to source term determination

Estimation of pollutant releases into the atmosphere is an important problem in the environmental sciences. It is typically formalized as an inverse problem using a linear model that can explain observable quantities (e.g. concentrations or deposition values) as a product of the source-receptor sensitivity (SRS) matrix obtained from an atmospheric transport model multiplied by the unknown source term vector. Since this problem is typically ill-posed, current state-of-the-art methods are based on regularization of the problem and solution of a formulated optimization problem. This procedure depends on manual 5 settings of uncertainties that are often very poorly quantified, effectively making them tuning parameters. We formulate a probabilistic model, that has the same maximum likelihood solution as the conventional method using pre-specified uncertainties. Replacement of the maximum likelihood solution by full Bayesian estimation allows to estimate also all tuning parameters from the measurements. The estimation procedure is based on the Variational Bayes approximation which is evaluated by an iterative algorithm. The resulting method is thus very similar to the conventional approach, but with the possibility to estimate also 10 all tuning parameters from the observations. The proposed algorithm is tested and compared with the state-of-the-artstandard methods on data from the European Tracer Experiment (ETEX) where advantages of the new method are demonstrated. A MATLAB implementation of the proposed algorithm is available for download.


Introduction
Estimating the emissions of a substance into the atmosphere can be done in two alternative ways: The first method, a bottom-up approach, is based on a (statistical) model describing the emission process.For greenhouse gases or air pollutants, this is typically based on detailed country statistics (e.g., about energy use) and some proxy information (e.g., population distribution) to spatially disaggregate the emissions.The other method, often called top-down approach (Nisbet and Weiss, 2010), makes use of ambient measurements of the released substance (e.g., atmospheric concentrations) and a model for describing the behavior of the substance in the atmosphere.The emissions are then constrained to values that are compatible with the measured data.The two approaches are complementary, as the top-down approach can be used to verify bottom-up estimates, to identify problems in bottom-up emission inventories, or to improve these inventories (e.g., Lunt et al., 2015).For determining the emissions of greenhouse gases into the atmosphere, such an approach has become very common.In particular, total greenhouse gas emissions are the result of both anthropogenic and natural emissions.Bottom-up inventories for anthropogenic emissions should, at least in principle, be quite accurate but a verification using top-down methods is desirable (Stohl et al., 2009;Bergamaschi et al., 2015).Natural emissions are often poorly constrained with bottom-up methods and thus the role of top-down methods is even more important (Tans et al., 1990;Rayner et al., 1999).
For other emissions into the atmosphere, such as releases by nuclear accidents (Davoine and Bocquet, 2007;Stohl et al., 2012), nuclear explosions (Issartel and Baverel, 2003), Published by Copernicus Publications on behalf of the European Geosciences Union.
or for emissions of volcanic ash during volcanic eruptions (Kristiansen et al., 2010;Stohl et al., 2011), the problem is very different.While the emission location is often known and sometimes the emission period can be estimated, other bottom-up information on the magnitude of the emissions, their temporal variations and, occasionally, the emission altitude, can be very incomplete or, especially in real time, nonexistent.In these cases, emission estimates based on the top-down approach are often the only way to constrain the so-called source term, which quantifies the emissions of a point source as a function of time and, sometimes, altitude.Still, the source term is one of the largest source of errors in modeling and predicting the pollutant dispersion in the atmosphere (Stohl et al., 2012).Since it is key information for decision making in emergency response situations, any improvement of the reliability of the source-term estimation is important.
The common approach for source-term determination is to combine data measured in the environment (e.g., radionuclide concentrations downwind of the release site) with an atmospheric transport model in a top-down approach.Agreement between a model with calculated source term and measurements can be modeled and optimized using various parameter-estimating methods including the Bayesian approach (Bocquet, 2008), maximum entropy principle (Bocquet, 2005b), or cost function optimization (Eckhardt et al., 2008).For computational reasons, this problem is typically formulated as a variant of linear regression.The vector of measurements is assumed to be explained using a linear model with a known source-receptor sensitivity (SRS) matrix determined using an atmospheric dispersion model (Seibert and Frank, 2004) and an unknown source-term vector.Simple solution via the ordinary least-squares method typically yields a poor solution because the problem is often only partially determined and ill-constrained by the available measurement data.Many regularization schemes taking into account physically plausible ranges of parameter values such as non-negativity of the emissions, or other a priori information, for instance on the duration of release, have been proposed providing more realistic solutions.However, especially if the a priori information is incomplete, the regularization terms can also contain tuning parameters which are often selected manually and subjectively.The solution is subsequently highly sensitive to their choice.Therefore, many authors proposed inversion schemes to reduce the dependency on these parameters.Davoine and Bocquet (2007) formulated the inversion problem as minimization problem with Tikhonov regularization term.A similar model was used by Winiarek et al. (2012), where covariance matrices of both observation errors and source term are assumed to be diagonal with identical elements on each diagonal.The positivity of the source term is enforced using truncation of negative estimates.Three estimation methods were studied to infer model parameters: L-curve, Desroziers' scheme (Desroziers et al., 2005), and brute force using maximum likelihood screening.Diagonal matrices with different diagonal entries were considered in the work of Michalak et al. (2005) where a maximum likelihood method was used to infer the model parameters.The model was extended by Berchet et al. (2013) where full covariance matrices were considered.Desroziers' scheme and maximum likelihood were used; however, heuristics need to be used due to divergence of the algorithm after a few iterations.To cope better with full covariance matrix of measurements, Ganesan et al. (2014) follow the work of Michalak et al. (2005) and propose a model for non-diagonal entries using exponential decay with a common autocorrelation timescale parameter weighted by estimated diagonal entries.A similar model was then used by Henne et al. (2016) for both covariance matrices, measurement and source term, although with a fixed common autocorrelation timescale parameter for non-diagonal entries.In this paper, we propose a probabilistic model that estimates such tuning parameters from the data using a Bayesian approach with hierarchical prior.
Most of the existing regularization techniques are based on restricted structure of the prior covariance matrix.Various covariance structures for linear models have been studied extensively in the statistical literature; see for example reviews of Pourahmadi (2011) and Daniels (2005).For example, a model of only diagonal structure of the covariance matrix has been proposed to favor sparse solutions (Tipping, 2001).It is possible to use more complex models of the covariance structure using Cholesky decomposition (Pourahmadi, 2000), its modifications (Daniels and Pourahmadi, 2002), or more general decompositions (Khare and Rajaratnam, 2011).The inference mechanism is usually a variant of Monte Carlo simulations.In this work, we choose the prior covariance structure to have two main diagonals in modified Cholesky decomposition.The inference of the posterior is achieved using the variational Bayes method (Šmídl and Quinn, 2006) which is closely related to algorithms used in this application domain.
We will illustrate the proposed approach in comparison with the commonly used method of Seibert (2001) and Eckhardt et al. (2008), state-of-the-art algorithms (Martinez-Camara et al., 2014), least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996), and the conventional Tikhonov regularization (Golub et al., 1999).We will show how the formal Bayesian approach yields an iterative algorithm closely related to that of Eckhardt et al. (2008).Many heuristic steps in determining regularization parameters will be replaced by statistical estimates.The most significant advantage over the reference approach is estimation of the tuning parameters from the data.In effect, the proposed algorithm works without manual intervention.The only entries of the proposed algorithm are the vector of measurements and the SRS matrix calculated with a dispersion model.The MATLAB implementation of the derived algorithm is freely available for download.The resulting algorithm is tested and compared using real data from the European Tracer Experiment (ETEX).

Background
In this Section, we review the standard methodology, as described, for example, by Eckhardt et al. (2008), which is commonly used in source-term determination and the variational Bayes method (Šmídl and Quinn, 2006;Miskin, 2000) which is the key methodology of this work.

Standard methodology
We choose the work of Eckhardt et al. (2008) as a reference.It is only an example from a family of optimization methods based on linear regression such as Seibert (2000), Seibert (2001), Bocquet (2005b), Bocquet (2008), or Tarantola (2005).The regularization is achieved by formulating a prior knowledge on the solution and using an iterative algorithm for removing physically unrealistic values in the posterior solution.The basic inverse problem is formulated based on the following linear model, where y is the vector of measurements (typically observed concentrations or, sometimes, deposition values), M is a known SRS matrix (Seibert and Frank, 2004), and x is the unknown source-term vector.Solution of the problem via the ordinary least-squares method is not feasible since matrix M is typically ill-conditioned.
Regularization of the problem proposed in Eckhardt et al. (2008) is based on minimization of the cost function J = J 1 + J 2 + J 3 : where the term J 1 stands for the deviation of the model from the observation with scalar σ 0 to be a standard error of the observation; however, note that y and M are prone to errors and cumulate uncertainties from measurement and the atmospheric transport model used for SRS calculations (including errors in the meteorological data used to drive the transport model); J 1 therefore includes errors in the model M, mapped into observation space.The term J 2 penalizes high values of the solution where the penalty is inversely proportional to the assumed standard errors of each source-term element aggregated in vector σ x , where symbol diag() denotes a diagonal matrix with an argument vector on its diagonal and zeros otherwise; the term J 3 encourages smooth estimates of the source term x where D is a tridiagonal differential matrix numerically approximating a Laplacian operator and the scalar weights the strength of the smoothness of the solution relative to the other two terms.Note that we assume the smoothness in time as it is used in Stohl et al. (2011).This assumption may not be valid in cases such as explosions, which cause abrupt change in the source term.
Note that model (1) can be also used for problems with non-zero prior mean x 0 and known covariance matrix of the observations, R. Using Cholesky decomposition of the observation covariance matrix, R = T , the original model can be written in form y = Mx + e, where e is an isotropic noise and x is assumed to have prior mean x 0 .Then, transformation x = x − x 0 , M = −1 M, y = −1 (y − Mx 0 ) maps such a model into form (1) with zero mean prior and isotropic noise assumption.
This minimization problem leads to a system of linear equations that is solved for the source term x.Since the solution is assumed to be positive, the optimization problem is subject to x ≥ 0: This restriction is achieved in the iterative algorithm by replacing all negative values by an arbitrary small positive number together with a reduction of their standard errors to force these values closer to the non-negative prior solution.This can be formalized by the selection of a stop condition for the ratio between negative and positive parts of the solution as x neg x pos < δ, where δ is a selected threshold.For the estimation algorithm, proper values of parameters σ 0 , σ x , and need to be preselected manually and potentially changed repeatedly until an acceptable solution is obtained.The main ideas of the algorithm are summarized in Algorithm 1.
Algorithm 1 The main ideas of algorithm from Eckhardt et al. (2008).

b. Iterate until
x neg x pos < δ or maximum number of iteration is reached: i. Solve minimization problem given by Eqs. ( 2)-(4).ii.Change negative parts of x to arbitrary small positive random numbers, reduce related variances σ x for negative parts of solution and increase variance σ x for positive parts of solution.
c. Report estimated source term x .

Bayesian interpretation of the reference method
The method of Eckhardt et al. (2008) where N (µ, ) denotes a multivariate Gaussian distribution with mean µ and covariance matrix , I n is the n × n identity matrix, tN (µ, , 0, ∞ ) is a truncated Gaussian distribution with parameters µ, and support (domain) of physically realistic values restricted to positive values of all entries of the vector x = [x 1 , . .., x n ], see Appendix A. The choice of Gaussian distribution is motivated primarily by tractability of its inference.
The logarithm of the posterior probability of the unknown x has the form where γ (x) is the logarithm of the characteristic function enforcing positivity (see Appendix A), and c aggregates all terms independent of x.Maximization of the log-likelihood is then equivalent to minimization of the cost function of the reference method (5) where γ (x) is the barrier function of the constraint on x.While interpretation of positivity by truncated normal distribution is non-standard, it has the same effect as the "subject to" constraint.The maximum likelihood estimate is the value of µ if µ > 0 and it is zero otherwise (Fig. 1).
The maximum likelihood solution is the simplest case of Bayesian inference.Application of full Bayesian inference (i.e., evaluation of full posterior distribution and their marginals) can address two important problems: 1. selection of tuning parameters σ 0 , σ x , and which are considered to be hyper-parameters and estimated from the data, 2. selection of the appropriate model M via Bayesian model selection.
We are concerned only with the first problem in this paper while matrix M is considered fixed.Extension of the proposed methodology to Bayesian model selection is possible (Bishop, 2006), however it is rather long and its proper treatment is beyond the scope of this paper.Full Bayesian treatment of the unknowns σ 0 , σ x , and is not analytically tractable.Approximate inference of σ 0 and σ x is possible, however estimation of represents a challenge since the determinant of the covariance becomes too complex.Moreover, common variance of temporal derivative of the source term may not be realistic, since it is subject to abrupt Figure 1.Example of the normal distribution N (1, 1), blue line, and the truncated normal distribution tN (1, 1, [0, ∞]), red line.changes caused, for example, by explosions.Therefore, we will present results for a different and more complex structure of the prior variance x that allows stable and reliable estimation of the source-term vector x via the variational Bayes method.

Probabilistic model with unknown prior covariance
We formulate the probabilistic model to cope with the linear inverse problem, Eq. ( 1), and derive an iterative algorithm to estimate parameters of this model.

Observation model
The observation model is identical to Eq. ( 6), i.e., the isotropic Gaussian noise model1 .However, we will consider the precision (inverse variance) of the observations to be unknown, parameterized by Since ω is unknown and will be estimated, similarly to Tipping and Bishop (1999), we define its own prior distribution in the form of the Gamma distribution (which is conjugated prior for precision of the Gaussian distribution) as with chosen prior constants ϑ 0 , ρ 0 .These constants are needed for numerical stability; however, they are set as low as possible to provide a non-informative prior (see Algorithm 2).Note that the model with different elements on the diagonal of the covariance matrix of measurements has also been studied in the literature; see for example Michalak et al. (2005).Modification of the proposed algorithm to a diagonal precision matrix with unknown elements on the main diagonal is very simple.However, such a model was found to be more susceptible to local optima than the presented model.The presented model was found to be more reliable in practical tests.

Prior model
We use the same prior for the source-term vector as in Eq. ( 7), with the exception that the prior covariance of x, denoted as x , is unknown.Note from Eq. ( 8) that the covariance matrix is a band matrix with predefined structure; tridiagonal matrix in this case.Relaxing the assumption of the tridiagonal structure, we consider the following structure of the prior covariance: It is composed of diagonal matrix ϒ = diag(υ), with unknown positive diagonal entries forming vector υ = [υ 1 , . .., υ n ] and zeros otherwise.L is a lower bidiagonal matrix with unknown off-diagonal elements forming a vector l = l 1 , . ..l n−1 .This model preserves the tridiagonal structure of the covariance matrix x and allow us to model each diagonal separately.The task is to introduce prior models for vectors υ and l whose estimates fully determine the decomposition in Eq. ( 12).The prior model of the vector υ is selected as where α 0 , β 0 are selected non-informative prior constants (see Algorithm 2).The prior model of the vector l is selected in a problem specific way.Note that for l j = 0, the model in Eq. ( 12) corresponds to Eq. ( 8) with = 0.For l j = −1, model in Eq. ( 12) corresponds to Eq. ( 8) with → ∞.Since we expect the result to be within this interval, we define the prior on l j to be independently Gaussian-distributed with unknown precision ψ j : where ζ 0 , η 0 are selected prior constants.Since we expect the neighboring values x i and x i+1 to be either uncorrelated (l j = 0) or correlated (l j = −1) we choose parameters l 0 , ζ 0 , η 0 to cover these extremes with preference for a value l 0 = −1, and precision ψ j set around this value using selection ζ 0 = η 0 = 10 −2 .This allows parameter l j to vary in the range circa −1 ± 100 which we consider to be sufficiently non-informative.Lower values of ζ 0 and η 0 result in posterior estimates closer to l 0 .On the other hand, further relaxation of these parameters to a wider range results in higher sensitivity to local extremes and potentially numerical instability.
The joint model of the full distribution is then: Estimation of all unknown parameters can be obtained by the Bayes rule which is, however, computationally intractable.
The best possible approximation is defined as a minimizer of the Kullback-Leibler divergence (Kullback and Leibler, 1951) between the solution and the hypothetical true posterior.The choice of this form is motivated by simplicity of evaluation and experience indicates that it is a very good approximation for linear models (Bishop, 2006;Šmídl and Quinn, 2006).
The necessary conditions of the best approximation uniquely determine the form of the posterior distributions.These were identified to be as follows: p(υ j |y) = G υ j α j , β j , ∀j = 1, . .., n, p(l j |y) = N l j µ l j , l j , ∀j = 1, . .., n − 1, ( 21) The shaping parameters of posterior distributions, Eqs. ( 19)-( 23), µ x , x , α j , β j , µ l j , l j , ζ j , η j , ϑ, ρ, derived according to the standard variational Bayes procedure (see Šmídl and Quinn (2006)) are given in Appendix B. The shaping parameters are functions of standard moments of posterior distributions, e.g., x , xx T , and x T x for the distribution p(x|y).Symbol x denotes the expected value with respect to the distribution on the variable in the argument.The shaping parameters and the required moments form a set of implicit equations which is solved iteratively using Algorithm 2. Good initialization should be considered since convergence only to the local minima is guaranteed (Šmídl and Quinn, 2006).We propose to initialize the algorithm by solution of the ordinary least squares with Tikhonov regularization tuned such that the data and the regularization term have equal scale.It is achieved by choice of the initial value of the estimate of the precision parameter ω (0) = 1 max(M T M) .Here, superscript (i) is used to denote iteration number of the algorithm.The algorithm will be called Least Squares with Adaptive Prior Covariance (LS-APC) and is freely available for download from http://www.utia.cz/linear_inversion_methods.
Algorithm 2 Least Square with Adaptive Prior Covariance (LS-APC) algorithm.b.Set initial values (denoted by zero iteration number in superscript (0) ) used in computation of the covariance matrix of the source term, x : = γ I n , and 2. Iterate until convergence or maximum number of iterations is reached: a. Compute estimate of the source term x (i) using least squares: using moments of the truncated normal distribution, Eq. (A).
3. Report estimated source term x and its uncertainty x .
Note that the algorithm is closely related to Algorithm 1 of the reference method.It also iteratively solves the least squares problem but with adaptive tuning of its parameters.The proposed method has the following differences: 1.The algorithm is largely tuning-free, i.e., all hyperparameters ω, l i , υ i , ψ i are estimated from the data.The results may slightly differ for different choices of the initial conditions since the variational Bayes solution may suffer from local minima.The most sensitive initial value is ϒ (0) of tuning parameter γ .The sensitivity of the solution to this initial choice is very low, which will be discussed in Sect.5.2.
2. Since estimating the hyper-parameter values requires the calculation of the variance of the posterior distribution, the covariance matrix of the least squares problem needs to be evaluated; the cost of this operation is circa O(n 2.4 ) in each iteration.This implies a slightly higher computational cost compared to Algorithm 1 where this matrix is not needed.
3. The method of positivity enforcement is replaced by moments of the truncated normal distribution.

Verification using a synthetic dataset
To test the proposed LS-APC algorithm and to demonstrate its performance, we design a synthetic dataset before performing a real data experiment.We generate elements of the matrix M ∈ R 20×10 as random samples from an uniform distribution between 0 and 1 and elements less then 0.5 were cropped to 0 to reduce the condition number of the matrix M (which is 6.69 in l2-norm in this dataset).The source term is generated as x true = [0, 0, 0, 1, 1, 1, 0, 0, 0, 0] as shown in Fig. 2, top row, using dashed red line.The vector of measurement data is generated according to the assumed model in Eq. ( 1) as y = Mx + e where three sets are generated with the same matrix M but with different levels of the noise term e.Each element e j is generated randomly as N (0, c 2 k ) where the coefficients are set as c 1 = 0 for the set 1, c 2 = 0.4 for the set 2, and c 3 = 0.8 for the set 3. Then, negative elements of y are cropped to 0. Note that these data are supplied together with the LS-APC algorithm as a tutorial example.
The results from the LS-APC algorithm for this dataset are given in Fig. 2. The estimates of the source term are shown in the uppermost row (solid blue lines) together with the simulated true source term (dashed red line).The estimated values of the vector υ , i.e., the diagonal of the matrix ϒ , are displayed in the second row.This parameter models the sparsity of the solution where a higher value signifies higher confidence that the corresponding element of the solution is zero.The parameter l modeling the smoothness of the solution is shown in Fig. 2, bottom row.Note that at the constant parts of the solution this parameter is approaching −1, signifying highly correlated neighboring elements, while it is approaching zero at the time of the step change, indicating uncorrelated neighbors.The two parameters υ and l can also compensate one another, as is demonstrated on the falling edge of the source term.Instead of the expected zero in the smoothness parameter l 6 , the posterior value is close to the prior.The difference in the data is compensated by the sparsity parameter υ 6 which is very low, indicating very low confidence in this value.
The quality of the reconstruction depends on the noise level, as demonstrated in individual columns of Fig. 2. As , where c k = 0 for the set Synthetic 1, c k = 0.4 for the set Synthetic 2, and c k = 0.8 for the set Synthetic 3).In the top panel, the true source term is given by the red line while the estimated source term is given by the blue line.The estimated sparsity parameters, vectors υ , are given in the middle panel using full line while prior values are given using dashed black lines and the estimated smoothness parameters, vectors l , are given in the bottom panel while prior values are given using dashed black lines.expected, the source term is reconstructed precisely when the data are noise-free (Fig. 2, left column).With increasing noise, the reconstruction departs form the ground truth (Fig. 2, middle column), however, the start and end of the release is still estimated with sharp rising and falling edges.The estimate is also sparse, i.e., the estimated values of the source term outside of the true release window are zero.Note that this result was achieved with standard deviation of the noise equal to 40 % of the released quantity.Naturally, with even higher noise (standard deviation equal to 80 % of the released quantity, Fig. 2, right column), the estimates also depart from the ideal shape and yield undesired artifacts.

Experimental results for the ETEX data
The European Tracer Experiment (ETEX) took place at Monterfil in Brittany, France, on 23 October 1994 (Nodop et al., 1998).Its attractiveness is that it is one of a very few controlled large-scale tracer release experiments with a large amount of available information (see https://rem.jrc.ec.europa.eu/etex/).During ETEX, two release experiments were made.We use here only the data from the first experiment (ETEX-I), for which atmospheric dispersion models generally performed much better than for the second experi-ment, e.g., Stohl et al. (1998).During ETEX-I, a total amount of 340 kg of perfluoromethylcyclohexane (PMCH) was released at a constant rate for nearly 12 h.PMCH is nearly inert in the atmosphere and does not experience dry deposition or wet scavenging and is thus suitable for testing how well transport models can handle atmospheric dispersion.Atmospheric concentrations of the released PMCH were monitored across Europe by a network of 168 measurement stations with a sampling interval of 3 h over a period of 72 h.The release location and station locations are shown in Fig. 3.The ETEX dataset has been used previously for testing inverse models by, for example, Bocquet (2005aBocquet ( , 2007)), Krysta et al. (2008), andMartinez-Camara et al. (2014).
To construct the SRS matrix M, we used version 8.1 of the Lagrangian particle dispersion model FLEXPART (Stohl et al., 2005(Stohl et al., , 1998)).An earlier version of the model was evaluated against the first ETEX experiment and revealed relatively good performance compared to other models (Stohl et al., 1998).We assumed that the release location is known, and that the release occurred during a 5-day period including the true release time but that the source term (i.e., released amount as a function of time) is not known.Thus, we performed 120 forward calculations from the release site, each for a hypothetical unit release during a 1 h period.For each of these unit release simulations, the simulated tracer concentrations were sampled at all the measurement station locations and during the exact measurement times (in total, 3102 measurements were made), to construct the SRS matrix M of the size 3102 × 120.The SRS matrix was used together with the observation vector, y, of size 3102×1, to reconstruct the source term, vector x, of the size 120 × 1.The reconstructed source term can then be compared with the known true source term, to evaluate the skill of the reconstruction.The same set-up was used by Martinez-Camara et al. (2014) to test a method to blindly remove outliers.
For running FLEXPART, we have used meteorological input data from the European Center for Medium-Range Weather Forecasts (ECMWF).Different datasets are available from ECMWF and we have used two different datasets: The Lagrangian time scale depends on the turbulence conditions in the planetary boundary layer and is computed in FLEXPART for every particle at every individual time step.Lagrangian time scales can be very short (an order of seconds) and, thus option A requires very short numerical integration time steps.Close to a source, this is the only accurate way of ensuring the well-mixed condition and a correct simulation of near-field dispersion.Over longer transport distances, such an accurate description of small-scale turbulent transport is often not necessary as transport errors are dominated by other sources of error (such as errors in large-scale wind fields).Thus, compromises are often made in numerical simulations, especially for real-time model applications, where longer time steps are used.This is explored with configuration B. While the differences between these simulations are actually rather small in terms of simulated SRS values, they can serve as a lower estimate of the uncertainty associated with the SRS calculations and can still produce quite substantial differences in the retrieved source terms.

Source-term estimation using LS-APC algorithm
The task is to estimate the original source term x based only on the available measurement data.Algorithm 2 was applied to the selected example data ETEX ERA-Interim B and the results are presented in Fig. 4. In the top panel of Fig. 4, the red line denotes the true source term while the blue line denotes the estimated source term x accompanied by the 99 % highest posterior density region, which is denoted by the gray fill region.The estimated sparsity parameter υ , i.e., the diagonal of the matrix ϒ , is given in the middle panel of Fig. 4, and the estimated smoothness parameter l , i.e., second diagonal of the matrix L , is given in the bottom panel.Note that the sparsity parameter is approaching 10 10 (value determined by α 0 and β 0 ) at times when no release occurred; therefore, the posterior mean value is close to the prior value, which is zero.The posterior mean value of the smoothness parameter l j is −1 when the neighboring values of the solution are close to each other.During periods of rapid change of the release, the estimate of the smoothness parameter approaches zero.
While the reconstructed source term does not agree exactly with the known source profile, the true total of the source term, i.e., 340 kg, is on the edge of the 99 % highest posterior density region which is (120,340) kg.This result is achieved without any tuning of the internal parameters of the FLEXPART dispersion model.Also the timing of the release is well captured, although the reconstructed release shows some variation during the release period, while the true release rate was constant.Furthermore, the reconstruction suggests some small release to also occur after the true release has ended.The quality of the reconstruction is comparable to or better than previous reconstructions of the ETEX source term (e.g., Seibert and Stohl, 1999;Bocquet, 2005aBocquet, , 2007;;Martinez-Camara et al., 2014).Note that these results were obtained without setting any tuning parameters; all regular- In the top panel, the true source term is given by the red line while the estimated source term is given by the blue line associated with the 99 % highest posterior density region using gray filled regions.The estimated sparsity parameter, vector υ , is given in the middle panel and the estimated smoothness parameter, vector l , is given in the bottom panel.
ization parameters are estimated from the data within the iterative algorithm.The sensitivity of this approach to the initial values and assumed uncertainties will be studied in comparison with other algorithms.

Comparison and sensitivity study
We compare results from the proposed LS-APC algorithm, Algorithm 2, with (i) an algorithm of Eckhardt et al. (2008) (Tibshirani, 1996); and (iv) the Tikhonov regularization (Golub et al., 1999).Specifically, we study the ability of the proposed solution to regularize the problem for different choices of the selected tuning parameters.It was found that the results of Algorithms 1 and 2 are most sensitive to initial values of the sparsity parameters σ x and υ, respectively.Similarly, the RegClean, LASSO, and Tikhonov algorithms also have parameters influencing preference for penalization of large values of the solution (e.g., the α parameter of the Tikhonov and LASSO regularization).Thus, we run all five algorithms with this selected tuning parameter set in points of interval α ∈< e −15 , e +7 > for four ETEX datasets.That is, for the methods with diagonal choice, e.g., the reference method in Algorithm 1, we set σ −2 x = αI n .For the LS-APC algorithm, this choice influ-ences only the initial value of the regularization parameter via γ = α.
All remaining parameters of the other methods were kept at their default values (RegClean) or set to best performing values (Algorithm 1).Evaluation of the results was performed on the metric of mean absolute error (MAE) between the true and the estimated source term: |x j,true − x j,estim |.
The computed MAEs between the true source term and the estimated source term for all methods and for the same range of the tuning parameter α are displayed in Fig. 5 for ETEX ERA-40, and in Fig. 6 for ETEX ERA-Interim, top rows.The estimates of the total released mass for all methods and for the same range of the tuning parameter α are displayed in the bottom row of Figs. 5 and 6.Note that all methods achieve similar results although for different values of the tuning parameters.This is most obvious in the estimate of the total released mass, where each method has a range of tuning values yielding the same estimate.This looks like a plateau on the curve.The value of the total released mass at this plateau is very similar for all methods.The exception is the experiment ERA-Interim A, where the curves of the estimated total released mass contain two plateaus.Comparison with the true total released mass of 340 kg is misleading in this case since the plateau at 180 kg is due to an artifact, as discussed below.
www   The Supplement related to this article is available online at doi:10.5194/gmd-9-4297-2016-supplement.

Figure 2 .
Figure2.The results of the LS-APC algorithm on synthetically generated dataset with different levels of noise degradation (increasing from left to right; e j = N (0, c 2 k ), where c k = 0 for the set Synthetic 1, c k = 0.4 for the set Synthetic 2, and c k = 0.8 for the set Synthetic 3).In the top panel, the true source term is given by the red line while the estimated source term is given by the blue line.The estimated sparsity parameters, vectors υ , are given in the middle panel using full line while prior values are given using dashed black lines and the estimated smoothness parameters, vectors l , are given in the bottom panel while prior values are given using dashed black lines.

Figure 3 .
Figure 3. Domain of the ETEX experiment with source (red triangle) and receptors (blue crosses).
(1) Data from the 40-year re-analysis (ERA-40); (2) Data from the continuously updated ERA-Interim re-analysis.For both meteorological data sets we have run FLEXPART in two different configurations: A. with the model time step in the boundary layer limited to less than 20 % of the Lagrangian timescale and a maximum value of 300 s (ERA-40 A and ERA-Interim A), and B. with time step only limited by 300 s, which may be chosen for computationally demanding real-time simulations (ERA-40 B and ERA-Interim B).

Figure 4 .
Figure4.The results of the LS-APC algorithm for the ETEX experiment (ETEX ERA-Interim B).In the top panel, the true source term is given by the red line while the estimated source term is given by the blue line associated with the 99 % highest posterior density region using gray filled regions.The estimated sparsity parameter, vector υ , is given in the middle panel and the estimated smoothness parameter, vector l , is given in the bottom panel.

Figure 5 .
Figure5.Comparison of sensitivity of the tested algorithms to the setting of the selected tuning parameter α measured in terms of the mean absolute error metric (top row), Eq. (26), and total estimated mass of the source term (bottom row) on data ETEX ERA-40 A and B.

Figure 6 .
Figure6.Comparison of sensitivity of the tested algorithms to the setting of the selected tuning parameter α measured in terms of the mean absolute error metric (top row), Eq. (26), and total estimated mass of the source term (bottom row) on data ETEX ERA-Interim A and B.