Construction of the SILAM Eulerian atmospheric dispersion model based on the advection algorithm of Michael

The paper presents the transport module of the System for Integrated modeLling of Atmospheric coMposition SILAM v.5 based on the advection algorithm of Michael Galperin. This advection routine, so far weakly presented in the international literature, is positively defined, stable at any Courant number, and efficient computationally. We present the rigorous description of its original version, along with several updates that improve its monotonicity and shape preservation, allowing for applications to long-living species in conditions of complex atmospheric flows. The scheme is connected with other parts of the model in a way that preserves the sub-grid mass distribution information that is a cornerstone of the advection algorithm. The other parts include the previously developed vertical diffusion algorithm combined with dry deposition, a meteorological preprocessor, and chemical transformation modules. The quality of the advection routine is evaluated using a large set of tests. The original approach has been previously compared with several classic algorithms widely used in operational dispersion models. The basic tests were repeated for the updated scheme and extended with real-wind simulations and demanding global 2-D tests recently suggested in the literature, which allowed one to position the scheme with regard to sophisticated state-of-the-art approaches. The advection scheme performance was fully comparable with other algorithms, with a modest computational cost. This work was the last project of Dr. Sci. Michael Galperin, who passed away on 18 March 2008.


Introduction
One of the key problems in atmospheric composition modelling is the accuracy and reliability of numerical schemes.A less appreciated but important issue is the consistency of the approaches applied in different modules of the modelling system.Usually, model construction follows process-wise split (Yanenko, 1971;Marchuk, 1995;Seinfeld and Pandis, 2006), thus considering separately the advection and diffusion algorithms, physical and chemical transformations, and dry and wet deposition.In practical model developments, features of the transport algorithms, first of all, the advection scheme, largely shape the model and determine its area of application.

Advection schemes
There are numerous types of advection schemes currently employed in atmospheric dispersion models.Two major categories refer to Lagrangian or Eulerian treatment of tracers: as small-size masses (Lagrangian particles) or as the concentration fields discretised in a prescribed grid.The Eulerian schemes, the primary subject of this paper, can be divided into flux-form finite volume, semi-Lagrangian, or expansionfunction schemes.The expansion-function schemes approximate the solution with a given set of basis functions and, in turn, can be divided into spectral, pseudospectral and finiteelement approaches.Some classic schemes are also based on finite-difference approximations of the advective term of the transport equation.The basic principles of all these approaches were formulated several decades ago and, with cer-Published by Copernicus Publications on behalf of the European Geosciences Union.
tain modifications, are still in use.Many modern schemes combine several approaches.
The large diversity of the advection algorithms is explained by a long list of requirements for such schemes.The most important ones are positive definition, minimal numerical diffusion, limited non-monotonicity and non-linearity, stability with regard to high Courant numbers (the number of the model grid cells passed within one advection time step), small phase error, local and global mass conservation, and high numerical efficiency.Some of these requirements contradict each other.For example, numerical diffusion "blurs" the resulting patterns but also makes them smoother, thus improving the monotonicity.
The finite-difference schemes involve direct discretisation of the dispersion equation and various interpolation functions to evaluate derivatives of the concentration fields (see the reviews of Richtmyer, 1962;Leith, 1965;Roach, 1980, as well as Sect. 3.1 in Rood, 1987); specific examples are, for instance, Russell and Lerner (1981), and Van Leer (1974, 1977, 1979).These schemes, being once popular, usually suffer from large numerical diffusion and limited stability, which sets stringent limitations on the Courant number, usually requiring it to be substantially less than 1.Therefore, the interest has gradually shifted towards flux, finite-element, and semi-Lagrangian schemes.
The flux schemes represent the transport via mass fluxes at the grid cell borders, which are calculated from concentrations in the neighbouring cells and wind characteristics.They are inherently mass conservative and have become popular in atmospheric chemistry transport models (Kukkonen et al., 2012).Probably the most widely used flux-type scheme is the one made by A. Bott (Bott, 1989(Bott, , 1992(Bott, , 1993)), especially if one would count the numerous Bott-type schemes (see examples in Syrakov, 1996;Syrakov and Galperin, 1997;Syrakov and Galperin, 2000;Walcek and Aleksic, 1998;Walcek, 2000;Yamartino, 1993), which are based on the same principle but involve different approximation functions, monotonicity and normalisation procedures, etc.
The semi-Lagrangian schemes have been among the most widely used approaches for decades, with numerous algorithms using its basic concept (Crowley, 1968;Egan and Mahoney, 1972;Pedersen and Prahm, 1974;Pepper and Long, 1978;Prather, 1986;Smolarkiewicz, 1982;Staniforth and Cote, 1991, and references therein;Lowe et al., 2003;Sofiev, 2000, etc.).In the forward form, these schemes consider the transport of mass starting from the grid mesh points (departure points) and calculate the masses at the grid points closest to the arrival point.Backward algorithms start from arrival grid points and find the grid points near the departure point.The schemes can be based on tracking either grid points or grid cells along their trajectories.The grid-pointbased schemes are not inherently mass-conserving, whereas the volume-based schemes achieve mass conservation by integrating the mass over the departure volume.They can sometimes be described as a combination of finite-volume and semi-Lagrangian methods (Lin andRood 1996, 1997).Stability of these schemes can be ensured for a wide range of Courant numbers (Leonard, 2002).A general review can be found in Lauritzen et al. (2011), whereas a comparison of 19 modern schemes is discussed in Lauritzen et al. (2014), hereinafter referred to as L14.
Modelling in spectral space is another approach with a long history (Ritchie, 1988;Kreiss and Oliger, 1972;Zlatev and Berkowicz, 1988;Prahm and Christensen, 1977), but is not widely used today.
One of the main problems of the existing schemes is substantial numerical diffusion originating from the finite-step discretisation along space and time.Seemingly inevitable in an Eulerian context, this phenomenon, however, does not exist in Lagrangian advection schemes, which do not contain explicit discretisation of particle movement.The Lagrangian domain is a continuous space rather than a set of pre-defined grid meshes, and the position of the particles can be calculated precisely.As a result, numerical diffusion of purely Lagrangian schemes is always zero -at a cost of strongly nonmonotonous concentration fields due to limited spatial representativeness of a single Lagrangian particle and a limited number of particles.
One of the ways to reduce the diffusivity of an Eulerian scheme is to store additional prognostic variables to describe the state of each grid cell with higher spatial resolution than the formal cell size: a sub-grid mass distribution.This can take the form of extra conservation equation(s) for e.g.first-or higher-order moments (Egan and Mahoney, 1972;Prather, 1986).There are other approaches that use different kinds of extra information.For instance, the conservative semi-Lagrangian schemes (Yabe and Aoki, 1991;Yabe et al., 2001) use a semi-Lagrangian step to evaluate the mixing ratio at cell interfaces, and then use the interface values along with the cell integrals to derive an interpolant representing the sub-grid distribution.
In a series of works, Michael Galperin developed a semi-Lagrangian scheme that used the sub-grid information on mass centre location inside the cell.The scheme was made fully non-diffusive for isolated plumes, positively defined, and very efficient computationally (Galperin et al., 1994(Galperin et al., , 1995(Galperin et al., , 1997;;Galperin, 1999;Galperin and Sofiev, 1998;Galperin and Sofiev, 1995;Galperin, 2000).The early version of this scheme applied in the large-scale modelling by Sofiev (2000) resembled the "moving-centre" approach widely used in aerosol dynamics models (Kokkola et al. 2008) and shared its characteristic weakness -high nonmonotonicity.The later developments substantially reduced it without damaging other features (Galperin, 1999(Galperin, , 2000)).Further development of this scheme is the subject of the current paper.

Horizontal and vertical diffusion, dry deposition
Diffusion algorithms are less diverse than advection schemes.The physical reason for one of the common diffusion parameterisations is described in detail by Smagorinsky (1963), who suggested a formula for grid-scale scalar eddy diffusivity based on the model resolution and wind speed derivatives, thus connecting the numerical features of the simulations and hydrodynamics.It is widely used in chemical transport models (Kukkonen et al., 2012).
The dry deposition is usually accounted for as a boundary condition for the vertical advection-diffusion equation.Computation of dry deposition for gases practically always follows the electrical analogy, for which one of the first comprehensive parameterisations was suggested by Wesely (1989).Among the extensions of this approach, one was suggested by Sofiev (2002), who combined it with vertical diffusion and connected it with the Galperin advection scheme.The algorithm used an effective mean diffusion coefficient over thick layers calculated from high-resolution meteorological vertical profiles, the direction also recommended by Venkatram and Pleim (1999).These thick layers were determined using the subgrid information available from the advection scheme, which increased the accuracy of both algorithms (Sofiev, 2002).
For aerosol species, the electrical analogy is not correct (Venkatram and Pleim 1999).Compromising approaches suggested by Slinn (1982) and Zhang et al. (2001) and updated by Petroff and Zhang (2010) involve numerous empirical relations, sometimes on thin ground.A more rigorous approach unifying the gas and aerosol deposition parameterisations into a single solution was developed by Kouznetsov and Sofiev (2012).

Organisation of the paper
The current paper describes the Eulerian transport algorithm of the System for Integrated modeLling of Atmospheric coMposition SILAM v.5, which is based on the advection scheme of Michael Galperin with several updates.
The paper is organised as follows.Section 2 describes the original algorithm of Michael Galperin and positions the scheme among other approaches.Section 3 presents the improvements made during its implementation and testing in SILAM.Section 4 outlines the scheme interconnections with other model parts.Standard and advanced model tests are shown in Sect. 5. Finally, the discussion in Sect.6 includes an analysis of the scheme performance in the tests, as well as comparison with other algorithms.
Following the SILAM standards, all units throughout the paper are the basic SI: (mole) for chemicals, (kg) for aerosols without chemical speciation, (m) for distance and size, (s) for time, etc.The model operates with concentrations (mol m −3 ) or (kg m −3 ).Some of the tests below are formulated in mixing ratios (mol mol −1 ) or (kg kg −1 ).

Basic equations
We consider the forward dispersion equation with the firstorder K-theory-based closure for diffusion: Here ϕ is the concentration of the pollutant, t is time, E is the emission term, ξ i , i = 1..3 denote the three spatial axes, u i are components of the transport velocity vector along these axes, µ ij are components of the turbulent diffusivity tensor, ρ is air density, and ζ represents the transformation source and sink processes.
Equation ( 1) is considered on the time interval t ∈ (t 0 , t N ) in the domain {ξ i } ∈ = [h 1 , H ] × , where the heights h 1 and H are the lower and upper boundaries of the computational domain and is the horizontal computational area with border ∂ .The initial conditions are (2) Boundary conditions depend on the type of the simulations.In a general form, they constrain concentration and/or its first derivative: Here the values of α, β, and γ depend on the type of the boundary.In particular, dry deposition at the surface ξ 3 = h 1 is described via α = µ 33 , β = −v d (dry deposition velocity), and γ = 0; prescribed concentration ϕ l at the lateral boundaries ξ 1,2 ∈ ∂ implies α = 0, β = 1, γ = ϕ l , etc.A global domain would require periodic longitudinal conditions.

Advection scheme of Michael Galperin
The current section presents the advection algorithm suggested by Michael Galperin in the 2000s as a contribution to the Eulerian transport scheme of SILAM.The idea of the scheme can be found in a short methodological publication of Galperin (2000) (in Russian) and conference materials (Galperin, 1999;Sofiev et al., 2008).It is very briefly outlined by Petrova et al. (2008) (hereinafter referred to as P08), but no systematic description exists so far.
For the 1-D case, let us define the simulation grid, ξ 1 = x, as a set of I grid cells i = 1..I .Let the coordinate of the centre of the ith grid cell be x i , and the coordinates of the cell left-and right-hand borders be x i−0.5 and x i+0.5 , respectively.The 1-D cell size is then V i = x i+0.5 − x i−0.5 .The advected field ϕ, in each grid cell i, is defined via the total u Formation of slabs Eq.( 5) Move with wind Eq.( 6) -( 7) Reprojection Eq.( 8) mass in the cell M i and the position of the centre of mass X i , X i ∈ [x i−0.5 , x i+0.5 ]: x i +0.5 x i −0.5 xϕ(x)dx. (4) Let us represent the mass distribution in each grid cell via the rectangular slab: where n is the time step and is the distance from the centre of mass X n i to the nearest border of the cell i.Equation ( 5) defines the widest unit-volume slab that can be confined inside the cell (Fig. 1) for the given centre of mass.
The advection scheme consists of a transport step and a reprojection step.At the transport step, each slab π i is moved along the velocity field u(x).Advection of the slab does not change its shape within the time step δt = t n+1 − t n , but can move it anywhere over the domain or bring it outside.In essence, the slab transport is replaced with advection of its mass centre, which during this time step becomes analogous to a Lagrangian particle: where u(X n i , t) is the wind speed at the mass centre location.The original Galperin scheme employed wind at the cell mid-point x i and used explicit first-order time discretisation: u(x n i , t n ) = u n i .Then the transported slab is given by Following the transport step (7), the masses M k and centres of mass X k of the receiving set of cells k ∈ K are updated using the transported slabs π n i : where α i,k = x k +0.5 x k −0.5 π n i (x)dx and β i,k = x k +0.5 x k −0.5 x π n i (x)dx correspond to the mass and the first-moment fractions arriving from the cell i into cell k.The integrals are easy to evaluate due to the simple form of π n i (x) in Eq. ( 5).In essence, Eq. ( 8) describes a mass-conservative projection of the advected slab to the computation grid.
The coefficients α i,0 = 0.5 −∞ π n i (x)dx and α i,I +1 = ∞ I +0.5 π n i (x)dx determine the transport outside the domain through the left and right borders, respectively; that is, the scheme is fully accountable and mass-conservative i (x)dx = 1 for each i.Also, since the functions π n i (x) are nonnegative, the coefficients α i,k are nonnegative, and consequently M n+1 k ≥ 0 as long as M n i ≥ 0 for all i.It means that the scheme is positively defined for any simulation set-up: u, t, and discretisation grid.
In multiple dimensions, the slabs are described by the total mass in multidimensional cells and centres of mass along each dimension.In two dimensions, an analogue of Eq. ( 5) will be n i,j (x, y) = π n i,j (x)π n i,j (y), where the functions π i,j (x) and π i,j (y) depend on X i,j and Y i,j , respectively.The advection step in the form of Eq. ( 7) and the slab projection integrals Eq. ( 8) are then defined in 2-D space.However, a simpler procedure used in the original scheme is obtained with dimensional splitting, where the transport in each dimension is processed sequentially with the grid projection applied in between.For an x−y split, the intermediate distribution for each row j is obtained by setting n+1/2 i,j (x, y) = π n i,j (x)π n i,j (y), evaluating α i,k and β i,k from π n i,j (x) and updating M i,j , X i,j and Y i,j .Since y j +0.5 y j −0.5 π i,j (y)dy = 1 and y j +0.5 y j −0.5 yπ i,j (y)dy = Y n i,j , the 2-D slab projection simplifies to The y advection is then performed by taking the transport step for π n+1/2 i,j (y) starting from Y n+1/2 i , evaluating α i,k and (y), and applying the re-projection Eq. ( 11) with X and Y inverted.The generalisation to three dimensions is analogous.

Relation of the Galperin scheme to other approaches
The Galperin scheme shares some features with other approaches (see the reviews of Rood, 1987, andLauritzen et al., 2011).Arguably the closest existing scheme is the finitevolume approach of Egan and Mahoney (1972), hereinafter referred to as EM72, and Prather (1986), hereinafter P86.
The main similarity between these schemes is the representation of the mass distribution as a set of slabs (rectangular in EM72 and continuous polynomial distributions in P86), one per grid cell, with the mass centre identified via the slab first moment, plus additional constraints.During the EM72 and P86 advection steps, mass and the first moment are conserved, similarly to the reprojection step (8).However, the similarity ends here.
There are several principal differences between the EM72/P86 and Galperin algorithms.
Firstly, in EM72 the slab width is computed via the second moment (variance) of the mass distribution in the grid cell.P86 uses the second moments to constrain the shape of the polynomials.As a result, this moment has to be computed and stored for the whole grid, and the corresponding conservation equation has to be added, on top of those for the mass and the first moment.Galperin's approach does not require the second moment, instead positioning the slab against the cell wall.In fact, EM72 pointed out that the second moment can be omitted, but did not use the wall-based constraint in such a "degenerated" scheme, which severely affected its accuracy.
Secondly, EM72 uses the movements of the slabs in adjacent grid cells to calculate the mass flows across the border.Such local consideration requires the Courant number to be less than 1: the so-called "portioning parameter" (a close analogy to the Courant number in the scheme) is limited between 0 and 1.The same limitation is valid for the P86 approach.Galperin's scheme can be applied at any Courant number and its re-projection step can rather be related to Lin and Rood (1996).

Updates of the scheme in SILAM v.5
There are several features of the original scheme which make it difficult to use in large-scale chemical transport simulations.These are listed here and the corresponding improvements are introduced in the following sub-sections.
-The scheme is formulated with zero inflow boundary conditions.
-Real-wind tests have shown that the scheme has difficulties in complex-wind and complex-terrain conditions, similar to EM72 (Ghods et al. 2000).
-The scheme, being very good with individual sharp plumes over zero background, noticeably distorts the smoother fields with a non-zero background -see P08.
In addition, the accuracy of the dimension split was increased via symmetrisation: the order of dimensions in SILAM routines is inverted each time step: x−y−z−z−y−x (Marchuk, 1995).

Lateral and top boundary conditions
The open boundary for the outgoing masses is kept in SILAM regional simulations.The inflow into a limited-area domain is defined via prescribed concentration at the boundary (Eq.3), α = 0, β = 1, γ = ϕ l .The mass coming into the domain during a single time step is equal to Here ℵ(u) is a Heaviside function (= 1 if u > 0, = 0 if u ≤ 0).Assuming the locally constant wind, we find that M in is distributed uniformly inside the slab, similar to that of Eq. ( 5).For e.g. the left-hand border, the continuous form will read It is then projected to the calculation grid following Eq.( 8).
The top boundary follows the same rules as the lateral boundaries.At the surface, the vertical wind component is zero, which is equivalent to closure of the domain with regard to advection.
Global-domain calculations require certain care: SILAM operates in longitude-latitude grids; that is, it has singularity points at the poles and a cut along the 180 Vertical motion in the pole cylinders is calculated separately using the vertical wind component diagnosed from the global non-divergence requirement.

Extension of the scheme for complex wind patterns
The original Galperin scheme tends to accumulate mass at stagnation points where one of the wind components is small.Similar problems were reported by Ghods et al. (2000) for the EM72.Ghods et al. (2000) also suggested an explanation and a generic principle for solving the problem: increasing the number of points at which the wind is evaluated inside the grid cell.For application in the Galperin scheme, it can be achieved by separate advection of each slab edge instead of advecting the slab as a whole.This allows for shrinking and stretching of the slab following the gradient of the velocity field.Formally, this can be written as follows.
Let us again consider the 1-D slab that has been formed according to Eq. ( 5).Its edges are The advection step takes the wind velocity at the left and right slab edges and transports them in a way similar to Eq. ( 6) with the corresponding wind velocities.The new slab is formed as a uniform distribution between the new positions of the edges: where X k L,i , X k R,i are the new positions of the slab edges at the end of the time step.This new slab is then projected following Eq.( 8).

Changing wind along the mass-centre trajectory
The explicit advection step (Eq.7) is inaccurate and can be improved under the assumption of linear change of wind inside each grid cell, with values at the borders coming from the meteo input: Then, the trajectory equation ( 6) can be piece-wise integrated analytically for each slab edge.Let us denote u = u i+0.5 − u i−0.5 , t = t n+1 − t n , α = u/ t and consider the transport starting at e.g.x i−0.5 .Then, the time needed for passing through the entire cell, x = x i+0.5 − x i−0.5 , is If t < T cell , the point will not pass through the whole cell and stop at Applying sequentially Eqs. ( 18) and ( 19) until completing the model time step t, one obtains the analytical solution for the final position of the slab edges.
This approach neglects the change of wind with time.However, the integration method is robust, since the linearly interpolated wind field is Lipschitz-continuous everywhere, which in turn guarantees the uniqueness of the trajectories of X L and X R .Therefore, using the analytic solution Eqs. ( 18) and ( 19), the borders of the slabs will never cross.

Reducing the shape distortions
The original scheme tends to artificially sharpen the plume edges and to gradually redistribute the background mass in the vicinity of the plume towards it (Fig. 2, blue shapes).Similar "antidiffusive" distortions were also reported by P08 and by EM72 -for their scheme.
The reason for the feature can be seen from Eq. ( 8): if a large mass is concentrated at one of the grid cell sides, the centre of mass becomes insensitive to the low-mass part of the cell; that is, a small mass that appears there from the neighbouring cell is just added to the big slab with little effect on its position and width.
A cheap, albeit not rigorous, way to confront the effect is to compensate for it via an additional small pull of the mass centre towards the cell mid-point before forming the slab for advection: where ε is the smoothing factor.The adjusted mass centre Xn i is then used to form the slab Eq. ( 5).
The way this smoother works becomes clearer if one notices that the Galperin scheme becomes similar to the upwind algorithm if the locations of the centres of masses are always forced to the middle of the grid cells at the beginning of the time step.The upwind scheme is very diffusive, and relaxation towards it confronts the antidiffusive features of the Galperin approach.The actual value of ε cannot be easily obtained from some optimisation problems.Its increase from 0 up to 1 gradually makes the scheme more and more diffusive, with the above distortions becoming negligible by ε ∼ 0.08 (Fig. 2, red shapes).This behaviour and the value were similar for various Courant numbers and tests.It is also seen from the spectral features of the scheme in the next section -and further discussed in relation to scheme tests.

Analysis in frequency space
The non-linearity introduced by the coupling of cell mass and centre of mass in Eq. ( 8) makes formal stability and convergence analysis after Charney et al. (1950) difficult.However, the features of the scheme can be investigated numerically following the approach of Kaas and Nielsen (2010).
The scheme was run for 200 time steps in a 1-D domain with 100 grid points.For each wave number up to 25, the scheme was initialised with the corresponding sine function and run with Courant numbers ranging from 0.05 to 0.95 in steps of 0.05.This allowed evaluation of the spectrally resolved root mean squared error (RMSE) and, after a Fourier transform, the spectral amplification factor (cumulative for the 200 steps) for each wave number.The amplification factor quantifies the scheme's ability to resolve the corresponding harmonics, while the RMSE additionally includes the effect of phase errors and possible spurious modes.Since the scheme is formulated for nonnegative concentrations, a constant background B = 1 is added to each waveform.
Figure 3 presents the amplification factor and RMSE for the Galperin scheme without the smoother (panels a, d) and with it, ε = 0.08 (panels c, f).Furthermore, the impact of doubling the background component to twice the wave amplitude is shown (panels b, e).In the case of B = 1, the scheme without the smoother shows only minor damping of all considered wave numbers (k up to 25).The RMSE has a maximum for k of between 5 and 10 but stays almost constant from k = 10 to k = 25.This shows the scheme's ability to resolve sharp gradients when there is no significant background.The cumulative amplifying factors for some wavelengths exceed 1, but this does not imply instability, since the single-step amplifying factors fluctuate depending on the positions of the centres of mass.If the integration is continued over a large number of time steps (not shown), the solution converges to a combination of rectangular pulses (a similar feature was mentioned in EM72 for that scheme).
Introducing the smoothing ε = 0.08 resulted in strong attenuation of high-frequency components and increased the RMSE for wave numbers above ∼ 10.Since the smoothing factor effectively damps the fluctuations of the centres of mass, the amplification factors are below 1 for all wave numbers.Adding a background term also reduces the responsiveness of the mass centres to newly coming amounts (see Eq. 8), which leads to a similar damping of the higher frequencies as in Fig. 3c, f.To further investigate the spectral response of the scheme, it was evaluated with a broadband input: Figure 4, right panel, depicts the power spectral densities for the exact and numerical solutions after a single revolution with CF = 0.7 and 100 grid points.The corresponding solutions are shown in the leftpanel.For the comparison, results are also shown for the fourth-order Bott (1989) scheme without shape preservation, and for a generic non-conservative upstream semi-Lagrangian scheme with cubic spline interpolation.
With B = 1, all schemes capture the first spectral peaks around k = 10 and therefore resolve most of the spectral content.Without a smoother, the Galperin scheme that follows the spectrum of the true solution also resolves the spectral features around k = 30.Application of the smoother leads to a damping effect throughout the spectrum, including the spurious high-frequency components, such as the peak at k = 40.This illustrates the use of the smoother for reducing over-and under-shoots, as discussed in Sect.3.4.
Similarly to the single-harmonic tests, the situation changes in the presence of a significant background (B = 2).Regardless of smoothing, the Galperin scheme damps the spectral peaks starting around k = 10, which corresponds to the reduction of amplitude visible in the numerical solution.

Connection of the advection scheme with other SILAM modules
Construction of the dispersion model using the Galperin advection scheme as its transport core is not trivial because all other modules should support the use of the sub-grid information on positions of the mass centres.In some cases it is straightforward, but in others one can only make the module to return them undamaged back to advection.

Vertical axis: combined advection, diffusion, and dry deposition
For the vertical axis, SILAM combines the Galperin advection with the vertical diffusion algorithm following the extended resistance analogy (Sofiev, 2002), which considers the air column as a sequence of thick layers.Vertical slabs within these layers are controlled by the same 1-D advection, which is performed in absolute coordinates -either z-or p-, depending on the vertical (height above the surface or hybrid).Settling of particles is included in advection for all layers except for the first one, where the exchange with the surface is treated by the dry deposition scheme.
The centres of masses are used but not modified by diffusion: the effective diffusion coefficient between the neighbouring thick layers is taken as an inverse of aerodynamic resistance between the centres of mass of these layers (Sofiev, 2002): The effective thickness z i,i+1 is taken to be proportional to the pressure drop between the centres of masses, which ensures equilibration of mixing ratios due to diffusion.
In the lowest layer, the dry deposition velocity is calculated at the height of the centre of mass Z 1 following the approach of Kouznetsov and Sofiev (2012).
The advantages of using the mass centres as the vertical diffusion meshes are discussed in detail by Sofiev (2002), where it is shown that the effect can be substantial if an inversion layer appears inside the thick dispersion layer.Then the location of the mass centre above/below the inversion can change the up/down diffusion fluxes by a factor of several times.

Emission-to-dispersion interface
Emission data are the only source of sub-grid information apart from the advection itself: location of the sources is transformed into the mass centre positions of their emission.
For point sources, the mass is added to the corresponding grid cell and centres of masses are updated: where M ems is the mass emitted to the cell (i, j, k) during the time step, X ems , Y ems are the coordinates of the source in the grid and Z k ems is the effective injection height in the layer k, equal to the middle of the layer if no particular information is available.
For area sources, the approach depends on the source grid.If it is the same as the computational one, the mass centre is put to the middle of the cell (no extra information can be obtained).If the grids are different, the source is re-projected.For each computational grid cell (i, j ), the centre of mass of emission is Y em,ij = (x,y)∈(i,j ) yM(x, y) dx dy (x,y)∈(i,j ) M(x, y) dx dy , where M(x, y) denotes the original source distribution.After that, the procedure is the same as in the case of point source Eq. ( 23).

Meteo-to-dispersion interface
Modifications described in Sect. 3 require staggered wind fields, which have to be provided by the meteo pre-processor (unless they are directly available from the input data).Moreover, the pre-processor needs to ensure consistency between the flow and air density fields (Prather et al. 1987;Rotman et al., 2004;Robertson and Langner, 1999).This is particularly important with the present advection scheme, since mixing ratio perturbations caused by the mass-flow inconsistency are not suppressed by numerical diffusion.
The wind pre-processing follows the idea of a "pressure fixer", which means adding a correction δV to the original horizontal wind field V 0 such that for their sum, the vertical integral of mass flux divergence corresponds to the surface pressure tendency: where the surface pressure tendency ∂p s /∂t is evaluated from the meteorological input data.The correction δV is not uniquely determined, and SILAM adopts the algorithm of Heimann and Keeling (1989), where the correction term is given by the gradient of a 2-D potential function: Substituting Eq. ( 26) into (25) yields a Poisson equation for ψ(x, y), which is solved to subsequently recover δV .The obtained correction flux is then distributed within the column proportionally to the air mass in each layer, ending up with the corrections to the horizontal winds.The vertical wind is then evaluated in each column to enforce the proper air-mass change in each cell.

Chemical module interface
This interface is implemented in a very simple manner: the mass centres are not affected by the transformations.The chemical module deals exclusively with concentrations in the grid cells.The newly created mass is added to the existing one, thus accepting its centre position in the cell.If some species did not exist before the transformation, the new mass centre is put to the middle point of the cell.

Standard tests
A set of basic tests and comparison with some classical approaches has been presented by Galperin (1999) and P08 for the original scheme, along with Bott, Holmgren, and several other schemes.Their main conclusions were that the scheme is very good for sharp-edge patterns: in particular, it transports delta functions without any distortions.It had, however, issues with long slopes, smooth shapes, etc., where the tendency to gradually convert them to a collection of rectangles was noticeable.Addressing these concerns, tests used during the scheme improvements and implementation in SILAM included puffover-background, conical and sine-shaped peaks and dips, etc. (some examples are shown in Fig. 2), a divergent 1-D high-Courant wind test in the 1-D divergent wind field (Fig. 5), a constant-level background field in eight vortices with stagnation points (Fig. 6), and rotation tests for various shapes (Fig. 7).
The scheme stays stable at arbitrarily high Courant numbers and handles the convergence and divergence of the flows (Fig. 5).
Transport and rotation tests of the improved scheme maintain low distortions of the shapes: the L 2 norm of the error varies from 0.1 % up to 3.8 % of the initial-shape norm -for the most challenging task in Fig. 7.The effect of the improvements in comparison with the original scheme is demonstrated in Fig. 2, where the blue contours show the results of the original scheme.In particular, application of the smoothing Eq. ( 20) reduced the distortions of smooth shapes (red curves), largely resolving the concerns of P08: Fig. 2b presents the same test as one of the P08 exercises.However, the smoother also leads to a certain numerical viscosity of the scheme, so its use in problems requiring nondiffusive schemes (e.g.narrow plumes from accidental releases) should be avoided.

Global 2-D tests
Performance of Galperin's advection scheme in the global spherical domain was assessed with the collection of demanding tests of Lauritzen et al. (2012).The cases are designed to evaluate the accuracy of transport schemes at a wide range of resolutions and Courant numbers.The tests used a prescribed non-divergent 2-D velocity field defined on a sphere and consisting of deformation and rotation, so that the initial concentration pattern is reconstructed at the end of the test, t = T , providing the exact solution ϕ(t = 0) = ϕ(t = T ).
Four initial concentration distributions were used (Fig. 8): "Gaussian hills" with unity maximum value, "cosine bells" with a background of 0.1 and maxima of 1, "slotted cylinders" -a rough pattern with a 0.1 background and 1 maximum level, and "correlated cosine bells" -the distribution obtained from "cosine bells" with a function The tests were run with SILAM on a global regular nonrotated lon-lat grid, with R = 6400 km and T = 12 h.Spatial resolutions were 6, 3, 1.5, 0.75, 0.375, and 0.1875 • , each run with mean Courant numbers of ∼ 5.12, ∼ 2.56, and ∼ 0.85 (for a 6 • grid they correspond to the model time step of T /12 = 1 h, T /24 = 30 min, and T /72 = 5 min), and a total of 18 runs for each initial pattern.
Examples of the most challenging runs with slotted cylinders at t = T /2 and at t = T are shown in Figs. 9 and 10, respectively.The corresponding error fields are collected in Fig. 11 as decimal logarithms of the absolute difference between the corresponding field in Fig. 10 and the slottedcylinder initial shape of Fig. 8.The main complexity of the test was in reproducing the very tiny sharp-edge structures obtained from the cylinder cut at t = T /2 -and then returning them back by t = T .The pictures, together with the error field at t = T (Fig. 11), show that already 24 time steps allow the scheme to make the shape recognisable (3 • , C = 5.12 pattern), whereas 48 time steps allow for the main details to show up.Expectedly, certain deviations at the cylinder edge remain at any resolution -as is visible from the error fields.
Deviation of the resulting field ϕ T = ϕ(t = T ) from the initial shape ϕ 0 = ϕ(t = 0) was considered in three spaces: L 2 , L ∞ , and L 1 .The corresponding distance metrics are defined as follows: where S[•] is an area-weighted sum over latitude and longitude.The values of these three metrics for all model runs are presented in Fig. 12.The main interest of these curves is that they show the rate of the scheme convergence (straight grey lines correspond to the first-and second-order convergence rates).Expectedly, the rates depend on the transported shape (the smoother the shape, the faster the convergence) and on the norm used.Thus, the scheme converges in L 1 faster than in L 2 , whereas in L ∞ no convergence in the case of sharp edges is an expected result.The rate in the L 2 norm is in between the first and the second order, whereas in L 1 it is close to the latter one.Advection should also keep the local ratio of the tracer's concentrations.Such a ratio between "cosine bells" and "correlated cosine bells" was calculated at t = T /2 and t = T .Since these initial patterns are related by Eq. ( 27), the concentration fields during the tests should maintain the same relation.The scatter plots of the concentrations in these two tests give an indication of how the ratio is kept.Ideal advection would keep all points on a line given by Eq. ( 28).The results of the tests for t = T /2 are shown in Fig. 13, where the results with and without the smoother in Eq. ( 20) are presented.The smoother improves the scheme mixing preservation; that is, it can be recommended to chemical composition computations, which usually also tolerate some numerical viscosity.

Global 3-D test with real wind
Testing the scheme with real-wind conditions has one major difficulty: there is no accurate solution that can be used as a reference.An exception is simulations of the constantmixing-ratio 3-D field, which, once initialised, must stay constant throughout the run.Deviation from this constant is then the measure of the model quality.Such a test verifies both the scheme and the meteo-to-dispersion interface, which has to provide the consistent wind fields.
The constant-vmr test was set with winds taken from the ERA-Interim archive of ECMWF, for the arbitrarily selected month of January 1991 (Fig. 15).The model was initialised with vmr = 1 and run with 3 • of lon-lat resolution and a time step of 30 min (max Courant number exceeding 13 in the stratosphere and reaching up to 2-3 in the troposphere).The model top was closed at 10 Pa, which corresponds to the top level of the ERA-Interim fields.The procedure described in Sect.4.3 was used to diagnose the vertical wind component.
The results of the test are shown in Fig. 15, which depicts the model state after 240 h of the run, panel a) showing the boundary-layer vmr, and panel b) presenting it in the strato-sphere.The zonally averaged vertical cross section is shown in panel c).Green colours in the pictures correspond to less than 1 % of the instant-field error.
An important message is that the limited distortions about 1-2 % are visible in a few places, but they are not related to topography, rather being associated with the frontal zones and cyclones.The comparatively coarse spatial and temporal resolution of the test makes the associated changes of the wind quite sharp, so that the dimension-split errors start manifesting themselves.Smoother flows in the stratosphere posed minor challenges for the scheme.The L2 error (not shown) is approximately proportional to the model time step.

Discussion
The presented SILAM v.5 transport module is based on the semi-Lagrangian advection scheme of Michael Galperin with subgrid information available through the positions of centres of masses.It poses certain challenges in implementation.Firstly, one has to organise the sub-grid information use and transmission between the advection and other model units.Secondly, the scheme requires storage of four full fields for each transported species (mass and moments) and care should be taken to maintain an efficient exchange between the processors and the computer memory.Thirdly, the possibility to run with high Courant numbers and MPI parallelisation via horizontal domain split can be utilised only if the MPI split allows for sufficient buffer zones.Finally, the better performance of the advection at a Courant number greater than 1 challenges the implementation of other modules, first of all, chemistry and emission.Indeed, introduction of emitted mass once per long time step would result in a broken plume unless the mass is spread downwind over the corresponding distance.Similar problems show up in chemical    transformation calculations.At present, the actual SILAM applications are performed with Courant close to but mostly smaller than 1 to avoid such problems.
The above challenges are mostly technical and their solution allows the scheme to demonstrate strong performance with low computational costs.
In particular, by attributing the release from point source to its actual location, one can reduce the impact of the common problem of Eulerian models: point release is immediately diluted over the model grid cell.This substantially improves the transport but does not solve the problem completely: (i) the chemical module still receives the diluted plume concentration, and (ii) the slab size in the case of the source near the centre of the grid cell will still be as large as the grid cell itself.A more accurate solution would be the plume-in-grid or similar approaches, which is being built in SILAM.Another example of the sub-grid information usage is utilisation of full meteorological vertical resolution to calculate effective values of meteo variables for thick dispersion layers (Sofiev, 2002).
The model can operate at any Courant number (Fig. 5).Its time step is limited not by grid cell size, but by a spatial scale of the wind-shear field; that is, it has to satisfy a much less restrictive Lipschitz criterion, which relates spatial and temporal truncation errors (Pudykiewicz et al., 1985).It follows from the advection step Eq. ( 6) and the reprojection step Eq. ( 8), which do not restrict new positions of the slabs: they can find themselves anywhere in the grid or outside it after the time step is made.SILAM heavily relies on such features of Galperin scheme as mass conservation and accountability: the scheme provides complete mass budget including transport across the domain boundaries.In particular, nesting of the calculations is straightforward and does not need the relaxation buffer at the edges of the inner domain: the inflow through the boundaries is described by the same slabs as the main advection.The scheme is also shape-preserving -in the sense this term is used by L14 -that is, it does not result in unphysical solutions, such as a negative mixing ratio.Some distortions are still possible (Fig. 2), which can be reduced by the smoother described in Sect.3.4, Eq. ( 20).

Standard advection tests
Evaluating the Galperin scheme with the simple tests (Figs.2-7), one can point out the known issues of the classical schemes resolved in the Galperin approach: high-order algorithms suffer from numerical diffusion, oscillations at sharp gradients (require special efforts for limiting their amplitude), high computational costs and stringent limits to Courant number.None of these affects the Galperin scheme.
The main issue noticed during the implementation of the original scheme was the unrealistically high concentrations near the wind stagnation points.Thus, the concentration pattern at the test Fig. 6a resembles the situation of a divergent wind field.However, it is not the case: the 2-D wind pattern is strictly solenoidal.The actual reason is insufficient resolution of the advection grid: one centre of mass point is not enough if the spatial scale of the wind variation is comparable with the grid cell size.Tracking the edges of the slab rather than its centre resolves the problem (Fig. 6b).
The other challenging tasks for Galperin algorithm were those with smooth background and soft gradients, a frequent issue for semi-Lagrangian schemes, which is easily handled by more diffusive approaches.This feature was visible in the P08 tests where the scheme noticeably distorts the Gaussian and conical plumes.For the puff-over-background pattern, the scheme makes a single low-mass dip in the vicinity of the puff, which receives this mass (Fig. 2).From a formal point of view, the scheme does not conserve the higher mo- ments inside the grid cell, which becomes a problem when the pattern changes at a spatial scale shorter than the grid cell size.The smoothing step (20) may be advised despite it having no rigorous basis and, as in L14 evaluation of other schemes, may damage some formal quality scores (adding this step introduces numerical viscosity -Fig.2).

Global 2-D and real-wind advection tests
The application of the scheme to the highly challenging tests of Lauritzen et al. (2012) allowed its evaluation in a global 2-D case and comparison with the state-of-the-art schemes evaluated by L14 and Kaas et al. (2013).
Performing these tests with different spatial and temporal resolutions, as well as Courant numbers, suggested that the scheme has an "optimal" Courant number for each spatial resolution where the error metrics reach their minimum, so that the increase in temporal resolution is not beneficial.Indeed, in Fig. 12 the low Courant runs are by no means the most accurate.This is not surprising: for an ideal scheme, increasing the grid resolution and reducing the time step should both lead to gradual convergence of the algorithm; that is, the error metrics should reduce.For real schemes, higher temporal resolution competes with accumulation of the scheme errors with increasing number of steps.Convergence in L14 tests was still solid for all fixed Courant number series (Fig. 12), but excessive temporal resolution (specific to each particular grid cell size) was penalised by higher errors.Similarly, the most accurate representation of the correlated patterns is obtained from the runs with the intermediate Courant numbers (Fig. 13).This seems to be a common feature: the same behaviour was noticed by L14 for several schemes.
High optimal Courant numbers, however, should be taken with care.For L14, the smooth wind fields reduced the dimension-split error and made the long time steps particularly beneficial.
It is also seen (Fig. 11) that the best performance, in case of a near-optimal Courant, is demonstrated by the high-spatialresolution simulations, which have reproduced both the sharp edges of the slotted cylinders, the flat background and the cylinder's top planes.
The scheme demonstrated a convergence rate higher than 1 for all metrics and all tests with smooth initial patterns.Even for the most stringent test with the slotted cylinders, the scheme showed the first-order convergence rate in the L 1 norm (Fig. 12).
Among the other features of the solution, one can notice a certain inhomogeneity of the background field away from the transported bodies.The error is very small (< 10 −4 ) for highresolution cases (Fig. 11) and < 0.1 % for inexpensive setups, such as λ = 0.75, C = 2.56.For coarser resolutions, it grows.The inhomogeneity also grows with Courant number, which is opposite to the decreasing error of representation of the shapes themselves.The issue originates from the dimension-split error in polar areas, where the spatial scale of wind change becomes comparable with the distance passed by the slabs within one time step.
Similar non-monotonicity of background is visible for some schemes tested by L14.Unfortunately, no error fields are given there, but Figs.7-10 there are comparable with our Fig. 9 (results without a smoother).With few excep-tions (schemes TTS-I and LPM, notations of L14), all algorithms manifested such patterns unless filters are applied.For some schemes (SFF-CSLAM3, SFF-CSLAM4, UCISOM-CS, CLAW, and CAM), these inhomogeneities are visible also for the tests with shape-preserving filters.One should note however that the 0.1 level, which distinguishes between the two violet colours in Figs. 9 and 7-10 of L14, corresponds to the background level in the slotted-cylinder test.As a result, even a very small deviation leads to the appearance of such shapes in the plots (note the stripes in the background of Fig. 8).
Comparing the so-called "minimal resolution" threshold for L 2 , the norm of cosine bells to reach 0.033 (Fig. 3 of L14) for SILAM was about 0.75 • , which puts it in the middle of that multi-model chart (the specific place depends on whether the shape preservation is considered or not).
Another criterion can be the optimal convergence of L 2 and L ∞ norms for Gaussian hills: about 1.7-1.8 for SILAM -this is again in the middle of the L14 histograms, in the second half if the unlimited schemes (without shapepreservation filters) are considered and in the first half if the unphysical negative concentrations are suppressed (since the Galperin advection is strictly positively defined, no extra efforts are needed to satisfy this requirement).
Interestingly, the L14 tests were limited with 3 • as the coarsest resolution, and it was pointed out that the schemes start converging only when a certain limit, specific to each scheme, is reached.The SILAM results show similar behaviour only for the lowest Courant number (red lines in Fig. 12), which indeed required appropriate resolution to start working.Higher Courant set-ups were much less restrictive (the errors decrease with growing resolution also for coarse grids) and, as already pointed out, often worked better than the low Courant runs (similar to many L14 schemes).
The scheme demonstrated limited distortion of preexisting functional dependence -see the cosine bells and correlated cosine bells tests in Eq. ( 27) (Fig. 13).Formal scores suggested by Lauritzen et al. (2012) calculated for the Galperin scheme are shown in Fig. 14.Notations are the following.l o , "overshooting", describes the values that fell outside the rectangular [0.1 : 1] (Fig. 13), l u , "shape-preserving unmix", describes the values inside that rectangular but outside the "lens" formed by its diagonal (0.1, 1)-(1, 0.1) and the curve, and l r , "real mixing", describes the values inside the "lens".Comparison with L14 (Fig. 15, middle panel) shows that the Galperin scheme outperforms CLAW, SLFV-ML, SLFV-SL, and all set-ups of ICON schemes, being close to CAM-SE, MIPAS, and HOMME, and trailing behind the runs with CSLAM, HEL, SFF, and UCISCOM schemes.
A peculiarity of the mixing diagnostic scores is that they are significantly affected by the background areas far from the advected bells, which occupy only a small fraction of the domain (Fig. 8).As a result, small background fluctuations discussed above in application to slotted cylinders (see the error field in Fig. 11) contribute significantly to the mixing diagnostic scores too.In particular, the high Courant simulations, which accurately reproduce the bells themselves (the dots are close to the curve in the scatter plots in Fig. 13), still show poor formal scores due to non-zero width of the cloud near the location (0.1, 1), where all background dots should arrive.This issue contributes most significantly to the "overshooting" part of the error, but also to the other two components.
Expectedly, the smoother improves the mixing diagnostic scores, mainly affecting the representation of the bells themselves (Fig. 13).This is in contrast with the schemes tested in L14, where the shape-preservation filters mostly removed the penalty for overshooting the background but rarely improved the other two components, sometimes worsening them.
Following the conclusions of Sect.3.4 and the 1-D tests, we used the smoothing factor of 0.08, which is a compromise between the scheme diffusivity and distortion reduction.As a result, some non-linearity exists also in the smoothed solution.The test showed that a simple increase in temporal resolution leads to an increase in the number of steps and related re-projections, which then worsen the representation of the bells -but improve the background field by reducing the dimension-split errors.A synchronous rise of the resolution in time and space with the same Courant number (columns in Fig. 13) showed better results for higher-resolving set-ups.
Further investigating the flat-field behaviour in complex wind patterns, the simulations with the constant-vmr initial conditions (Fig. 15) were performed, showing that the model has no major problem in keeping the homogeneous distribution: deviations do not exceed a few %, with no relation to topography.The existing ups and downs of the vmr are related to cyclones and atmospheric fronts, which challenge the dimension-splitting algorithm rather than the core 1-D advection (it transports the homogeneous field perfectly -no distortion was found after 10 5 steps regardless of the Courant number).Increasing the resolution leads to a lower "unmix" of the pattern (not shown).This experiment refines the "optimal Courant" recommendation of the L14 test, which had smoother wind fields and, consequently, a higher optimal Courant number.For real-life applications, especially with coarse grids, it may be necessary to choose a time step short enough to ensure comparable levels of time-and spacewise truncation errors (Pudykiewicz et al., 1985).This case also argues for developing the 2-D implementation of the Galperin scheme, which would eliminate the horizontal dimension split.

Where to use the smoother
When deciding whether to apply the smoother Eq. ( 20), one has to keep in mind that the Galperin scheme is always positively defined and does not need a shape-preserving filter to provide a "physically meaningful" solution, i.e. without negative values.It is free from this caveat.The purpose of the smoother is only to reduce the non-linear distortions of fields.
The smoother has both a positive and negative impact on the scheme performance.Among the positive ones are that (i) it damps the distortions of smooth shapes and gradients (Sect.3.4), (ii) it reduces the amplification factor, precluding it from exceeding 1 even for a few time steps (Sect.3.5), and that (iii) it reduces the unmixing problem (Fig. 14).Its negative features are that (i) the obtained solution is diffusive (Sect.3.4), (ii) moderate and high frequencies in the solution spectrum are damped (Sect.3.5), and that (iii) formal scores and convergence rates are lower in some tests (Sects.5.2 and 6.2).The smoother has little impact on background inhomogeneity.
Most of the positive and negative features coincide with the impact of shape-preserving filters (e.g.L14), despite the different idea and formulations.
Since the smoother computational cost is negligible, one can decide whether to apply it depending only on the problem at hand.Strict interconnections between the species, smooth patterns and tolerance to diffusion form a case for the smoother.Conversely, sharp plumes over zero background (e.g. the accidental release case) argue against it.
The smoother impact grows monotonically with its parameter ε.Numerous tests showed that the distortions and above 1 amplification factor essentially disappear at ε ∼ 0.08, where the diffusivity also becomes significant.This value appeared stable with regard to Courant number and set-up of the tests.

Efficiency of the Galperin advection scheme
Evaluation of the scheme efficiency is always very difficult as it strongly depends on the algorithm implementation, but also on computers, parallelisation, compiler options, etc.Nevertheless, basic characteristics of the scheme can be deduced from comparison of its original version with several classical schemes made by Galperin (2000).It included, in particular, EM72 and Bott, which appeared > 5 and > 3 times slower, respectively.Comparison with another implementation of the Bott routine by Petrova et al. (2008) showed a 7-15 times difference, depending on tests.The updated scheme version, however, is bound to be heavier.It is also worth putting it in line with modern approaches.
In this section, the efficiency of the updated Galperin scheme is evaluated from several points of view: (i) the scalability with regard to the number of transported species, spatial and temporal resolution, specific to the problem at hand, (ii) comparison with "standard implementation" of the Bott algorithm and the semi-Lagrangian scheme, and (iii) comparison of the run time in the L14 tests with the HEL and CSLAM schemes.

SILAM run time vs. number of species, temporal and spatial resolution
The scalability of the scheme and the whole SILAM model was tested in real-wind global simulations for an arbitrarily taken 3 days (15-17 May 2012).The reference run was set with 0.5 • resolution, six vertical layers, a time step of 30 min, and one aerosol species.Two types of emission were considered: an artificial 1 h long source filling up the whole 3-D domain, and the SILAM own wind-blown dust emission model, which created dust plumes from sandy areas of the Sahara.Vertical diffusion, which is coupled with vertical 1-D advection, was turned off for artificial source tests but turned on for dust sources in order to allow the model to quickly populate the upper layers of the domain.Then, the number of aerosol species, spatial and temporal resolutions were repeatedly doubled (one change at a time).
The model was run in a single-processor mode but compiled with O3 optimisation and OMP code pre-processing.Runs were made in a notebook with an Intel Core i7 processor and repeated in a workstation with an Intel Xeon E5.The scaling differed by 10-20 %, which was considered to be negligible.
The results (Fig. 16) highlight the scalability of the scheme and its implementation in SILAM.The species-unrelated time of horizontal 2-D advection (Fig. 16a, offset in regression line) is ∼ 30 % of a single-species computation time (represented via the slope).This "overhead" is, in fact, the transport-step integrals Eqs. ( 17)-( 19), which are computed only once and used for all species.Higher overhead of the vertical advection is due to the necessity to handle the uneven vertical layers, which makes its scaling just 20 % better than the 2-D horizontal one.It also has larger species-independent overhead.
With the chemical module turned off, advection constitutes ∼ 85 % of the total model run time.
Since the scheme operates with the source grid cells, it can check that M n i > 0 before going into computations, which gives a very substantial speed-up in the case of limitedvolume plumes (Fig. 16b).In the Saharan dust run, the horizontal advection time is about twice lower, whereas the vertical advection, even together with diffusion, becomes all but negligible, owing to efficient filtering of zero columns in comparison with lon or lat stripes.
A faster-than-proportional growth of the horizontal advection time with increasing spatial resolution (Fig. 16c, normalised run time) is a result of a growing Courant number: for a 4-times smaller grid cell (0.25 • lon-lat resolution), the time step of 30 min means C 1 over a large part of the domain.As a result, transport integrals Eqs. ( 17)-( 19) have to be analysed over longer paths.Still, the growth is much smaller than the cost of 4-fold reduction of the time step, which makes the high-C computations attractive.Vertical advection is not affected and its time is proportional to the number of columns to analyse.
The time spent by advection is practically proportional to the temporal resolution (Fig. 16c); that is, it follows the number of times the advection is computed in the run.

Comparison with efficiency of other schemes
Comparison with other schemes is arguably the most uncertain part of the exercise: the scheme efficiency is strongly dependent on the quality of the implementation (note the different results for the Bott scheme obtained by Galperin, 2000, andPetrova et al., 2008).To obtain reproducible results, we made this comparison against the "standard implementation" of the Bott code available from the Internet (http://www2.meteo.uni-bonn.de/forschung/gruppen/tgwww/people/abott/fortran/fortran_english.html,visited 28 September 2015).Since our code is also available, this comparison is reproducible.
The test with 10 4 time steps, 2000 grid points in a 1-D periodic grid, Courant number = 0.1, and one species took 0.92 s for the Galperin scheme (∼ 0.3 s for cell border advection, ∼ 0.6 s for slab reprojection) and 0.85 s for the Bott scheme.This confirms the expectation that the updates of the Galperin scheme from its initial version about tripled its run time, which is now similar to that of the Bott scheme.However, the Galperin scheme still scales better with the number of species: as shown in the previous section, only reprojection is multiplied by the number of species, whereas the Bott scheme does not have such a saving possibility.
The above numbers should be considered as indicative only since the environment for the tests was completely artificial: the schemes were used as a stand-alone code applied in 1-D space.The Galperin scheme needed only one moment instead of three, which would be the case for 3-D advection.Despite very limited extra computations, this would still raise the memory exchange.The Bott scheme was taken without a shape-preservation filter, which would be needed for any real-life applications.
The tests were also made for our own implementation of the semi-Lagrangian scheme (took ∼ 50 % longer than the above timing), but its efficiency was not carefully verified.
The L14 tests allowed rough benchmarking of the SILAM implementation of the scheme in 2-D tasks.In particular, the run with 0.75 • resolution and 120 time steps can be related to the performance of the HEL and CSLAM schemes, which were tested against the same test collection by Kaas et al. (2013).Extrapolating the charts of Fig. 13 of Kaas et al. (2013) to one species (the range given there is 2-20 species), the test takes about 190 s for HEL and 300 s for CSLAM, but only 47 s for SILAM; i.e., the difference was about 4 and 6 times, respectively.
Formal benchmarks of the computers, the main uncertainty in this comparison, are essentially the same: Kaas et al. (2013)   with a mobile Intel Core i5-540M Duo (Intel Linpack 18.5 GFlops).These CPUs were also compared in http://www.cpubenchmark.net(visited 8 October 2015), which also put them within 20 % of each other, albeit that the i5-540M was put forward.The memory bandwidth of our notebook, as always for compact computers, was modest: 7.2 GB s −1 (STREAM test, http://www.cs.virginia.edu/stream/ref.htmlaccessed 5 October 2015).We used a GNU compiler with -O3 optimisation without parallelisation, similar to Kaas et al. (2013).

Further boosting the scheme efficiency: parallelisation
In SILAM applications, advection is parallelised using the shared-memory OMP technology, whereas the MPI-based domain split is being developed.The OMP parallelisation is readily applicable along each dimension, thus exploiting the dimensional split of the advection scheme.For MPI, care should be taken to allow for a sufficient width of the buffer areas to handle the Courant > 1 cases.The original scheme was formulated for the bulk mass of all transported tracers, thus performing the advection step for all species at once: the tracer's mass in the slab definition Eq. ( 5) was the sum of masses of all species.This is much faster than the species-wise advection and reduces the number of the moments per dimension down to 1 regard-less of the number of tracers.It is also useful in the case of strong chemical interconnections between the species because the bulk advection keeps all pre-existing relations between the species.However, transport accuracy diminishes if the species have substantially different lifetimes in the atmosphere, are emitted from substantially different sources, or otherwise decorrelated in space.

Summary
The current paper presents the transport module of the System for Integrated modeLling of Atmospheric coMposition SILAM v.5, which is based on the improved advection routine of Michael Galperin combined with separate developments for vertical diffusion and dry deposition.
The cornerstone of the advection scheme is the subgrid information on distribution of masses inside the grid cells, which is generated at the emission calculation stage and maintained in a consistent way throughout the whole model, including chemical transformation, deposition, and transport itself.This information, albeit requiring substantial storage for handling, allows for accurate representation of transport.
The scheme is shown to be particularly efficient for point sources and sharp gradients of the concentration fields, still showing solid performance for smooth patterns.The most challenging task was found to be the puff-over-plain test, where the scheme showed noticeable distortions of the conwww.geosci-model-dev.net/8/3497/2015/Geosci.Model Dev., 8, 3497-3522, 2015 centration pattern.Application of a simple smoother efficiently reduces the problem at a cost of non-zero viscosity of the resulting scheme.
Advanced tests and comparison with state-of-the-art algorithms confirmed the compromise between the efficiency and accuracy.SILAM performance was fully comparable with the other algorithms, outperforming some of them.
Among the future developments, implementation of the scheme in 2-D space and replacement of the smoother with extensions of the core advection algorithm are probably the most pressing ones.

Figure 2 .
Figure 2. Shape preservation tests: (a) step, (b) triangle peak, (c) sine-shaped dip, and (d) sine-shaped peak.Sequential positions are shown: "r" denotes the scheme without a smoother, "r_diff" with it.The legend includes the number of times steps made.Wind is from left to right; Courant = 0.4.

Figure 3 .
Figure 3. Spectral analysis for 1-D.Panels (a) and (d): amplification factor (AF) and RMSE, respectively, for the Galperin scheme without a smoother; (b), (e): AF and RMS for the Galperin scheme with a large background; (c), (f): AF and RMSE for the Galperin scheme with smoother ε = 0.08.

Figure 4 .
Figure 4. Example of input and output spectra for broadband input to the advection schemes with zero and nonzero background levels.Left panels: exact and numerical solutions.Right panels: power spectrum densities initially and after one revolution.Top: B = 0; bottom: B = 1.0.

Figure 5 .
Figure 5. Linear-motion tests with a constant-release point source at X s and varying wind speed along the x axis.Upper panel: Courant number; lower panel: concentration (arbitrary unit).Wind blows from left to right.Without smoother.

Figure 7 .
Figure 7. Double-vortex rotation tests for a rectangular split between the vortices (upper panels); three single-cell peaks and two connected rectangles (middle panels); and sin-and cone-shaped surfaces (lower panels).A series of time steps is shown in the left panels, except for the low panel (shown t = 361).Right panels: error field after one full revolution (obs 10-fold more sensitive scale and relative L 2 norm given above each plot).Max Courant ∼ 1.5.Grid dimensions 400 × 200.Without smoother.

Figure 8 .
Figure 8.Initial shapes of the puffs for the 2-D global test on the sphere.

Figure 9 .
Figure 9. Half-period (t = T /2) shapes for the 2-D global test with slotted cylinders for different spatial and temporal resolutions.Without smoother.

Figure 10 .Figure 11 .
Figure 10.Final shapes (t = T ) for the 2-D global tests with slotted cylinders for different spatial and temporal resolutions.Without smoother.

Figure 12 .
Figure 12.Dependence of the performance metrics l1, l2, and l ∞ for the spherical 2-D tests with initial shapes of Fig. 8. Dashed straight lines mark the slope for the first and second order of convergence.Without smoother.

Figure 13 .
Figure 13.Mixing preservation test for cosine bells and correlated cosine bells Eq. (27) at t = T /2.Every two lines show the tests without (upper line) and with (lower line) a smoother (20).

Figure 14 .
Figure 14.A histogram of the mixing diagnostic (stacked) for the same resolutions, Courant number and smoother factor as in Fig. 13.Metrics are the following (see text and Lauritzen et al. (2012) for more details): l r is "real mixing", l u is "range-preserving unmixing", and l o is "overshooting".Values are relative to the reference CSLAM performance in L14 tests.The picture is comparable with panel (b) of Fig. 15 in L14.

Figure 15 .
Figure 15.Constant-vmr test with real-wind conditions after 122 h.(a) vmr within the boundary layer, (b) vmr above the tropopause, and (c) zone-average vertical cross section of vmr.Without smoother.

Figure 16 .
Figure 16.Scalability of the Galperin advection scheme and the SILAM model.Panel (a) Full-grid run time for different numbers of species, (b) sparse-plume run time for different numbers of species, (c) full-grid run time for varying horizontal grid resolutions, and (d) full-grid run time for varying time steps.
, respectively.Resolving the pole singularity is done by reserving a cylindrical reservoir over each pole.The radius of the reservoirs depends on the calculation grid resolution but is kept close to 2 • .The calculation grid reaches the borders of the reservoirs, whose mean concentrations are used for the lateral boundary conditions: • E line.For longitude, if a position of a slab part appears to be west of −180 • E or east of 180 • E, it is increased or decreased by www.geosci-model-dev.net/8/3497/2015/Geosci.Model Dev., 8, 3497-3522, 2015 360 • used an Intel Core2 Duo E6550 processor (Intel Linpack 20 GFlops, http://www.techpowerup.com,visited 8 October 2015).Our tests were run on a simple notebook