libmpdata++ 0.1: a library of parallel MPDATA solvers for systems of generalised transport equations

This paper accompanies first release of libmpdata++, a C++ library implementing the Multidimensional Positive-Definite Advection Transport Algorithm (MPDATA). The library offers basic numerical solvers for systems of generalised transport equations. The solvers are forward-in-time, conservative and non-linearly stable. The libmpdata++ library covers the basic second-order-accurate formulation of MPDATA, its third-order variant, the infinite-gauge option for variable-sign fields and a flux-corrected transport extension to guarantee non-oscillatory solutions. The library is equipped with a non-symmetric variational elliptic solver for implicit evaluation of pressure gradient terms. All solvers offer parallelisation through domain decomposition using shared-memory parallelisation. The paper describes the library programming interface, and serves as a user guide. Supported options are illustrated with benchmarks discussed in the MPDATA literature. Benchmark descriptions include code snippets as well as quantitative representations of simulation results. Examples of applications include: homogeneous transport in one, two and three dimensions in Cartesian and spherical domains; shallow-water system compared with analytical solution (originally derived for a 2D case); and a buoyant convection problem in an incompressible Boussinesq fluid with interfacial instability. All the examples are implemented out of the library tree. Regardless of the differences in the problem dimensionality, right-hand-side terms, boundary conditions and parallelisation approach, all the examples use the same unmodified library, which is a key goal of libmpdata++ design. The libmpdata++ library is implemented in C++, making use of the Blitz++ multi-dimensioanl array containers, and is released as free/libre and open-source software.

MPDATA stands for Multidimensional Positive-Definite Advection Transport Algorithm 1 . It is a finite-difference/finite-volume algorithm for solving the generalised transport equation (1) Equation (1) describes the advection of a scalar field ψ in a flow with velocity u. The field R on the right-hand-side (rhs) is a total of source/sink terms. The scalar field G can represent the fluid density, the Jacobian of coordinate transformation or their product, and satisfies the equation In the homogeneous case (R ≡ 0), MPDATA is at least second-order-accurate in space and time, conservative and non-linearly stable. The history of MPDATA spans three decades: Smolarkiewicz (1984) - Kühnlein et al. (2012), Smolarkiewicz et al. (2014) and is widely documented in the literature -see Smolarkiewicz and Margolin (1998), Smolarkiewicz (2006) and Prusa et al. (2008) for reviews. Notwithstanding, from the authors' experience the software engineering aspects still overshadow the benefits of MPDATA. To facilitate the use of MPDATA schemes, hereby we present a new implementation of the MPDATA family of algorithms -libmpdata++.
In the development of libmpdata++ we strive to comply with the best practices sought-after among the scientific community (Wilson et al., 2014); in particular, with the paradigm of maximising code reuse. This paradigm is embodied in the "open source computational libraries -the main foundation upon which academic and also a significant part of industrial computational research rests" (Bangerth and Heister, 2013).
The libmpdata++ has been developed in C++, 2 making extensive use of object-oriented programming (OOP) and template programming. The primary goals when designing libmpdata++ were to 1 In fact, MPDATA is sign-preserving, rather than merely positive-definite, but for historical reasons the name remains unchanged 2 In the C++11 revision of the language maintain strict separation of concerns and to reproduce within the code the mathematical "blackboard abstractions" used for documenting numerical algorithms. The adopted design contributes to the readability, maintainability and conciseness of the code. The current development of libmpdata++ is an extension of the research on OOP implementation of the basic MPDATA scheme presented in Arabas et al. (2014). The goal of this article is twofold: first, to document the library interface by providing usage examples; and second, to validate the correctness of the implementation by verifying the results against published benchmarks.
The structure of the paper is as follows. Section 2 outlines the library design. The four sections that follow correspond to four types of equation systems solved by the implemented algorithms, namely: homogeneous advective transport; inhomogeneous transport; transport with prognosed velocity; systems featuring elliptic pressure equation. Each of these sections outlines the implemented algorithms, describes the library interface and provides usage examples. Each example is accompanied with definition of the solved problem, description of the program code and discussion of the results.
The paper structure reflects the solver inheritance hierarchy in libmpdata++. All features discussed in preceding sections apply to the one that follow. The set of discussed problems was selected to match the tutorial structure of the paper. The presentation begins with simple examples focusing on the basic library interface. Subsequent examples use increasingly more complicated cases with the most complex reflecting potential for advanced applications.
The current version of libmpdata++ source code, including all examples presented herein, can be found at http://libmpdataxx.igf.fuw.edu.pl/.

Library design 2.1 Dependencies
The libmpdata++ package is a header-only C++ library. It is built upon the Blitz++ 3 array containers. We refer the reader to the Blitz++ documentation (Veldhuizen, 2006) for description of the Blitz++ interface, to which the user is exposed while working with libmpdata++. The libmp-data++ library also depends on several components 3 see http://sf.net/projects/blitz/ of the Boost 4 library collection, however these are used internally only. The library code requires a C++11-compliant compiler and has been tested to work with GNU g++ 5 and LLVM clang++ 6 .

Components
Components of the library are grouped as follows: • solvers: -mpdata intended for solving homogeneous transport problems, (section 3), -mpdata rhs extending the above with rhs term handling, (section 4), -mpdata rhs vip adding prognosedvelocity support, (section 5), -mpdata rhs vip prs further extending the above with elliptic pressure equation solvers, (section 6); • output handlers: -gnuplot offering direct communication with the gnuplot 7 program with no intermediate output files, -hdf5 offering basic HDF5 8 output compatible with netCDF 9 readers, -hdf5 xdmf implementing the eXtensible Data Model and Format 10 standard supported for instance by the Paraview 11 visualisation tool; • boundary conditions: -cyclic implementing periodic boundaries, -open giving zero-divergence condition on domain edges, -polar applicable with spherical coordinates; • concurrency handlers: -serial for single-thread operation, -openmp for multi-thread operation using OpenMP, -boost thread for multi-thread operation using Boost.Thread, -threads that defaults to openmp if supported by the compiler and falls back to boost thread otherwise. Performing integration with libmpdata++ requires choosing one of the solvers, one output handler, one 4 see http://boost.org/ 5 see http://gcc.gnu.org/ 6 see http://llvm.org/ 7 see http://gnuplot.info/ 8 see http://hdfgroup.org/HDF5/ 9 see http://www.unidata.ucar.edu/software/netcdf/ 10 see http://xdmf.org/ 11 see http://paraview.org/ boundary condition per each domain edge and one concurrency handler.
The inheritance diagram in Fig. 1 shows relationships between libmpdata++ solvers and the classes defined in the examples discussed in the paper. The mpdata solver is displayed at the top, as it is the base class for all other classes. The leftmost branch of the tree (solvers prefixed with mpdata ) depicts the inheritance relationships among the solvers defined within libmpdata++. The user-defined classes inherit from libmpdata++ solvers but are defined out of the library tree.

Computational domain and grid
The arrangement of the computational domain used in libmpdata++ is shown in Fig. 2. The initial condition for the dependent variable ψ is assumed to be known in nx × ny data points. The outermost data points are located at the boundaries of the domain.
The dual, staggered Arakawa-C grid (Arakawa and Lamb, 1977) used in libmpdata++ is shown in Fig. 3. In this spatial discretisation approach, the cell-mean values of the scalar fields ψ, and G reside in the centres of computational cells, -corresponding to the data points of the primary grid in Fig. 2 -whereas the components of the velocity field u are specified at the cell edges of the dual grid in Fig. 2.

Error and progress reporting
There are several error-handling mechanisms used within libmpdata++.
First, there are sanity checks within the code implemented using static assert() calls. These are reported during compilation, for instance when invalid values of compile-time parameters are supplied.
Second, there are available numerous run-time sanity checks, implemented using assert() calls. These are often time-consuming and are not intended to be executed in production runs. To disable them, one needs to compile the program using libmpdata++ with the -DNDEBUG compiler flag. Examples of such checks include detection of NaN values within the model state variables, which may be useful to trace origins of numerical instability problems.
Third, the user may chose to activate the Blitz++ debug mode that enables run-time array range checks. Activating Blitz++ debug mode requires   Finally, libmpdata++ reports run-time errors by throwing std::runtime error exceptions.
Simulation progress is communicated to the user by continuously updating the process threads' name with the percentage of work completed (can be observed e.g. by invoking top -H). Figure 3: A schematic of a 2D Arakawa-C grid. Bullets denote the cell centres and dashed lines denote the cell walls corresponding to the dual grid in Fig. 2.

Advective transport
The focus of this section is on the advection algorithm used within libmpdata++. Section 3.1 provides a short introduction to the implemented MPDATA scheme. Section 3.2 describes the library interface needed for the homogeneous transport cases. The following sections 3.3 -3.8 show examples of usage of libmpdata++ along with the references to other MPDATA benchmarks.

Implemented algorithms
This subsection is intended to provide the reader with an outline of selected MPDATA features that correspond to the options presently available in libmpdata++. For the full derivation of the scheme and its options see the reviews in Smolarkiewicz and Margolin (1998) and Smolarkiewicz (2006); whereas for an extended discussion of stability, positivity and convexity see Smolarkiewicz and Szmelter (2005).
In the present implementation, it is assumed that G is constant in time. Consequently, the governing homogeneous transport eq. (1) can be written as This particular form is solved by the mpdata solver of libmpdata++.
The following paragraphs will focus on the algorithms used for handling (3). The rules for applying source and sink terms are presented in section 4.

Basic MPDATA
MPDATA is an, at least, second-order-accurate iterative scheme in which all iterations take the form of a first-order-accurate donor-cell pass (alias upwind, upstream;cf. Press et al., 2007, sec. 20.1.3). For the one-dimensional 12 case, after the discretisation in space (subscripts i) and time (superscripts n), the donor-cell pass applied to eq. (3) yields The flux function F is defined as where [u] + ≡ max(u, 0) and [u] − ≡ min(u, 0). In the case of a time-varying velocity field, the velocity components are evaluated at an intermediate time level denoted by the n + 1 /2 superscript in eq. (4). Association of the velocity components with dual-cell edges is denoted by fractional indices i + 1 /2 and i − 1 /2, see Fig. 3.
Hereafter, Gu ∆t ∆x is written compactly as GC where C denotes the Courant number. GC is referred to as the advector, while the scalar field ψ as the advectee -the nomenclature adopted after Randall (2013).
Evaluation of eq. (4) concludes the first pass of MPDATA. To compensate for the implicit diffusion of the donor-cell pass, the subsequent passes of MPDATA reuse eq. (4) and (5), but with ψ replaced with the result of the preceding pass and u replaced with the "anti-diffusive" pseudo-velocity. The pseudo-velocity is analytically derived by expanding eq. (4) in the second-order Taylor series about spatial point i and time level n, and representing the leading, dissipative truncation error as an advective flux; see Smolarkiewicz (1984) for a derivation. A single corrective pass ensures secondorder accuracy in time and space. Subsequent corrective passes decrease the amplitude of the leading error, within second-order accuracy. The onedimensional formula for the basic antidiffusive advector is written as where k numbers MPDATA passes. For k=1, C k is the flow-velocity-based Courant number, whereas for k>1, C k is the pseudo-velocity-based Courant number. The number of corrective passes can be chosen within libmpdata++. The library features two implementations of the donor-cell algorithm defined by (4) and (5). The default one employs the compensated summation algorithm of Kahan (1965) which reduces round-off error arising when summing numbers of different magnitudes. The alternative, slightly less resourceintensive one, is a "straightforward" summation available as an option in libmpdata++.

Third-order-accurate variant
Accounting for third-order terms in the Taylor series expansion while deriving the pseudo-velocity improves the accuracy of MPDATA. When G ≡ 1, u = const and three or more corrective passes are applied, the procedure ensures third-order accuracy in time and space. The discretised formulae for the third-order scheme, derived analytically in Margolin and , can be found in Smolarkiewicz and Margolin (1998, Eq. 36).

Divergent-flow variant
In case of a divergent flow, the pseudo-velocity formulae are augmented with an additional term proportional to the flow divergence. This additional term is implemented in libmpdata++ following Smolarkiewicz and Margolin (1998, sec. 3.2(3)).

Non-oscillatory option
Solutions obtained with the basic MPDATA are sign-preserving, and thus non-oscillatory near zero. Generally however, they feature dispersive ripples characteristic of higher-order numerical schemes. These can be suppressed by limiting the pseudovelocities, in the spirit of flux-corrected transport. Application of the limiters reduces somewhat the accuracy of the scheme (Smolarkiewicz and Grabowski, 1990), yet this loss is generally outweighed by ensuring non-oscillatory (or ripple-free) solutions. Noteworthy, because MPDATA is built upon the donor-cell scheme characterised by small phase error, the non-oscillatory corrections have to deal with errors in signal amplitude only. The nonoscillatory option is a default option within the libmpdata++. For the derivation and further discussion of the multi-dimensional non-oscillatory option see Smolarkiewicz and Grabowski (1990).

Variable-sign scalar fields
The basic MPDATA formulation assumes that the advected field ψ is exclusively either nonnegative or non-positive. In particular, this assumption is evident in the ψ-fraction factor ψ k i+1 −ψ k i ψ k i+1 +ψ k i of eq. (6), which can become unbounded in case of variable-sign field. The libmpdata++ library includes implementations of two MPDATA options intended for simulating advection of variable-sign field.
The first method replaces ψ with |ψ| in all ψfraction factors that enter the pseudo-velocity expressions. This approach is robust but it reduces the solution quality where ψ crosses through zero; see paragraph 3.2(4) in Smolarkiewicz and Margolin (1998).
The default method, is the "infinite-gauge" variant of the algorithm, a generalised one-step Lax-Wendroff (linear, oscillatory) limit of MPDATA at infinite constant background, discussed in Smolarkiewicz (2006, sec. 4.2). In practice, the infinitegauge option of MPDATA is used with the nonoscillatory enhancement. Listing 3.1: Example definition of compile-time parameters structure.

Library interface
All solvers expect a structure with compile-time parameters as their first template parameter, as exemplified in List. 3.2.

Choosing library components
The library components listed in section 2.2 are chosen through template parameters. First, the solver is equipped with an output mechanism by passing the solver type as a template parameter to the output type, as exemplified in List. 3.3. The output classes inherit from solvers. using slv_out_t = output::gnuplot<slv_t>; Listing 3.3: Example alias declaration of an output mechanism.
Second, the concurrency handlers expect solver class (equipped with output) as the first template parameter. Subsequent template parameters control boundary condition types on each of the domain edges (see List. 3.4). Listing 3.4: Example alias declaration of a concurrency handler.

Run-time parameters
Run-time parameters include the grid size, number of MPDATA passes and output file name. The list of applicable run-time parameters is defined by fields of the rt params t structure. This structure is defined within each solver and extended when equipping the solver with an output mechanism. The concurrency handlers expect an instance of the run-time parameters structure as their constructor argument. Example code depicting how to set the run-time parameters and then instantiate a concurrency handler is presented in List. 3.5. typename slv_out_t::rt_params_t p; p.grid_size = { nx }; run_t run(p); Listing 3.5: Example run-time parameter structure declaration followed by a concurrency handler instantiation.

Public methods
The concurrency handlers act as controlling logic for the other components, and hence the user is exposed to the public interface of these handlers only.
Listing 3.6 contains signatures of methods implemented by each of the concurrency handlers.
The advectee() is an accessor method for the advected scalar fields. It can be used for setting the initial condition as well as for examining the solver state. It expects an index of the requested advectee as the argument (advected scalar fields are numbered from zero). This provides choice between different advected variables. The returned blitz::Array is zero-base indexed and has the same size as the computational grid (set with the grid size field of the run-time parameters structure, see List. 3.5).
of coordinate transformation, a fluid density field or their product). The argument selects the vector field components numbered from zero. The size of the returned array depends on the component. It equals the grid size in all but the selected dimension in which it is reduced by one (i.e. nx × (ny − 1) for the "y" component and so forth, cf. Fig. 3).
The g factor() is an accessor method for the G field. The returned array has the same size as the one returned by advectee(). The default value is set to G ≡ 1, (for details, see Ex. 3.8).
The advance() method launches the timestepping logic of the solver advancing the solution by the number of time steps given as argument. This method can be called multiple times -the solvers maintain all information needed to resume the integration.
The panic ptr() method returns a pointer to a Boolean variable that if set to true will cause the solver to stop the computations after the currently computed time step. This method may be used, for instance, to implement signal handling within programs using libmpdata++.
All multi-dimensional arrays used in libmp-data++ use the default Blitz++ "row-major" memory layout with the last dimension varying fastest. Domain decomposition for parallel computations is done over the first dimension only.

Example: "hello world"
The source code presented in this subsection is intended to serve as a minimal complete example on how to use libmpdata++. In other examples presented throughout the paper, only the fragments of code that differ significantly from the minimal example will be presented.
The example consists of an elemental transport problem for a one-dimensional, variable-sign field advected with a constant velocity. The simulation results using code in List. 3.7 are shown in Fig. 4. Spatial and temporal directions are depicted on the abscissa and ordinate, respectively. Cellmean values of the transported field are shown on the applicate and are presented in compliance with the assumption of data points representing grid-cell means of transported field.
The code in List. 3.7 begins with three include statements that reflect the choice of the library components: solver, concurrency handler and output mechanism. All compile-time parameters are grouped into a structure passed as a template parameter to the solver. Here, this structure is named ct params t and inherits from ct params default t what results in assigning default values to parameters not defined within the inheriting class. The solvers expect the structure to contain a type real t which controls the floating point format used. The two constants that do not have default values and need to be explicitly defined are n dims and n eqns. They control the dimensionality of the problem and the number of equations to be solved, respectively.
Choice between different solver types, output mechanisms and concurrency handlers is done via type alias declaration. Here, the basic mpdata solver is chosen which is then equipped with the gnuplot output mechanism. All output classes expect a solver class as their first template parameter, which is used to define the parent class (i.e., output classes inherit from solvers).
Classes representing concurrency handlers expect the output class and the boundary conditions as their template parameters. In the example, a basic serial handler is used and open boundary conditions on both ends of the domain are chosen.
The choice of run-time parameters is done by assigning values to the member fields of the rt params t structure defined within the solver # include <libmpdata++ / solvers / mpdata.hpp> # include <libmpdata++ / concurr / serial.hpp> # include <libmpdata++ / output / gnuplot.hpp> class and augmented with additional fields by the output class. In this example, the instance of rt params t structure is named p, the grid size is set to 101 points and the output is set to be done every 20 time steps. An instance of the rt params t structure is expected as the constructor parameter for concurrency handlers.
The grid step dx is set to 0.1 and the number of time steps to 100. Initial values of the Courant number and the transported scalar fields are set by assigning to the arrays returned by the advector() and advectee() methods. In this example, the Courant number equals 0.5 and the advected shape is described by the Witch of Agnesi formula y(x) = 8a 3 /(x 2 + 4a 2 ) with the coefficient a = 0.5. Initial shape is centred in the middle of computational domain and is shifted downwards by 0.5.
Finally, the actual integration is performed by calling the advance() method with the number of time steps as argument.

Example: advection scheme options
The following example is intended to present MPDATA advection scheme options described in subsection 3.1. The way of choosing different options is discussed, and the calling sequence of the library interface is shown for the case of advecting multiple scalar fields.
The example consists of transporting two boxcar signals with different MPDATA options. In all tests, the first signal extends from 2 to 4 and the second signal extends from -1 to 1, to observe the solution for fixed-sign and variable-sign signals.
Listing 3.8 shows the compile-time parameters structure fields common to all cases presented within this example. The number of dimensions is set to one and the number of equations to solve is set to two. Consistent with List. 3.7 from the "hello world" example, p shown in List. 3.9 is an instance of rt params t structure with run-time parameters of the simulation. Setting the outfreq field to the number of time steps results in plotting the initial condition and the final state. The outvars field contains a map with a structure containing variable name, here left empty, and unit defined for each of the advected scalar fields. Listing 3.10 shows how to set initial values to multiple scalar fields using the advectee() method with an integer argument specifying the index of equation in the solved system.

Variable-sign scalar fields
The libmpdata++ library is equipped with two options for handling variable-sign fields; recall the discussion in paragraph 3.1.5. The option using absolute values is named abs, whereas the "infinitegauge" option is dubbed iga. The option flags are defined in the opts namespace. The option choice is made by defining the opts field of the compiletime parameters structure, in analogy to n dims or n eqns.
In the first test, the choice of handling variablesign signal is set to abs, List. 3.11. Figure 5 shows the result of simulation with parameters set in List. 3.8, 3.9, 3.10 and 3.11. The final signal shows dispersive ripples characteristic of higherorder schemes. It is also evident that the ripple magnitude depends on the constant background, a manifestation of the scheme non-linearity. Furthermore, the final variable-sign signal features a bogus saddle point at the zero crossings (cf. paragraph 3.1.5), and this can be eliminated by using the infinite-gauge (alias iga) option. Listing 3.12 shows how to choose the iga option. Figure 6 shows the result of simulation with parameters set in List. 3.8, 3.9, 3.10 and 3.12. Although iga evinces more pronounced oscillations, their magnitude does not de-pend on the constant background. This, together with the robust behaviour of iga when crossing zero, substantiates the discussion of paragraph 3.1.5 on iga amounting to a linear limit of MPDATA. enum { opts = opts::abs }; Listing 3.11: Advection scheme options for Fig. 5, variable-sign option is set to absolute value. Listing 3.12: Advection scheme options for Fig. 6, variable-sign option is set to "infinite-gauge". x/dx Figure 6: As in Fig. 5 but with variable-sign option set to "infinite-gauge", cf. List. 3.12.

Third-order-accurate variant
Choosing third-order variant enhances the accuracy of the scheme when used with more than two passes of MPDATA or with iga; recall paragraph 3.1.2. Option tot enables the third-order variant of MPDATA scheme. Figure 7 shows result of the same test as in Fig. 5 and 6 but with MPDATA options set as in List. 3.13. The resulting signal is evidently more accurate and symmetric, but the oscillations are still present. enum { opts = opts::iga | opts::tot }; Listing 3.13: Advection scheme options for Fig. 7, variable-sign option is set to "infinite-gauge" and third-order accuracy variant is chosen.

Non-oscillatory option
To eliminate oscillations apparent in the preceding tests, the non-oscillatory (fct) option (paragraph 3.1.4) needs to be chosen. This option can be used together with all other MPDATA options, such as basic scheme, variable-sign signals (abs or iga) and the third-order-accurate variant (tot).
Here, fct is selected together with iga, cf. List. 3.14. This is the default setting; i.e., when inheriting from the default parameters structure, and not overriding the opts setting, as illustrated in List. 3.7. Figure 8 shows the corresponding results. The solutions for both fixed-sign and variable-sign signals have indistinguishable profiles and all of the dispersive ripples have been suppressed. enum { opts = opts::iga | opts::fct }; Listing 3.14: Advection scheme options for Fig. 8, variable-sign option is set to "infinite-gauge" and non-oscillatory option is enabled. This is the default setting in libmpdata++. Listing 3.15: Advection scheme options for Fig. 9, variable-sign option is set to "infinite-gauge", nonoscillatory option is enabled and third-order accuracy variant is chosen. To further enhance the accuracy of the solution, fct and iga can be combined with the tot variant; cf. List. 3.15. The corresponding result is shown in Fig. 9. Enabling the third-order-accurate variant improves the symmetry of the solution, as compared to the results presented in Fig. 8.

Example: convergence tests in 1D
In this subsection the convergence test originated in Smolarkiewicz and Grabowski (1990) is used to quantify the accuracy of various MPDATA options.
The test consists of a series of one-dimensional simulations with Courant numbers where ∆x m = 1 is the maximal increment. The series amounts to 152 simulations for each option. In each simulation, the number of time steps N T and the number of grid cells N X is adjusted so that the total time T and total length of the domain X remain constant. The domain size X = 44∆x m and simulation time T = 1 are selected. The advective velocity is set to u = ∆x m /T = 1.
In each simulation, a Gaussian profile is advected, and the result of the simulation is compared with the exact solution ψ ex . The initial profiles and the exact solutions are calculated by analytically integrating function (7) over the grid-cell extents, to comply with the inherent MPDATA assumption of a data point representing the grid-cell mean of transported field. The dispersion parameter of the initial profile (7) is set to σ = 1.5∆x m , while the profile is centred in the middle of the domain x 0 = 0.5X. As a measure of accuracy, a truncation-error function is introduced The results of the convergence test for the generic first-order-accurate donor-cell scheme, the basic MPDATA and its third-order-accurate variant are shown in Fig. 10a-10c. Each figure displays, in polar coordinates, the base-two logarithm of the truncation-error function (8) for the entire series of 152 simulations. The radius and angle, respectively, indicate changes in grid increment and Courant number. Thus, closer to the origin are simulation results for finer grids, closer to the abscissa are points for small Courant numbers, and closer to the ordinate are points with Courant numbers approaching unity. The contour interval of dashed isolines and of the colour map is set to 1, corresponding to error reduction by the factor of 2. Lines of constant grid-cell size and constant Courant number are overlaid with white contours. The figures contain information on the convergence rate of MPDATA options. When moving along the lines of constant Courant number towards the origin, thus increasing the spatial and temporal resolution, the number of crossed dashed isolines determines the order of the scheme, cf. section 8.1 in Margolin and Smolarkiewicz (1998). Therefore, the results in Fig. 10a-10c attest to the first-, second-and third-order asymptotic convergence rates, respectively. Furthermore, the shape of dashed isolines conveys the dependency of the solution accuracy on the Courant number. In particular, they show that at fixed spatial resolution the solution accuracy increases with the Courant number. Moreover, as the order of the convergence increases the isolines become more circular indicating more isotropic solution accuracy in the Courant number. Figure 10b reproduces the solution in Fig. 1 of Smolarkiewicz and Grabowski (1990) and, thus, verifies the libmpdata++ implementation. For further verification Fig. 11a and 11b show results of the convergence test for: i) three-pass MPDATA, (run-time solver parameter n iters = 3); and ii) for two-pass MPDATA with fct option. These results reproduce Fig. 2 and 3 from Smolarkiewicz and Grabowski (1990). Noteworthy, an interesting feature of Fig. 11a is the groove of the third-order convergence rate formed around φ = 45 • , characteristic of MPDATA with three or more passes . Next, comparing Fig. 11b with 10b shows that the price to be paid for an oscillation-free result is a reduction in the convergence rate (from 2 to ∼1.8, section 4 in Smolarkiewicz and Grabowski, 1990). Figures 11c and 11d document original results for the convergence test applied to the "infinite-gauge" limit of MPDATA. In particular, Fig. 11c shows that iga is as accurate as three-pass MPDATA, (cf. section 4 in Smolarkiewicz and Clark, 1986); whereas, Fig. 11d reveals that the third-orderaccurate iga is more anisotropic in Courant number than the third-order-accurate standard MPDATA in Fig. 10c.
The convergence test results for the default setting of libmpdata++ (iga plus fct) are not shown, because they resemble results from Fig. 11b with somewhat enhanced accuracy for well-resolved fields (i.e., small grid-cells). a b c Figure 10: The result of the convergence test. 10a for the donor-cell scheme, 10b for the basic MPDATA and 10c for the third-order-accurate variant. a b c d Figure 11: As in Fig. 10. 11a for three passes of MPDATA, 11b for two passes with non-oscillatory option, 11c for infinite-gauge option, and 11d for infinite-gauge with third-order-accurate variant.

Example: rotating cone in 2D
This example introduces libmpdata++ programming interface for two-dimensional simulations with the velocity field varying in space. Test results are compared with published MPDATA benchmarks. The example is based on the classical solid-body rotation test (Molenkamp, 1968). The current setup follows Smolarkiewicz and Margolin (1998). The initial condition features a cone centred around the point (x 0 , y 0 ) = (50∆x, 75∆y). The grid interval is ∆x = ∆y = 1, and the domain size is 100∆x × 100∆y -thus containing 101×101 data points, cf. Fig. 2. The height of the cone is set to 4, the radius to 15∆x, and the background level to 1. The flow velocity is specified as (u, v) = ω (y − y c , −(x − x c )), where angular velocity ω = 10 −1 and (x c , y c ) denotes coordinates of the domain centre. With time interval ∆t = 0.1, one full rotation requires 628 time steps. The total integration time corresponds to six full rotations.
Implementation of the set-up using the libm-pdata++ interface begins with definition of the compile-time parameters structure. The test features a single scalar field in a two-dimensional space, what is reflected in the values of n dims and n eqns set in List. 3.16. In one of the test runs, the number of MPDATA passes (n iters) is set to 3, instead of the default value of 2. Corresponding field of run-time parameters structure is shown in List. 3.17. During instantiation of the concurrency handler, four boundary-condition settings (two per each dimension) are passed as template arguments. Listing 3.18: Concurrency handler instantiation for the rotating-cone test.
The way the initial condition and the velocity field are set is shown in List. 3.19. The Courant number components are specified using calls to the advector() method with the argument defining the component index.
The initial condition is displayed in Fig. 12a, and the results after total integration time are shown in Fig. 12b-12d cone are plotted with 0.25 interval. The results in Figs. 12b and 12c were obtained with the fct and the three-pass tot+fct MPDATA, respectively. They match those presented by Smolarkiewicz and Margolin (1998, Fig. 1 therein) and Smolarkiewicz and Szmelter (2005, Fig. 4 therein). Figure 12d shows test result for the default setting of libmp-data++.

Example: revolving sphere in 3D
This example extends Ex. 3.6 to three spatial dimensions. It exemplifies how to specify a threedimensional set-up using libmpdata++. Furthermore, the option is described for saving the simulation results to HDF5 files with XDMF annotations.
Specifying the 3D setup with the libmpdata++ programming interface calls starts by setting the n dims field to 3, List. 3.20. Listing 3.21 shows the choice of recommended three dimensional output handler hdf5 xdmf. This results in output consisting of HDF5 files with XDMF annotation that can be viewed, for example, with the Paraview visualisation software. This output is saved in a directory specified by the outdir field of the run-time parameters, see List. 3.22.
p.outdir = "rotating_sphere_3d"; Listing 3.22: Run-time parameters field specifying output directory for the revolving-sphere test. Figure 13a shows the initial condition, Fig. 13b  and 13c show the results after five revolutions for the four-pass MPDATA without and with tot. Only a portion of the computational domain centred at the sphere is shown. The black line crossing the XY plane is the axis of rotation. The grey volume is composed of dual-grid cells (section 2.3) encompassing data points with cell-mean values of density greater or equal 0.5. a b c Figure 13: The results of Ex. 3.7; only a part of the domain is shown. 13a shows initial condition, 13b results for the four-pass MPDATA, 13c results for third-order-accurate variant with four passes.
The solution in Fig. 13b is deformed, but this deformation is significantly reduced when the thirdorder-accurate variant is set, Fig. 13c. Obtained results can be compared with those presented by Smolarkiewicz (1984, Fig. 13-16).

Example: 2D advection on a sphere
This subsection concludes homogeneous transport examples with a 2D solid-body rotation test on a spherical surface (Williamson and Rasch, 1989). The purpose of this example is to present methods for setting up the simulations in spherical coordinates. 13 Following Smolarkiewicz and Rasch (1991) only the case when the initial field rotates over the poles is presented. The initial condition is a cone centred around the point (3π/2, 0) with height and radius equal to 1 and 7π/64, respectively. The wind field is given by where λ and φ denote respectively longitude and latitude, and U = π/128. The computational domain [0, 2π] × [−π/2, π/2] is resolved with 128 × 64 grid increments ∆λ = ∆φ and is shifted by 0.5∆φ so that there are no data points on the poles. The test is run for 5120 time-steps corresponding to one revolution around the globe. The advection equation in spherical coordinates has the form of the generalised transport eq. (1) with the Jacobian of coordinate transformation In order to solve the generalised transport equation with G ≡ 1 the nug option has to be set, see List. 3.23.
run.g_factor() = dlmb * dphi * blitz::cos(dphi * (j + 0.5) -pi / 2); The initial condition for the test is plotted in Fig. 14a, whereas the results are displayed in Fig. 14b and 14c. All figures use orthographic projection, with the perspective centred at the initial condition (the true solution), with the contour interval 0.1. Figure 14b shows the result for the default libmpdata++ options. There is a visible deformation in the direction of motion, consistent with earlier Cartesian rotational tests. The result in Fig. 14c, obtained using three passes of MPDATA with fct and tot, shows reduced deformation and reproduces Fig. 6 in Smolarkiewicz and Rasch (1991).

Inhomogeneous advective transport 4.1 Implemented algorithms
As of the current release, libmpdata++ provides three ways of handling source terms in the inhomogeneous extension of eq. (3) The available time integration schemes include: the two variants of the first-order-accurate Eulerforward scheme (hereafter referred to as euler a and euler b); and the second-order-accurate Crank-Nicolson scheme (trapez). The Euler schemes are implemented to account for parameterised forcings (e.g., due to cloud microphysics), whereas the Crank-Nicolson scheme is standard for basic dynamics (e.g., pressure gradient, Coriolis and buoyancy forces). In both Euler schemes, while calculating the solver state at the time level n+1, the right-hand-side at the time level n is only needed. In the euler a option (eq. 13), the source terms are computed and applied standardly after the advection ψ n+1 = ADV (ψ n ) + ∆tR n .
In the euler b option (eq. 14), the source terms are computed and applied (arguably in the Lagrangian spirit; section 3.2 in Smolarkiewicz and Szmelter, 2009) before the advection ψ n+1 = ADV (ψ n + ∆tR n ) .
In the trapez option (eq. 15), half of the sources terms are computed and applied as in the euler a and half as in the euler b (arguably in the spirit of the Lagrangian trapezoidal rule; section 2.2 in Smolarkiewicz and Szmelter, 2009) ψ n+1 = ADV (ψ n + 0.5∆tR n ) + 0.5∆tR n+1 . (15)

Library interface
The logic for handling source terms is implemented in the mpdata rhs solver that inherits from the mpdata class, Fig. 1. Consequently, all options discussed in the preceding section apply. The choice of the source-term integration scheme is controlled by the rhs scheme compile-time parameter with the valid values of euler a, euler b or trapez.
The user is expected to provide information on the source terms by defining a derived class of mpdata rhs with the update rhs() method overloaded. The update rhs() signature is given in List. 4.1, whereas the usage example is given in subsection 4.3. The method is called by the solver with the following arguments: • a vector of arrays rhs storing the source terms for each equation of the integrated system, • a floating-point value dt with the time-step value, • an integer number at indicating if the source terms are to be computed at time level n (if at=0) or n+1 (if at=1).
Calculation of forcings at the n+1 time level is needed if rhs scheme=trapez option is chosen. The case of at equal zero is used in the Euler schemes and in the very first time step when using the trapez option (i.e., once per simulation). When the trapez option is used, the dt passed to the update rhs method equals half of the original time-step.
The update rhs method is expected to first call parent t::update rhs() to zero out the source and sink terms stored in rhs. Later, it is expected to calculate the rhs terms in a given time-step by summing all sources and sinks and "augment assign" them to the rhs field (e.g., using the += operator).
All elements of the rhs vector corresponding to subsequent equations in the system are expected to be modified in a single update rhs() call.

Example: translating oscillator
The purpose of this example is to show how to include rhs terms in libmpdata++, by creating a user-defined class out of the library tree.
A system of two one-dimensional advection equa- represents a harmonic oscillator translating with u o = const.; see section 4.1 in Smolarkiewicz (2006) for a discussion. 14 Applying the trapezoidal rule to integrate the PDE system (16) leads to following system of coupled implicit algebraic equations where ψ * i and φ * i stand for Substituting in (17) ψ n+1 i with φ n+1 i and vice versa and then regrouping leads to: Implementation of forcing terms prescribed in eq. (20) is presented in List. 4.2. A new solver coupled harmosc is defined by inheriting from the mpdata rhs class. A member field omega is defined to store the frequency of oscillations.
The rhs terms are defined for both variables, ix::psi and ix::phi within the update rhs() method. The method implements both implicit and explicit formulae, the two cases are switched by the at argument. Defining forcings for both n and n+1 cases allows to use this class with both euler and trapez options. The current state of the model is obtained via a call to the state() method. Note how the formulae defined in update rhs() in case(1) loosely resemble the mathematical notation presented in eq. (20). The 0.5 is absent because the ∆t passed as argument in trapez option is already divided by 2.
In the present example, the initial condition for ψ is defined as ψ(x) = 0.5[1 + cos(2πx/100)] for x ∈ (50, 150) and zero elsewhere. The initial condition for φ is set to zero.
The result of 1400 s of simulated time are shown in Fig. 15. Note that the solutions for both ψ and φ remain in phase and feature no substantial amplitude error. This contrasts with calculations using Euler-forward schemes (not shown). 5 Transport with prognosed velocity

Implemented algorithms
Whenever the velocity field changes in time, the second-order accuracy of the solution at n+1 requires estimate of the advector at n+1/2. This is provided by linear extrapolations from n and n-1 values . Furthermore, when the velocity is a dependent variable of the model, eq. (12) embodies equations of motion. Then the velocity (or momentum) components are treated as advected scalars (i.e. advectees) and are predicted at the centres of the dual-grid cells, Fig. 3. The advector field is then interpolated linearly to the centres of the cell walls.

Library interface
The algorithms for interpolating in space and extrapolating in time the advector field from the model variables are defined in the mpdata rhs vip class and all user-created solvers with time-varying velocity must inherit from this class.
The transported fields may represent either velocity or momenta. In the latter case the prognosed velocity components are calculated as ratios of two advectee fields (e.g. momentum components and density). The index of the advectee that forms the common denominator for all velocity components should be assigned to vip den. The vip i, vip j and vip k store the index of the advected fields appearing in the numerators for each velocity component. These velocity components are then used to calculate the advector field. In case when the velocity components are model variables (as in the example of section 6.3), the common denominator is redundant and value -1 should be assigned to vip den.
For systems where numerators and denominators can uniformly approach zeros, the vip eps value is defined to prevent divisions by zero. Then, if the denominator at a given grid-point is less than the vip eps, the resulting advector is set to zero therein. The default value of vip eps depends on the precision chosen for the simulation. Namely, it is set to be the smallest number that added to 1 produces a result that is not equal 1.
The vip i, vip j, vip k and vip den are expected to be members of the compile-time parameters structure ct params t of the mpdata rhs vip class. The vip eps value is a run-time parameter.
As of the current release, the prognosed-velocity features of libmpdata++ are implemented for constant G ≡ 1 only.

Example: 1D shallow-water system
The aim of this example is to show how to define simulations with prognosed velocity field. The necessary compile-time and run-time parameters as well as the user-defined class with source and sink terms are discussed. The obtained results are compared with the analytical solution and a published MPDATA benchmark.
The idealised system of 1D inviscid shallowwater equations is considered, with both the surface friction and background rotation neglected. The simulated physical scenario is a slab-symmetric parabolic drop spreading under gravity; see Frei (1993) for a general context and Schär and Smolarkiewicz (1996) for bespoke analytical solutions. The corresponding governing equations take the dimensionless form where h is a normalised depth of the fluid layer and u is a normalised velocity. Following Schär and Smith (1993) The time-step is set to 0.01 and the grid spacing is set to 0.05. The crux of the test is a synchronous solution for the depth and momentum near the drop edge that accurately diagnoses the velocity. The definition of the rhs terms for Ex. 5.3 is presented in List. 5.1. Only the method for calculating the forcing terms is shown; for the full out-of-thelibrary-tree definition of source-terms see List. 4.2. As in the List. 4.2, the definition in List. 5.1 attempts to follow the mathematical notation. Because of the use of the grad function, the nabla namespace is included. Listing 5.2: Compile-time parameters for Ex. 5.3.
Listing 5.2 specifies the compile-time parameters structure. Because fluid flow in this example is divergent the opts::dfl correction is enabled, cf. sec. 3.1.3. The rhs scheme is set to trapez. 15 Within the ix structure, the equation indices are assigned. Furthermore, the recipe for calculating the velocity is defined by assigning the indices to vip i and vip den. Lack of the rhs terms is specified by toggling n-th bit of the hints norhs field, where n is the index of the homogeneous equation. This prevents the unnecessary summation of zeros.
Listing 5.3 shows the run-time parameters structure. The value of gravitational acceleration p.g is set to 1 to follow the dimensionless notation of (21), and the vip eps is set arbitrarily to 10 −8 . The results of the test are plotted in Fig. 16. Figure 16a shows the initial condition (black) and the analytical solution for t=3 (blue). Solid lines mark the fluid depth and the dashed line the velocity. The remaining two panels show numerical results 16 at t=3 for different MPDATA options (red) plotted over the top panel. Figure 16b shows the solution with options abs and fct, whereas Fig. 16c shows the solution obtained with options iga and fct.
All presented results are free of apparent artefacts near the drop edge. The abs+fct in the central panel compares well with Fig. 7b in Schär and Smolarkiewicz (1996); whereas, the iga+fct solution in the bottom panel closely reproduces the analytical result.
15 Because the equation for h is homogeneous, the momentum forcing at n+1 time level can be readily evaluated after advecting h.
16 Similar to advector field evaluation discussed in Sec. 5.2 the vip eps value was used as cutoff value to prevent divisions by zero when calculating velocity field

Example: 2D shallow-water system
The 2D shallow-water test discussed here is an original axis-symmetric extension of the 1D slabsymmetric test in Ex. 5.3. The corresponding dimensionless equations take the form As in 1D case, h stands for the fluid height and (u, v) are the velocity components in x and y directions, respectively. Again, the initial condition consists of a parabolic drop centred at the origin and confined to x 2 + y 2 ≤ 1, h(x, y, t = 0) = 1 − x 2 − y 2 , f or x 2 + y 2 ≤ 1 0, f or x 2 + y 2 > 1 .
(24) Following the method presented by Frei (1993) and Schär and Smolarkiewicz (1996) the analytical solution of the system (23) can be obtained as Here λ(t) is half-width of the drop, evolving according to λ(t) = 2t 2 + 1 and λ t = ∂λ/∂t is the velocity of the leading edge. The libmpdata++ library includes an implicit representation of pressure gradient terms for incompressible fluid equations. This necessitates the solution of an elliptic Poisson problem for pressure. The elliptic problem is solved after applying all explicitly known forcings to ensure a non-divergent velocity field at the end of each time step. As of the current release, the library is equipped with the minimal-and conjugate-residual variational iterative solvers. For the derivation of used schemes and further discussion of the elliptic problem see Smolarkiewicz and Margolin (1994), Smolarkiewicz and Szmelter (2011) and references therein.

Library interface
The methods for solving the elliptic problem are implemented in the mpdata rhs vip prs class, Fig. 1.
This class inherits from the mpdata rhs vip class. Therefore the way to specify other source terms as well as time-varying velocity field remains unchanged.
The choice of elliptic solver is controlled by setting the compile-time parameter prs scheme to mr and cr for the minimal-residual and conjugateresidual solver, respectively. The iterations within the elliptic solver stop when the divergence of the velocity field is lower than a threshold tolerance set by a run-time parameter prs tol, cf (Smolarkiewicz et al., 1997).

Example: Boussinesq convection
The goal of this example is to show the user interface for simulations featuring elliptic pressure equation. The governing PDE system consists of momentum, potential temperature, and masscontinuity equations for an ideal, 2D, incompressible Boussinesq fluid ∂ t θ + ∇ · ( vθ) = 0 , Here, v = (u, w) denotes the velocity field, π is the pressure perturbation about the hydrostatic reference state normalised by the reference density ρ o , constant in the Boussinesq model. Furthermore, θ represents the potential temperature perturbation about the reference state θ o = const, and ⊗ denotes the tensor product. Combining the velocity prediction from the momentum equation, according to eq. (15), with the mass continuity eq. (29) leads to the elliptic Poisson problem where v is the velocity field after the advection summed with all the explicitly known source terms at time level n+1, namely buoyancy in this example. 17 In eq. (30) the pressure perturbation field π is unknown, and it needs to be adjusted such that the final velocity field v − 0.5∆t∇π satisfies the mass continuity equation (29). Denoting 0.5∆tπ as φ allows to symbolise eq. (30) using standard notation for linear sparse problems, (Smolarkiewicz and Margolin, 1994) L(φ) − R = 0 .
The setup of the test follows Smolarkiewicz and Pudykiewicz (1992). It consists of a circular potential temperature anomaly of radius 250 m, embedded in a neutrally stratified quiescent environment, with θ o = 300K, in the domain resolved with 200 × 200 grid-cells of the size dx=dy=10m. The initial anomaly θ = 0.5K is centred in the horizontal, 260 metres above the lower boundary. The time-step is set to ∆t = 0.75 s and the simulation takes 800 time-steps.
Listing 6.1 shows the compile-time parameters structure. The time integration scheme for the buoyancy forcing is set to trapez, as the user has a choice of the algorithm. However, as of the current release, the elliptic problem formulation requires forcings to be independent of velocity if handled using the trapez scheme. The implicit pressure gradient terms are always integrated with the trapezoidal rule (15), regardless of the rhs scheme setting. In List. 6.1 the elliptic solver option is set to the conjugate-residual scheme cr. The vip den is set to -1, because here the velocity components are the model kinematic variables, cf. the discussion in second paragraph of section 5.2. Listing 6.2: Run-time parameter field setting the accuracy of the pressure solver.

Remarks
In this paper the first release of libmpdata++ was introduced. Versatility of the user interface as well as the correctness of the implementation were illustrated with a series of examples with increasing degree of physical, mathematical and programming complexity. Starting from elementary advection in Cartesian domain, through passive advection on the sphere, through slab-and axis-symmetric water drop spreading under gravity, to buoyant convection in an incompressible Boussinesq fluid, the accompanying discussions included code snippets, description of the user interface and comparison with previously published benchmarks.
Our priority in the development of libmpdata++ is the researcher productivity. In case of scientific software such as libmpdata++, the researchers are both users and developers of the library. The adherence to the principle of separation of concerns and employment of programming techniques that promote code conciseness -e.g. the current release consists of less than 10k lines of code -contribute to the developers productivity. The user productivity is amplified by ensuring that the release of the library is accompanied with example-rich documentation. Both the users and developers benefit from the free/libre open-source software release of the library.
Work is under way on several new features for the subsequent release of libmpdata++, including distributed-memory parallelisation.