libcloudph++ 0.2: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++

This paper introduces a library of algorithms for representing cloud microphysics in numerical models. The library is written in C++, hence the name libcloudph++. In the current release, the library covers three warmrain schemes: the singleand double-moment bulk schemes, and the particle-based scheme with Monte-Carlo coalescence. The three schemes are intended for modelling frameworks of different dimensionality and complexity ranging from parcel models to multidimensional cloud-resolving (e.g. large-eddy) simulations. A two-dimensional prescribedflow framework is used in example simulations presented in the paper with the aim of highlighting the library features. The libcloudph++ and all its mandatory dependencies are free and open-source software. The Boost.units library is used for zero-overhead dimensional analysis of the code at compile time. The particle-based scheme is implemented using the Thrust library that allows to leverage the power of graphics processing units (GPU), retaining the possibility to compile the unchanged code for execution on single or multiple standard processors (CPUs). The paper includes complete description of the programming interface (API) of the library and a performance analysis including comparison of GPU and CPU setups.


Introduction
Representation of cloud processes in numerical models is crucial for weather and climate prediction. Taking climate modelling as an example, one may learn that numerous distinct modelling systems are designed in similar ways, sharing not only the concepts but also the implementations of some of their components (Pennell and Reichler, 2010). This creates a perfect opportunity for code reuse which is one of the key "best practices" for scientific computing (Wilson et al., 2014, sec. 6). The reality, however, is that the code to be shared is often "transplanted" from one model to another (Easterbrook and Johns, 2009, sec. 4.6) rather than reused in a way enabling the users to benefit from ongoing development and updates of the shared code. From the authors' experience, this practise is not uncommon in development of limited-area models as well (yet, such software-engineering issues are rarely the subject of discussion in literature). As a consequence, there exist multiple implementations of the same algorithms but it is difficult to dissect and attribute the differences among them. Avoiding "transplants" in the code is not easy, as numerous software projects in atmospheric modelling feature monolithic design that hampers code reuse.
This brings us to the conclusion that there is a potential demand for a library-type cloudmicrophysics software package that could be readily reused and that would enable its users to easily benefit from developments of other researchers (by gaining access to enhancements, corrections, or entirely new schemes). Library approach would not only facilitate collaboration, but also reduce development time and maintenance effort by imposing separation of cloud microphysics logic from other source code components such as model dynamical core or parallelisation logic. Such strict separation is also a prerequisite for genuine software testing.
The motivation behind the development of the libclouph++ library introduced herein is twofold. First, we intend to exemplify the possibilities of library-based code reuse in the context of cloud modelling. Second, in the long run, we intend to offer the community a range of tools applicable for research on some of the key topics in atmospheric science such as the interactions between aerosol, clouds and precipitation -phenomena that still pose significant challenges for the existing tools and methodologies (Stevens and Feingold, 2009).
The library is being designed with the aim of creating a collection of algorithms to be used within models of different dimensionality, different dynamical cores, different parallelisation strategies, and in principle models written in different programming languages. Presented library is written in C++, a choice motivated by the availability of highperformance object-oriented libraries and the builtin "template" mechanism. C++ templates allow the implemented algorithms not to be bound to a single data type, single array dimensionality or single hardware type (e.g. CPU/GPU choice). The library code and documentation are released as free (meaning both gratis & libre) and open-source software -a prerequisite for use in auditable and reproducible research (Morin et al., 2012;Ince et al., 2012).
Openness, together with code brevity and documentation, are also crucial for enabling the users not to treat the library as a "black box". While the aim of creating a self-contained package with welldefined interface is black-box approach compatible, the authors encourage users to inspect and test the code.
Modelling of atmospheric clouds and precipitation implies employment of computational techniques for particle-laden flows. These are divided into Eulerian and Lagrangian approaches (see e.g. Crowe et al., 2012, Chapter 8). In the Eulerian approach, the cloud and precipitation properties are assumed to be continuous in space, like those of a fluid. In the Lagrangian approach, the socalled computational particles are tracked through the model domain. Information associated with those particles travels along their trajectories. The local properties of a given volume can be diagnosed taking into account the properties of particles contained within it. The Eulerian approach is well suited for modelling transport of gaseous species in the atmosphere and hence is the most common choice for modelling atmospheric flows in general. This is why most cloud microphysics models are build using the Eulerian concept (Straka, 2009, e.g chapter 9.1). However, it is the Lagrangian approach that is particularly well suited for dilute flows such as those of cloud droplets and rain drops in the atmosphere.
In the current release, libcloudph++ is equipped with implementations of three distinct models of cloud microphysics. All three belong to the socalled warm-rain class of schemes, meaning they cover representation of processes leading to formation of rain but they do not cover representation of the ice phase (snow, hail, graupel, etc.). The socalled single-moment bulk and double-moment bulk schemes described in sections 3 and 4 belong to the Eulerian class of methods. In section 5, a coupled Eulerian-Lagrangian particle-based scheme is presented. In the particle-based scheme, Lagrangian tracking is used to represent the dispersed phase (atmospheric aerosol, cloud droplets, rain drops), while the continuous phase (moisture, heat) is represented with the Eulerian approach. Description of each of the three schemes is aimed at providing a complete set of information needed to use it, and includes: • discussion of key assumptions, • formulation of the scheme, • definition of the Application Programming Interface (API) • overview of the implementation, • example results.
The particle-based scheme, being a novel approach to modelling clouds and precipitation, is discussed in more detail than the bulk schemes. Sections covering descriptions of the APIs include C++ code listings of all data structure definitions and function signatures needed to use the library. In those sections, C++ nomenclature is used without introduction (for reference, see Brokken, 2013, that includes C++11 used in the presented code).
Sections covering scheme formulation feature cloud-modelling nomenclature (see appendix A for a brief introduction and further reading). In general, it is our approach not to repeat in the text the referenced formulae readily available in recent papers, but only to include equations that are specific to the presented formulation and its implementation.
Before introducing the three implemented schemes in sections 3-5, formulation of an example modelling context is presented in section 2.1. Section 6 presents a performance evaluation of all three schemes. Section 7 provides a summary of the key features of libcloudph++ and outlines the development plans for the next releases.
Appendix A contains an outline of governing equations for moist atmospheric flow. Appendix B contains a list of symbols used throughout the text. Appendix C covers description of a program called icicle that depends on libcloudph++ and is used to perform the example 2D simulations presented throughout the text.

Modelling context example
Being a library, libcloudph++ does not constitute a complete modelling system. It is a set of reusable software components that need to be coupled at least with a dynamical core responsible for representing air motion. In this section we describe an example context in which the library may be used. The three following subsections cover description of a modelling framework, a set-up including initial conditions, and a conceptual numerical solver. Example results obtained with these simulation components are presented alongside the microphysics schemes in sections 3, 4 and 5.

2D kinematic framework
The formulation is inspired by the 2D kinematic framework described in Szumowski et al. (1998); Morrison and Grabowski (2007); Rasinski et al. (2011). A simple 2D kinematic framework mimicking air motion in a cloud allows (and limits) one to study cloud microphysical processes decoupled from cloud dynamics. In fact, the differences between simulations when feedback on the dynamics is taken out can lead to better understanding of the role of flow dynamics (e.g. Slawinska et al., 2009). Such approach results in a computationally cheap yet still insightful set-up of potential use in: (i) development and testing of cloud-processes parameterisations for larger scale models; (ii) studying such processes as cloud processing of aerosols; and (iii) developing remote-sensing retrieval procedures involving detailed treatment of cloud microphysics.
The primary constituting assumption is the stationarity of the dry-air density (here, a vertical profile ρ d (z) is used) which allows to prescribe the 2D velocity field using a streamfunction: where ψ = ψ(x, z; t) is the streamfunction and u and w denote horizontal and vertical components of the velocity field u. As a side note, one may notice that the stationarity of the dry-air density field together with phasechange-related variations in time of temperature and water vapour mixing ratio imply time variations of the pressure profile. The deviations from the initial (hydrostatic) profile are insignificant.

8 th ICMW VOCALS set-up
Sample simulations presented in the following sections are based on a modelling set-up designed for the 8 th International Cloud Modelling Workshop (ICMW Muhlbauer et al., 2013, case 1). It was designed as a simplest scenario applicable for benchmarking model capabilities for research on aerosol processing by clouds. The cloud depth and aerosol characteristics are chosen to allow precipitation to develop over time and to mimic a drizzling stratocumulus cloud. The set-up uses a kinematic framework of the type defined in the preceding subsection. The definition of ψ(x, z) is the same as in Rasinski et al. (2011, Eq. 2): with w max = 0.6 m s -1 , domain width X = 1.5 km and domain height Z = 1.5 km. The resulting velocity field (depicted in Figure 1) mimics an eddy spanning the whole domain, and thus covering an updraught and a downdraught region. The domain is periodic in horizontal. To maintain flow incompressibility up to round-off error, velocity components (cf. eq. 1) are derived from (2) using numerical differentiation formulae for a given grid type (Arakawa-C grid is used in the examples presented in the paper). The initial profiles of liquid-water potential temperature θ l and the total water mixing ratio r t are defined as constant with altitude (θ l = 289 K; r t = 7.5 g kg -1 ). The initial air-density profile corresponds to hydrostatic equilibrium with a pressure of 1015 hPa at the bottom of the domain. This results in supersaturation in the upper part of the domain, where a cloud deck is formed in the simulations.
The domain is assumed to contain aerosol particles. Their dry size spectrum is a bi-modal lognormal distribution: with the following parameters (values close to those measured in the VOCALS campaign Allen et al., 2011,  where σ 1,2 is the geometric standard deviation, d 1,2 = 2 · r 1,2 is the mode diameter and N 1,2 is the particle concentration at standard conditions (T=20 • C and p=1013.25 hPa). This corresponds to a vertical gradient of concentration in the actual conditions of the model set-up due to air density changing with height, and a gradual shift towards larger sizes of wet particle spectrum (due to relative humidity changing with height). Both modes of the distribution are assumed to be composed of ammonium sulphate. For models that include a description of the cloud droplet size spectrum, the initial data for the droplet concentration and size are to be obtained by initialising the simulation with a two-hour-long spin-up period. During the spin-up, precipitation formation and cloud drop sedimentation are switched off. The spin-up period is intended to adjust an initial cloud droplet size spectrum (not specified by the setup) to an equilibrium state matching the formulation of cloud microphysics with the prescribed flow.
One may chose to initialise the model with θ = θ l and r v = r t , and no condensed water (as it was done in the examples presented in this paper). This simplifies initialisation, but results in an unrealistic initial supersaturation that may be an issue for a given microphysics scheme. One may chose to impose a limit on the supersaturation, say 5% (RH=1.05), when activating cloud drops during the spin-up.
To maintain steady mean temperature and moisture profiles (i.e. to compensate for gradual water loss due to precipitation and warming of the boundary layer due to latent heating), mean temperature and moisture profiles are relaxed to the initial profile. The temperature and moisture equations include an additional source term in the form −(φ 0 − < φ >)/τ , where φ 0 , < φ > and τ are the initial profile, the horizontal mean of φ at a given height and the relaxation time scale, respectively. The relaxation time scale τ is height-dependant (mimicking effects of surface heat fluxes) and is prescribed as τ = τ rlx · exp(z/z rlx ) with τ rlx = 300 s and z rlx = 200 m. Note that such formulation does not dump small-scale perturbations of φ, but simply shifts the horizontal mean toward φ 0 .
The grid is composed of 75×75 cells of equal size (hence the grid steps are 20 m in both directions  Figure 2: A sequence diagram depicting control flow in a conceptual solver described in section 2.3. This solver design is extended with libcloudph++ API calls in diagrams presented in Figures 3, 5 and 7. The diagram structure is modelled after the Unified Modeling Language (UML) sequence diagrams. Arrows with solid lines depict calls, while the dashed arrows depict returns from the called code. Individual solver steps are annotated with labels expressed in semi-mathematical notation and depicting key data dependencies. Model state variables are named r i , their corresponding rhs terms are namedṙ i . If a symbol appears on both sides of the equation, a programming-like assignment notation is meant, in which the old value of the symbol is used prior to assignment, i.e. r i = ADV(r i , C). ADV, ADJ and RHS depict all operations the solver does during the advection, adjustment and righthand-side update steps, respectively.

A conceptual solver
Example calling sequences for libcloudph++'s API are described in the following sections using a conceptual solver depicted in the diagram in Figure 2. The conceptual solver is meant to perform numerical integration of a system of heterogeneous transport equations, each equation of the form: where r i is the mixing ratio of the advected constituent, ρ d is the dry-air "carrier flow" density, u is the velocity field, and the dotted right-hand-side termṙ i depicts sources (see also appendix A). The solver logic consists of five steps executed in a loop, with each loop repetition advancing the solution by one timestep. Each of the first four integration steps is annotated in Figure 2 and described in the following paragraph. The final step does data output and is performed conditionally every few timesteps. The proposed solver design features uncenteredin-time integration of the right-hand-side terms. Besides the right-hand-side terms, the integration procedure provides for representation of sources using what is hereinafter referred to as adjustments. Adjustments are basically all modifications of the model state that are not representable as righthand-side terms (e.g. due to not being formulated as time derivatives). Adjustments are done after advecting but before updating right-hand-side terms.
The library code itself is not bound to this particular solver logic -it is just a simple example intended to present the library API. We refer the reader to Grabowski and Smolarkiewicz (2002) for discussion of higher-order integration techniques for moist atmospheric flows.

Single-moment bulk scheme
A common approach to represent cloud water and precipitation in a numerical simulation is the socalled single-moment bulk approach. The concepts behind it date back to the seminal works of Kessler (1995, section 3, and earlier works cited therein). The constituting assumption of the scheme is the division of water condensate into two categories: cloud water and rain water. The term singlemoment refers to the fact that only the total mass (proportional to the third moment of the particle size distribution) of water per category (cloud or rain) is considered in the model formulation.
In an Eulerian framework, two transport equations for the cloud water mixing ratio r c and the Table 1: State variables for the three implemented schemes. Number of state variables times the number of Eulerian grid cells plus number of particle attributes times the number of Lagrangian computational particles gives an estimation of the memory requirement of a given scheme. See appendix B for symbol definitions.

Eulerian (PDE)
Lagrangian (ODE) state variables particle attributes 1-moment bulk θ, r v , r c , r r -2-moment bulk θ, r v , r c , r r , n c , n rparticle-based θ, r v r 3 d , r 2 w , N , κ rain water mixing ratio r r are solved (in addition to the state variables θ and r v representing heat and moisture content, respectively, see Table 1 for a list of model-state variables in all schemes discussed in the paper).
Single-moment bulk microphysics is a simplistic approach. Without information about the shape of droplet size distribution, the model is hardly capable of being coupled with a description of aerosolor radiative-transfer processes.

Key assumptions
The basic idea is to maintain saturation in the presence of cloud water.
Condensation/evaporation of cloud water triggered by supersaturation/subsaturation happens instantaneously. Rain water forms through autoconversion of cloud water into rain (the negligible condensation of rain water is not represented). Autoconversion happens only after a prescribed threshold of the cloud water density is reached. Subsequent increase in rain water is also possible through the accretion of cloud water by rain.
Cloud water is assumed to follow the airflow, whereas rain water falls relative to the air with sedimentation velocity. Rain water evaporates only after all available cloud water has been evaporated and saturation is still not reached. In contrast to cloud water, rain water evaporation does not happen instantly. Rain evaporation rate is a function of relative humidity, and is parameterised with an assumed shape of the raindrop size distribution.

Phase changes
Phase changes of water are represented with the socalled saturation adjustment procedure. Unlike in several other formulations of the saturation adjustment procedure (cf. Straka, 2009, chapt. 4.2), the one implemented in libcloudph++ covers not only cloud water condensation and evaporation, but also rain water evaporation.
Any excess of water vapour with respect to saturation is instantly converted into cloud water, bringing the relative humidity to 100%. Similarly, any deficit with respect to saturation causes instantaneous evaporation of liquid water. The formulation of the saturation adjustment procedure is given here making the latent heat release equation a starting point. The heat source depicted with ∆θ is defined through two integrals, the first representing condensation or evaporation of cloud water, and the second one representing rain evaporation: where dθ /drv = −θlv /c pd T (cf. eq. 42 in appendix A) and the integration limit r v for cloud water condensation/evaporation is: where r vs = r vs (ρ d , θ , r v ) is the saturation vapour density evaluated after the adjustment. The first case above corresponds to supersaturation. The second and third cases correspond to subsaturation with either enough or insufficient amount of cloud water to bring the air back to saturation.
When saturation is reached through condensation or evaporation of the cloud water, the second integral in (5) representing evaporation of rain vanishes. If not enough cloud water is available to reach saturation through evaporation, the integration continues with the limit ρ v defined as follows: where δr r depicts the limit of evaporation of rain within one timestep. Here, it is parameterised as δr r = min(r r , ∆t · E r ) with E r being the evaporation rate of rain estimated following Grabowski and Smolarkiewicz (1996, eq. 5c) using the formula of Ogura and Takahashi (1971, eq. 25). As with r vs , here r vs = r vs (ρ d , θ , r v ).
Noteworthy, the name adjustment corresponds well with the adjustments solver step introduced in section 2.3 as the procedure defined above is formulated through integration over vapour density rather than time (see also discussion of eq. 3a in Grabowski and Smolarkiewicz, 1990).

Coalescence
Collisions and coalescence between droplets are modelled with two separate processes: autoconversion and accretion. Autoconversion represents collisions between cloud droplets only, while accretion refers to collisions between rain drops and cloud droplets. Both are formulated (parameterised) in a phenomenological manner as right-hand-side (rhs) terms following Grabowski and Smolarkiewicz (1996, eq. 5c) using the Kessler's formulae. See Wood (2005) for a recent review of how these formulations compare with other bulk warm-rain schemes.
In the Kessler's formulation, autoconversion source term is proportional to max(r c − r c0 , 0), where the value of the mixing-ratio threshold r c0 effectively controls the onset of precipitation in the simulation. Values of r c0 found in literature vary from 10 −4 to 10 −3 kg/kg (Grabowski and Smolarkiewicz, 1996).

Sedimentation
Representation of sedimentation of rain water is formulated as a rhs term. Another commonly used approach is to alter the vertical component of the Courant number when calculating advection. Here, the rhs term is formulated employing the upstream advective scheme: where old and new superscripts are introduced to indicate thatṙ r is a sum of multiple terms, the one representing sedimentation being only one of them. The | above and | below symbols refer to grid cell sequence in a column, v t is the rain terminal velocity parameterised as a function of rain water mixing ratio (eq. 5d in Grabowski and Smolarkiewicz, 1996), and F in and F out symbolise fluxes of r r through the grid cell edges. Employment of the upstream scheme brings several consequences. First, unlike the cellwise phasechange and coalescence formulation, the sedimentation scheme is defined over a grid column. Second, the combination of terminal velocity, vertical grid cell spacing ∆z and the timestep ∆t must adhere to the Courant condition (cf. discussion in Grabowski and Smolarkiewicz, 2002). Last but not least, the upstream algorithm introduces numerical diffusion, that can be alleviated by application of a higher-order advection scheme Smolarkiewicz (e.g., MPDATA, cf. 2006, and references therein).

API elements
The single-moment bulk scheme's API consists of one structure (composite data type) and three functions, all defined within the libcloudph::blk 1m namespace. The separation of the scheme's logic into the three functions is done first according to the conceptual solver design (i.e. separation of rhs terms and adjustments), and second according to a data-dependency criterion (i.e. cellwise or columnwise calculations). In case of the single-moment bulk scheme, the three functions actually correspond to the three represented processes, namely phase changes (cellwise adjustments), coalescence (cellwise rhs terms) and sedimentation (columnwise rhs term). Sedimentation is the only process involving columnwise traversal of the domain (note the | above and | below symbols in eq. 9-11). template<typename real_t> struct opts_t { bool cond = true, // condensation cevp = true, // evaporation of cloud revp = true, // evaporation of rain conv = true, // autoconversion accr = true, // accretion sedi = true; // sedimentation real_t r_c0 = 5e-4, // autoconv. threshold r_eps = 2e-5; // absolute tolerance }; Listing 3.1: blk 1m::opts t definition The blk 1m::opts t structure (Listing 3.1) is intended for storing options of the scheme for a given simulation. The template parameter real t controls floating point format (e.g., float, double, long double, . . . ). The structure fields include flags for toggling individual processes, a value of autoconversion threshold r c0 , and an absolute tolerance used in numerically integrating the integrals in equation 5. By default all processes are enabled, r c0 = 5 × 10 −4 kg/kg and the tolerance is set to 2 × 10 −5 kg/kg. All three functions from the singlemoment bulk scheme's API expect an instance of opts t as their first parameter (see Listings 3.2-3.4).
The saturation adjustment of state variables (cf. section 3.1.2) is obtained through a call to the blk 1m::adj cellwise() function (signature in Listing 3.2). The additional template parameter cont t specifies the type of data container used for passing model state variables. The function expects cont t to be equipped with STL-style 1 iterator interface (e.g., the standard std::vector class or a Blitz++ array slice as it is used in the example code described in appendix C Listing 3.4: blk 1m::rhs columnwise() signature guments include references to containers storing ρ d 1 C++ Standard Template Library (read-only) and θ, r v , r c , r r (to be adjusted). The last argument dt is the timestep length needed to calculate the precipitation evaporation limit (see discussion of eq. 8).
Forcings due to autoconversion and accretion are obtained through a call to the blk 1m::rhs cellwise() function whose signature is given in Listing 3.3. The function modifiesṙ c andṙ r by adding the computed rhs terms to the values already present inṙ c andṙ r . The function needs read-access to values of ρ d , r c and r r passed as the last three arguments.
Representation of sedimentation is included in a separate function rhs columnwise() (signature in Listing 3.4) as it is applicable only to simulation frameworks for which a notion of a column is valid (e.g. not applicable to a parcel framework). The passed cont t references are assumed to point to containers storing vertical columns of data with the last element placed at the top of the domain. The last argument dz is the vertical grid spacing. The function returns the value of F out (see eq. 9) for the lowermost grid cell within a column.

Example calling sequence
With the prototype solver concept defined in section 2.3, all three functions described above are called once per each timestep. The diagram in Figure 3 depicts the sequence of calls. As suggested by its name, the adj cellwise() function (covering representation of phase changes) is called within the adjustments step. Functions rhs cellwise() and rhs columnwise() covering representation of coalescence and sedimentation, respectively, are both called during the rhs-update step.

Implementation overview
The single-moment bulk scheme is implemented as a header-only C++ library. It requires a C++11compliant compiler.
Variables, function arguments and return values of physical meaning are typed using the Boost.units classes (Schabel and Watanabe, 2008). Consequently, all expressions involving them are subject to dimensional analysis at compile time (incurring no runtime overhead). This reduces the risk of typolike bugs (e.g. divide instead of multiply by density) but also aids the verification of the model formulae.
The integrals in equation 5 defining the saturation adjustment procedure are computed using the Boost.Numeric.Odeint library (Ahnert and Mulansky, 2013

Example results
The simulation framework and setup described in section 2 and implemented using libcloudph++ as described in appendix C were used to perform an example simulation with the single-moment bulk scheme. Integration of the transport equations was done using the nonoscillatory variant of MPDATA (Smolarkiewicz, 2006). Figure 4 presents a snapshot of cloud and rain water fields after 30 minutes simulation time (excluding the spin-up period).
The cloud deck is located in the upper part of the domain with the cloud water content increasing from the cloud base up to the upper boundary of the domain. The model has reached a quasi-stationary state and features a drizzle shaft that forms in the updraught region in the left-hand side of the domain. The quasi-stationary state was preceded by a transient rainfall across the entire domain in the first minutes of the simulation caused by the initial cloud water content exceeding the autoconversion threshold in the upper part of the entire cloud deck.  Figure 4: Example results from a 2D kinematic simulation using the single-moment bulk scheme. All panels depict model state after 30 minutes simulation time (excluding the spin-up period). Note logarithmic colour scale for rain water plots. See section 3.4 for discussion.

Double-moment bulk scheme
A common extension of the single-moment bulk approach is a double-moment bulk scheme. Similarly to the single-moment approach, the double-moment warm-rain scheme assumes that condensed water is divided into two categories: cloud water and rain water. In addition to the total mass of water in both categories, concentrations of droplets and drops are also predicted. As a result, the scheme considers two moments of particle size distribution, hence the name. In the Eulerian framework, four transport equations for cloud droplet concentration n c , cloud water mixing ratio r c , rain drop concentration n r and rain water mixing ratio r r are solved (see table 1 for a list of model-state variables). With additional information on the shape of clouddroplet and rain size spectra, the double-moment bulk microphysics scheme is better suited than the single-moment scheme for coupling to aerosol and radiative-transfer models The double-moment scheme implemented in libcloudph++ was introduced by Morrison and Grabowski (2007). Their scheme includes, in particular, prediction of the supersaturation. This makes it well suited for depicting impacts of aerosol on clouds and precipitation. However, the scheme does not keep track of the changes of aerosol size distribution, and hence excludes impacts of clouds and precipitation on aerosol.

Key assumptions
The model assumes aerosol, cloud and rain spectra shapes (lognormal, gamma and exponential, respectively). Aerosol is assumed to be well mixed throughout the whole domain and throughout the whole simulation time (uniform concentration per unit mass of dry air). Cloud water forms only if some of the aerosol particles are activated due to supersaturation. Activation and subsequent growth by condensation are calculated applying the predicted supersaturation. As in the single-moment scheme, rain water forms through autoconversion and is further increased by accretion. Prediction of mean size of cloud droplets and rain drops allows to better link the parameterisation of autoconversion and accretion to actual solutions of the collision-coalescence equation. As in the singlemoment scheme, cloud water is assumed to follow the airflow, whereas rain water falls relative to the air. Evaporation of cloud and rain water is included in the formulation of phase changes and considers the negligible diffusional growth of rain water.

Phase changes
Cloud droplets form from activated aerosol. The number of activated droplets is derived by applying Köhler theory to assumed multi-modal lognormal size distribution of aerosols. Freshly activated cloud droplets are assumed to have the radius of 1 µm; for full derivation see Morrison and Grabowski (2007, eqs. 9-13) and Khvorostyanov and Curry (2006). The concentrations of activated droplets are computed separately for each mode of the aerosol size distribution and then summed.
The size distribution of aerosols is not resolved by the model. To take into account the decrease of aerosol concentration due to previous activation, in each timestep the number of available aerosols is approximated as the difference between initial aerosol concentration and the concentration of preexisting cloud droplets. Note that this approximation is valid for weakly precipitating clouds only. For a strongly raining cloud, the model should include an additional variable, the concentration of activated cloud droplets. It differs from the droplet concentration because of collision-coalescence (see Eqs. (7) and (8) in Morrison and Grabowski, 2008).
The changes in cloud and rain water due to condensation and evaporation follow eq. (8) in Morrison and Grabowski (2007) with the phase relaxation times computed following eq. (4) in Morrison et al. (2005) adapted to fall speed parameterisation used in Morrison and Grabowski (2007).
The decrease in number concentration due to evaporation of cloud droplets and rain drops is computed following the approach of Khairoutdinov and Kogan (2000). Cloud droplet concentration is kept constant during evaporation, until all cloud water has to be removed. Rain drop concentration decreases during evaporation preserving the mean size of rain (drizzle) drops.

Coalescence
Parameterisation of autoconversion and accretion follows the one of Khairoutdinov and Kogan (2000). In contrast to the single-moment scheme, the autoconversion rate is a continuous function, and the rain onset is not controlled by a single threshold. Drizzle drops formed due to autoconversion are assumed to have initial radius of 25 µm.

Sedimentation
Sedimentation is calculated in the same way as in the single-moment scheme (see section 3.1.4), employing upstream advection. Sedimentation velocities (mass-weighted for the rain density and number-weighted for the rain drop concentration) are calculated applying drop terminal velocity formulation given in Simmel et al. (2002, Table 2). Sedimentation velocity is multiplied by ρ d0 /ρ d to follow eq. A4 in Morrison et al. (2005), where ρ d0 is the density of dry air at standard conditions. The double-moment bulk scheme's API consists of one structure and two functions, all defined within the libcloudphxx::blk 2m namespace.
The structure blk 2m::opts t holds scheme's options, its definition is provided in Listing 4.1. Besides process-toggling Boolean fields, it stores the parameters of the aerosol used in parameterising activation, that is the parameters of the lognormal size distribution (see eq. 3) and the parameter β defining the solubility of aerosol (see Khvorostyanov and Curry, 2006, sec. 2.1).
All processes are represented as right-hand-side terms in the double-moment scheme.
Contributions to the rhs terms due to phase changes and coalescence are obtained through a call to blk 2m::rhs cellwise() (see Listing 4.2). As in the single-moment bulk scheme's API, contribution from sedimentation to the rhs terms can be computed by calling blk 2m::rhs columnwise() (Listing 4.3).
The meaning of template parameters and function arguments is analogous to the single-moment bulk scheme's API (see section 3.2). The computed values of rhs terms are added (and not assigned) to the arrays passed as arguments.
The cellwise-formulated processes are handled in the following order: activation, condensation/evaporation of cloud droplets, autoconversion, accretion, condensation/evaporation of rain. In principle, there is no guarantee that the summed contributions from all those processes, multiplied   Figure 6: Example results from a 2D kinematic simulation using the double-moment bulk scheme. All panels depict model state after 30 minutes simulation time (excluding the spin-up period). Note logarithmic colour scale for rain water plots. See section 4.4 for discussion.
by the timestep, are smaller than the available water contents or droplet concentrations. Application of such rhs terms could result in negative values of water contents or concentrations. To prevent that, each contribution to the rhs term evaluated within rhs cellwise() is added to the arrayṙ i passed as argument using the following rule: whereṙ old i is the value obtained in evaluation of previously-handled processes,ṙ i is the value according to model formulae, andṙ new i is the augmented value of rhs term that guarantees nonnegative values of r i after its application. The same rule is applied when evaluating values of outgoing fluxes F out from equation 9 when calculating rhs term within rhs columnwise(). The rhs columnwise() returns the value of the F out flux from the lowermost grid cell within a column.

Example calling sequence
A diagram with an example calling sequence for the double-moment scheme is presented in figure 5. The only difference from the single-moment bulk scheme's calling sequence presented in section 3.2.2 is the lack of an adjustments step. Condensation is represented using right-hand-side terms and is computed together with coalescence in the blk 2m::rhs cellwise().

Implementation overview
The implementation of the double-moment scheme follows closely the implementation of the singlemoment scheme (see section 3.3). It's a header-only C++ library, using Boost.units classes for dimensional analysis and Boost.Iterator for iterating over sets of array slices.

Example results
Simulations presented in Section 3.4 were repeated with the double-moment scheme. Figure 6 presents a snapshots of the cloud and rain water content as well as the cloud and rain drop concentration fields after 30 minutes simulated time (excluding the spin-up period). Because of large differences in the predicted rain, rain water content and drop concentration are plotted using logarithmic colour scale in order to keep the same colour range for all three presented schemes.
Similarly to the results from the single-moment scheme presented in Figure 4, cloud water content increases from the cloud base almost up to the upper boundary of the domain. However, unlike in the case of the single-moment scheme, the cloud deck in Figure 6 features a "cloud hole" above the downdraught region. The rain forms in the upper part of the updraught and is advected into the downdraught region in the right-hand side of the domain. The double-moment simulation at the thirtieth minute is still to reach the quasi-stationary state. This is because of the differences in the parameterisation of autoconversion that lead to different timing of the precipitation onset.
The cloud droplet concentration plot reveals that the model captures the impact of the cloud base vertical velocity (and hence the supersaturation) on the concentration of activated cloud droplets. The highest concentrations are found near the updraught axis, and the lowest near the updraught edges. The difference in shapes of the rain drop concentration n r and the rain water mixing ratio r r fields arguably comes from the different fall velocities of n r and r r .

Particle-based scheme
The third scheme available in libcloudph++ differs substantially from the other two bulk schemes. It does not treat water condensate as continuous medium. Instead, the scheme employs Lagrangian tracking of particles that represent atmospheric aerosol, cloud and drizzle droplets, and rain drops. However, volumes relevant to atmospheric flows contain far too many particles to be individually represented in a numerical model. Consequently, each "computational particle" tracked in the scheme represents multiple particles of identical properties (i.e. spatial coordinates and physicochemical properties). Such approach was recently applied for modelling precipitating clouds by Andrejczuk et al. (2010); Sölch and Kärcher (2010); Riechelmann et al. (2012); Arabas and Shima (2013). Formulation of the scheme presented here follows the Super Droplet Method of Shima et al. (2009) to represent collisions and coalescence of particles.

Formulation
The key assumption of the particle-based scheme is to assume no distinction between aerosol, cloud, drizzle and rain particles. All particles tracked by the Lagrangian component of the solver are subject to the same set of processes including advection by the flow, gravitational sedimentation, diffusional growth, evaporation, and collisional growth. The Eulerian component of the model is responsible for advecting θ and r v (see Appendix A). Representation of all the processes, as well the method of coupling the Lagrangian and Eulerian components of the model is given below.
The Lagrangian component is responsible for tracking the computational particles, each having the following attributes: • multiplicity N • location (i.e. spatial coordinates with 0,1,2 or 3 components) • wet radius squared r 2 w • dry radius cubed r 3 d • hygroscopicity parameter κ Multiplicity depicts the number of particles represented by the computational particle. All particles represented by one computational particle are assumed to be spherical water solution droplets of radius r w . Following Shima et al. (2009) the model is formulated in r 2 w for numerical reasons. The amount of solvent is represented with the dry radius r d (third power is used in the model code as most often r d serves as a proxy for volume of the solvent). The hygroscopicity of the solvent is parameterised using the single-parameter approach of Petters and Kreidenweis (2007).
The list of particle attributes can be extended. For example, parameters describing chemical composition of the solution or the electrical charge of a particle can be added. Adding new particle attributes does not increase the computational expense of the Eulerian component of the solver. However, extension of the phase space by a new dimension (the added attribute) potentially requires using more computational particles to achieve sufficient coverage of the phase space.

Key assumptions
Most of the assumptions of the bulk models described in sections 3 and 4 are lifted. All particles are subject to the same set of processes. It follows that the model represents even dry deposition and collisions between aerosol particles (both being effectively negligible). The supersaturation in the model domain is resolved taking into account phase change kinetics (i.e. condensation and evaporation are not treated as instantaneous). Aerosol may have any initial size distribution and may be internally or externally mixed (i.e. have the same or different solubility for particles of different sizes). There are no assumptions on the shape of the particle size spectrum.
There are, however, two inherent assumptions in the premise of all particles being spherical and composed of water solution. First, the humidity within the domain and the hygroscopicity of the substance of which aerosol is composed must both be high enough for the solution to be dilute. For tropospheric conditions and typical complex-composition internally-mixed aerosol this assumption is generally sound (Fernández-Díaz et al., 1999;Marcolli et al., 2004). Second, the nonsphericity of large precipitation particles has to be negligible. It is a valid assumption for drops smaller than 1 mm (Szakáll et al., 2010).
It is assumed in the present formulation that a particle never breaks up into multiple particles. It is a reasonable assumption for the evaporation of cloud particles into aerosol (Mitra et al., 1992). However, both collision-induced and spontaneous breakup become significant (the latter to a much smaller extent) for larger droplets (McFarquhar, 2010) and hence the scheme requires an extension in order to allow for diagnosing rain spectra for strongly precipitating clouds.
There is not yet any mechanism built into the model to represent aerosol sources (other than regeneration of aerosol by evaporation of cloud droplets).

Advection
Transport of particles by the flow is computed using the backward Euler scheme: where C is the Courant number field of the Eulerian component of the solver, and ∆x is the grid step (formulae are given for the x dimension, but are applicable to other dimensions as well). An Arakawa-C staggered grid is used and evaluation of C(x [n+1] ) is performed using linear approximation (interpolation / extrapolation of the particle velocities using fluid velocity values at the grid cell edges): where fractional indices denote Courant numbers on the edges of a grid cell in which a given particle is located at time level n. The Courant number components are defined as velocity components times the ratio of time step and grid step in each dimension. The weight ω is defined as: where x depicts the largest integer not greater than x. Substituting (14) and (15) into (13) results in an analytic solution for x [n+1] : where ∆C = C i+ 1 /2 − C i− 1 /2 . The same procedure is repeated in other spatial dimensions if applicable (i.e. depending on the dimensionality of the Eulerian component). Periodic horizontal boundary conditions are assumed.

Phase changes
Representation of condensation and evaporation in the particle-based approach encompasses several phenomena that are often treated individually, namely: aerosol humidification, cloud condensation nuclei (CCN) activation and deactivation, cloud droplet growth and evaporation, and finally rain evaporation. The timescale of some of these processes (notably CCN activation) is much shorter than the characteristic timescale of the large-scale air flow solved by the Eulerian component of the solver. Therefore, representation of condensation and evaporation in the Lagrangian component involves a substepping logic in which the Eulerian component timestep ∆t is divided into a number of equal substeps. For simplicity, this procedure is not depicted explicitly in the following formulae. It is only hinted by labelling subtimestep as ∆t and the subtimestep number as n .
Presently, the number of subtimesteps is kept constant throughout the domain and throughout the simulation time. However, the actual constraints for timestep length ∆t differ substantially, particularly with the distance from cloud base (see figure 2 in Arabas and Pawlowska, 2011). An adaptive timestep choice mechanism is planned for a future release.
Within each substep, the drop growth equation is solved for each computational particle with an implicit scheme with respect to wet radius but explicit with respect to r v and θ: Solution to the above equation is sought by employing a predictor-corrector type procedure. First, the value of the dr 2 w/dt derivative evaluated at r 2 [n ] w is used to construct an initial-guess range a < r 2 [n +1] w < b in which roots of equation 17 are to be sought, with: Second, r 2 n +1 w is iteratively searched using the bisection algorithm. If the initial-guess range choice makes bisection search ill-posed (minimisation function having the same sign at a and b), the algorithm stops after first iteration returning (a + b)/2, and reducing the whole procedure to the standard Euler scheme (due to the use of factor 2 in the definition of a and b). It is worth noting, that such treatment of drop growth (i.e. Lagrangian in radius space, the so-called moving sectional or method of lines approach) incurs no numerical diffusion.
The growth rate of particles is calculated using the single-equation (so-called Maxwell-Mason) approximation to the heat and vapour diffusion process (Straka, 2009, rearranged eq. 5.106): where the effective diffusion coefficient is: ρ vs (density of water vapour at saturation with respect to plane surface of pure water), T and ρ v are updated every subtimestep. The vapour density at drop surface ρ • is modelled as: where water activity a w and the so-called Kelvin term exp(A/r w ) are evaluated using the κ-Köhler parameterisation of Petters and Kreidenweis (2007). See Arabas and Pawlowska (2011) for the formulae for A, l v and ρ vs used. Vapour and heat diffusion coefficients D and K are evaluated as: The Fuchs-Sutugin transition-régime correction factors β M (r w , T ) and β T (r w , T, p) are used in the form recommended for cloud modelling by Laaksonen et al. (2005, i.e. employing mass and heat accommodation coefficients of unity). The Sherwood number Sh and the Nusselt number Nu (twice the mean ventilation coefficients) are modelled following Clift et al. (1978) as advocated by Smolík et al. (2001). As in the particle-based ice-microphysics model of Sölch and Kärcher (2010), no interpolation of the Eulerian state variables to particle positions is done (in contrast to the approach employed in warm-rain models of Andrejczuk et al., 2008;Shima et al., 2009;Riechelmann et al., 2012). It is therefore implicitly assumed, in compliance with the Eulerian solver component logic, that the heat and moisture are homogeneous within a grid cell. Consequently, the effects of subgrid-scale mixing on the particles follow the so-called homogeneous-mixing scenario (see Jarecka et al., 2013, and references therein). Furthermore, no effects of vapour field inhomogeneity around particles are taken into consideration (see Vaillancourt et al., 2001;Castellano andÁvila, 2011).
Particle terminal velocities used to estimate the Reynolds number to evaluate Sh and N u are calculated using the parameterisation of Khvorostyanov and Curry (2002, see also 5.1.5 herein).
After each substep, the thermodynamic fields r v and θ are adjusted to account for water vapour content change due to condensation or evaporation on particles within a given grid cell and within a given substep by evaluating: where ∆V is the grid cell volume, and ρ w is the density of liquid water. Noteworthy, such formulation maintains conservation of heat and moisture in the domain regardless of the accuracy of integration of the drop growth equation.
Phase change calculations are performed before any other processes, as it is the only process influencing values of r v and θ fields of the Eulerian component. Consequently, Eulerian component of the solver may continue integration as soon as phase-change calculations are completed. Such asynchronous logic is enabled if performing calculations using a GPU: particle advection, sedimentation and collisions are calculated by the Lagrangian component of the solver using GPU while the Eulerian component is advecting model state variables using CPU(s).

Coalescence
The coalescence scheme is an implementation of the Super Droplet Method (SDM) described in Shima et al. (2009). SDM is a Monte-Carlo type algorithm for representing particle collisions. As it is done for phase changes, coalescence of particles is solved using subtimesteps ∆t . In each subtimestep all computational particles within a given grid cell are randomly grouped into non-overlapping pairs (i.e. no computational particle may belong to more than one pair). Then the probability of collisions between computational particles i and j in each pair is evaluated as: where n is the total number of computational particles within the grid cell in a given timestep and K(r i , r j ) is the collection kernel. In analogy to a target-projectile configuration, scaling the probability of collisions with the larger of the two multiplicities max(N i , N j ) (target size) implies that if a collision happens, min(N i , N j ) of particles will collide (number of projectiles). The last term in equation (27) upscales the probability to account for the fact that not all (n(n − 1)/2) possible pairs of computational particles are examined but only n/2 of them. Evaluation of collision probability for nonoverlapping pairs only, instead of for all possible pairs of particles, makes the computational cost of the algorithm scale linearly, instead of quadratically, with the number of computational particles (at the cost of increasing the sampling error of the Monte-Carlo scheme). The coalescence kernel has the following form if only geometric collisions are considered: where E(r i , r j ) is the collection efficiency and v is the terminal velocity of particles (i.e. their flowrelative sedimentation velocity). The collection efficiency differs from unity if hydrodynamic effects (e.g. Vohl et al., 2007) or van der Waals forces (Rogers and Davis, 1990) are considered. The whole coalescence kernel may take different form (in particular may be nonzero for drops of equal terminal velocity) if turbulence effects are taken into account (Grabowski and Wang, 2013, and references therein).
In each subtimestep the evaluated probability P ij is compared to a random number from a uniform distribution over the (0,1) interval. If the probability is larger than the random number, a collision event is triggered. During a collision event, all min(N i , N j ) particles collide (Shima et al., 2009, see Figure 1 and Section 4.1.4 in). One of the colliding computational particles (the one with larger multiplicity) retains its multiplicity but changes its dry and wet radii to those of the newly formed particles. The second colliding computational particle (the one with smaller multiplicity) retains its dry and wet radii but changes its multiplicity to the difference between N i and N j .
Unlike in the formulation of Shima et al. (2009) particles with equal multiplicities collide using the same scheme, leaving one of the particles with zero multiplicity. Particles with zero multiplicity are "recycled" at the begining of each timestep. The recycling procedure first looks for computational particles with highest multiplicites and then assignes their properties to the recycled particles halving the multiplicity.
Noteworthy, the collisional growth is represented in a numerical-diffusion-free manner, that is, Lagrangian in particle radius space (both dry and wet radius). This is an advantage over the Euleriantype schemes based on the Smoluchowski equation which exhibit numerical diffusion (see e.g. Bott, 1997). Other particle parameters are either summed (i.e. extensive parameters such as r 3 d ) or averaged (i.e. intensive parameters such as κ).
Presently, the "multiple coalescence" feature of SDM introduced by Shima et al. (2009) to robustly cope with an undersampled condition of P ij > 1 is not implemented. It is planned, however, to use the values of P ij to control an adaptive timestep logic to be introduced in a future release.

Sedimentation
Particle sedimentation velocity is computed using the formula of Khvorostyanov and Curry (2002, eqs. 2.7, 2.12, 2.13, 3.1). The explicit Euler scheme is used for adjusting particle positions (the terminal velocity is effectively constant, taking into account the assumed homogeneity of heat and moisture within a grid cell).
Sedimentation may result in the particles leaving the domain (i.e. dry deposition or ground-reaching rainfall). Computational particles that left the domain are flagged with zero multiplicity and hence undergo the same recycling procedure as described above for equal-multiplicity collisions.

Initialisation
One of the key parameters of the particle-based simulation is the number of computational particles used. As in several recent cloud-studies employing particle-based techniques, the initial particle spatial coordinates are chosen randomly using a uniform distribution. Consequently, the initial condition has a uniform initial mean density of computational particles per cell (assuming all cells have the same volume). The value of this initial mean density defines the sampling error in particle parameter space, particularly in the context of phase changes and coalescence which are both formulated on cellwise basis. The dry radii of the computational particles are chosen randomly with a uniform distribution in the logarithm of radius. The minimal and maximal values of dry radius are chosen automatically by evaluating the initial dry-size distribution. The criterion is that the particle multiplicity (i.e. the number of particles represented by a computational particle) for both the minimal and the maximal radii be greater or equal one. The initial spectrum shape is arbitrary. Externally mixed aerosol may be represented using multiple spectra, each characterised by different value of κ. The initial particle multiplicities are evaluated treating the input spectra as corresponding to the standard atmospheric conditions (STP) and hence the concentrations are multiplied by the ratio of the dry-air density in a given grid cell to the air density at STP.
Equation 20 defines the relationships between the dry and the wet spectra in the model. These should, in principle, be fulfilled by the initial condition imposed on model state variables. For cloud-free air, it is possible by assuming an equilibrium defined by putting zero on the left-hand side of equation 20. This allows to either diagnose the wet spectrum from the dry one or vice versa.
If the dry size distribution is given as initial condition, bringing all particles to equilibrium at a given humidity is done without changing θ and r v to resemble bulk models' initial state. A small amount of water needed to obtain equilibrium is thus effectively added to the system.
For setups assuming initial presence of cloud water within the model domain, the equilibrium condition may be applied only to subsaturated regions within the model domain. The initial wet radius of particles within the supersaturated regions is set to its equilibrium value at RH=95% (following Lebo and Seinfeld, 2011). Subsequent growth is computed within the first few minutes of the simulation. In order to avoid activation of all available aerosol, the drop growth equation 20 is evaluated limiting the value of the supersaturation to 1% (see also discussion on particle-based simulation initialisation in Andrejczuk et al., 2010, sec. 2.2).

API elements
The particle-based scheme's API differs substantially from bulk schemes' APIs as it features objectoriented approach of equipping structures (referred to as classes) with functions (referred to as methods). Furthermore, unlike the bulk schemes' APIs, the particle-based scheme is not implemented as a header-only library but requires linking with libcloudphxx lgrngn shared library. The particlebased scheme's API consists of four structures (classes with all members public), one function and two enumerations, all defined in the libcloudphxx::lgrngn namespace. The often occurring template parameter real t controls the floating point format.
As in the case of bulk schemes, the options controlling the scheme's course of action in each solver step are stored in a separate structure lgrngn::opts t whose definition is given in Listing 5.1. The first Boolean fields provide control template<typename real_t> struct opts_t { // process toggling bool adve, sedi, cond, coal; // RH limit for drop growth real_t RH_max; // no. of substeps int sstp_cond, sstp_coal; Listing 5.1: lgrngn::opts t definition over process toggling. The following fields are the RH limit for evaluating drop growth equation (for the spin-up, see sec. 2.2), and the numbers of substeps to be taken within one Eulerian timestep when calculating condensation and coalescence. The default values are set in the structure's constructor whose definition is not presented in the Listing (the C++11 syntax for default parameter values used in blk 1m::opts t and blk 2m::opts t is not used to maintain compatibility with C++03 required to compile the code for use on a GPU, see discussion in sec. 5.3). Several other options of the particle-based scheme not meant to be altered during simulation are grouped into a structure named lgrngn::opts init t (Listing 5.2). The initial dry size spectrum of aerosol is represented with a map associating values of the solubility parameter κ with pointers to functors returning concentration of particles at STP as a function of logarithm of dry radius. Subsequent fields specify the geometry of the Eulerian grid and the timestep. It is assumed that the Eulerian component operates on a rectilinear grid with a constant grid cell spacing, although this assumption may easily be lifted in future releases if needed. The parameters x0, y0, z0, x1, y1, z1 are intended for defining a subregion of the Eulerian domain to be covered with computational particles. The last two fields provide control of the initial mean concentration of computational particles per grid cell and the type of the coalescence kernel (only the geometric one implemented so far, see Listing 5.3).
Computational particle spatial coordinates provide the principal link between the particle-based Listing 5.3: lgrngn::kernel t definition scheme's Lagrangian and Eulerian components. Consequently, unlike in the case of bulk schemes which use STL-type iterators to traverse array elements without any information on the array dimensionality or shape, here the actual geometry and memory layout of the passed arrays need to be known. The memory layout of array data is represented in the API using the lgrngn::arrinfo t structure (Listing 5.4 (Veldhuizen, 2005): ,,dataZero is a pointer to the element (0,0,...,0), even if such an element does not exist in the array. What's the point of having such a pointer? Say you want to access the element (i,j,k). If you add to the pointer the dot product of (i,j,k) with the stride vector stride, you get a pointer to the element (i,j,k)." Using arrinfo t as the type for API function arguments makes the library compatible with a wide range of array containers, Blitz++ being just an example. In addition, no assumptions are made with respect to array index ranges, what allows the library to operate on array slabs (e.g. array segments excluding the so-called halo regions).
The state of the Lagrangian component of the model (notably, the values of particle attributes) is stored in an instance of the lgrngn::particles t class (see Listing 5.5). Internally, the Lagrangian calculations are implemented using the Thrust library 2 which, among other, allows to run the particle-based simulations either on CPU[s] or on a GPU. The second template parameter of lgrngn::particles t is the type of the backend to be used by the Thrust library, and as of current release it has three possible values: serial, OpenMP or CUDA (cf. Listing 5.6 with definition of the backend t enumeration). The OpenMP 3 backend offers multi-threading using multiple CPU cores and/or multiple CPUs. The CUDA 4 backend enables the user to perform the computations on a GPU. The serial backend does single-thread computations on a CPU. The "backend-aware" particles t<real t, backend> inherits from "backendunaware" particles proto t<real t> (definition not shown) what allows to use a single pointer to particles proto t with different backends (as used in the return value of lgrngn::factory() discussed below).
Initialisation, time-stepping and data output is performed by calling particles t's methods whose signatures are given in Listing 5.5 and discussed in the following three paragraphs.
The particles t::init() method intended to be called once at the beginning of the simulation performs the initialisation steps described in section 5.1.6. The first three arguments are mandatory and should point to the θ, r v and ρ d fields of the Eulerian component of the solver. The next arguments should point to Courant number field multiplied by the dry-air density ρ d . The number of required arguments pointing to Courant field components depends on the dimensionality of the modelling framework, and ranges from zero (parcel framework) up to three (3D simulation). The Courant number components are expected to be laid out on the Arakwa-C grid, thus for the 2D case courant 1's shape is (nx+1)×nz and courant 2's shape is nx×(nz+1).
The latter covers other processes (transport of particles, sedimentation and coalescence) which may be computed asynchronously, fer example while the Eulerian model calculates advection of the Eulerian fields. Both methods take a reference to an instance of lgrngn::opts t as their first argument. Among subsequent arguments of step sync(), only the first are mandatory. They should point to θ and r v fields which will be overwritten by the method. The Courant field components need to be specified only if the Eulerian component of the model solves air dynamics (they are omitted in the case of the kinematic framework used in examples in this paper). The last argument pointing to a ρ d array is also optional and needs to be specified only if the Eulerian framework allows the density to vary in time. The step async() method returns accumulated rain flux through the bottom of the domain.
The particles t's methods prefixed with diag provide a mechanism for obtaining statistical information on the droplet parameters gridded on the Eulerian component mesh.
The particles t::diag sd conc method calculates the concentration of computational particles per cell. The particles t::diag dry mom() and particles t::diag wet mom() calculate statistical moments of the dry and wet size spectra respectively. The k-th moment M of the dry (d) or wet (w) spectrum is defined here as: where the index i traverses all computational particles and N is the particle multiplicity. The moment number k is chosen through the methods' argument k. The range of radii [r mi , r mx ] over which the moments are to be calculated is chosen by calling diag dry rng() or diag wet rng() before calls to diag dry mom() and diag wet mom(), respectively. Calling the particles t::outbuf() method causes the calculated fields to be stored in an output buffer, and a pointer to the first element of the buffer to be returned.
The last element of the particle-based scheme's API is the factory() function. It returns a pointer to a newly allocated instance of the particles t class. Its arguments are the backend type (see Listing 5.6) and the scheme's options as specified by the opts init t fields (see Listing 5.2). The purpose of introducing the lgrngn::factory() function is twofold. First, it makes the backend choice a runtime mechanism rather than a compile-time one (backend is one of the compile-time template parameters of particles t). Second, it does report an error if the library was compiled without CUDA (GPU) or OpenMP (multi-threading) backend support. Figure 7 depicts an example calling sequence for the particle-based scheme's API. The API calls are split among the adjustments and output steps of the solver. The rhs steps are presented in the diagram, but here they refer to forcings extrinsic with respect to the cloud microphysics scheme (e.g. the relaxation terms in the setup described in Section 2.2).

Example calling sequence
In the case of bulk schemes (Figures 3 and 5) both the solver and library flow control was handled by a single thread (or a group of threads performing the same operations in case of domain decomposition). Here, there are two separate threads (or a group of solver threads plus one library thread in case of domain decomposition). The synchronisation between the solver and the library threads is depicted in the diagram with "wait for . . . " labels.
In the presented calling sequence the diagnostic methods are only called within the output step. Depending on the modelling framework, such calls may also be needed in every timestep, for example to provide data on particle surface for a radiativetransfer component, or the data on particle mass for a dynamical component of the solver. Note that a single call to diag dry/wet rng() may be followed by multiple calls to diag dry/wet mom() as depicted by nesting the "for each moment" loop within the "for each size range" loop.

Implementation overview
The Lagrangian component of the model is implemented using the Thrust library (Hoberock and Bell, 2010). Consequently, all parallelisation logic is hidden behind the Thrust API calls. The parallelisation is obtained by splitting the computationalparticle population among several computational units using shared memory. However, arguably the true power of Thrust is in the possibility to compile the same code for execution on multiple parallel architectures including general-purpose GPUs (via CUDA) and multi-core CPUs (via OpenMP). The implemented particle-based scheme is particu- larly well suited for running in a set-up where the Eulerian computations are carried out on a CPU, and the Lagrangian computations are delegated to a GPU. That is due to: • the low data exchange rate between these two components -there is never a need to transfer the state of all computational particles to the Eulerian component residing in the main memory, only the aggregated size spectrum parameters defined per each grid box are needed; • the possibility to perform part of the microphysics computations asynchronously -simultaneously with other computations carried out on CPU(s) (cf. Sec. 5.1.3).
Since the current release of CUDA compiler does not support C++11, the particle-based scheme is implemented using C++03 constructs only. Furthermore, the CUDA compiler does not support all C++ constructs used by the Boost.units library. For this reason, a fake units drop-in replacement for Boost.units was written and is shipped with lib-cloudph++. It causes all quantities in the program to behave as dimensionless. It is included instead of Boost.units only if compiling the CUDA backend. Consequently, the particle-based scheme's code is checked for unit correctness while compiling other backends.
The asynchronous launch/wait logic is left to be handled by the caller. In the example program icicle (see appendix C), it is implemented using the C++11's std::async() call.
Both in the case of GPU and CPU configurations the Mersenne Twister (Matsumoto and Nishimura, 1998) random number generator is used. If using GPU, the CUDA cuRAND's MTGP32 is used that offers parallel execution with multiple random number streams. If not using GPU, the C++11 std::mt19937 is used and the random number generation is done by a single thread only, even if using OpenMP.

Example results
Figures 8 and 9 present results from an example simulation with the particle-based scheme. The simulations are analogous to those discussed in Sections 3.4 (single-moment) and 4.4 (doublemoment). As before, the plots are for the thirtieth minute of the simulation time (again excluding the two-hour-long spin-up period). The initial mean concentration of computational particles was set to 64 per cell. The number of substeps was set to 10 for both condensation and coalescence, with coalescence calculated using the geometric kernel only. Figure 8 depicts gridded aerosol, cloud and rain properties, with the gridded data obtained by calculating statistical moments of the particle size distribution in each grid cell. In addition to quantities corresponding to the bulk model variables r c , r r (cf . Figs 4 and 6) and n c and n r (cf. Fig. 6), Figure 8 features plots of the effective radius (ratio of the third to the second moment of the size spectrum) and the aerosol concentration. The distinction between aerosol particles, cloud droplets, and rain drops is made using radius thresholds of 0.5 µm and 25 µm for aerosol/cloud and cloud/rain boundaries, respectively. The noise in most panels comes from sampling errors of the particle-based scheme, these errors get smaller with increasing number of computational particles used (not shown). The cloud water content and cloud droplet concentration plots both show strong similarities to the results of simulation using the double-moment scheme (Fig. 6). The increase with height of cloud water content, drop concentration approximately constant with height, the maximum droplet concentration near the updraught axis, and the cloud hole are all present in both the particle-based and the double-moment simulations. The range of values of the rain water content and the rain drop concentration predicted by the particle-based model roughly matches those of the double-moment scheme, yet the level of agreement is much smaller than in the case of cloud water. For example, the maximum rain water content in the double-moment simulation is located in the centre of the downdraught, whereas this location features virtually no rain in the particle-based simulation. Arguably, this is because of the numerical diffusion of the Eulerian double-moment scheme. The two schemes agree with respect to the vertical extent of the drizzle shaft as it vanishes at about 300 metres above the bottom boundary of the domain in both cases.
The plot of the effective radius in Figure 8 shows the gradual increase of drop sizes from the cloud base up to the top of the cloud. The drizzle shaft is the location of the largest particles still classified as cloud droplets. The effective radius plot features the smoothest gradients among all presented plots, particularly across the cloud. This is likely due to the fact that unlike other plotted quantities, the effective radius is an intensive parameter and hence is not proportional to the drop concentration which inherits random fluctuations of the initial aerosol concentrations. The aerosol concentration demonstrates anticipated presence of the in- particle radius [μm] (b) wet radius dry radius Figure 9: Plots of dry and wet size spectra for ten location within the simulation domain. The locations and their labels (a-j) are overlaid on plots in Figure 8. The vertical bars at 0.5 µm and 25 µm indicate the range of particle wet radii which is associated with cloud droplets. See section 5.4 for discussion.
terstitial aerosol within the cloud. The regions of largest rain water content correspond to regions of lowered aerosol concentrations, both within and below the cloud. This likely demonstrates the effect of scavenging of aerosol particles by the drizzle drops, most likely overpredicted by the geometric collision kernel applied in the simulation. The ten black squares overlaid on each plot in Figure 8 show locations of the regions for which the wet and dry particle size spectra are plotted in Figure 9. The ten locations are composed of a 3×3 grid cells each, and the spectra plotted in the ten panels of Figure 9 are all averages over the 9 cells. The dry spectra are composed of 40 bins in an isologarithmic layout from 1 nm to 10 µm. The wet spectra are composed of 25 bins extending the above range up to 100 µm. Each square in the Figure 8 and its corresponding panel in Figure 9 are labelled with a letter (a to j). All panels in Figure 9 contain two vertical lines at 0.5 µm and 25 µm that depict the threshold values of particle wet radius used to differentiate between aerosol, cloud droplets and rain drops.
To match the pathway of cloud evolution, we shall discuss the panels in Figure 9 counterclockwise, starting from panel (i) which presents data on the aerosol size spectrum in the updraught below cloud base. There, the wet spectrum plotted with thick blue line is slightly shifted towards larger sizes than the dry spectrum plotted with thin red line. This shift corresponds to humidification of the hygroscopic aerosol. Panels (g) and (e) show how the wet spectrum evolves while the updraught lifts the particles across the cloud base causing the largest aerosol to be activated and to form cloud droplets. Panel (c) shows a distinctly bimodal wet spectrum with an unactivated aerosol mode to the left and the cloud droplet mode just below 10 µm. Panel (a) depicting the near-cloud-top conditions shows that some of the cloud droplets had already grown pass the 25 µm threshold, likely through collisional growth. Such drops have significant fall velocities what causes the air in the upper part of the domain to become void of largest aerosol what is evident from the shape of the dry spectrum in panel (b) depicting conditions above the downdraught. Panel (d) and panel (c) show size spectra at the same altitude of about 100 m above cloud base. Their comparison reveals that the spectrum of cloud droplets in the downdraught (panel d, edge of the cloud hole) is much wider than near the updraught axis (panel c). Finally, panels (f), (h) and (j) show gradual evaporation of drizzle and cloud droplets back to aerosol-sized particles.

Performance evaluation
Computational cost of a microphysics scheme is one of the key factors determining its practical applicability. Here, we present a basic analysis of the computational cost of the three schemes presented in this paper. The analysis is based on timing of simulations carried out with the kinematic framework and the simulation set-up described in section 2.1 using the icicle tool described in appendix C. In order to depict the contributions of individual elements of the schemes, all simulations were repeated with four sets of process-toggling options: • advection only, • advection and phase changes, • advection, phase changes and coalescence, • all above plus sedimentation.
For the particle-based scheme, the advection-only runs include transport of particles and the Eulerian fields (moisture and heat).
Simulations were performed with a 6-core AMD Phenom II CPU and a 96-core nVidia Quadro 600 GPU (an example 2010 prosumer desktop computer). The CPU code was compiled using GCC 4.8 with -Ofast, -march=native and -DNDEBUG options enabled. The GPU code was compiled with nvcc 5.5 with -arch=sm 20 and -DNDEBUG options enabled. No data output was performed.
In order to eliminate from the reported values the time spent on simulation startup, all simulations were repeated twice, performing a few timesteps in the first run and a dozen timesteps in the second run. The long and short run times are then subtracted and the result is normalised by the difference in number of timesteps.
In order to reduce the chance of an influence of other processes on the wall-clock timing, all simulations were additionally thrice-repeated and the shortest measured time is reported.
The particle-based simulations were performed with three different mean densities of computation particles, 8, 32 and 128 per grid cell, and with four "backend" settings: • serial backend, • OpenMP backend using 2 threads, • OpenMP backend using 4 threads, • CUDA backend using the GPU.  The test was completed for single-precision arithmetics. The GPU used offered about three times higher performance at single precision. Higherperformance GPU hardware typically applied in computing centres is expected to deliver similar performance for double precision. Execution times for CPU-only calculations hardly change when switching from double to single precision. Figure 10 presents measured wall-clock times for the four sets of processes (bottom x-axis labels) and for all three schemes (different colours and symbols). The figure reveals the significant spread of times needed to compute a single timestep -spanning over three orders of magnitude. For simulations with all processes turned on, it takes the double-moment scheme roughly twice longer than the single-moment scheme to advance the solution by one timestep. The particle-based scheme may be anything from about ten-to over hundred-times more costly than the double-moment bulk scheme depending on its settings. Figure 10 also shows how the execution time of the particle-based scheme depends on the backend choice and on the number of computational particles used. The execution time is also dependent on the number of subtimesteps used for phase changes and coalescence (not shown, the default of 10 subtimesteps per one advective step was used here). It is also evident in Figure 10 that computations of phase changes for particle-based simulations take most of the simulation time. The code responsible for the iterative implicit solution of the drop-growth equation is thus the first candidate for optimisation (e.g., through employment of a faster-converging root-finding algorithm).
Arguably, the most striking feature depicted in Figure 10 is the order-of-magnitude speedup between serial execution times for CPU and the GPU execution times. Even compared to the four-thread OpenMP runs, the GPU backend offers a threefold speedup. It is worth reiterating here the two reasons why the particle-based scheme is particularly well-suited for GPUs. First, the large body of data defining the state of all particles never leaves the GPU memory (and the GPU-CPU transfer bandwidth is often a major issue for the performance of GPU codes). Here, all data that are transferred from the GPU are first gridded onto the Eulerian mesh before being sent from GPU to the main memory. Second, a significant part of the computations (i.e. everything but phase changes) may be computed asynchronously, leaving all but one CPU available for other tasks of the solver (one thread is busy controlling the GPU).
Finally, Figure 10 also depicts the linear scaling of the computational cost of the particle-based method with the number of computational particles (cf. section 5). Regardless of the backend choice, in-creasing the mean number of particles per cell from 8 to 32 to 128 gives a linear increase of wall-time as seen in the logarithmic scale of the plot.
The library is still at its initial stage of development, and ongoing work on its code is expected to result in shorter execution times.

Summary and outlook
The main goal behind the ongoing development of libcloudph++, as stated in Section 1, has been to develop and to offer the community a set of reusable software components of applicability in modern cloud modelling. Incorporation of the doublemoment bulk and the particle-based schemes suitable for studies on the interactions between clouds and aerosol makes the library applicable for research on the widely discussed indirect effects of aerosol on climate.
The implementation of the library was carried out having maintainability and auditability as priorities. This is reflected in: (i) the choice of C++ with its concise and modularity-encouraging syntax 5 ; (ii) the separation of code elements related to the schemes' formulation (formulae) from other elements of the library (API, numerics); (iii) the adoption of compile-time dimensional analysis for all physically-meaningful expressions in the code; (iv) the delegation of substantial part of the library implementation to external libraries (including the dimensional analysis, algorithm parallelisation and GPU hardware handling); (v) the hosting of library development and handling of code dissemination through a public code repository.
All above, supported by the choice of the GNU General Public License, underpins our goal of offering reusable code.
Development plans for the upcoming releases of libcloudph++ include: (i) enriching the set of usage examples; 5 As of current release, libcloudph++ consists of ca. 70 files with a total of ca. 5500 lines of code (LOC) of which ca. 1000 LOC are common to all schemes, and ca. 500, 1000 and 4500 LOC are pertaining to the single-moment, double-moment and particle-based schemes, respectively.
(ii) widening the choice of available schemes, also by integration of third-party codes; (iii) extending the spectrum of processes in the particle-based scheme (notably, by covering basic aqueous chemistry, droplet breakup and implementing more realistic collision kernels).
Python bindings to libcloudph++ offering analogous functionality as the original C++ interface are already included in the library code. Their description along with reports on further developments will be reported in forthcoming communications.

A Common concepts and nomenclature
This section presents some key elements of a mostly standard approach to analytic description of motion of moist air, particularly in context of modelling warm-rain processes. It is given for the sake of completeness of the formulation and to ease referencing particular equations from within the text and the source code.

Governing equations
There are three key types of matter considered in the model formulation and their densities ρ i and mass mixing ratios r i are defined as follows: The governing equations are the continuity equation for dry air, a conservation law for water vapour, and the thermodynamic equation (see e.g. Vallis, 2006, Sec. 1.6): where s andq represent entropy and heat sources, respectively (both defined per unit mass of dry air). The dot notation is used to distinguish variations due to transport and due to thermodynamic processes. It is assumed already in (32) that the presence of moisture and its transformations through phase changes do not influence the density of dry air. Dryair flow is assumed to act as a carrier flow for trace constituents. This assumption is corroborated by the fact that in the Earth's atmosphere 1 r v > r l .

System of transport equations
Equations (33) and (34) may be conveniently expressed as a pair of transport equations of a similar form to (32). A continuity equation for water vapour density ρ v is obtained by summing (33) · ρ d + r v · (32): Combining equation (34) with the definition of potential temperature θ : gives: At this point no assumption is made on the exact form of θ or c p . Summing (32) · θ c p + (37) · ρ d and ρ d θ · D Dt c p = ρ d θ ċ p results in a continuity equation for ρ d c p θ (akin to energy density): ∂ t (ρ d c p θ ) + ∇ · ( uρ d c p θ ) = ρ d θ ċ p +q/T (38) Resultant equations (35) and (38) share the form of a generalised transport equation (see Smolarkiewicz, 2006, sec. 4.1): representing transport of a quantity φ (equal to r v or c p θ ) by a dry-air carrier flow.

Dry air potential temperature
The way the potential temperature was defined in the preceding section gives a degree of freedom in the choice of θ andq. For moist air containing suspended water aerosol, assuming thermodynamic equilibrium and neglecting the expansion work of liquid water, ds may be expressed as (eq. 6.10-6.11 in Curry and Webster, 1999): where p d = ρ d R d T is the partial pressure of dry air, and the potential temperature θ is defined here as: (p 1000 = 1000hP a, note that the definition features the dry air pressure as opposed to the total pressure, see e.g. Bryan, 2008;Duarte et al., 2014). Substituting c p = c pd = const and θ = θ into equation (38) and employing the form ofq hinted with −dq in equation (40) gives: Neglecting of all but the l vṙ terms on the right-hand side results in an approximation akin to the one employed in Grabowski and Smolarkiewicz (1996) and used herein as well.
Another common choice of θ andq is obtained by putting θ = θ · exp ( −rvlv /c pd T ), what results in the l v dr v term becoming a part of c pd d(lnθ ) instead of −dq in equation 40 (see e.g. Grabowski and Smolarkiewicz, 1990, sect. 3).

Diagnosing T and p from state variables
The principal role of any cloud-microphysics scheme is to close the equation system defined by (35) and (42) with a definition ofṙ v linked with a representation of liquid water within the model domain. This requires representation of various thermodynamic processes that depend on temperature and pressure which are diagnosed from the model state variables (i.e. the quantities for which the transport equations are solved). With the approach outlined above, the model state variables are: r v water vapour mixing ratio θ potential temperature Assuming ρ d is known (solved by a dynamical core of a model), temperature and pressure may be diagnosed from r v and θ with:

B List of symbols
A list of symbols is provided in Table 2. [kg m -3 s -1 ] evaporation rate of rain (single-moment scheme) E(ri, rj) [1] collection efficiency Fin, Fout [kg m -3 s -1 ] fluxes of ρr through the grid cell edges K, K0 [J m -1 s -1 K -1 ] thermal conductivities of air K(ri, rj) [m 3 s -1 ] collection kernel lv0 = 2.5 × 10 6 [J kg -1 ] latent heat of evaporation at the triple point lv(T ) = lv0 + (cpv − c l ) · (T − T0) [J kg -1 ] latent heat of evaporation at a given temperature M [k] [m -3+k ] k-th moment of size spectrum The example simulations discussed in the text were performed with icicle -an implementation of all elements of the example modelling context presented in Section 2, that is: transport equation solver, 2D kinematic framework and simulation set-up.

Dependencies
The code of icicle depend on several libraries: libcloudph++'s sister project libmpdata++ (Jaruga et al., 2014), libcloudph++ itself and several components of the Boost 6 collection. The libmpdata++ components solve the transport equations for the Eulerian fields using the MPDATA algorithm (Smolarkiewicz, 2006) and provide data output facility using the HDF5 library 7 . Figure 11 presents dependency tree of icicle. Source code of icicle, libm-pdata++ and libcloudph++ is available for download at http://foss.igf.fuw.edu.pl/. All other icicle dependencies are available, for instance, as Debian 8 packages. All icicle dependencies are free (gratis) software, and all but CUDA (which is an optional dependency) are additionally libre -open sourced, and released under freedom-ensuring licenses.

Usage
Control over simulation options of icicle is available via command-line parameters. Most of the options correspond to the fields of the opts t structures of the three microphysics schemes discussed in the paper. A list of general options may be obtained by calling $ icicle --help and includes, in particular, the --micro option that selects the microphysics scheme. Options specific to each of the three available schemes are listed as in the following example: $ icicle --micro=lgrngn --help For the particle-based scheme, the options include such settings as the backend type (serial, OpenMP or CUDA) and the size ranges for which to output the moments of the particle size distribution.
Simulations may be stopped at any time by sending the process a SIGTERM or SIGINT signal (e.g., using the kill utility or with Ctrl+C). It causes the solver to continue integration up to the end of the Thrust CUDA or OpenMP Figure 11: A tree of libcloudph++'s and icicle's major dependencies. In addition to these libraries, several components require C++11 compiler and CMake at build time. current timestep, close the output file and exit. After executing the simulation, its progress may be monitored for example with top -H as the process threads' names are continuously updated with the percentage of work completed.