Interactive comment on “ MicroHH 1 . 0 : a computational fluid dynamics code for direct numerical simulation and large-eddy simulation of atmospheric boundary layer flows ” by Chiel C

The manuscript describes an anelastic atmospheric finite difference model, called MicroHH 1.0, with extensions to include buoyancy-driven turbulence. The code is open source, built on a C++ library, and uses MPI and CUDA. Some validation results are provided for doubly-periodic domains, with and without the moist turbulence model. Numerical convergence demonstrates 2nd-4th-order accuracy for the dry model, and results match idealized convective turbulence tests. Overall the manuscript is clearly written, reasonably comprehensive, and establishes a capability that can be used for


Introduction
In this paper, we present a description of MicroHH 1.0, a new computational fluid dynamics code for the simulation of turbulent flows in doubly periodic domains, with a focus on those in the atmosphere.MicroHH is designed for the direct numerical simulation (DNS) technique but also supports the large-eddy simulation (LES) technique.Its applications range from neutral channel flows to cloudy atmospheric boundary layers in large domains.MicroHH is written in C++ and the graphical processing unit (GPU)-enabled parts of the code in NVIDIA's CUDA.The simulation algorithms have been designed and are written from scratch with the goal to create a fast and highly parallel code that is able to run on machines with more than 10 000 cores.It is a key requirement for the code to be able to perform DNS at very high Reynolds numbers or to conduct LES at very fine grids (grid spacing less than 1 m), or in domains that approach the synoptic scales (beyond 1000 km).We decided to start from scratch, in order to be able to use C++ and its extensive possibilities in object-oriented and metaprogramming.Furthermore, the implementation of a dynamical core that is fully fourth order in space, which is very beneficial for DNS, but to retain the option to switch to second-order accuracy for LES, required a new code design.
Even though we started from scratch, many of the ideas are the results of our experiences with other codes.Here, DALES Published by Copernicus Publications on behalf of the European Geosciences Union.(Heus et al., 2010), UCLA-LES (Stevens et al., 2005), and PALM (Maronga et al., 2015), deserve a reference as Mi-croHH could not have been possible without those.
This paper is built up as follows: in Sect.2, we provide a full description of the governing equations of the dynamical core, and their numerical implementation is discussed in Sect.3. Subsequently, in Sect.4, we present the parameterizations and their underlying assumptions.Section 5 discusses the technical details of the code, and Sects.6 and 7 explain how to run the model and which output is generated.This is followed by a series of model tests on the validity and accuracy of the dynamical core in Sect.8, and a series of more applied atmospheric flow cases based on previous studies (Sect.9).Hereafter, the parallel performance is evaluated (Sect.10).Then, an overview of published work with MicroHH is presented (Sect.11), followed by the future plans (Sect.12) and the concluding remarks (Sect.13).Finally, there is a short description of where to get MicroHH, and where to find its tutorials and a selection of visualizations (see code availability section).

Dynamical core: governing equations
The dynamical core of MicroHH solves the conservation equations of mass, momentum, and energy under the anelastic approximation (Bannon, 1996).Under this approximation, the state variables density, pressure, and temperature are described as small fluctuations (denoted with a prime in this paper) from corresponding vertical reference profiles (denoted with subscript zero) that are functions of height only.This form of the approximation directly simplifies to the Boussinesq approximation if the reference density ρ 0 (z) is taken to be constant with height z.Consequently, MicroHH does not need separate implementations of Boussinesq and anelastic approximations.To facilitate the subsequent discussion of the conservation equations, we define the scale height for density H ρ based on the reference density profile: . (1)

Conservation of mass
The conservation of mass is formulated using Einstein summation as where u i represents the components of the velocity vector (u, v, w) and x i represents the components of the position vector (x, y, z).This formulation conserves the reference mass, as density perturbations are ignored in the equation (Lilly, 1996).

Thermodynamic relations and conservation of momentum
The thermodynamic relation between the fluctuations of virtual potential temperature, pressure, and density under the anelastic approximation is (see Bannon, 1996 for its derivation) where θ v is the perturbation virtual potential temperature, θ v0 the reference virtual potential temperature, p is the perturbation pressure, g is the gravity acceleration, and ρ is the perturbation density.
The corresponding momentum equation is written in the flux form in order to assure momentum conservation.The hydrostatic balance dp 0 /dz = −ρ 0 g has been subtracted, and Eq. ( 4) has been used to introduce potential temperature as the buoyancy variable to formulate the conservation of momentum as where δ is the Kronecker delta, ν is the kinematic viscosity, and vector F i represents external forces resulting from parameterizations or large-scale forcings.As Bannon (1996) showed, this formulation is energy-conserving in the sense that there is a consistent transfer between kinetic and potential energy.Under the Boussinesq approximation, the two equations simplify to

Pressure equation
The equation to acquire the pressure is diagnostic because density fluctuations are neglected in the mass conservation equation under the anelastic approximation (Eq.2).To simplify the notation, we define a function f (u i ) that contains all right-hand-side terms of Eq. ( 5), except the pressure gradient.To arrive at the equation that allows us to solve for the pressure, we multiply the equation with the base density ρ 0 and take its divergence.Conservation of mass ensures that the tendency term vanishes, and an elliptic equation for pressure remains: Under the Boussinesq approximation, the equation simplifies to In Sect.3, we explain how these equations are solved numerically.

Conservation of an arbitrary scalar
The conservation equation of an arbitrary scalar φ is written in flux form: where κ φ is the diffusivity of the scalar and S φ represents sources and sinks of the variable.

Conservation of energy
MicroHH provides multiple options for the energy conservation equation.The conservation equation for potential temperature for dry dynamics θ can be written as where κ θ is the thermal diffusivity for heat, and Q represents external sources and sinks of heat.A second option for moist dynamics is available.This has an identical conservation equation, but with liquid water potential temperature θ l rather than θ as the conserved variable (see Sect. 3.9 for details).
A third, more simplified mode, is available for dry dynamics under the Boussinesq approximation.Here, the equation of state (Eq.6) can be eliminated and the conservation of momentum and energy can be written in terms of buoyancy b ≡ (g/θ v0 ) θ v as with κ b as the diffusivity for buoyancy and Q b as an external buoyancy source.By using buoyancy, length and time remain as the only two dimensions, which proves convenient for dimensional analysis.In this formulation, θ v is the fluctuation of the virtual potential temperature with respect to the surface value θ v0 .The consequence is that the buoyancy increases with height in a stratified atmosphere, analogously to the virtual potential temperature (see Garcia andMellado, 2014, their Fig. B1 andvan Heerwaarden andMellado, 2016, their Fig.7a) With a slight modification to the definition of θ v , it is possible to study slope flows in periodic domains.We define θ v as the fluctuation with respect to a linearly stratified background profile θ v0 + (dθ v /dz) 0 z.The background stratification in units of buoyancy is N 2 ≡ (g/θ v0 ) (dθ v /dz) 0 .If we work out the governing equations again and introduce a slope α (x axis pointing upslope; see Fedorovich and Shapiro, 2009, their Fig. 1) in the x direction, we find where the evolution equation of v is omitted, as it contains no changes.

Grid
MicroHH is discretized on a staggered Arakawa C-grid, where the scalars are located in the center of a grid cell and the three velocity components are at the faces.The code can work with stretched grids in the vertical dimension.The grid is initialized from a vertical profile that contains the heights of the cell centers.The locations of the faces are determined consistently with the spatial order of the interpolations that are described in Sect.3.4.All spatial operators in the model, such as the advection and diffusion, default to the same order as the grid and can be overridden according to the user's wishes (see Sect. 6).
There is the option to apply a uniform translation velocity to the grid and thus to let the grid move with the flow.This so-called Galilean transformation is allowed as the Navier-Stokes equations are invariant under translation.It has the potential to allow for larger time steps and to increase the accuracy of simulations.

Three-dimensional fields
In order to solve the governing equations, MicroHH generates at initialization three-dimensional fields of the prognostic variables.These are the three velocity components (Eqs.5 or 7) and the thermodynamic variables (Eqs. 11,13,or 16).Furthermore, the user has the option to define additional passive scalars (Eq.10).Each of the prognostic fields has an additional three-dimensional field assigned to store its tendency (see Sect. 3.3).Furthermore, a diagnostic field is assigned for the pressure, as well as three or four additional ones for intermediate computations.Newly implemented physical parameterizations have the option to request additional threedimensional fields at initialization of the specific parameterization.
The generation of turbulence requires perturbations to the initial fields.MicroHH has two options to superimpose perturbations on any of the prognostic variables.These perturbations can be random noise of which the amplitude and location can be controlled, as well as two-dimensional rotating vortices with an axis aligned with the x or y dimension.The former option is the most commonly used method to start convective turbulence, whereas the latter is the default for neutral or stably stratified flows, which develop turbulence more easily from larger perturbations.

Time integration
The prognostic equations are solved using low-storage Runge-Kutta time integration schemes.Such schemes require two fields per variable: one that contains the actual value, which we denote with φ in this section, and one that represents the tendencies, denoted with δφ.The code provides two options: a three-stage, third-order scheme (Williamson, 1980) and a five-stage, fourth-order scheme (Carpenter and Kennedy, 1994).Both can be written in the same generic form in semi-discrete formulation as where f is a function that represents the computation of all right-hand-side terms, a n and b n are the coefficients for the Runge-Kutta method at stage n, and t is the time step.Expression f (φ n ) thus represents the actual tendency calculated using, for instance, Eqs. ( 5) or (10), whereas (δφ) n is a composite of the actual tendency and those from the previous stages.In low-storage form, the tendencies of the previous stage (δφ) n−1 are retained and multiplied with a n at the beginning of a stage, except for the first stage, where a 1 = 0.For the third-order scheme, the vectors a n and b n are For the fourth-order scheme, the vectors a and b are The reduced truncation error of the fourth-order scheme makes the scheme preferable over the third-order scheme under many conditions (see Sect. 8.2).The code can be run with a fixed t, as well as an adaptive time step based on the local flow velocities.

Building blocks of the spatial discretization
The spatial operators are based on finite differences.The code supports second-order and fourth-order accurate discretizations following Morinishi et al. (1998) andVasilyev (2000).From Taylor series, spatial operators can be derived that constitute the building blocks of more advanced operators, such as the advection and diffusion operators.In the following subsections, we describe the elementary operators and the composite operators that can be derived from them.We have selected a set of examples that cover the relevant operators.
We define two second-order interpolation operators, one with a small stencil and one with a wide stencil, as Interpolations are marked with a bar.The superscript indicates the spatial order (2) and the direction (x), and has an extra qualifier L when it is taken using the wide stencil.The subscript indicates the position on the grid (i, j ).The gradient operators, denoted with letter δ, are defined in a similar way: The fourth-order operators, written down in the same notation, are defined as The biased version of this operator (suffix b in the superscript) can be applied in the vicinity of the boundaries at the bottom and top.Here, we show the biased stencil that can be applied for vertical interpolation near the bottom: Note that we only write down the bottom boundary for brevity.
The centered and biased fourth-order gradient operators are and

Boundary conditions
The lateral boundaries in MicroHH are periodic.The bottom and top boundary conditions can be formulated in their most general form as the Robin boundary condition: MicroHH makes use of ghost cells in order to avoid the need of biased schemes for single interpolation or gradient operators near the wall.The values at the ghost cells are derived making use of the boundary conditions following Morinishi et al. (1998).The ghost cells for the Dirichlet boundary conditions in the second-order accurate discretization are whereas those for the Neumann boundary condition are In the case of the fourth-order scheme, we have two ghost cells, and therefore a second boundary condition is required.
Here, we set the third derivative equal to zero following Morinishi et al. (1998).For the Dirichlet boundary condition we then acquire the following expressions for the ghost cells: whereas in the case of a Neumann boundary condition, we find φ

Advection
We use the previously introduced notation to describe the more complex operators and expand them for illustration.The advection term is discretized in the flux form, where φ is an arbitrary scalar located in the center of the grid cell.In the second-order case, this gives the following discretization: The discretization of the advection of the velocity components (see Eqs. 5 and 7) involves extra interpolations as the following example illustrates: In the standard fourth-order scheme, the scalar advection in flux form is represented by Hereafter, we assume that operator notation is clear and only expand it where necessary.
Velocity interpolations, such as those in Eq. ( 39), still need to be performed with fourth-order accuracy (Eq.27) in order to be fourth-order accurate (see Morinishi et al., 1998 for details).The expression includes, for instance, a combination of second-and fourthorder interpolations.
To increase the overall accuracy of the second-order advection operator, there is an option available to only increase the interpolation part to fourth order:

Diffusion
We apply a discretization for diffusion that can be written as the divergence of a gradient, using the building blocks defined earlier in this section.As this operator is identical in all directions, we present it in one direction only: On an equidistant grid, this provides the well-known secondorder accurate operator for the second derivative: where x is the uniform grid spacing.
For the fourth-order accurate operator, a seven-point stencil is used: Whereas diffusion can be computed with fourth-order accuracy using a five-point stencil, we use a seven-point stencil, as it extends naturally to non-uniform grids as explained in Castillo et al. (1995).The usage of a seven-point stencil requires special care near the walls.In Fig. 1, we show an example of how the second derivative in the vertical direction is computed for a scalar at the first model level (green node in Fig. 1).The calculation of the divergence (Fig. 1, red stencil) requires the gradient located at the first face below the wall (lowest red node in Fig. 1), which can only be acquired using the biased gradient operator (Eq.30 and yellow stencil connected to lowest red node in Fig. 1).The extent of the complete stencil near the wall (white nodes; Fig. 1) is thus six points, rather than seven.

Pressure
Equations ( 8) and ( 9) are solved following the method of Chorin (1968).This is a fractional step method that first computes intermediate values of the velocity components for the next time step, based on all right-hand-side terms of the momentum conservation equation (Eq.5): with the intermediate velocity components denoted with an asterisk.
The velocity values at the next time step can be computed as soon as the pressure is known, using In order to compute the pressure, we multiply the previous equation with the reference density and take its gradient, ar-riving at where n indicates the spatial order, and the subscript i in superscript x i indicates that δ nx i is a divergence operator.The left-hand side equals zero due to mass conservation at the next time step (Eq.2).The resulting equation is the Poisson equation that is the discrete equivalent of Eq. ( 8).Rewriting this equation leads to To simplify the notation, we denote the left-hand-side term as ψ and the p/ρ 0 term on the right-hand side as π .Solving a Poisson equation is a global operation.Because the computed fields are periodic in the horizontal directions on an equidistant grid, and a Poisson equation is linear, we can perform a Fourier transform in the two horizontal directions: where Fourier-transformed variables are denoted with a hat, the spatial order of the operation with n, and the wavenumbers in the two horizontal dimensions x and y are l and m, respectively.Variables k2 * and l 2 * are the squares of the modified wavenumbers: and where the former is the modified wavenumber for the secondorder accurate solver and the latter is the wavenumber for the fourth-order one.Note that the coefficients correspond to those in Eqs. ( 46) and ( 47).Both expressions satisfy the limit lim x→0 k 2 * n = k 2 , where n is the order of the scheme.Solving Eq. ( 52) for π requires solving a banded matrix for the vertical direction in which the walls are located.This matrix is tridiagonal for the second-order solver and heptadiagonal for the fourth-order solver.For this, a standard Thomas algorithm (Thomas, 1949) is used.After the pressure is acquired, inverse Fourier transforms are applied and subsequently the pressure gradient term (see Eqs. 5 and 7) is computed for all three components of the velocity tendency.Note that the computation of the corrected velocity components does not require a boundary condition for pressure (see Vreman, 2014 for details).

Thermodynamics
MicroHH supports the potential (θ ) and liquid water potential (θ l ) temperature as thermodynamic variables (Sect.2.5).The dry (θ ) and moist (θ l ) thermodynamics are related through the use of a total specific humidity q t , which is defined as the sum of the water vapor specific humidity (q v ) and the cloud liquid water specific humidity (q l ).In the absence of liquid water, θ l = θ ; in the presence of liquid water, the liquid water potential temperature is approximated as (Betts, 1973) where L v is the latent heat of vaporization, c p the specific heat of dry air at constant pressure, and is the Exner function: where p is the hydrostatic pressure, p 00 a constant reference pressure, and R d the gas constant for dry air.The cloud liquid water content is calculated as where q s is the saturation specific humidity: with the ratio between the gas constant for dry air and the gas constant for water vapor (R d /R v ), and e s the saturation vapor pressure.The latter is approximated using a 10th order Taylor expansion at T = 0 • C of the Arden Buck equation (Buck, 1981).q l is adjusted iteratively to arrive at a consistent state where q v = q s .Finally, the virtual potential temperature (Eq.5) is defined in MicroHH as The base state pressure and density are calculated assuming a hydrostatic equilibrium: dp 0 = −ρ 0 gdz, with the density defined as ρ 0 = p 0 /(R d θ v0 ).Integration with height results in where θ v0 is the average virtual potential temperature between z k and z k+1 .This equation is applied from a given surface pressure to the model top, alternating the calculations at the full and half model levels.That is, given the full thermodynamic state (pressure and density) at a full level k, the thermodynamic state can be advanced from the half level k − 1 to k + 1 2 .Using the newly calculated state at k + 1 2 , pressure and density at k + 1 can be calculated.
The base state density ρ 0 that is used in the dynamical core (Sect.2) is calculated using the initial virtual potential temperature profile and is not updated during the experiment.The density and hydrostatic pressure used in the moist thermodynamics can optionally be updated every time step, following the same procedure as explained in Boing (2014).

Rotation
The effects of a rotating reference frame on an f plane can be included through the Coriolis force.The acceleration due to the Coriolis force F i,cor is computed for the two horizontal velocity components (indices 1 and 2 in Eqs. 5 and 7) as with f 0 as Coriolis parameter specified by the user.

Subfilter-scale model for large-eddy simulation
With the governing equations described in Sect. 2 it is possible to resolve the flow down to the scales where molecular viscosity acts.In many applications, however, such simulations are too costly.In that case, one may opt for large-eddy simulation (LES), where filtered equations are used to describe the largest scales of the flow, and the subfilter-scale motions are modeled.The LES implementation in MicroHH assumes very high Reynolds numbers in which the molecular viscosity is neglected.Filtering of the anelastic conservation of momentum equation (Eq.5), with a tilde applied to denote filtered variables, leads to In this equation, a tensor τ ij is defined as This is the anisotropic subfilter-scale kinematic momentum flux tensor.The isotropic part of the full momentum flux tensor has been added to the pressure, providing the modified pressure: As τ ij contains the filtered product of unfiltered velocity components, this quantity needs to be parameterized.Mi-croHH uses the Smagorinsky-Lilly (Lilly, 1968) model, in which τ ij is modeled as with K m interpreted as the subfilter eddy diffusivity.This quantity is modeled as and is proportional to the magnitude S ≡ 2S ij S ij 1 2 of the strain tensor S ij , which is defined as The subfilter eddy diffusivity thus takes into account the local stratification and the turbulent Prandtl number Pr t .The latter is set to one-third by default but can be overridden in the settings.The length scale λ is the mixing length defined following Mason and Thomson (1992), as 1 which is an arbitrary matching function (n is a free parameter, set to 2 in MicroHH) between the mixing length following wall scaling to the subfilter length scale (filter size) ≡ ( x y z) 1/3 , related to the grid spacing.The grid scale is used as an implicit filter; thus, no explicit filtering is applied.In the case of a high Reynolds number atmospheric LES with an unresolved near-wall flow, the vertical gradients of the horizontal velocity components ∂ u i,j /∂z in the strain tensor are replaced with the theoretical gradients predicted from Monin-Obukhov similarity theory.Evaluation of these gradients is explained in detail in Sect.4.2.
The same approach is followed for all scalars, including the thermodynamic variables discussed in Sect.2.5: The term R φ,j refers to the subfilter flux of φ and is defined as The subfilter-scale flux is parameterized in terms of the gradient Geosci.Model Dev., 10, 3145-3165, 2017 www.geosci-model-dev.net/10/3145/2017/

Surface model
The LES implementation of MicroHH uses a surface model that is constrained to rough surfaces and high Reynolds numbers, which is a typical configuration for atmospheric flows.This model computes the surface fluxes of the horizontal momentum components and the scalars (including thermodynamic variables) using Monin-Obukhov similarity theory (MOST) (Wyngaard, 2010, his Sect. 10.2).MOST relates surface fluxes of variables to their near-surface gradients using empirical functions that depend on the height of the first model level z 1 divided by the Obukhov length L as an argument.Length L is defined as where u * is the friction velocity, κ is the Von Karman constant, and B 0 is the surface kinematic buoyancy flux.L represents the height at which the buoyancy production/destruction of turbulence kinetic energy equals the shear production.In MicroHH, we use a local implementation of MOST, i.e., each grid point has its own value of L. This choice can lead to a overestimation of near-surface wind due to violation of the MOST assumption of horizontal homogeneity (Bou-Zeid et al., 2005, their Fig.18), but it allows for a more straightforward extension to heterogeneous land surfaces.
Following MOST, the friction velocity u * and the momentum fluxes may be related to the near-surface wind gradient as where U is defined as √ u 2 + v 2 , and u w and v w as the surface momentum fluxes for the two wind components.These relationships can be integrated from the roughness length z 0m to z 1 resulting in with f m defined as with m described in Eqs. ( 83) and ( 85).
The same procedure for scalars is followed, with and in integrated form: with with h described in Eqs. ( 83) and ( 85).
The functions φ m , φ h , m , and h are empirical and depend on the static stability of the atmosphere.Under unstable conditions, we follow (Wilson, 2001;Wyngaard, 2010): where ζ is the ratio of a height and the Obukhov length L, γ m = 3.6, and γ h = 7.9.Under stable conditions, we use (Högström, 1988;Wyngaard, 2010) where λ m = 4.8 and λ h = 7.8.With the equations above, the surface fluxes, surface values, and near-surface gradients can be computed but only if the Obukhov length L is known.The surface model calculates the Obukhov length by relating the dimensionless parameter z 1 /L to a Richardson number.The employed formulation of the Richardson number depends on the chosen boundary condition in the model.Three possible options are available: -The first option is fixed momentum fluxes and a fixed surface buoyancy flux.Both the friction velocity u * and the surface buoyancy flux B 0 are specified.Under these conditions, we define the Richardson number Ri a equal to z 1 /L; L can be computed directly from the expression -The second option is a fixed horizontal velocity U 0 at the surface and a fixed surface buoyancy flux B 0 .The friction velocity u * is unknown.Now, L needs to be retrieved from the implicit relationship: the buoyancy are given, and both u * and the surface buoyancy flux B 0 are unknown.L is then retrieved from In the event of the latter two options, a solver is required to find the value of L that satisfies the equation, as f m (Eq.78) and f h (Eq.81) both depend on L as well.For performance reasons, we have created a lookup-table-based approach that relates L to the Richardson number.The lookup table has 10 4 entries, of which 90 % is spaced uniformly between z 1 /L = −5 to 5. The remaining 10 % are used to stretch the negative range up to z 1 /L = −10 4 to allow for the correct free convection limit.

Pressure force
MicroHH provides two options to introduce a large-scale pressure force into the model.The first is to enforce a constant mass flux through the domain in the streamwise direction.In this approach, the desired large-scale velocity U f is set, and the corresponding pressure gradient is computed.We follow here the approach of van Reeuwijk (2007).In this approach, the u component of the horizontal momentum equation (Eq.5) is volume averaged to acquire where brackets indicate a volume average, f 1 contains all the right-hand-side terms of the u component of the conservation of momentum, except for the dynamic pressure, which is contained in the second term.The large-scale pressure force F p;ls , which is to be computed, is the last term.We can now set u n+1 = U f to explicitly set the volume-averaged velocity in the next time step.Furthermore, the volume-averaged horizontal pressure gradient vanishes, because of the periodic boundary condition, which makes F p;ls the only unknown.The acquired pressure force F p;ls will be added as an external force in the equation of zonal velocity (F 1 in Eqs. 5 and 7).
The second option is to enforce a large-scale pressure force through the geostrophic wind components u g and v g , in combination with rotation, with the accelerations of the two horizontal velocity components F i,p;ls calculated as where u g;k and v g;k are user-specified vertical profiles of geostrophic wind components.

Large-scale sources and sinks
Large-scale sources and sinks, representing, for instance, large-scale advection or radiative cooling, can be prescribed for each variable separately.The user has to provide vertical profiles of large-scale sources and sinks S φ;ls that are added to the total tendencies.

Large-scale vertical velocity
A second method of introducing large-scale thermodynamic effects is through the inclusion of a large-scale vertical velocity.In this case, each scalar gets an additional source term S φ,w,ls of the form where w ls;k is a user-specified vertical profile of large-scale vertical velocity and φ k is the horizontally averaged vertical profile at height z k for scalar φ.The tendency term is not applied to the momentum variables.

Buffer layer
MicroHH has the option to damp gravity waves in the top of the simulation domain in a so-called buffer layer.The source term S φ,buf associated with the damping at grid cell i, j, k is calculated for an arbitrary variable φ as where φ 0 is taken from a user-specified vertical reference profile, and timescale τ d is a measure for the strength of the damping.It varies with height and is calculated at height z k following where σ is the damping frequency chosen by the user and β an exponent that describes the shape of the vertical profile of the damping frequency, which is always zero at the bottom (z b;bot ) and σ at the top (z b;top ) of the layer.
5 Technical details of the code

Code structure
MicroHH is written in C++ and uses elements of objectoriented programming and template metaprogramming.The code components are written in classes that define the interface.Inheritance is used to allow for specializations of classes.This way of organizing the code has two advantages: it minimizes switches and it allows code components and their extensions to reside in their own file, which increases code clarity and facilitates the merging of new code Geosci.Model Dev., 10, 3145-3165, 2017 www.geosci-model-dev.net/10/3145/2017/extensions.High performance of computational kernels is achieved by executing kernels in their own function, with explicit inclusions of the restrict keyword to notify the compiler that fields do not overlap in memory.Furthermore, compiler-specific pragmas are used to aid vectorization on Intel compilers.

GPU
MicroHH is enabled to run on fast and energy-efficient graphical processing units (GPUs).This promising technique has been pioneered in atmospheric flows by Schalkwijk et al. (2012) and has shown its potential in weather forecasting (Schalkwijk et al., 2015).The implementation is based on NVIDIA's CUDA.The CPU and GPU version reside in the same code base, where the GPU implementation is activated with the help of precompiler statements.The philosophy is that the entire model is initialized on the CPU and that the GPU implementation is only activated right before starting the main time loop.At that moment, the required fields are copied in double precision accuracy to the GPU, and the entire time integration is done there.Synchronization only happens when statistics have to be computed or when restart files or cross sections of flow fields are saved to disk, to ensure high performance.The design choice to do the entire initialization on the CPU minimizes the amount of GPU code and therefore allows for maintaining a single code base for the CPU and GPU code.

Parallelization
The code uses the Message Passing Interface (MPI) in order to run on a large number of cores.The three-dimensional simulation domain is split into vertically oriented columns standing on a two-dimensional grid.The code assigns one MPI task to each grid column using the MPI_Cart_create function and uses this grid to detect the IDs of neighboring processes.In order to avoid complex packing routines, we make use of MPI datatypes wherever possible.The MPI calls are hidden in an interface to avoid the need to manually write MPI calls.The input/output (IO) is entirely based on MPI-IO, the parallel IO framework that comes with MPI, to ensure that three-dimensional fields and two-dimensional cross sections are stored as single files.We have chosen MPI-IO in order to limit the number of files resulting from simulations on a large number of processes and to allow for restarts on a different number of processes.In order to keep complexity of the IO as low as possible, we make use of the MPI_Sub_array function in combination with MPI_File_write_all in order to write the fields.

External dependencies
MicroHH depends on several external software tools or libraries.It uses the CMake (https://cmake.org)build system for the generation of Makefiles.CMake allows for parallel builds, which minimizes the compilation time, and it is easy to add configurations for different machines.Furthermore, the FFTW3 library (Frigo and Johnson, 2005) is used for the computation of fast Fourier transforms.The statistical routines make use of netCDF software developed by UCAR/Unidata (http://doi.org/10.5065/D6H70CW6).In order to run the provided test cases and their output scripts, a Python (https://www.python.org)installation including the NumPy (van der Walt et al., 2011; http://www.numpy.org)and Matplotlib (Hunter, 2007; https://matplotlib.org)modules is required.Automatic documentation generation can be done using Doxygen (http://doxygen.org),but this is optional.

Running simulations
In order to run a simulation with MicroHH, a sequence of steps needs to be taken.Each simulation has an .inifile that contains the settings of the simulation, a .proffile that contains the (initial) vertical profiles of all required variables, and, if time-varying boundary conditions are desired, a file with the prescribed time evolution for all timevarying boundary conditions.MicroHH provides a document (doc/input.pdf) that contains an overview of all possible options that can be specified in the .inifile.
To prepare a simulation with the name test_simulation, MicroHH needs to be run in the following way: ./microhh init test_simulation where it is assumed that test_simulation.iniand test_simulation.profare available in the directory where the command is triggered.This procedure will create the initial fields of all prognostic variables and save the required fields for those model components that need to save their state to guarantee bitwise identical restarts.
After the previously described initialization phase, the execution of ./microhhrun test_simulation will start the actual simulation.Depending on the chosen output intervals, the simulation will create restart files, statistics, cross sections, and field dumps.MicroHH can restart from any time at which the restart files are saved.
The last mode in which the code can run is the postprocessing mode.By running ./microhhpost test_simulation the code will generate statistics from saved restart files at a specified time interval.This mode allows the user to create new statistics and calculate those from saved data without having to rerun the simulation.Figure 2. Convergence of the spatial discretization error in the twodimensional Taylor-Green vortex.Subscript 2 indicates the secondorder scheme, subscript 4 the most accurate fourth-order scheme, and subscript 4M the fully energy-conserving fourth-order scheme.The dashed black line is the reference for second-order convergence; the dotted black lines indicate fourth-order convergence.
7 Model output

Statistics
A large set of output statistics can be computed during runtime at user-specified time intervals.The statistics module provides vertical profiles of means, second-, third-and fourth-order moments of all prognostic variables, as well as advective and diffusive fluxes.Furthermore, there are multiple diagnostic variables, such as the pressure, the pressure variance, and its vertical flux.The thermodynamics generate their own statistics based on the chosen option.The moist thermodynamics provides a large set of cloud statistics.There is a separate module for budget statistics, which provides the budgets of all components of the Reynolds stress tensor, and those of the variance and vertical flux of the thermodynamic variables.
One of the key features of the MicroHH statistics routine is that an arbitrary mask can be passed into the routine over which the statistics are calculated.This allows, for instance, for a very simple way of computing conditional statistics in updrafts or clouds, which is demonstrated later in Sect.9.2.

Two-and three-dimensional output
It is possible to save two-dimensional cross sections and three-dimensional fields of any of the prognostic and diagnostic variables of the model, as well as of derived variables.This output can be made at user-specified time intervals.Cross sections can be made in any chosen xy, xz, and yz planes.Derived variables (any arbitrary function of existing model variables), can be easily added to the code by the user.

Validation of the dynamical core
In this section, we present a series of cases intended to validate MicroHH under a wide range of settings.Each of these test cases is available in the cases/ directory in the Mi-croHH repository, where all detailed settings can be found (see code availability section).Below, we present only the most relevant information per case.

Taylor-Green vortex
The two-dimensional Taylor-Green vortex (cases/taylorgreen) presents an ideal test case for a dynamical core, as it has an analytical solution even though it is nonlinear.This flow is composed of two rotating vortices whose evolution in a domain [0, 1; 0, 0.5] is described with where f (t) = 8π 2 νt.We use the analytical form at t = 0 as the initial condition and run this case for one vortex rotation (t = 1), with Geosci.Model Dev., 10, 2017 www.geosci-model-dev.net/10/3145/2017/−0.12 0.00 0.12 0.24 0.36 0.48 u [m s ν = (800π 2 ) −1 .We compare the result against the analytical solution for a set of grid spacings and with the second-order and fourth-order dynamical cores; for the latter we compare the most accurate advection scheme and the fully energyconserving one.
Figure 2 shows the error convergence of the simulations.The error for a variable φ is computed as x z φ i,k − φ ref,i,k , over the two-dimensional domain, where x and z are the uniform grid spacings used in this case and φ ref is the analytical solution.All variables converge according to the order of the numerical scheme.The fourthorder dynamical core loses accuracy at fine grid spacings.This is due to the boundary condition for the vertical velocity that sacrifices an order of accuracy to ensure global mass conservation (Morinishi et al., 1998).

Kinetic energy conservation and time accuracy
The second test of the dynamical core consists of combined evaluation of kinetic energy (KE ≡ 1 2 u 2 + v 2 + w 2 ) conservation and time accuracy (cases/conservation).In this experiment, we run the model with only the advection and pressure solver enabled and advect random noise through the domain for 1000 s.These tests have been conducted with the third-and fourth-order Runge-Kutta schemes.We have chosen the fourth-order spatial discretization in order to demonstrate its energy conservation.
The loss of kinetic energy as a function of time is shown in Fig. 3a.The fourth-order scheme results in a smaller energy loss for the same time step and a faster convergence.The error-convergence plot (Fig. 3b) shows that the error convergence is in accordance with the order of the respective scheme.Furthermore, it illustrates the fact that, if high accuracy in time is desired, the five-stage, fourth-order scheme is less expensive than the three-stage, third-order scheme.For instance, at a t of 10, the error of the fourth-order scheme is approximately equal to the error of the third-order scheme at a t of 2.5.To reach this accuracy, the fourth-order scheme needs only 5 steps per 10 time units, whereas the third-order scheme needs 12 steps.

Laminar anabatic flow
To test the buoyancy routine and the option to put the domain on a slope, a Prandtl-type anabatic slope flow (Prandtl, 1942) has been simulated (cases/prandtlslope).In this test case, the surface is inclined at an angle of 30 • and a linearly stratified atmosphere (N = 1 s −1 ) is heated from below with a fixed surface buoyancy flux of 0.005 m 2 s −3 .The fluid, which was initially at rest, goes through a series of decaying oscillations after the buoyancy flux is applied at the surface.Eventually, it reaches the steady state corresponding to the Prandtl model solution.Numerical integration was performed sufficiently long for the oscillation amplitude to become a small fraction of the amplitude of the first oscillation.Comparison of horizontal wind u and buoyancy b of analytical and numerical solutions is shown in Fig. 4. For both variables, the solutions closely agree with each other.
. Budgets of variances and turbulence kinetic energy (TKE, defined as u 2 + v 2 + w 2 /2) compared against Moser et al. (1999)'s reference data at a Reynolds τ of 590.Height z is normalized with u τ /ν; the variances and TKE budget with ν/u 4 τ .

Turbulent channel flow
For fully turbulent flows, the numerical solutions cannot be compared against any analytical test cases.Therefore, we validate our results against a channel flow at a Reynolds τ number of 590 (Moser et al., 1999) for means, variances, spectra, and second-order budget statistics (cases/moser590).The case is run at a resolution of 768 × 384 × 256 grid points.The original numerical simulation data of Moser et al. (1999) have been produced on a 384 × 384 × 256 grid with spectral schemes in the horizontal dimensions and Chebyshev polynomials in the vertical.
Figure 5a shows the normalized horizontally averaged streamwise velocity.The normalized rms values of all three velocity components are presented in Fig. 5b.All plotted variables show a perfect match with the data and are indistinguishable from Moser's data.In order to further assess the accuracy of the data, we show the second-order budgets of the variances in Fig. 6.Also here, the match with the reference data is excellent, which indicates that the whole range of spatial scales in the flow is represented well and that the fourth-order scheme is well able to pick up the small-scale details of the flow.
The findings in the previous paragraph are further corroborated by the spectra shown in Fig. 7.Over the whole range of scales, the match between our simulation and that of Moser et al. (1999) is excellent.Note that the spectra from Moser's simulation display an increase in pressure variance at the highest wavenumbers.This increase is the result of aliasing errors at high wavenumbers that are typical for the spectral schemes that Moser et al. (1999) used.

Turbulent katabatic flow
The final evaluation of the dynamical core without parameterizations enabled is based on the direct numerical simulation of a turbulent katabatic flow.Here, a buoyancy-driven slope flow is simulated following the setup of Fedorovich and Shapiro (2009) (cases/drycblslope).A flow over a slope inclined at an angle α of 60 • is simulated with a fixed uniform surface buoyancy flux of −0.5 m 2 s −3 .The simulation is performed in a domain of 0.64 m × 0.64 m × 1.6 m using a uniform grid of 256 × 256 × 640 points.The initial state is a fluid at rest with a linear buoyancy stratification N of 1 s −1 .No-slip boundary conditions are applied at the bottom; free-slip at the top.
Turbulent motion starts quickly after the buoyancy flux is applied at the surface.It is characterized by random, largeamplitude fluctuations of velocity and buoyancy fields in the near-slope core region and shows quasi-periodic oscillatory behavior at larger distances from the slope.Mean profiles of the along-slope velocity component and buoyancy, as well as profiles of second-order turbulence statistics, such as kinematic turbulent fluxes of momentum and buoyancy, and velocity component and buoyancy fluctuation variances, were evaluated by averaging the simulated flow fields spatially over the along-slope planes and temporally over five oscillation periods beyond the transition stage.For comparison, the same katabatic flow case was reproduced using the numerical code (hereafter referred to as FS09) that was employed to simulate turbulent slope flows in Shapiro andFedorovich (2008) andFedorovich andShapiro (2009).In that code, the time advancement was performed with an Asselin-filtered secondorder leapfrog scheme (Durran, 2013).The spatial discretizations are identical to the second-order accurate ones of Mi-croHH.
Numerical results obtained with both numerical codes testify that stable environmental stratification in combination with negative surface buoyancy forcing in the katabatic flow leads to an effective suppression of vertical turbulent exchange in the flow region adjacent to the slope.This suppression results in a shallow near-surface sublayer with strong buoyancy gradients (Fig. 8a) capped by a narrow jet with peak velocity located very close to the ground (Fig. 8b).Further comparison has been performed on the slope-normal fluxes of momentum and buoyancy (not shown), where a nearly perfect match has been reproduced as well.9 Validation of atmospheric large-eddy simulations

Dry convective boundary layer with strong inversion
As a first test case of MicroHH in LES mode, we present that of Sullivan andPatton (2011) (cases/sullivan2011).This is a dry clear convective boundary layer that grows into a linearly stratified atmosphere with a very strong capping inversion (see Fig. 9a).The system is heated from the bottom by applying a constant kinematic heat flux of 0.24 K m s −1 .The domain size is 5120 m × 5120 m × 2048 m.Gravity wave damping has been applied in the top 25 % of the domain.Simulations have been run for 3 h with three spatial resolutions.The time-averaged profiles have been calculated over the last hour.
The results show the formation of a well-mixed layer with an overlying capping inversion (see Fig. 9a) and the associated linear heat flux profile with negative flux values in the entrainment zone (see Fig. 9b).The change of both quantities with resolution highlights the intrinsic challenge in resolving a boundary layer with an inversion layer that is stronger than the numerical schemes can accurately resolve.At coarse resolution, the strong inversion leads to an unphysical overshoot of the potential temperature flux above the boundary layer top (see Fig. 9b).This overshoot leads to a numerical mixed layer on top of the entrainment zone that vanishes with increasing resolution.

BOMEX
The BOMEX shallow cumulus case (Siebesma et al., 2003) (cases/bomex), S03 hereafter, provides the opportunity to evaluate the moist thermodynamics (see Sect. 3.9) and large-scale forcings.We have repeated the case as described in S03 at the original resolution of the study (100 m × 100 m × 40 m) and at a higher resolution (10 m × 10 m × 9.375 m).
This case produces non-precipitating shallow cumulus.It has a large-scale cooling applied that represents radiation, as well as a large-scale drying to allow the atmosphere to relax to a steady state.In addition, a large-scale vertical velocity is applied over a certain height range to reproduce the appropriate synoptic conditions.
The simulation is run for 6 h.Statistics are recorded during the final hour, including conditional statistics for the cloudcovered fields (q l > 0) and for the cloud cores (q l > 0 and θ v > 0).The vertical profile of area coverage and profiles of horizontally averaged liquid water potential temperature θ l , total water q t , and vertical velocity w are shown in Fig. 10.All mean and conditionally sampled statistics at the original resolution are predominantly within 1 standard deviation of the ensemble mean of data from all models that participated www.geosci-model-dev.net/10/3145/2017/Geosci.Model Dev., 10, 3145-3165, 2017 in S03.The horizontally averaged vertical velocities in the cloud and cloud core decrease considerably with an increase in resolution.

GABLS1
To evaluate the LES mode for stable atmospheric conditions, the GABLS1 LES intercomparison case (Beare et al., 2006) (cases/gabls1) was reproduced.The boundary layer develops in this case from a shallow, well-mixed layer into a weakly stable boundary layer, driven by a prescribed negative tendency of the surface temperature over a total integration time of 9 h.The Boussinesq approximation is used, the advection scheme uses fourth-order accurate interpolations (Eq.27), and the Smagorinsky subgrid turbulence scheme is set up with a Smagorinsky constant equal to 0.12 and a subgrid turbulent Prandtl number of unity.The experiments are performed at two different resolutions with grid cells of 2 and 6.25 m, and compared to the models which participated in the study of Beare et al. (2006).
Figure 11 shows the domain and time-averaged (over a period from 28 800 to 32 400 s) vertical profiles of potential temperature ( θ ) and the velocity component ( u ), and also time series of the boundary layer depth (z ABL ) and friction velocity (u * ).At the largest grid spacing of 6.25 m, it takes approximately 2 h for the flow to become turbulent, as is evident from the delayed boundary layer growth and abrupt changes in u * .Nonetheless, typical features like the low-level jet (Fig. 11b) are well reproduced, and all statistics are predominantly within the range of results from Beare et al. (2006).With the grid spacing reduced to 2 m, the flow becomes turbulent nearly instantaneously, but the resulting boundary layer depth and surface friction velocity are on the low side compared to the five models from Beare et al. (2006) which were run at this resolution.
Figure 10.BOMEX LES intercomparison (S03).Shown the domain mean, and conditionally sampled cloud (q l > 0) and cloud core (q l > 0 and b − b > 0) vertical profiles of (a) area coverage of cloud and cloud core, (b) liquid water potential temperature, (c) total specific humidity, and (d) vertical velocity.The results are averaged over t = 18 000-21 600 s.The shaded area denotes the mean ±1 standard deviation of the participating models from S03; the solid and dashed lines the results from MicroHH, using the original (solid) and a higherresolution (dashed) setup.The strong-scaling experiment shows that increasing the number of processors leads to faster simulations.The speedup is initially close to linear, but each consecutive increase in the number of cores makes the model less efficient.Based on these results, we conclude that for the chosen test case and for the used supercomputers, a work load of approximately 2 × 10 6 grid points per core is the best balance between speed and computational efficiency.
The weak scaling shows almost 90 % efficiency going from 512 to 8192 cores; beyond that, the scaling falls off to 80 %.This can be explained by physical properties of the machine; beyond 8192 cores, a simulation no longer fits on one midplane (a physical unit consisting of 8192 cores), leading to slower communication.

Performance GPU (CUDA) implementation
The GPU implementation of MicroHH allows for fast simulations on a single device.The current state-of-the-art GPUs feature 12 GB of memory; thus, simulations of maximally 512×512×512 grid points of a flow with three velocity components, pressure, two scratch fields for temporary storage, and a single scalar fit in memory.Within this experiment, we compare thus GPU simulations that do not need communication against CPU simulations that require communication between cores and nodes.The reason for doing so is that nearly all of the simulations of the presented results in Sects.8 and 9 fit within the memory of a single GPU.
To test the performance of such simulations, the performance of MicroHH on an NVIDIA Quadro K6000 (using CUDA 6.5) has been compared against the Max Planck Institute for Meteorology's cluster Thunder (two Intel Xeon E5-2670 CPUs per node, 16 cores per node).Three benchmark cases have been chosen: the BOMEX moist convection case on grids of 64 3 , 128 3 , 256 3 , and 512 2 × 384, and the channel flow cases of Moser et al. (1999) at Reynolds τ numbers of 180 and 590.The results shown in Table 1 point to the great potential of GPU computing.For the considered cases, which all fit on a single GPU, it takes at least 32 cores to reach equal performance.Only at 64 cores, the CPU simulations are notably faster.Therefore, for simulations that fit into its memory, the GPU provides an excellent alternative for the CPU, especially because the GPU is very energy efficient.

Published studies
To date, several studies have been published that make use of MicroHH or data generated with MicroHH.The work of van Heerwaarden et al. (2014) focused on the scaling of flow over heterogeneously heated land surfaces using DNS and LES, Gentine et al. (2015) used LES to study the structure of the inversion of a convective boundary layer, van Heerwaarden and Mellado (2016) developed scaling laws for the convective boundary layer over a surface with a constant temperature from DNS data, McColl et al. (2017) improved surfacelayer similarity under mildly convective conditions with the help of DNS data, and Umphrey et al. (2017) used DNS data produced with MicroHH as a reference for their simulations of slope flow.
12 Future plans There are several ongoing projects to extend the model.Currently, a parameterization for microphysics has been developed, and an interactive land surface model is under devel- opment.In addition, the immersed boundary method following Tseng and Ferziger (2003) is being implemented to allow for simulations of flow over obstacles, urban canopy, and hills.Furthermore, preliminary experiments have been performed to include a domain-specific language (DSL) to enable the expression of complex finite difference operators in simple and compact syntax (https://github.com/microhh/stencilbuilder/).This development has shown great potential for two reasons.First, the DSL prevents implementation errors, as the explicit indexing in computational kernels with spatial operators can be omitted.Second, the DSL allows for simple implementation of system-specific tuning, such as loop tiling or OpenMP.

Conclusions
This paper has presented a full description of MicroHH, a new computational fluid dynamics code for simulations of turbulent flows in the atmospheric boundary layer.The governing equations and their implementation have been presented, and a broad validation under a wide range of settings has been shown.MicroHH delivers the expected error convergence of the spatial and temporal schemes, and has proven to be mass, momentum, and energy conserving.The code delivers good performance in weak-and strong-scaling experiments.Its current limitations are the absence of horizontal boundary conditions other than periodic, and the limited set of available physical parameterizations.Both limitations will be addressed in future versions of the code.
with a, b, and c as constants.This gives the Dirichlet boundary condition when a = 1, b = 0, and the Neumann boundary condition when a = 0, b = 1.

Figure 1 .
Figure1.Schematic of the diffusion discretization near the wall.The green node is the evaluation point at the center of the first cell above the wall, the red nodes are the stencil of the divergence operator, and yellow nodes show the stencils of the four gradient operators over which the divergence is evaluated.White nodes indicate the extent of the stencil.

)-
The third option is a fixed surface velocity U 0 and a fixed surface buoyancy b 0 .With this boundary condition, the surface values of the horizontal velocities and www.geosci-model-dev.net/10/3145/2017/Geosci.Model Dev., 10, 3145-3165, 2017

Figure 3 .
Figure 3.Time evolution of the kinetic energy change KE during 1000 time units of random noise advection for the RK3 and RK4 time integration schemes with three different time steps (a).Kinetic energy change convergence of the temporal discretization for the RK3 and RK4 schemes (b).

Figure 4 .
Figure 4. Normalized numerical Prandtl model solutions for velocity u (a) and buoyancy b (b) compared to their analytical counterparts.

Figure 5 .
Figure 5. Velocity means (a) and variances (b) for Moser et al. (1999) channel flow case at a Reynolds τ of 590.The dashed vertical lines mark the spectra locations.Height z is normalized with u τ /ν; velocities with u −1 τ .

Figure 7 .
Figure 7. Spectra of all velocity components and pressure compared against Moser et al. (1999)'s reference data at a Reynolds τ of 590.The velocity spectra are normalized with u −2 τ ; the pressure spectra with u −4 τ .

Figure 8 .
Figure 8. Profile of the mean along-slope velocity (a) and buoyancy (b) as predicted by MicroHH and FS09.

Figure 9 .
Figure 9. Vertical profiles of horizontally averaged potential temperature (a) and normalized kinematic heat flux (b).The boundary layer depth z i is the location of the maximum vertical gradient in the potential temperature profile shown in (a).

Figure 11 .
Figure11.GABLS1 LES intercomparison(Beare et al., 2006).Shown are the vertical profiles of (a) potential temperature and (b) u component of the velocity, and time series of the (c) boundary layer depth and (d) surface friction velocity.The shaded areas show the range in the results from the models that participated in theBeare et al. (2006) study.The dotted black lines show the initial conditions.

Figure 12 .
Figure 12.Speedup from a strong-scaling experiment (a).Efficiency from a weak-scaling experiments (b).Black lines indicate perfect speedup and efficiency.The dashed red lines show the efficiency change relative to the previous measurement.

Table 1 .
Speedup of GPU simulation compared to respective CPU simulation performed on n cores.