A flexible importance sampling method for integrating subgrid processes

Numerical models of weather and climate need to compute grid-box-averaged rates of physical processes such as microphysics. These averages are computed by integrating subgrid variability over a grid box. For this reason, an important aspect of atmospheric modeling is spatial integration over subgrid scales. The needed integrals can be estimated by Monte Carlo integration. Monte Carlo integration is simple and general but requires many evaluations of the physical process rate. To reduce the number of function evaluations, this paper describes a new, flexible method of importance sampling. It divides the domain of integration into eight categories, such as the portion that contains both precipitation and cloud, or the portion that contains precipitation but no cloud. It then allows the modeler to prescribe the density of sample points within each of the eight categories. The new method is incorporated into the Subgrid Importance Latin Hypercube Sampler (SILHS). The resulting method is tested on drizzling cumulus and stratocumulus cases. In the cumulus case, the sampling error can be considerably reduced by drawing more sample points from the region of rain evaporation.


Introduction
Coarse-resolution atmospheric models of weather and climate do not solve differential equations; they solve integrodifferential equations, that is, equations containing both derivatives and integrals.Although a derivation of an atmospheric model starts with differential equations, such as the Navier-Stokes or advection-diffusion equations, those equations are coarse-grained or filtered before being dis-cretized (e.g., Leonard, 1974;Pope, 2000).Typically, a spatial running-mean filter is used, producing equations similar to Reynolds-averaged equations (e.g., Germano, 1992).Each term in the filtered equations is spatially averaged over a grid box.For instance, in a prognostic equation for gridaveraged rain mixing ratio, the grid-averaged rain is updated by grid-averaged microphysical process rates.Schematically, we may give an example of such a filtered equation: where r denotes grid-averaged rain mixing ratio, t denotes time, and h denotes the grid-averaged microphysical time tendency of rain mass mixing ratio.Because a grid-box average is an integral (divided by the grid-box volume), the resulting filtered equations are integro-differential equations.Therefore, a central problem in atmospheric modeling is (subgrid-scale, spatial) integration.Mathematically, the problem is to evaluate integrals of the form h ≡ h(x)P (x)dx, (2) where x is a vector containing the relevant model fields, h(x) is the time tendency of a physical process, such as autoconversion of cloud droplets to form rain drops, and P (x) is the model-predicted subgrid-scale probability density function (PDF) (i.e., "filtered density function", Colucci et al., 1998) of the variables.Here, h(x) could be a simple analytic function or a complex numerical subroutine.An integral such as Eq. ( 2) ought to be computed for each of the many nonlinear process rates in the model.(However, when the grid box is assumed to be uniform, then the integral is not performed.) Published by Copernicus Publications on behalf of the European Geosciences Union.
The integrals also need to be computed for each grid column in the horizontal and each grid level in the vertical.
To carry out this integration (i.e., "quadrature"), researchers have proposed several methods.First, the integral (Eq.2) may be evaluated analytically (e.g., Zhang et al., 2002;Larson and Griffin, 2006;Morrison and Gettelman, 2008;Cheng and Xu, 2009;Griffin and Larson, 2013;Larson and Griffin, 2013;Lebsock et al., 2013;Boutle et al., 2014).Analytic integration has the advantage of accuracy, but it can be carried out only if both the process rate h(x) and the subgrid-scale PDF P (x) are sufficiently simple.Furthermore, analytic integration is carried out grid level by grid level, and does not compute the vertical overlap of cloud properties.Vertical overlap is related to the correlation between quantities at two points in space, one located directly above the other.The degree of vertical overlap has a strong influence on, e.g., radiative transfer.Second, the integrals may be computed by deterministic quadrature (Xiu, 2009;Golaz et al., 2011;Chowdhary et al., 2015).Deterministic quadrature solves an integral by computing a weighted sum of integrand values evaluated at specially chosen quadrature points.Deterministic quadrature has a couple of advantages: unlike analytic integration, deterministic quadrature is applicable to a broad range of processes, and like analytic integration, deterministic quadrature is still accurate.Deterministic quadrature also has a disadvantage: it does not compute vertical overlap.Third, the integrals can be evaluated by Monte Carlo integration (e.g., Gentle, 2003;Kalos and Whitlock, 2008).In Monte Carlo integration, random samples are drawn from the subgrid PDF P (x), the integrand is evaluated at each sample point, and the resulting values are suitably averaged.Monte Carlo integration is broadly applicable and can be configured to model vertical overlap (Barker et al., 2002(Barker et al., , 2008;;Pincus et al., 2003Pincus et al., , 2006;;Räisänen et al., 2004Räisänen et al., , 2005Räisänen et al., , 2007Räisänen et al., , 2008;;Räisänen and Barker, 2004;Larson et al., 2005;Larson, 2007;Hill et al., 2011;Larson and Schanen, 2013;Tonttila et al., 2013Tonttila et al., , 2015)).However, Monte Carlo integration converges slowly.Obtaining an accurate integration requires many costly evaluations of a microphysics parameterization.
To improve the convergence of Monte Carlo integration, many methods have been proposed.Two broad strategies are stratified sampling and importance sampling (Press et al., 2007;Lemieux, 2009).Stratified sampling spreads out the sample points in sample space in order to avoid clumping, which leads to poor sampling.One popular stratified sampling method is Latin hypercube sampling, which stratifies along each dimension of the integral (e.g., McKay et al., 1979;Owen, 2003).Another strategy, importance sampling, preferentially places sample points in important regions of the integration domain (Press et al., 2007;Lemieux, 2009).For instance, extra sample points may be placed within cloud because that is where important processes occur, such as the formation and growth of cloud droplets.Some sampling methods combine stratified and importance sampling.For instance, a prior version of the Subgrid Importance Latin Hypercube Sampler (SILHS) placed sample points preferentially in cloud, and also stratified the within-cloud sample points using Latin hypercube sampling (Larson et al., 2005;Larson and Schanen, 2013;Storer et al., 2015;Thayer-Calder et al., 2015).SILHS similarly stratified the points out of cloud.Although SILHS' importance sampling improved the integration of within-cloud microphysical processes, the importance sampling did not improve the integration of out-of-cloud processes, such as evaporation of rain.This is a drawback in cases where evaporation is an important process.What is needed is a more flexible importance sampling method, one that allows the modeler to sample important processes in a more targeted way.
This paper proposes a new importance sampling method that is highly flexible.It divides the domain of integration into N cat non-overlapping "categories", such as the region that contains precipitation and cloud, or the region that contains precipitation but no cloud.This "nCat" method allows the modeler to prescribe the density of sample points within each category.This flexibility allows a modeler to allocate more sample points to a particular process, such as evaporation of rain drops, if evaporation is especially important to the problem of interest.Furthermore, two or more categories can be combined into a single "cluster" if none of the categories in the cluster should be treated preferentially over the others.
This paper will introduce nCat sampling and evaluate it in an idealized, single-column setting.Section 2 specifies the subgrid probability density function (PDF) that our method will sample.Section 3 describes how SILHS sampled the subgrid PDF before the nCat method was introduced.Section 4 details the new nCat method that has been introduced into SILHS.Section 5 explains the criteria and methodology used to evaluate the new SILHS sampling scheme, including configuration of the model.Section 6 shows tests using a precipitating shallow cumulus case and a precipitating stratocumulus case.Section 7 concludes the paper.

The functional form of the PDF from which SILHS draws sample points
SILHS does not generate sample points according to a stochastic rule; rather, SILHS merely draws sample points from a pre-existing subgrid PDF.In this paper, the PDF is calculated by the Cloud Layers Unified By Binormals (CLUBB) parameterization (Larson et al., 2002;Golaz et al., 2002;Larson and Golaz, 2005;Larson et al., 2012).At each timestep in a simulation, CLUBB diagnoses the subgrid PDF by means of the assumed PDF method.That is, CLUBB prognoses various subgrid higher-order moments, assumes a functional form for the PDF, and diagnoses a particular PDF for each timestep and grid level that is consistent with both the moments and the functional form.CLUBB's PDF is multivariate.It includes several variates (i.e., variables) that are useful inputs to thermodynamical and microphysical calculations.One of the PDF's variates is the extended cloud (liquid) water mass mixing ratio (χ) (Mellor, 1977;Larson et al., 2005).When χ > 0, then χ equals the cloud water mass mixing ratio; when χ < 0, then cloud water is assumed to be zero, and χ represents the deviation from saturation.The variate χ does not include ice.Another of the PDF's variates, related to χ, is the corresponding orthogonal variable (η) (Mellor, 1977;Larson et al., 2005).Together, χ and η are a rotation and rescaling of temperature and total water variables.CLUBB's PDF also includes the vertical velocity w, an extended cloud droplet number mixing ratio (N cn ), and the precipitating hydrometeor mass mixing ratios and number mixing ratios (hm).The cloud droplet number equals the extended cloud droplet number mixing ratio when cloud is present (that is, when χ > 0); when no cloud is present, the cloud droplet number is assumed to be zero.
The functional form of CLUBB's PDF is a compromise between realism and mathematical simplicity.CLUBB's PDF may be written as ( The PDF is split into N comp components; currently, in CLUBB, N comp = 2.Each component m has a weight ξ (m) , where N comp m=1 ξ (m) = 1, and for each m, ξ (m) ≥ 0. Each component has different means and variances for the variates, and the PDF is a mixture (i.e., weighted sum) of the components.The vector hm contains precipitating hydrometeor species that are prognosed by the microphysics scheme.The exact type and number of hydrometeors depends on the microphysics scheme used.In this paper, the microphysics scheme used is that of Khairoutdinov and Kogan (2000), in which the prognosed hydrometeor species are rain mass mixing ratio and rain number mixing ratio.The fraction f p(m) represents the portion of mixture component m that contains at least one precipitating hydrometeor species, where 0 ≤ f p(m) ≤ 1.The fraction (1 − f p(m) ) represents portions of mixture component m that are precipitation-free (and are denoted by δ(hm)) but may or may not contain cloud water. 1 In the portions of the PDF that contain precipitation, P (m) (χ , η, w, N cn , hm) is a joint normal-lognormal distribution, where χ, η, and w are normally distributed, and N cn and all the variables in hm are lognormally distributed (Larson and Griffin, 2013;Griffin and Larson, 2016).In the parts of the PDF that do not contain 1 In the notation used above, δ(hm) is a Dirac delta function and is short for δ(hm 1 )δ(hm precipitation, P (m) (χ , η, w, N cn ) is a joint normal-lognormal distribution, as before, but all the precipitating hydrometeors are zero, rather than lognormally distributed.A simplifying feature of the functional form is that we insist that P (m) (χ , η, w, N cn ) = P (m) (χ , η, w, N cn , hm) dhm. (4) That is, the marginal distribution of cloud and turbulence within a mixture component, m, is the same both within and outside of the precipitating region.Therefore, integrating over precipitation in Eq. (3) collapses the two terms per component m into one term.

Prior formulation of SILHS
In both the new and prior versions of SILHS, sample points are drawn from CLUBB's PDF and fed into subroutines that compute microphysical process rates.To reduce the noise associated with the random sampling of processes, both versions of SILHS incorporate stratified sampling (specifically, Latin hypercube sampling) and importance sampling.
The Latin hypercube algorithm is described in many sources (e.g., McKay et al., 1979;Owen, 2003;Larson et al., 2005).Intuitively, the algorithm stratifies the sample points across each variate such that for each variate, exactly one sample point falls into each of N s intervals of equal size, where N s is the number of sample points.For instance, if N s = 3, the Latin hypercube sampling chooses low, medium, and high values of, e.g., rain mass mixing ratio.
Importance sampling is useful when a process rate is particularly large and variable within a small portion of the sample space.For instance, autoconversion of cloud droplets occurs only within cloud, which in cumulus cases often occupies a small fraction of the domain.Without importance sampling, the density of sample points in the sample space is given by the PDF P (x).For example, if, according to the PDF, 10 % of the domain is occupied by cloud, then on average only 10 % of sample points will be placed within cloud.Importance sampling is used to change the sampling density so that areas of interest are sampled more frequently than less important regions, regardless of the densities given by the PDF.
The prior version of SILHS (Larson et al., 2005;Larson and Schanen, 2013) used a simple importance sampling scheme that placed half the sample points in cloud and the other half out of cloud.This importance sampling was only performed when, according to the PDF, the amount of the grid box occupied by cloud (the "cloud fraction") was between 0.5 and 50 %.When cloud fraction was in this range, SILHS preferentially sampled cloud, thereby improving the representation of cloud processes such as autoconversion.In doing so, of course, SILHS' importance sampling degraded the representation of processes that occur out of cloud, such as evaporation of rain.
In both the new and prior versions of SILHS, a sample is first drawn from a starting grid level.This grid level is the only grid level where SILHS explicitly performs importance sampling; it is called the "importance sampling level" in this paper.The importance sampling level is chosen at each timestep to be the height level with the maximum withincloud cloud water mass mixing ratio.To represent vertical overlap, sample points at other height levels are drawn such that they are correlated with adjacent levels according to a correlation coefficient that decreases exponentially with increasing height (see Larson and Schanen, 2013).This process continues to the top and bottom of the domain.The resulting vertical profile of sample points is a "subcolumn" that statistically represents a fraction of the grid column and models vertical overlap.Thus, SILHS does sample at all grid levels, but it explicitly performs importance sampling only at the importance sampling level.

The nCat importance sampling method
The nCat flexible importance sampling method is a generalization of the original SILHS importance sampling method described above.It is designed to give the modeler finer control over which parts of the subgrid PDF are preferentially sampled, that is, regarded as "important".
First, the domain of the PDF is split into a set of disjoint categories, C j , that span the entire PDF domain.Here, j = 1. ..N cat , where N cat is the number of categories.
In this paper, eight categories are used.The definitions of the categories are based on the following three criteria: in/out of cloud, in mixture component 1/2, and in/out of precipitation.A sample point lies in cloud if and only if χ > 0; if χ ≤ 0, the sample point lies outside cloud.To determine whether a sample point lies within mixture component 1, we generate a random number, u d+1 , that is uniformly distributed between (0, 1).If u d+1 < ξ (1) , then the sample is in mixture component 1; otherwise, the sample is in mixture component 2. To determine whether a sample point lies within precipitation, we generate another uniformly distributed random number, u d+2 , and check whether u d+2 < f p(m) , where m is the mixture component number.The eight possible combinations of cloud, mixture component, and precipitation form the eight categories used for importance sampling.The categories are shown in Table 1.
Each category C j is associated with a certain amount of PDF mass, called the category's "original probability" and denoted as (5) Since the categories C j span the entire PDF, we have where N cat is the number of categories (currently N cat = 8).Each category has p j ≥ 0; but naturally, categories with p j = 0 need not be included in the corresponding integral (Eq.2).
In general, the p j values must be found by performing an integral over the PDF.For example, the amount of PDF mass in the first category (the category with cloud, precipitation, and in component 1) may be found by integrating the PDF in Eq. ( 3) over this portion of the PDF: In CLUBB's PDF (see Eq. 3), because cloud and precipitation are independent within a component (that is, the marginal distribution of χ , which determines cloud, is the same both in and out of precipitation), the integrals to find the category original probabilities p j involve only simple quantities, such as f c(1) (cloud fraction in mixture component 1), that are already computed elsewhere in CLUBB.In general, computing these constants may involve evaluating complicated numerical functions, such as the error function, which involves computational expense.However, the constants need to be computed only at the importance sampling level (not at each height level) and only once per timestep (not for each sample point), and so the additional expense is tolerable.
The notation introduced so far in this section relates to the PDF itself, rather than importance sampling per se.In order to implement importance sampling, we sample what we regard as the "important" categories preferentially.To do so, we introduce for each category, C j , a user-defined probability, S j , called the category's "modified probability".The modified probability S j of a given category is the desired probability that any sample will fall in that category.In other words, it is the expected fraction of sample points in the category when importance sampling is used.Therefore, intuitively, it is advantageous to set the modified probabilities such that the categories that are important for a process of interest are sampled more often than the unimportant categories.These modified probabilities must be set such that The sampling process is modified such that each category C j is sampled with probability S j rather than p j .In order to give a mathematical form for the new PDF that points are drawn from, we introduce some notation.We define a new function, L(x), called the "likelihood ratio": Here, 1 j (x) is the indicator function of category C j , defined as and is the weight of each sample point in category C j .Then, the new sampling PDF, denoted Q(x), is defined as The new PDF, Q(x), is normalized because N cat j =1 S j = 1.The integral in Eq. ( 2) is written as Then, the new integral in Eq. ( 13) is approximated by drawing N s sample points from the Q(x) distribution and evaluating where x i is the i sample point drawn from the Q(x) distribution.For a sample point x i in category C j , L(x i ) = p j S j = ω j .To draw sample points from the Q(x) distribution, a uniform variate, 0 < u c < 1, is picked for each sample point.The value of this uniform variate determines a sample point's category.For example, if 0 < u c < S 1 , then the sample point will be associated with category C 1 .If S 1 ≤ u c < S 1 +S 2 , the sample will be associated with category C 2 , and so on.Once the category has been determined, the sample point is drawn from the portion of the marginal distribution of P (x) that is within the category.For example, a sample point that is to be in cloud is drawn from the distribution P (x|χ > 0).

The weight limiter
Importance sampling allows the modeler to concentrate sample points in areas of the sample space that are considered important.But sample points given to important areas are taken from unimportant areas.Therefore, if importance sampling is applied overzealously, the less important processes can become excessively noisy.
In SILHS, we wish to employ an importance sampling scheme that improves results for important processes (e.g., certain microphysical processes) while still producing reasonably accurate estimates of other "less important" or perhaps less variable processes.One reason that we wish to avoid overdoing the importance sampling is that a favorable sampling distribution at one grid level (altitude) may be unfavorable at another.
The change in accuracy for a given category due to importance sampling can be assessed by noting the weight of sample points in that category.The inverse weight of a sample point in category C j is given by where p j is the category's original probability and S j is the category's modified probability due to importance sampling.The weight, ω j , is closely related to L(x) (see Eq. 9).The inverse weight, 1/ω j , may be interpreted as the density of sample points per unit probability mass.When 1/ω j < 1, the category is sampled less often with importance sampling, and when 1/ω j > 1, the category is sampled more often with importance sampling.We are particularly concerned about large values of ω j , because large weights are associated with undersampling and hence degradations in accuracy.
In order to mitigate the negative impact of importance sampling, we now propose a simple method to impose a maximum weight, ω max , in each category.Intuitively, the algorithm works as follows.For each category C j , we compute the minimum modified sampling probability: To ensure that the weight of a category, ω j , does not exceed ω max , the category must be sampled at least as often as S j,min ; that is, S j ≥ S j,min .If any categories are undersampled, then S j must be increased in those categories, and probability mass must be taken from other categories (where S j > S j,min ) in order to ensure that the S j probabilities sum to 1.The algorithm takes probability mass from another category in proportion to how much "extra" probability mass the category has.Formally, the algorithm is constructed as follows.We compute the difference between the category's modified probability, S j , and its minimum modified probability, S j,min : Let N be the set of all categories where S j,diff < 0 (the categories where S j needs to be increased) and M be the set of all categories where S j,diff ≥ 0 (the categories where S j can potentially be reduced).If N is the empty set (that is, if no categories have S diff < 0), then all categories already satisfy the weight limit, and nothing needs to be done.Otherwise, if S j is the new distribution of modified probabilities, then for all C j ∈ N we set and for all C j ∈ M we set This method will take sampling probability away from the categories with extra probability proportionally to how much extra probability they have (i.e., how large S j,diff is).It can readily be shown that the fractions S j are bound to the range [0, 1] and sum to 1.
In SILHS, we currently set ω max = 2, which means that on average, a category will be sampled no less than half as often with importance sampling than without importance sampling.Consequently, the variance of the estimate of a quantity is increased (degraded) by no more than a factor of 2 due to importance sampling.(The standard deviation is increased by no more than a factor of √ 2 ≈ 1.4.For this estimate of variance, we assume the usual Monte Carlo convergence rate.)

Optimal allocation of sample points
The success of the nCat method depends on knowing how to allocate sample points to the categories.In some cases, it is easy to see how to allocate points.For example, if it is known that the process(es) of interest are active in only one of the N cat categories, then we can simply put all sample points in that one category, and use the weight limiter to ensure that other categories are still adequately sampled.However, in the case that processes of interest are active in two or more categories, one needs to know how to distribute (i.e., "allocate") sample points among these categories.
For a given process rate, h(x), and a given estimator of the variance, it is possible to derive the optimal allocation of sample points (Lemieux, 2009).The optimal allocation provides guidance on how to determine the S j values.In Appendix A, it is shown that the optimal modified probabilities are This expression shows that the optimal fraction of samples in category C j , i.e., S j , depends on both the original probability p j of category C j , and on the category-averaged standard deviation, √ v j .

A simple method of allocating sample points
One could prescribe the modified probabilities S j directly.However, a key problem with directly prescribing the S j is that prescribed values cannot scale with the original probabilities p j .For instance, consider the case in which, for some category C j , we have prescribed S j > 0, but it turns out that for a particular cloud case, p j = 0. Then some sample points will be placed in category C j even though they contribute nothing to the overall sum.(These sample points have a weight of zero.)This is a needless computational expense.Instead, the sample points should be placed in other categories with non-zero p j .More generally, prescribing each S j directly is akin to assuming that the contribution of each category to the total sum is constant regardless of what fraction of the PDF is occupied by each category.Instead, a more realistic assumption is that each category contributes to the total sum a constant amount per unit original probability, p j .These prescribed amounts are then scaled by the original probabilities p j to obtain the modified probabilities S j .
Specifically, we prescribe the following normalized standard deviation of the process rate for each category C j : To make the γ j easier to interpret and prescribe, we insist that 0 ≤ γ j ≤ 1; then the denominator is simply the sum of the numerator in all categories, so that each γ j is a fraction with Specifically, γ j is the fraction of √ v j in category C j .Prescribing γ j is accurate and general when each γ j varies little in space or time, or from case to case.Note that the numerator of Eq. ( 21) is the same as the numerator of Eq. ( 20), but without the p j term.
Given the γ j fractions, it is easy to determine the S j values by dividing the numerator and denominator of Eq. ( 20) by It is clear from this equation that the S j values are still in the range [0, 1] and still sum to 1.
The prescription (Eq.22) leads to more robust importance sampling than does prescribing S j values as constants.With Eq. ( 22), the optimal γ j in a particular category (say, in cloud and precipitation and mixture component 1) is relatively insensitive to the area occupied by that category (e.g., to the cloud fraction or precipitation fraction).The reason is that in Eq. ( 22), each S j value weights γ j by the original probability p j of category C j .This means that, for instance, when C j occupies a small fraction of the domain, and γ j is moderate, then the total fraction of sample points S j in category C j scales naturally to small values.Prescribing the γ j values is akin to prescribing the inverse weights 1/ω j , rather than the sample fractions S j .To see resemblance of γ j and 1/ω j , note from Eqs. ( 15) and ( 22) that Both 1/ω j and γ j are related to the density of sample points per unit probability mass, p j .

Advantages of the new method
As compared to the previous version of SILHS, the chief advantage of the new nCat method is its flexibility.In particular, the user can individually prescribe the sampling density per unit probability (γ j ) in each of eight categories, C j .This flexibility is useful when important processes, such as evaporation of rain, occur in particular categories, such as the region of rain but no cloud.This flexibility is made possible in part by the fact that the nCat method imposes no restriction on the number of sample points used per timestep.The previous version of SILHS required an even number of sample points per timestep, because one point was placed in cloud and the other was placed outside cloud.Generalizing this method to eight categories would have required a multiple of at least eight sample points per timestep, and would not allow much flexibility in prescribing the relative importance of categories.Instead, the nCat method uses a probabilistic approach to picking a category for each sample point.This allows any number of sample points to be used at each timestep, including the use of fewer than N cat samples, without causing a biased result.

Summary of steps to implement method
In summary, to implement the new importance sampling method, the following steps should be taken.
1. Pick a set of categories, C j , that span the PDF domain.
We have proposed eight categories for use, as given in Table 1.
2. Pick a set of sampling fractions, γ j .A good set of values to use can be obtained by using a simulation to estimate the optimal values, given by the right-hand side of Eq. ( 21), as we do in Sect.6.1 below.
E. K. Raut and V. E. Larson: A flexible importance sampling method for integrating subgrid processes 3. Compute, from the fractions γ j , the modified probabilities S j using Eq. ( 22).Pick sample points from the Q(x) distribution, defined in Eq. ( 12).
4. Compute the weight in each category, ω j , using Eq. ( 11).Sample points are given a weight corresponding to the category the sample point is in.Limit the weights according to the algorithm in Sect.4.1, if so desired.
5. Feed the (unweighted) sample points one by one into a physical parameterization (e.g., a microphysics scheme).
6. Compute a weighted average of the function of interest using Eq. ( 14).

Methodology of evaluation of the sampling methods
In order to evaluate how well the new importance sampling scheme simulates multiple cloud types, we have simulated two cloud cases.The first is a drizzling shallow cumulus case: Rain in shallow Cumulus over the Ocean (RICO), configured as in the intercomparison of vanZanten et al. ( 2011).RICO was a drizzling trade-wind cumulus case observed off the Caribbean islands of Antigua and Barbuda (Rauber et al., 2007).The second is a drizzling stratocumulus case: Research Flight 2 (RF02) of the DYnamics and Chemistry Of Marine Stratocumulus (DYCOMS-II), configured as in Wyant et al. (2007).DYCOMS-II RF02 was a nocturnal drizzling stratocumulus layer observed off the coast of California (Stevens et al., 2003).A key difference in the sampling of these two cases is that the stratocumulus case has a much larger cloud fraction (> 0.95) than the cumulus case (< 0.05).Therefore, without importance sampling, nearly all sample points fall in cloud in the stratocumulus case, while almost none fall in cloud in the cumulus case.Finding a single, effective sampling strategy for both the stratocumulus and cumulus cases is challenging.
The following four configurations of SILHS were used for comparison.
1. "LH-only".This configuration uses only Latin hypercube sampling.No importance sampling is performed.The nCat method is not used.
2. "2Cat-Cld".This configuration is functionally equivalent to the old version of SILHS that placed one point in cloud and one point out of cloud.This configuration uses two categories: in cloud, and out of cloud.The categories (c,p,1), (c,p,2), (c,np,1), and (c,np,2) are all lumped together into the "cloud" category, and the other four categories are analogously lumped into the "clear" category.That is, a point that is in cloud belongs to the cloud category regardless of whether it is in precipitation or which mixture component it is in, and similarly for points in clear air.When cloud fraction is between 0.5 and 50 %, it places 50 % of sample points in each of the two categories.(That is, S 1 = S 2 = 0.5.)Otherwise, no importance sampling is performed.
3. "2Cat-CldPcp".This configuration also uses two categories.The first consists of points that are either in cloud or in precipitation, and the second consists of the complement, namely, points that are neither in cloud nor in precipitation.That is, (nc,np,1) and (nc,np,2) are lumped into the no-cloud-or-precipitation category, and the others are lumped into the cloud-or-precipitation category.Since no microphysical processes act in the area of the domain outside of cloud and precipitation, the sample points are initially prescribed such that all points fall in the cloud-or-precipitation category (i.e., the first category).(That is, γ 1 = S 1 = 1, γ 2 = S 2 = 0).After the initial prescription, the weight limiter ensures that S 2 = p 2 ω max = p 2 2 .
4. "8Cat".This configuration uses all eight categories listed in Table 1.To determine the sampling fractions γ j to use, a simulation was run in which SILHS was used to estimate the quantity in Eq. ( 21) at each timestep.One set of sampling fractions, γ j , was used for both RICO and DYCOMS-II RF02.This is discussed in Sect.6.1.
The simulations were non-interactive, so that errors in the SILHS simulations did not feed back into the simulated fields.This made it possible to evaluate multiple SILHS simulations against a common analytic solution.Some notable aspects of the simulation configurations are shown in Table 2.
The microphysics scheme used in the simulations is that of Khairoutdinov and Kogan (2000).As a reference solution, an analytically upscaled version of the Khairoutdinov-Kogan microphysics scheme was used, as described in Larson and Griffin (2013).Comparison with the analytic solution indicates whether SILHS draws sample points from the correct PDF at each grid level.However, the comparison with the analytic solution does not test whether the PDFs at each level are overlapped accurately.Although the overlap assumptions do not affect these test cases, overlap does influence processes such as radiative transfer.Testing the PDF overlap assumptions is left for future research.Nevertheless, the ability to test convergence at each grid level is an advantage.For instance, early convergence tests revealed several bugs in SILHS.Many microphysics schemes in operational use do not permit analytic solution.For these microphysics schemes, a non-analytic integration method, such as SILHS, is necessary.
Each SILHS configuration was evaluated on its ability to estimate the following three microphysical processes.

Autoconversion: the conversion of cloud water to rain
water.This process occurs within cloud, both inside and outside of precipitation (rain).−3 ) 70 × 10 6 55 × 10 6 2. Accretion: the growth of rain droplets by collection of cloud water.This process occurs when both cloud and precipitation are present.
3. Evaporation: the conversion of rain water to water vapor.This process occurs in areas outside cloud but within precipitation.

Simulations of drizzling cumulus and stratocumulus clouds
In this section, we present results obtained using the new importance sampling method.

Estimation of optimal sampling fractions
Prescribing the γ j is a useful general approach only if the γ j vary relatively little from case to case.We test this by estimating the optimal sampling fractions, γ j , for both the RICO and DYCOMS-II RF02 cases.The optimal γ j values are calculated by estimating the right-hand side of Eq. ( 21) at each timestep at the importance sampling level.The process used, h(x), is the sum of the autoconversion, accretion, and evaporation tendencies from the microphysics scheme.The importance sampling level is chosen at each timestep to be the level with the maximum within-cloud cloud water mass mixing ratio.The integral defining v j , given in Eq. (A4), was estimated using 256 SILHS sample points.The γ j values, averaged over all timesteps (864 total timesteps for RICO and 360 for DYCOMS-II RF02), are shown in Table 3.We see that in both cases, the optimal γ j values are largest in category 1 (in cloud, in precipitation, and in mixture component 1) and in category 3 (out of cloud, in precipitation, and in mixture component 1).As expected, the optimal sampling fractions for the last two categories are zero, since microphysical processes do not act in the region where neither cloud nor rain exists.The other categories show differences, which may or may not be important.To test this, the optimal fractions for DYCOMS-II RF02 shown in Table 3 are used for both RICO and DYCOMS-II RF02 simulations to be presented.Thereby, the RICO case is used to test the robustness of the DYCOMS-II RF02 sampling fractions.Figure 1.RICO: the root-mean-square error (RMSE) at the importance sampling level of SILHS simulations as a function of the number of sample points, for the RICO cumulus case.The error is time-averaged over the entire simulation.The 2Cat-CldPcp and 8Cat methods show a large improvement over the 2Cat-Cld and LHonly methods in the estimate of evaporation, but not for autoconvesion and accretion, which are in-cloud processes.Nevertheless, the 8Cat and 2Cat-CldPcp methods both impove the estimate of the sum of the three processes.

Results for RICO case
Figure 1 shows a plot of the root-mean-square error (RMSE) of the SILHS RICO simulations as a function of the number of sample points used.The 8Cat method has the smallest RMSE of all three methods when estimating the sum of autoconversion, accretion, and evaporation.The largest improvement of the 8Cat and 2Cat-CldPcp methods over the (old) 2Cat-Cld method is in sampling evaporation.In fact, even the LH-only method (no importance sampling at all) results in a better estimate of evaporation than the 2Cat-Cld method.The reason that evaporation is so www.geosci-model-dev.net/9/413/2016/Geosci.Model Dev., 9, 413-429, 2016 Table 3.Estimated optimal sampling fractions (γ j ) for each importance category, averaged over the entire simulation, for the RICO and DYCOMS-II RF02 cases.These estimates were obtained by using SILHS to estimate the right-hand side of Eq. ( 21) for each category.Here "c" denotes "in-cloud", "nc" denotes "out of cloud", "p" denotes "in-precipitation", "np" denotes "out of precipitation", "1" denotes "in mixture component 1", and "2" denotes "in mixture component 2".poorly sampled in the 2Cat-Cld method is that the 2Cat-Cld method performs importance sampling only within cloud.Indeed, for in-cloud processes, such as autoconversion and accretion, the 2Cat-Cld method equals or improves upon both the 2Cat-CldPcp method and the 8Cat method in the RICO cumulus case.However, the 2Cat-Cld method reduces the number of sample points outside of cloud, degrading the simulation of rain evaporation.In contrast, both the 2Cat-CldPcp and 8Cat methods preferentially sample within the region of the sample space containing evaporation (out of cloud but within precipitation), leading to large improvements.Table 4 compares how each of the four sampling methods allocates sample points.The table shows the percentage of sample points allocated to each category, averaged over the simulation.Comparing the allocation between the methods can give insight into the strengths and weaknesses of each method.For example, evaporation is best sampled by the 8Cat and 2Cat-CldPcp methods because they are the only methods that place a sizable number of points in the two categories that are in precipitation and outside cloud.The 2Cat-Cld and 8Cat methods give the best estimate of accretion because they place the largest number of points in the categories that are within cloud and precipitation.
Figure 2 shows, for the RICO case, time-series plots of the four tendencies at the importance sampling level.Again, the largest improvement can be seen in the sampling of evaporation.Looking at the 2Cat-Cld time series for evaporation, .RICO: time-series plots of the four tendencies at the importance sampling level.The simulations in these plots use 32 sample points, and the plots show minutes 3321 to 4320 of the simulations.To improve readability, the LH-only method is not plotted.The evaporation tendencies are much more noisy in the 2Cat-Cld method than in the 2Cat-CldPcp or 8Cat methods.
it can be seen that at many timesteps, no points are found in the evaporating region of the sample space (out of cloud and within precipitation), and the estimated evaporation tendency is zero.At the timesteps where one or more sample points are found in the evaporating region, the tendency estimate is very large because the evaporation rate within the evaporating region is much larger than the overall mean evaporation rate.
In the 2Cat-CldPcp and 8Cat simulations, the evaporating region of the sample space is well sampled, and sample points in this region have small weights, leading to an estimate that is much more comparable to the overall mean.
To assess the performance of the sampling methods at levels away from the importance sampling level, profile plots (over height levels) were generated for simulations with 32 sample points.To reduce the role of a "lucky" random seed in the comparison and thereby better distinguish the methods, an ensemble of 12 simulations was used.Figure 3 shows profile plots of the four tendencies over height levels.These plots are averaged over all 864 timesteps and over the 12 ensemble members, and serve to indicate that SILHS converges to the analytic solution at all height levels and not only the importance sampling level.Figure 4 shows the RMSE of the SILHS solutions at each height level compared to the analytic solution, for all timesteps and ensemble members.It can be seen that the 8Cat and 2Cat-CldPcp methods show improved results compared to the 2Cat-Cld method at height levels between 1000 and 2500 m.These height levels are where the improvement in the evaporation term is strongest.At levels below 1000 m (which are far below the importance sampling level of about 2000 m), all methods start to show considerable noise.It is interesting that this noise remains even after time and ensemble averaging.This highlights the large degree of variability in cumulus clouds and the need for careful parameterization of this variability.We note that, in this paper, only the profile plots display an ensemble average.The time-series plots display a single simulation so that individual sample values can be seen.The plots displaying RMSE vs. the number of sample points are not strongly influenced by the choice of random seed.

Results for DYCOMS-II RF02 case
The other simulated case is DYCOMS-II RF02, a drizzling stratocumulus case.Figure 5 shows a plot of the RMSE of the SILHS simulations as a function of the number of sample points.The LH-only, 2Cat-Cld, and 2Cat-CldPcp simulations all show approximately the same amount of noise.However, the 8Cat method reduces noise in autoconversion and accretion, thereby also decreasing noise in the sum of the three tendencies.
The similarity between the LH-only, 2Cat-Cld, and 2Cat-CldPcp methods is expected.The 2Cat-Cld method, like the previous version of SILHS, includes a condition that reverts to straight Latin hypercube sampling in the event that cloud fraction exceeds 50 %.The DYCOMS-II RF02 case is a stratocumulus case, and the cloud fraction is close to 100 % for much of the simulation.Therefore, the 2Cat-Cld method behaves identically to the LH-only method for much of the simulation.The 2Cat-CldPcp reduces the number of sample   to 360 of the simulation.The evaporation process is poorly sampled in all three sampling methods, but it is a relatively small term and makes a much smaller contribution to the sum of the three processes than autoconversion and accretion.
The reason for the improvement using the 8Cat method can be inferred from Table 5.The table shows the percentage of sample points allocated to each category at the importance sampling level, averaged over the entire simulation.The 2Cat-CldPcp, 2Cat-Cld, and LH-only allocations are all similar, but the 8Cat allocation places more points in (c,p,1) than (c,p,2).That is, unlike the other three methods, the 8Cat method is able to preferentially sample from mixture component 1. Component 1, in turn, contains larger cloud (liquid) water mixing ratios.The other three methods necessarily place more points in component 2 than component 1, because component 2 occupies more of the (original) PDF.However, it was shown in Table 3 that optimally, component 1 has a much higher per-probability sampling density than does component 2. This increased sampling of component 1 is the source of the improvement of the 8Cat method.
Figure 6 shows time-series plots of the four tendencies at the importance sampling level.The estimate of evaporation is noisy in the three plotted configurations, because evaporation occurs outside of cloud, and the region of the sample space outside cloud is poorly sampled by all three methods.However, evaporation contributes little to the overall sum because the original probability p j outside of cloud is so small.Table 6.Number of sample points needed by each configuration of SILHS to achieve a given RMSE in estimating the sum of the three processes, for the RICO and DYCOMS-II RF02 cases.These numbers are estimated visually from Figs. 1 and 5.In RICO, the 2Cat-CldPcp method requires approximately a factor of 4 fewer sample points than the 2Cat-Cld method to achieve an RMSE of 10 −8 kg kg −1 , and 8Cat method requires approximately a factor of 8 fewer sample points.In DYCOMS-II RF02, the 8Cat method requires approximately a factor of 1.6 fewer sample points than the 2Cat-CldPcp, 2Cat-Cld, and LH-only methods to achieve an RMSE of 4 × 10 −9 kg kg −1 .Figure 7 shows the mean profile plots of the four tendencies.Once again, an ensemble of 12 simulations was used, and each sampling method overplotted is averaged over all timesteps and 12 ensemble members.All of the lines look similar, which indicates that all three methods do a good job of sampling the three processes with 32 sample points per timestep, perhaps because the DYCOMS-II RF02 stratocumulus case is not highly variable.Figure 8 shows profile RMSE plots averaged over all timesteps and ensemble members.All sampling methods show the largest RMSE at around 800 m.At this level, the 8Cat method shows smaller error in autoconversion and accretion, but all methods show about the same error in evaporation.

Method
Table 6 shows a quantitative comparison of the four configurations for both the RICO and DYCOMS-II RF02 cases.For each sampling method, Table 6 lists the approximate number of sample points needed to obtain the given timeaveraged RMSE at the importance sampling level.These values are estimated visually from Figs. 1 and 5.This table shows that in RICO, the 8Cat method requires approximately a factor of 8 fewer points to achieve a desired RMSE than the 2Cat-Cld method.The 2Cat-CldPcp method requires approximately a factor of 4 fewer sample points than the 2Cat-Cld method.In DYCOMS-II RF02, the reduction of necessary sample points for the given RMSE for the 8Cat method as compared to the others is a factor of approximately 1.6.

Computational cost of the nCat method
An important consideration among Monte Carlo integration methods is their computational cost.The cost of the new nCat method was tested against both the prior SILHS importance sampling method and the cost of CLUBB.Eight SILHS sample points were used in each simulation.Five RICO simulations were performed, and Table 7 shows the means and standard deviations of the five simulations.Each time is a cumulative total of the respective component of the model over the entire simulation.The two nCat methods (2Cat-CldPcp and 8Cat) show no significant increase in computation time as compared to the original SILHS importance sampling method.All SILHS methods are about twice as expensive as CLUBB when eight sample points are used.These costs may be compared with other costs in global climate simulations.To this end, Thayer-Calder et al. (2015) tested the cost of SILHS in the Community Atmosphere Model (Neale et al., 2012).They show that an adequate cloud climatology can be obtained with as few as four sample points (see Figs. 12 and 13 of Thayer-Calder et al., 2015).The extra cost of computing four samples is (1.89-1.69)/ 1.69 = 16 % (see Table 2 of Thayer-Calder et al., 2015).

Conclusions
We have developed a new ("nCat") method to sample subgrid variability in atmospheric models.The method divides the grid box sample space into N cat categories and allows the modeler to prescribe the sampling probability in each category.The most flexible variant of the nCat method that we consider here breaks the grid box into eight categories, depending on whether a parcel contains cloud droplets or rain droplets, or is within the first mixture component of the PDF.This "8Cat" variant allows a fine degree of control over where the samples are placed.
Another variant has been created by lumping the eight separate categories into two: one that contains either cloud or precipitation, and one that contains neither cloud nor precipitation.This ("2Cat-CldPcp") variant is useful when the user does not have an estimate of the optimal sampling fraction for each of the eight categories.
We have tested the 8Cat and 2Cat-CldPcp methods on a drizzling cumulus case (RICO) and a drizzling stratocumulus case (DYCOMS-II RF02).The improvement we find relies on two aspects of the method.One aspect is an algorithm that limits the weight of samples and thereby increases the number of samples in "unimportant" but largeprobability categories.This helps prevent a user from becoming overzealous with importance sampling, thereby leaving excessive noise in "unimportant" categories.Another aspect is the choice of sampling variable to prescribe.We prescribe γ j (see Eq. 21), which is related to the density of sample points in a category.This prescription allows the sampling to behave well as the cloud fraction and precipitation fraction vary widely between stratocumulus and cumulus cases.
The finer degree of control over the sampling in the nCat method allows us to improve sampling in evaporating (i.e., precipitating but non-cloudy) regions.This turns out to be a key to the improvement in the results.Evaporation of precipitation is an important process in the RICO case, but precipitation evaporates within only a small portion of a grid box, a portion that the nCat method can preferentially sample.Such fine-scale control of the sampling is not possible in less flexible methods, such as the former method in SILHS, 2Cat-Cld, which does not allow importance sampling on precipitation.
Quantitative improvements are realized by the 2Cat-CldPcp and especially the 8Cat allocations.As compared to the 2Cat-Cld method, the 8Cat allocation allows a reduction in the number of sample points, given equal accuracy in the tendency of autoconversion plus accretion plus rain evaporation.The reduction is approximately a factor of 1.6 in DYCOMS-II RF02 and a factor of 8 in RICO (see Figs. 1 and  5).This permits a factor of 1.6 to 8 fewer calls to the microphysics code.If a computationally expensive microphysical parameterization were used, this would result in a considerable reduction in computational cost.
Figure2.RICO: time-series plots of the four tendencies at the importance sampling level.The simulations in these plots use 32 sample points, and the plots show minutes 3321 to 4320 of the simulations.To improve readability, the LH-only method is not plotted.The evaporation tendencies are much more noisy in the 2Cat-Cld method than in the 2Cat-CldPcp or 8Cat methods.

Figure 3 .
Figure3.RICO: mean profile plots of the four tendencies.The simulations in these plots use 32 sample points.For each configuration, an ensemble of 12 simulations is used, each with a different seed.Profiles are averaged over all 864 timesteps of the simulation and all 12 ensemble members.It is seen that all SILHS sampling methods are clearly convergent at all height levels.

Figure 4 .
Figure 4. RICO: profile RMSE plots of the four tendencies.The simulations in these plots use 32 sample points.For each configuration, an ensemble of 12 simulations is used, each with a different seed.RMSE values are averaged over all 864 timesteps of the simulation and all 12 ensemble members.The 8Cat and 2Cat-CldPcp methods show improvement between 1000 m and 2500 m, where the improvement in sampling of the evaporation term is largest.All three methods suffer from extra noise below 1000 m, which is far away from the importance sampling level.The importance sampling level is just under 2000 m for most timesteps in the simulation.

Figure 5 .
Figure5.DYCOMS-II RF02: the root-mean-square error (RMSE) of SILHS simulations as a function of sample points for the DYCOMS-II RF02 stratocumulus case.The error is calculated at the importance sampling level and is averaged over all timesteps of the simulation.The LH-only, 2Cat-Cld, and 2Cat-CldPcp methods are expected to have roughly the same behavior in a case like DYCOMS-II RF02 that has cloud fraction near 100 %.The 8Cat method still improves the estimates of autoconversion and accretion because it is able to flexibly allocate points within the cloudy region of the sample space.

Figure 6 .
Figure6.DYCOMS-II RF02: time-series plots of the four tendencies at the importance sampling level.The simulations in these plots use 32 sample points.The time range plotted includes minutes 161 to 360 of the simulation.The evaporation process is poorly sampled in all three sampling methods, but it is a relatively small term and makes a much smaller contribution to the sum of the three processes than autoconversion and accretion.

Figure 7 .
Figure7.DYCOMS-II RF02: mean profile plots of the four tendencies.The simulations in these plots use 32 sample points.For each configuration, an ensemble of 12 simulations is used, each with a different seed.Profiles are averaged over all 360 timesteps of the simulation and all 12 ensemble members.It is seen that all SILHS sampling methods are clearly convergent at all height levels.All of the lines overlap well, indicating that all three processes are sampled well by all three sampling methods.

Figure 8 .
Figure8.DYCOMS-II RF02: profile RMSE plots of the four tendencies.The simulations in these plots use 32 sample points.For each configuration, an ensemble of 12 simulations is used, each with a different seed.RMSE values are averaged over all 360 timesteps of the simulation and all 12 ensemble members.All sampling methods show the largest RMSE at around 800 m.At this level, the 8Cat method shows smaller error in autoconversion and accretion, but all methods show about the same error in evaporation.

Table 1 .
For each of the eight categories, this table lists (1) the category number; (2) whether the category is cloudy, in mixture component 1 or 2, or precipitating; (3) what inequalities must be satisfied for a sample point to lie within the category; and (4) the original probability mass associated with the category, p j .

Table 2 .
Notable configuration settings for the RICO and DYCOMS-II RF02 simulations performed in this paper.

Table 4 .
RICO: percentage of sample points allocated to each category by each sampling method at the importance sampling level, time-averaged over the entire simulation.The more sample points placed in a particular category, the better the estimate of processes active in that category.

Table 5 .
DYCOMS-II RF02.Percentage of sample points allocated to each category by each sampling method at the importance sampling level, time-averaged over the entire simulation.The more sample points placed in a particular category, the better the estimate of processes active in that category.

Table 7 .
Cumulative run time of CLUBB and the different SILHS configurations over an 864-timestep RICO simulation.Each SILHS configuration uses eight sample points.The means and standard deviations of five simulations are shown in the table.All times are in seconds.The two nCat methods (2Cat-CldPcp and 8Cat) show no significant increase in computation time as compared to the original SILHS importance sampling method.All SILHS methods, with eight sample points, are more expensive than CLUBB.