We describe an emulator of a detailed cloud parcel model which has been
trained to assess droplet nucleation from a complex, multimodal aerosol size
distribution simulated by a global aerosol–climate model. The emulator is
constructed using a sensitivity analysis approach (polynomial chaos
expansion) which reproduces the behavior of the targeted parcel model across
the full range of aerosol properties and meteorology simulated by the parent
climate model. An iterative technique using aerosol fields sampled from a
global model is used to identify the critical aerosol size distribution
parameters necessary for accurately predicting activation. Across the large
parameter space used to train them, the emulators estimate cloud droplet
number concentration (CDNC) with a mean relative error of 9.2 % for aerosol
populations without giant cloud condensation nuclei (CCN) and 6.9 % when
including them. Versus a parcel model driven by those same aerosol fields,
the best-performing emulator has a mean relative error of 4.6 %, which is
comparable with two commonly used activation schemes also evaluated here
(which have mean relative errors of 2.9 and 6.7 %, respectively). We
identify the potential for regional biases in modeled CDNC, particularly in
oceanic regimes, where our best-performing emulator tends to overpredict by
7 %, whereas the reference activation schemes range in mean relative error
from

Aerosols play a critical role in the climate system by interacting with
radiation through several different mechanisms. Depending on their
composition, aerosol particles can directly scatter or absorb incoming solar
radiation, leading to a direct radiative effect and rapid changes in the
energy budgets of the surface and atmosphere. Additionally, aerosol particles
mediate the production of clouds by providing surface area on which water
vapor may condense to form droplets. Through this second pathway, changes in
the aerosol population perturb the radiative properties of clouds by altering
their microstructure and life cycle, thereby impacting the planetary radiative
budget. Despite decades of focused research by the scientific community, the
radiative forcing produced through this second pathway, known as
aerosol–cloud interactions, remains one of the largest uncertainties in
understanding contemporary and future climate change on both regional and
global scales

To include this second pathway, contemporary Earth system models predict
cloud droplet number concentration (CDNC) by evaluating the nucleation of
droplets (aerosol activation) from their simulated aerosol fields. As a
result, these models can resolve aerosol–climate indirect effects which arise
when anthropogenic aerosol emissions influence cloud microphysical and
optical properties through impacting the CDNC burden. The interactions
between aerosol particles, water vapor, and cloud droplets are often
described using the conceptual model of a possibly entraining, adiabatic
cloud parcel

A comprehensive review of the development of these parameterizations is
provided by

In this present work, we extend this approach to develop a set of metamodels
trained for the aerosol and meteorology parameter space simulated by a global
aerosol–climate model. The resulting metamodels can thus be directly used as
activation parameterizations inside a global climate to predict online CDNC,
given information about the aerosol size distributions and subgrid-scale
meteorology in each model grid box. To refine the parameter space used for
emulation, we present an analysis of how the size distribution parameters of
each aerosol mode in our model contribute to activation dynamics and droplet
nucleation. We then evaluate the performance of our
emulator parameterizations versus two physically based schemes which are used
in the vast majority of contemporary global models

We seek to derive aerosol activation emulators for the multimode, two-moment,
mixing-state-resolving Model of Aerosols for Research of Climate (MARC;
version 1.0.1 here)

Table

The aerosol size distributions predicted by MARC interact with both radiation
and cloud microphysics. MARC uses a two-moment, stratiform cloud microphysics
scheme

MARC requires 24 parameters to fully describe its 15-mode aerosol size
distribution, although not all of these modes are important for aerosol
activation. Activation calculations additionally require three meteorological
parameters (temperature, pressure, and

As shown in Fig.

The distributions of key aerosol size distribution parameters are shown in
Fig.

Distributions of model-predicted instantaneous subgrid-scale vertical velocity for near-surface (below 700 mb) grid cells broken down by land (red) and ocean (black) regimes.

MARC aerosol-mode size distribution and composition parameters. The MOS mode (*) has a composition-dependent density and hygroscopicity which is computed using the internal mixing state of organic carbon and sulfate present at a given grid cell and time step.

Distributions of aerosol size distribution parameters for key modes
simulated by MARC. Vertical dashed lines indicate the 5th and 95th
percentiles of the sampling distribution for each parameter. Note that each
mode name corresponds to those in
Tables

The emulation method used by

Relative errors in

To further reduce the emulation parameter space, we assess the relative
importance of each individual aerosol mode and its influence on activation
dynamics using an ensemble of iterative, single-mode activation calculations
using a detailed reference parcel model

We apply this algorithm to a set of 50 000 aerosol size distribution and meteorology parameters taken from our reference MARC simulation. Overall, ACC is the dominant mode in 96.5 % of the sample cases. Infrequently, MOS and small dust (DST1) are the dominant modes, accounting for all of the remaining cases. When ACC dominates the activation dynamics, MBS/MOS or the smallest sea salt mode (SSLT1) is the second most dominant (in 10.3, 36.2, and 52.8 % of cases, respectively). In 85 % of all the sample cases, three of ACC, MOS, MBS, or SSLT01 comprise the top three dominant modes.

Figure

Input parameter space and bounds on associated uniform probability
density functions used to derive polynomial chaos expansions for MARC activation. For the lower
and upper bounds on the aerosol size distribution parameters, the parenthetical values denote
the percentile of the distribution for that parameter at which the bound occurs. All terms
are present for the main expansion; terms affixed with an (

Based on the sampling and iterative calculations presented in this section,
we define the activation emulation parameter space as in Table

The following sections briefly describe the chosen cloud parcel model and
emulation technique, polynomial chaos expansion. For more details on both
techniques and their application, we refer the reader to

Adiabatic cloud parcel models are a standard modeling tool for detailed
assessments of aerosol activation and other studies focused on the
composition of atmospheric particulates

At some time

We emulate the behavior of the detailed parcel model by applying the
probabilistic collocation method

The PCM is a non-intrusive technique which does not require modifications to
an existing model in order to be applied. Instead, the PCM treats the
original, full-complexity model as a black box and the chosen set of

Such an expression has

In order to compute the polynomial chaos expansions, we use the Design
Analysis Kit for Optimization and Terascale Applications

We extend the idealized calculations in

For emulation, all the aerosol size distribution parameters are transformed
using a logarithm, since they can take on values that span several orders of
magnitude. We then construct uniform distributions with the associated ranges
of values for each transformed parameter (Table

This methodology represents a compromise between using a high-fidelity
representation of the aerosol size distribution parameters for our emulation,
and the desire to build an emulator that can later be used in a GCM. However,
we note that the distributions of aerosol parameters simulated by MARC are
neither normal nor independent from one another. For instance, over remote
maritime regions, total aerosol number concentration tends to be small but
dominated by sea salt and small sulfate particles. In contrast, continental
regions with anthropogenic emissions may feature much higher burdens of
carbonaceous aerosol. Using both numerically generated orthogonal polynomials
and statistical transformations, both of these complications can be handled
directly

Predicted supersaturation maxima (%) from parcel model and
activation parameterizations: third-order emulator

These parameters are used to drive parcel model simulations where we record

The emulators constructed through this process are functions which map

From a prediction of the

We evaluate our emulators by applying them to both a synthetic sample of input parameters as well as real samples drawn from a MARC simulation. In all of our comparisons, we study third- and fourth-order chaos expansions both excluding (main) and including (gCCN) the coarse dust and sea salt modes.

As a reference, we compute activation statistics for each sample using both a
detailed parcel model and two widely used activation schemes. The first
parameterization, by

The same as Fig.

Using the parameter space defined in Table

Figure

Figure

Both of these sets of plots are repeated in Figs.

The same as Fig.

The same as Fig.

Summary statistics for error in supersaturation maxima and droplet number nucleated, predicted
by emulators and activation parameterization relative to corresponding simulations with
a detailed parcel model. From left to right, each column represents the coefficient of
determination (

The same as Fig.

These differences in bias are most likely related to the choice of parcel
model used in testing and building the ARG and MBN schemes; because each
scheme relies on some empirical tuning to parcel model calculations, details
in the implementation of each parcel model which influence its sensitivity
should show up in ensemble evaluations of each activation scheme. The
gCCN case is more taxing to simulate with parcel models using a
Lagrangian description of the particle size distribution, because
condensational growth is computed for each particle bin simultaneously. The
stiffness ratio in this case will be extremely large, as the liquid water
uptake by small particles in the main aerosol modes is much slower than those
in the giant CCN modes. Although modern ODE solvers can automatically handle
these scenarios, the subjective choice of which particular solver and how to
discretize the giant CCN population (how many bins per mode) could influence
the sensitivity of

To better summarize the results in Figs.

Although the sampling in the previous section fully exhausts the input
parameter space over which aerosol activation may need to be assessed, it
undoubtedly samples from aerosol and meteorological conditions which may not
be likely to occur in the real world. To better understand the performance
and potential bias of the emulators developed here and the existing
activation schemes, we also studied a sample of

Qualitatively, all of the activation schemes perform similarly when evaluated
against the MARC parameters as compared to the more generic sampling in the
previous section. Figure

The emulators derived here do not fare as well as the physically based
parameterizations when using the MARC samples. Both third-order schemes tend to
overpredict droplet number over oceans, and underpredict it over land, but
with an extremely large variance extending to

To contextualize these differences in

Distributions of relative error in scheme prediction of

Mean relative error in scheme prediction of

Figure

The ARG scheme is the original activation parameterization used within CAM5.3
to assess cloud droplet nucleation. We implemented the MBN scheme as an
alternative in MARC, as well as an interface for chaos expansion-based
schemes. To use the emulators derived in this work, one must provide a NetCDF
file which contains at least three pieces of information:

an

a

an

MARC caches these vectors and matrices in memory at startup, just as it caches several time-invariant terms used in both the ARG and the MBN schemes for each of the CCN-providing aerosol modes.

To estimate the impact of each scheme on MARC's performance, we performed a set of 3-month simulations initialized with fully spun-up aerosol and meteorology fields from a previous experiment. The simulations were conducted using 480 MPI tasks with two threads allocated to each task. Using the default configuration of MARC with the ARG scheme, the atmosphere component of the model averaged 6.1 s per model day. The MBN scheme averaged 7 % longer per model day, while the emulators tended to be comparable to the ARG scheme. Per model day, both the main schemes were comparable to within 0.4 % of the ARG scheme's performance, with the higher-order scheme costing an additional 0.16 %. Similarly, the gCCN schemes also compared similarly with the ARG scheme; the third-order scheme was 0.15 % faster than the ARG scheme, but the fourth-order scheme was 3 % slower.

Adding additional parameters to the chaos expansion underpinning the emulators would continue to add overhead to each evaluation by increasing the number of terms in the expansion. However, a larger penalty is incurred by increasing the expansion order for a given set of parameters, because this produces a much larger increase in the number of terms added to the expansion than adding a single parameter for the same order expansion. An assessment of the offline implementations of each scheme used in the analysis in the previous section yielded similar results.

In this work, we extend the metamodeling technique of

In ensembles of iterative calculations using a large sample of aerosol size
distributions from a coupled aerosol–climate model, we note that, typically, a
single mode tends to dominate activation or otherwise strongly predict the
total number of droplets nucleated. This approach to understanding the
sensitivities of activation dynamics on the underlying aerosol population is
distinct from previously published approaches in the literature. For
instance,

In terms of predicting CDNC, the accumulation-mode sulfate (ACC) alone serves
as a good proxy for the activity of a full aerosol population in many cases,
including in the presence of giant CCN and a wide swath of myriad updraft
regimes. However, it is known that giant CCN exert a larger influence on
precipitation formation in cleaner regimes

The fact that a single mode can place such a strong constraint on aerosol activation is useful for attempts seeking to extend look-up table methods for building parameterizations. If two modes – an accumulation size and a coarse size – accurately predict aerosol activation, then one can constrain the look-up table to just a few key aerosol size distribution parameters. The inclusion of variable aerosol composition would still likely make employing a look-up table in a global model unwieldy, though, necessitating more sophisticated approaches such as the metamodeling technique adopted here.

When sampling against the full training parameter space, our emulators
perform capably. Neglecting the influence of the giant CCN modes, the mean
relative error in predicting

Assessing the relative performance of activation schemes which, for all intents and purposes, perform extremely well at reproducing their own reference parcel models, is a critical step in establishing the parametric uncertainty in translating aerosol to droplet numbers and which underlies uncertainty in global model estimates of the indirect effect.

For this reason, we supplemented the evaluation of our emulators by using a second set of input parameter samples drawn from aerosol fields simulated by an aerosol–climate model. In contrast with previous studies, we use instantaneous fields in lieu of monthly or annual averages for our samples. Activation is inherently a fast process; because the microphysics schemes in aerosol–cloud models directly account for a tendency of new droplets formed via nucleation, the activation parameterization in any model will be called every time step and in every grid cell where clouds are occurring. Assessing activation schemes using temporally averaged aerosol fields risks missing some combinations of input parameters and limiting the range of values for which the scheme will need to accurately perform.

Most of the emulators and schemes tested here perform differently in oceanic
and continental regimes, owing to the relative abundance of natural and
anthropogenic aerosols in each. When focusing on the narrower range of
aerosol parameters present in MARC (in comparison with the larger parameter
space on which the emulators were trained), the emulators which explicitly
account for giant CCN perform poorly, especially in maritime regimes
dominated by sea salt. However, their counterpart performs nearly identical
to the ARG scheme, showing a slight overprediction of

The results presented here have important implications for global modeling
studies seeking to quantify uncertainty in the aerosol indirect effect on
climate. While different activation schemes generally perform equally well
when faced with idealized sets of input parameters

Future work should seek to systematically assess the differences in cloud
microphysical processes and aerosol–cloud interactions arising from the choice of
activation schemes in aerosol–climate models. As this work illustrates,
employing emulators of detailed parcel model calculations which include
complex chemical and physical effects on activation will aid with this task,
since additional effects (e.g., changes in droplet surface tension due to
organic surfactants;

A Git repository archiving the scripts used to generate
the chaos expansions can be found at

The authors declare that they have no conflict of interest.

The work in this study was supported by the National Science Foundation Graduate Research Fellowship Program under both NSF grant 1122374 and NSF grant AGS-1339264, the National Research Foundation of Singapore through the Singapore – MIT Alliance for Research and Technology and the interdisciplinary research group of the Center for Environmental Sensing and Modeling, and the U.S. Department of Energy, Office of Science (DE-FG02-94ER61937). We thank Steve Ghan (PNNL) and Athanasios Nenes (Georgia Tech) for reference implementations of their activation parameterizations. We would also like to thank Graham Mann and three anonymous reviews for comments that helped improve the manuscript. Edited by: G. Mann Reviewed by: three anonymous referees