GMDGeoscientific Model DevelopmentGMDGeosci. Model Dev.1991-9603Copernicus PublicationsGöttingen, Germany10.5194/gmd-11-3781-2018sympl (v. 0.4.0) and climt (v. 0.15.3) – towards a flexible framework for
building model hierarchies in Pythonsympl, climt – model hierarchies in PythonMonteiroJoy Merwinjoy.merwin@gmail.comhttps://orcid.org/0000-0002-3932-3603McGibbonJeremyhttps://orcid.org/0000-0003-4955-9772CaballeroRodrigohttps://orcid.org/0000-0002-5507-9209Department of Meteorology, Stockholm University, 106 91 Stockholm, SwedenAtmospheric Sciences–Geophysics (ATG) Building, University of Washington, Seattle, Washington 98195-1640, USAJoy Merwin Monteiro (joy.merwin@gmail.com)18September20181193781379413April201826April201820August201823August2018This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/This article is available from https://gmd.copernicus.org/articles/11/3781/2018/gmd-11-3781-2018.htmlThe full text article is available as a PDF file from https://gmd.copernicus.org/articles/11/3781/2018/gmd-11-3781-2018.pdf
sympl (System for Modelling Planets) and
climt (Climate Modelling and Diagnostics Toolkit) are an
attempt to rethink climate modelling frameworks from the ground up. The aim
is to use expressive data structures available in the scientific Python
ecosystem along with best practices in software design to allow scientists to
easily and reliably combine model components to represent the climate system
at a desired level of complexity and to enable users to fully understand
what the model is doing.
sympl is a framework which formulates the model in terms of a
state that gets evolved forward in time or modified within a specific
time by well-defined components. sympl's design facilitates building
models that are self-documenting, are highly interoperable, and provide
fine-grained control over model components and behaviour. sympl
components contain all relevant information about the input they expect and
output that they provide. Components are designed to be easily interchanged,
even when they rely on different units or array configurations.
sympl provides basic functions and objects which could be used in
any type of Earth system model.
climt is an Earth system modelling toolkit that contains scientific
components built using sympl base objects. These include both pure
Python components and wrapped Fortran libraries. climt provides
functionality requiring model-specific assumptions, such as state
initialization and grid configuration. climt's programming interface
designed to be easy to use and thus appealing to a wide audience.
Model building, configuration and execution are performed through a Python
script (or Jupyter Notebook), enabling researchers to build an end-to-end
Python-based pipeline along with popular Python data analysis and
visualization tools.
Introduction
The climate is a complex system composed of interacting subsystems
(atmosphere, ocean, cryosphere, biosphere, chemosphere), each encompassing a
broad range of physical and chemical processes with space and timescales
spanning many orders of magnitude. Modelling and understanding the climate
system in its entirety is a grand scientific and technological challenge. An
increasingly recognized strategy for tackling this challenge is to build a
hierarchy of models of varying complexity. Simpler models are more amenable
to in-depth analysis and understanding; the insight gained from these simpler
models can then be used to understand more complicated models, and so on
. Specifying which particular models should occupy each
rung in such a hierarchy is necessarily a matter of subjective choice, and
the questions of how to create a hierarchy and what models are desirable in a
canonical hierarchy has generated extensive discussions
. Our purpose here is to present a
modelling framework which enables climate scientists to easily and
transparently traverse the specific model hierarchy suiting their needs.
Designing and building a framework that facilitates traversing this hierarchy
remains a challenge. The lack of flexibility of existing climate models
forces scientists to spend a lot of time reading and modifying code to
construct alternative model versions that should in principle be
straightforward to build. Most modelling frameworks simply provide code to
exchange information between different physical domains such as the atmosphere
and the ocean (see , and , for a survey of modelling
frameworks), with each physical domain
being represented by a monolithic block of code. It was only with the advent
a new generation of frameworks like the Earth System Modelling Framework
(ESMF) that fine-grained
control over the components that constitute a climate model was made
possible. For instance, ESMF allows configuring components as trees, where
the leaf nodes could represent physical processes such as radiation and the
root of the tree could represent a physical domain such as the atmosphere.
Such a hierarchical ordering of physical processes is also present in Python-based modelling packages – previous versions of climt (Climate Modelling and Diagnostics Toolkit) and
climlab (https://github.com/brian-rose/climlab, last access: 20 August 2018) allow building models in a manner similar to
ESMF. However, the norm continues to be that climate model composition is
configured by name list variables and boolean flags in the code rather than
framework-based approaches (like the component trees that ESMF allows).
Another emerging concern in the scientific community is that of
the reproducibility of research . While publicly
available climate models do provide validated configurations that are in
principle completely reproducible, climate scientists routinely need to make
changes to the code that are hard to document (or understand). Short of
sharing a copy of the entire code base, such modifications makes it difficult
to reproduce simulations. While some level of code manipulation is
inescapable, we note from our own experience and from reading about such
modifications in the literature that most of them follow set patterns which
could easily be provided by modelling frameworks themselves.
In this paper we present a new modelling framework, sympl (System for Modelling Planets), and a
model toolkit, climt. While sympl focuses on providing a
model framework, a rich taxonomy of components, and model agnostic
configuration options, climt focuses on providing a broad array of
physical components to allow users to build scientifically useful models.
climt also provides model-dependent configuration options and helper
functions to create an initial model state as required by the components.
sympl and climt allow finer-grained control over the
composition of the model, with individual components representing physical
processes (such as radiation and convection) rather than physical
domains. Attempting to model the climate system at the physical process level
has its own challenges, which we attempt to solve in these packages.
Initiatives to build frameworks to traverse the hierarchy between highly
idealized models to full-scale GCMs (general circulation models) do exist
, but we believe the definition
of a clear set of classes to represent the physical process level of the
model to be unique.
sympl and climt allow writing models which are easy to use
and facilitate the reproducibility of simulations. Both these packages are
subject to ongoing development but have reached a level of maturity that
makes it worthwhile to document them here. In Sect. ,
we present a series of models written using sympl and climt
which illustrate the construction of a model hierarchy. In
Sect. we describe some challenges that modelling
frameworks have to solve (in the context of the above examples when possible)
and discuss how sympl and climt address these challenges.
We then discuss in more detail the interfaces of sympl
(Sect. ) and climt
(Sect. ). In Sect. we present some
benchmark calculations and conclude with a discussion of developments
planned in the future.
A Python script which calculates the heating tendencies and
associated diagnostics from a longwave radiative transfer component. See the
text for a detailed description.
sympl and climt in action
To illustrate the advantages of the fine-grained control that sympl
and climt offer, we consider a series of examples starting with a
diagnostic radiative calculation and ending with an idealized three-dimensional atmospheric general circulation model.
Figure shows the script required to calculate the
heating tendencies from a longwave radiative transfer component
. A detailed explanation of the script
follows:
Line 1: import the climt package.
Line 4: instantiate a radiative transfer component.
Line 7: create a state dictionary which contains all the
quantities required as inputs by the radiative transfer component.
Line 9: calculate the heating rate (available in
“tendencies” and any associated diagnostics such as the radiative fluxes (available in “diagnostics”).
In Fig. , we have used the default values for
quantities in model_state, which corresponds to an isothermal atmosphere.
This example, though seemingly simple, is remarkable because of the ease with
which such a diagnostic calculation can be performed. Traditionally, such a
calculation would involve writing a Fortran driver, compiling it with the
radiative transfer library, writing the output to a file, and then reading
the output file into a suitable environment for further analysis. The ease of
use illustrated in the above example is a direct result of the fine-grained
control that sympl and climt allow – individual components
can be configured and run (interactively, if desired) without having to
compile them with a driver file. To summarize, components, not
models, are first-class entities within the sympl framework.
It is worth examining this example in greater detail, since it highlights
some important features of sympl and climt that we will
look at closely in subsequent sections. The component called
“radiation” is an implementation of the sympl entity called
TendencyComponent, which is a template for components which
calculate tendencies of a quantity (air temperature in this case) based on
quantities in a state dictionary. We will encounter other kinds of components
in subsequent examples.
A Python script which calculates the radiative–convective
equilibrium temperature of an atmospheric column for a fixed surface
temperature. See the text for a detailed description.
Figure builds upon the previous example to
create a model that includes both radiation and convection, steps the state
quantities forward in time, and writes the output to a file. The changes from
the previous script are as follows:
Line 7: define the model time step using timedelta from the datetime library, which is part of any Python distribution.
Line 21: create a time integrator which steps the model state forward
in time, using tendencies generated by the radiation and convection components. The
integrator chosen is an instance of the sympl Stepper class which
implements a variety of Adams–Bashforth schemes.
Line 24: create a “monitor” component which writes the model state
to a netCDF file.
Lines 27–37: the boundary layer component provides new values of model
quantities, which are used to update the model state. The time integrator incorporates
the tendencies due to radiation and convection and provides new values for the model
state as well. The current model state is updated with diagnostics and written to a
netCDF file. The model state is then updated with the new model quantities to prepare for the next iteration.
This example illustrates how to piece together a radiative–convective
equilibrium (RCE) model from different kinds of sympl components.
This example also moves up the model hierarchy, away from static diagnostic
calculations to a model which evolves in time. It is worth noting that the
first half of the example which creates components and a model state remains
identical to the procedure followed in the previous example. This example
shows two notable features of the design of sympl and climt: (i) individual components can step the model state forward themselves; (ii) the model integrator is a separate entity in itself which can be
replaced easily (to use more stable integration schemes, for example). These
features are in keeping with our goal of capturing the diversity of model
components – some of which produce tendencies and others that produce new
state quantities – and allowing the user to have fine-grained control over model
configuration.
This example also illustrates how sympl's design provides a clear
understanding of the model to users. Configuring the model consists of
modifying a run script which is meant to be entirely legible to the user. By
reading the run script, one can see exactly which model components are being
used, how the state is initialized, what configuration options are being
passed to which components, what time integration scheme is used on which
components, the order in which components are being called, and the point
within the integration where the state is being output to a file.
A Python script which creates an idealized moist atmospheric general
circulation model. See text for a detailed description.
Figure presents the next step in the hierarchy from
Fig. , creating a moist atmospheric general
circulation model (AGCM) in an aqua-planet configuration with a prescribed
sea surface temperature. The default surface temperature is horizontally
uniform, but we omit prescribing a realistic temperature distribution for
purposes of comparison with Fig. . The main
differences from the previous example are as follows:
Line 14: the boundary layer component is wrapped to output tendencies
instead of a new state. This wrapper is required since the spectral dynamical core
must apply tendencies to most quantities in spectral space for numerical reasons.
Line 18: the spectral dynamical core is used as the time stepper
instead of the Adams–Bashforth scheme used previously.
Lines 22 and 25: since the model now has three dimensions, we require
a grid describing the latitudes and longitudes as well. Line 18 creates a model grid,
and the default model state created in Line 21 uses this grid to create an appropriate three-dimensional state.
A remarkable fact about this example is that it is only one line longer than
the previous example. Intuitively, this seems appropriate – an AGCM can be
thought of as a collection of radiative–convective columns which communicate
with each other via the dynamical core. However, in most modelling frameworks
the intuitive picture of the transition from an RCE to an AGCM does not
easily translate to code. This plug-and-play behaviour is a direct
consequence of the modularity of individual sympl components and the
fact that all components of a similar kind (AdamsBashforth and
GFSDynamicalCore in this case) implement the same interface, making
it possible to reuse almost the entire model script from the RCE case. In
this way, sympl and climt allow constructing models in such
a way that a change that intuitively seems small also translates to a code
change that is small.
In this following section, we look in detail at the design decisions that
allow for the construction of a model hierarchy as described in the three
examples in this section.
Design considerations and choices
In this paper we distinguish between modelling frameworks, model toolkits,
and models themselves. A framework (such as sympl) consists of
abstractions of “infrastructure” code that allow the creation of climate
models. A framework creates rules one must follow but by doing so ensures
models using the framework are easier to write, understand, and combine with
one another. A toolkit (such as climt) implements those abstractions
for concrete physical processes and may provide additional functionality
that is not covered by the framework. A model itself (such as in
Figs. and ) is written
using components that may come from a toolkit which follows the guidelines of
the framework.
Modelling frameworks should enable scientists to intuitively combine model
components and create an appropriately complex model for the scientific
question at hand. The user should also be able to specify details such as the
order in which components are called and the time stepping schemes used.
Model toolkits should provide a wide variety of components that enable users
to write a model appropriate to the question at hand. Toolkits should also
maintain a list of quantities and numerical grids that are required by the
components it provides to facilitate creating model arrays. It is also
desirable that the process of creating a model is fairly easy to understand
and that the model code be self-documenting to eliminate the need to write
additional documentation whenever possible.
Component diversity
One of the major aims of sympl and climt is to allow
fine-grained control over the processes that constitute an Earth system
model. In currently available modelling frameworks, the integration of
scientific code (or model components) and the modelling framework happens at
the physical domain level – atmosphere, ocean, land, and so on. The processes
that operate within each physical domain (fluid dynamics, radiation,
convection etc.) are not accessible in a systematic manner, and their code
becomes tightly coupled and difficult to modify. For example, changing the
radiation code in an atmospheric model is typically harder than changing the
kind of ocean the atmosphere is coupled to (slab, dynamical, etc.). Tight
coupling of code at the process level can make these models more efficient but at the cost of scientist efficiency, as the code is significantly less
flexible and have undocumented and complicated interfaces between components.
Furthermore, the fact that certain process-level components are written to
work only with certain other components demands an “architectural unity”
which might also encourage tight integration
of model components. Within sympl, it may still be the case that two
components are incompatible with one another (for example, using different
thermodynamic quantities), but because their interfaces are clearly defined,
it is easier to couple these components (for example, by converting
thermodynamic quantities using an additional component). Other
incompatibilities are handled automatically. For example, sympl
automatically ensures components which use different units or dimension
orderings can be used alongside one another.
If we allow for configuration at the process level, we are then faced with
model components which behave quite differently: some components (like
radiation) return tendencies, while others (like large-scale condensation)
return a new value of a physical quantity. sympl provides a set of
component classes that is comprehensive enough to capture the diversity of
component behaviours.
An example of a sympl component is depicted in
Fig. .
The general code layout for a sympl component.
input_properties and
tendency_properties are “property” attributes containing details
of the required inputs and returned outputs, tendencies, or diagnostics.
Here, the input is a quantity
air_temperature whose horizontal dimensions are latitude and longitude and whose values in the vertical are defined at model
mid-levels. The units are specified for each quantity. The
array_call method will be called by sympl with numpy
arrays that are automatically extracted from the model state to satisfy the
input property specifications. It is written to return numpy arrays as output
which satisfy its tendency and diagnostic property specifications. The
__call__ method which is called, for example, in Line 9 of
Fig. , is implemented by the base sympl
classes. This method encapsulates the boilerplate code of performing
consistency checks on the model state, extracting numpy arrays with the
correct units and shape, and creating a new state dictionary from the arrays
returned by array_call.
Dimension and unit requirements in sympl are not restrictions on the
inputs but rather describe the internal representation used by the
component. sympl will automatically convert the input state to
satisfy these requirements and raise an exception if that is not possible.
In this way, the property dictionaries act as self-documenting code, which
both documents the component interface and is used to convert input arrays to
the desired dimension ordering and units for the component.
As a TendencyComponent (discussed in Sect. ),
this component outputs tendencies of a quantity air_temperature.
The array_call() method accepts a dictionary containing just the
numpy arrays extracted from the model state with the correct units
and dimensions and returns the temperature tendency as specified in the
component properties.
Since components in sympl and climt are first-class
entities, they are not dependent on any other code for execution –
Fig. shows how to interact solely with a radiative
component without the need for an integrator or any other entity. This
behaviour serves an important educational purpose and facilitates diagnostic
calculations made very often during research.
Model configuration
Together, sympl and climt allow a natural way of
configuring various aspects of a climate model listed below:
physical configuration (toolkit agnostic) – the physical constants required by the model components;
algorithmic configuration (toolkit specific) – the “tunable” parameters which modify
the behaviour of the algorithm which represent physical processes, for example, the entrainment
coefficient in a convective parameterization;
interfacial configuration (toolkit agnostic) – modifications applied to inputs and
outputs at the interface of a component, further described below;
memory and computing resource configuration (toolkit specific) – the layout of arrays
used by the model and the distribution of the model components over the available number of
processors and coprocessors;
compositional configuration (toolkit specific) – the components that compose a model,
any dependency between components, and the order in which components are executed all need to be described.
Figure shows types of configuration information and
where such a configuration lies in climt/sympl and in
traditional climate models. In contrast to sympl, where all
configuration passes through a readable run script, the sheer variety of
locations where the configuration resides in traditional models makes it hard
to keep track of what configuration information has changed and how configuration changes affect model runs. This makes it daunting for beginners
to write models. Model configuration using sympl and climt
is highly centralized and easily accessible – all configuration passes
through the model run script, which is written to be readable and accessible
to model users. Such a centralized configuration reduces errors arising due
to a misconfiguration of the model.
While most of the configuration elements listed above are familiar,
interfacial configuration is mostly unheard of and usually applied in a
non-systematic manner within climate models. The
TimeDifferencingWrapper used in Line 14 of Fig.
is an example of the interfacial configuration of a climate model component.
There are a variety of interfacial modifications which are commonly applied
to components and can be applied in a consistent manner across components.
For instance, model components could
normally return a new value rather than a tendency of some physical quantity,
but in certain instances a tendency might be desirable (as in
Fig. );
provide output that is piecewise constant in time – the output is updated only
once every N iterations, and the same value it output for the next N-1
iterations; this behaviour is normally used in radiative transfer codes;
scale some of its inputs or outputs by some floating point number; this kind of
behaviour is desirable, for example, in mechanism denial studies.
We can interfacially modify certain behaviours of model components, such as
the ones above, by interacting with only the inputs and outputs of scientific
components. sympl formalizes such a configuration by providing wrapper
objects like TimeDifferencingWrapper.
The variety of configuration options in a climate model and in the
sympl - climt framework. The coloured boxes indicate the type of
configuration and the white boxes parts of the model code base. Arrows from
the coloured to the white boxes indicate the part of the code base that is
typically responsible for the configuration from which the arrow originates.
A particular type of configuration could exist in two different parts of the
code base: such a situation is indicated by multiple arrows terminating at
all the relevant white boxes. Note that not every climate model uses all
configuration options. The starred box on the sympl + climt side
indicates functionality that is not yet implemented.
Model arrays
Model arrays are contained within the DataArray abstraction provided
by xarray rather than the more low-level
numpy array. DataArrays are an abstraction over
numpy arrays with a more natural fit to climate data by providing
labelled dimensions and metadata storage capabilities.
sympl's DataArray object is a subclass of the
xarray DataArray that provides unit handling and conversion and
will be described subsequently. This exposes the powerful analysis
capabilities of xarray, allowing users to build an end-to-end
pipeline entirely within Python, from simulation to data analysis and
the generation of publication-ready figures.
Since low-level array operations using numpy and xarray are
fairly simple, especially changing coordinate ordering and C/Fortran memory
ordering, climt only provides guidelines for memory layout of
arrays. However, changing the ordering of dimensions in memory will incur the
performance penalty of copying the array if the model calls compiled code.
In the interest of the readability of component and model code, sympl
strongly encourages and climt makes use of descriptive names for model
quantities, adhering to the climate and forecast (CF) conventions when
possible.
http://cfconventions.org/Data/cf-standard-names/48/build/cf-standard-name-table.html
(last access: 20 August 2018).
Most pre-defined quantities in climt use
names derived from the CF conventions. One additional suffix that we found
necessary to use was on_interface_levels to distinguish between
quantities defined on the interfaces and mid-levels of the vertical grid. For
example air_temperature refers to the air temperature defined at the
vertical grid centre, whereas air_temperature_on_interface_levels
refers to the same quantity defined at the vertical grid edges. This
convention is only a requirement within the model state. Within an individual
component, shorter names can be used for variables representing quantities.
This shorter name is also contained in the property dictionary of the
component, serving as documentation for the meaning of that shorter variable
name.
Flow of data for each type of component in sympl. The four panels
are as follows. Panel (a) shows DiagnosticComponent, which creates
diagnostics (purple box) based on the current state. The current state is
updated with the resulting diagnostics, resulting in the updated current
state. Panel (b) shows Monitor, which can “store” the updated
current state into some format like NetCDF or plots. Panel (c) shows
Stepper, which determines a new state from the current state and a time
step and also diagnostics from the time of the current state which are used
to update the current state. The current state could then be passed on to
Monitor components (see panel b). Panel (d) shows
TendencyStepper, which is a special case of Stepper initialized with a
list of TendencyComponent objects (denoted by green boxes within a grey
box). It passes the current state on to those TendencyComponent objects to
compute tendencies and diagnostics, which are used to compute its outputs
(generally using a time stepping scheme). In all figures, dark green boxes
denote components, light green boxes denote optional wrappers, orange boxes
indicate the state dictionary at different times, and purple boxes indicate
tendencies and diagnostics generated by components. The converging arrows at
a summation symbol (plus sign inscribed within a circle) denote updating the
state dictionary (orange) to include output (purple) quantities. Examples of
wrapper placements are not exhaustive and are meant only as examples.
Modelling language
Python was used as the language to write the framework. Python as a language
and the Python ecosystem have a number of desirable features, all of which
were taken advantage of during the development of sympl and
climt:
Earlier versions of climt were written in Python, which gave the
authors an idea of the convenience and flexibility it afforded. In particular, the
object-oriented capabilities of Python provide a straightforward way to represent
the component-based architecture adopted by almost all climate modelling frameworks.
Scientific libraries within the Python ecosystem now offer acceptable
performance for computationally intensive operations typically used in climate models.
The Python ecosystem includes many libraries which can be useful in developing
climate models. Examples include machine learning, graphics, and web service libraries.
Python's ability to act as a glue language allows interfacing with the large
number of libraries for climate modelling already available in Fortran.
Tools available in the Python ecosystem like Jupyter, pytest, and sphinx enable writing reproducible workflows and code that is well
documented and tested.
sympl – design and programming interface
sympl conceives of a climate model as a state that is continuously
updated by various components. sympl's taxonomy consists of seven
kinds of components. Four of these component types are used to represent
physical processes and the remaining represent other functionality required
to build and run models.
TendencyComponent objects like RRTMGLongwave in
Fig. , which take the model state as input and return tendencies of quantities and values of
quantities defined at the time of the input state.
Stepper components like SimplePhysics in
Fig. , which take the model state and a timestep as input and return values of quantities defined at
a new time (after the timestep) and optionally diagnostic values of quantities defined at the
time of the input state. These are implicit as they define the target model state in
terms of the target model state (e.g. that the target state is not supersaturated).
DiagnosticComponent objects which take the model state as input and
return quantities defined at the time of the input state as output.
ImplicitTendencyComponent objects like EmanuelConvection in
Fig. , which return tendencies but require the
model timestep to produce these tendencies. This is required when the
tendencies are defined in terms of the target model state, as is often done
in convection schemes or flux limiters. These should generally be written as
Stepper components which can later be wrapped into
ImplicitTendencyComponent objects if needed (such as with SimplePhysics in Line 14 of
Fig. ).
TendencyStepper components like AdamsBashforth in
Fig. , which contain a set of TendencyComponent
objects and use the tendencies they output to integrate the model state forward in time.
Monitor components like NetCDFMonitor in
Fig. , which provide a store method which takes the model state as input and “stores”
it. The implementation of this method is left to the user and is currently used for
NetCDF output and plotting.
Wrapper components like TimeDifferencingWrapper in
Fig. , which contain other sympl components and modify the inputs passed to
or outputs generated by the “wrapped” component.
Schematics of how the above components interact with the model state are
presented in Fig. . A DiagnosticComponent
object (panel a) is very simple, producing diagnostic quantities that are
inserted or updated in the current model state. Monitor objects
(panel b) take in the model state and perform some action using it. The Stepper object (panel c), which steps the
model state forward in time, is slightly
more complicated. In addition to producing a new state, it can
produce diagnostic quantities which are inserted or updated in the current
model state. Panel (d) is the most complex, depicting how
TendencyComponent objects are used with a TendencyStepper
to update the model state. Once created, a TendencyStepper behaves
exactly like a Stepper object. Internally, it provides the input
state to the TendencyComponent objects to compute tendencies and
uses those tendencies according to its time stepping scheme to evolve the
model state forward in time. The TendencyStepper provides the same
model state to all TendencyComponent objects it contains and sums
the tendencies before stepping forward in time (see
Fig. d). Using other time marching algorithms such
as sequential tendency or sequential update splitting
will require users to implement their own
TendencyStepper object or to call several Stepper and
TendencyStepper components in sequence.
As mentioned previously, wrapper components contain a wrapped component
and modify the inputs or outputs, changing how the component appears to
behave. Currently, sympl has the following wrappers:
TimeDifferencingWrapper creates tendencies from the output of a Stepper component by first-order differencing. This creates an
ImplicitTendencyComponent from a Stepper component, which
is required when using spectral methods.
UpdateFrequencyWrapper calls a wrapped TendencyComponent
only after the user-specified time interval has elapsed and until then outputs
the previously returned value. In effect, this creates a piecewise constant output
tendency which can reduce the computational load during a simulation. This is
often used on radiation schemes.
ScalingWrapper scales the inputs passed into the wrapped component
and the outputs (new state, tendency, or diagnostic) returned by the component.
This taxonomy of components is larger than those typically used in modelling
frameworks. For example, ESMF only considers two kinds of components –
Gridded and Coupler components. However, as discussed previously, this
extended taxonomy is required to capture the diversity of components that
arise if models are written to be configurable and modular at the process
level.
Model state and the DataArray abstraction
The model state is a dictionary whose keys are the names of model quantities,
and values are sympl DataArray objects. The model state also contains a
required keyword time whose value is an object that implements the Python
datetime or timedelta interface. sympl provides an interface to use the
datetime objects from the cftime package to support several different
calendars,
https://github.com/Unidata/cftime (last access:
20 August 2018).
as well as dates not supported by the numpy datetime64 or
built-in datetime objects. A schematic of the model state is presented in
Fig. . sympl does not put hard restrictions on the
name of model quantities, though standardized names should be used to ensure
inter-package compatibility. All DataArray objects must define a string
attribute called “units”, which is used to convert the data contained
within to the appropriate units requested by a component. The units
conversion is performed internally using the Pint
library.
https://pint.readthedocs.io/en/latest/ (last access:
20 August 2018).
Since the actual contents of the state are dependent on
model details, sympl assumes that the initialization of the model state
will be done by a model package (such as climt) or by the user.
The model state as in the sympl framework. The state in the orange
box contains all the information that is stepped forward in time. Each
DataArray contains information such as the quantity name, units, and
dimensions/coordinates.
Physical constants
sympl maintains a unit-aware library of constants which can be accessed or
modified by model packages and by the user through get_constant() and
set_constant() functions. For example, planetary_rotation_rate can be
changed with a single function call. The unit handling is important to ensure
constants are given to components in the units they each require. For
example, the Rapid Radiative Transfer Model (RRTMG) radiative transfer code
requires physical constants in centimetre–gramme–second (CGS) units.
For the purposes of logging, physical constants are classified into various
categories:
planetary constants such as rotation rate and acceleration due to
gravity;
physical constants such as the speed of light;
atmospheric constants such as specific heat of dry air and reference air
pressure;
stellar constants such as stellar irradiance;
condensible constants which refer to the thermodynamic properties of the
condensible (in all three phases) in the atmosphere;
oceanographic constants such as the reference sea water density.
We chose to keep the constants related to the condensible component of the
atmosphere separate to ensure sympl is flexible enough to handle
general planetary atmospheres. sympl provides a function
set_condensible(), which allows switching all constants related to
the condensible. For example set_condensible(``methane'') will
replace all condensible constants (such as density of liquid, solid, or gaseous
phases and the latent heat of condensation) to those corresponding to methane,
provided such constants are already in the constants dictionary. The default
condensible is water, which is currently the only condensible compound for
which default values are given by sympl.
Modelling using sympl
A typical workflow when using a model written using sympl, as seen
in previous examples such as Fig. and
Fig. , might involve the following steps:
Initialize model components, providing configuration information.
Use Wrapper components to modify the behaviour of any components if necessary.
Initialize model state which contains all quantities required by the selected components.
Use TendencyStepper to collect all TendencyComponent components
into a component that can step the model state forward in time.
Begin the model main loop.
Call DiagnosticComponent to compute any derived quantities from prognostic
quantities or provide forcing quantities at a given time step.
Call Stepper components and get a new state dictionary with the updated model
quantities and any diagnostics. Update the initial model state with diagnostics.
Call TendencyStepper and get a new state dictionary with the updated model
quantities and any diagnostics. Update the initial model state with diagnostics.
Call any Monitor components to store the initial model state (e.g. store to
disk, display in real time, send over the network).
Increment model time and repeat the model main loop.
climt – design and programming interfaceModel state, quantity dimensions, and output dictionaries
For initialization, climt provides the functions
get_grid() and get_default_state(). get_grid()
creates quantities that define the grid such as latitude, longitude, and air
pressure. get_default_state() accepts a list of components and
optionally a state with grid quantities and creates a state dictionary which
satisfies the input requirements for those components. Default values of each
model quantity are defined centrally in climt. The default values
provided are scientifically meaningful and can be used without modification
for certain simulations.
Model composition
Currently, the creation of the model and running the simulation loop is done
by hand, which provides a better understanding of what the model is doing but
increases the verbosity of model code. In the near future, climt
will provide an additional class called Federation which automates
the process of creating a model from its components. Federation
would not require that the user know the difference between a
TendencyComponent and Stepper (for instance) and their
different call signatures or that TendencyComponents require a
TendencyStepper to step the model state forward in time. This makes
creating models, easy especially for those who are not familiar with climate
modelling. The tradeoff is that the run script will not explicitly describe
the sequence of the main loop because that information is hidden within the
Federation code, but this can be desirable for certain applications,
particularly in education. As mentioned before, this automation is possible
only because of the rich taxonomy of components sympl provides.
Features and software engineering
climt currently has the following components that can be used to
build models.
All components written in pure Python were written by
the authors for use in climt.
RRTMG longwave and shortwave radiative transfer : this is a Fortran component
accessed via a Cython wrapper. RRTMG is a state-of-the-art radiative transfer
code used in many climate models.
Grey gas radiation scheme: simulates radiative transfer in a grey gas. This component is
accompanied by another component that provides an optical depth distribution which mimics the effect of water vapour
. These components are written in pure
Python. This radiative scheme has been used in many idealized climate
dynamics simulations to isolate the thermodynamic effects of latent heat
release from the radiative effects of water vapour, which is a strong
greenhouse gas.
Insolation: this component is written in pure Python. It provides the
solar zenith angle based on the time available in the model state. This zenith
angle is used in radiative transfer codes. Currently, this component uses
approximations and orbital parameters which make it highly accurate for Earth
but inapplicable to other planets.
Emanuel convection scheme : this is a Fortran
component accessed via a Cython wrapper. It is a mass-flux-based convection scheme
which is based on the boundary layer quasi-equilibrium hypothesis .
Grid scale condensation: this is written in pure Python. It calculates the
water vapour and temperature fields in the atmosphere after condensing out excess
water vapour to keep the atmospheric column from becoming supersaturated.
Spectral dynamical core: this component is derived from the General Forecast
System (https://github.com/jswhit/gfs-dycore, last access: 20 August
2018). It is a Fortran module accessed via a Cython wrapper. It uses a
high-performance spherical harmonics library shtns
(https://bitbucket.org/nschaeff/shtns, last access: 20 August 2018). It
is parallelized using OpenMP and therefore is most
effective on shared memory systems. The dynamics are stepped using an
implicit–explicit total variation diminishing Runge–Kutta 3 time stepper.
The physics tendencies are stepped forward using a forward Euler scheme.
Simple Physics package for idealized simulations : this is a Fortran module accessed via a Cython wrapper. It provides initial conditions
which can be used for testing moist dynamical cores and also provides a simple
diffusive boundary layer suitable for idealized simulations.
Slab surface: this component is written in pure Python. It allows for a
prognostic surface temperature by calculating the surface energy budget. It is
flexible enough to represent land or ocean. It currently does not account for
localized heat fluxes.
Sea/land ice model: this allows for snow and ice layers and energy-balanced top and bottom surfaces. This component is written in pure Python.
It is flexible enough to represent ice/snow growth and melting. It is capable
of representing sea or land ice based on the surface type available in the model
state. It currently cannot handle fractional land surface types.
Held–Suarez forcing : this component is written in
pure Python. It provides an idealized set of model physics which can be used
for testing dry dynamical cores and idealized simulations.
Initial conditions from the dynamical core Model Intercomparison Project (MIP) (DCMIP)
(https://www.earthsystemcog.org/projects/dcmip/, last access: 20 August
2018): this is a Fortran module accessed via a Cython wrapper. It provides
initial conditions for a wide variety of tests which allow assessing the
conservation properties of dynamical cores.
This set of components allows building a hierarchy of models ranging from
single-column radiative–convective models to energy-balanced moist
atmospheric general circulation models. Because of the fine-grained
configurability of sympl/climt, the difference between the
number of lines of code required to build a single-column model and a moist
GCM is only around 10 lines of Python code (see scripts in the Supplement).
More importantly, most of the code is reusable when moving from a simpler to
a more complex model.
Both sympl and climt are open-source projects, licensed
under a permissive BSD license. Both packages are available on Mac, Linux, and
Windows platforms and can be directly installed from the Python Package
Index using one-line commands:
This eliminates the need to download source code from GitHub. The Python
Package Index projects are located at
https://pypi.python.org/pypi/sympl (last access: 20 August 2018) and
https://pypi.python.org/pypi/climt (last access: 20 August 2018)
respectively.
climt also provides binary releases on all supported platforms, eliminating
the need to have a compiler on the user's system. sympl is written in pure
Python and does not have any compiler requirements. Both packages are
regression tested using the online services TravisCI
(https://travis-ci.org/, last access: 20 August 2018) and AppVeyor
(https://www.appveyor.com/, last access: 20 August 2018). Both packages
also maintain regularly updated documentation at
http://sympl.readthedocs.io/en/latest/ (last access: 20 August 2018)
and http://climt.readthedocs.io/en/latest/ (last access: 20 August
2018).
The mean equilibrium profiles in the radiative–convective single-column model. The mean temperature is presented in (a), along with
the dry and moist adiabats which are orange and green respectively. The mean
heating profiles are presented in (b), which shows a stratosphere
in radiative equilibrium and a troposphere in radiative–convective
equilibrium.
The zonal mean equilibrium profiles in the idealized GCM runs with
no seasonal cycle. The plotted fields are, in clockwise order from the top
left, the zonal winds, air temperature, convective heating rate, and specific
humidity respectively. The y axis is in model levels, and the x axis is in
degrees.
Some benchmark simulations
The first simulation is that of an atmospheric column that is run to
equilibrium in the presence of radiation and convection. This model uses the
RRTMG longwave and shortwave components, the Emanuel convection scheme, the
Simple Physics component as its boundary layer scheme, and a slab ocean of
thickness 50 m. The model timestep is 5 min and the results presented in
Fig. are the mean between 15 000 and 20 000
timesteps. The air temperature transitions from a dry adiabat in the boundary
layer to a moist adiabat in the free atmosphere until the tropopause. The
diabatic heating balance changes from a balance between radiation and
convection in the troposphere of the model to a pure radiative equilibrium in
the stratosphere.
The second simulation is of an idealized aqua-planet GCM with fixed equinoctial
insolation. As mentioned before, the modular nature of our framework allows
the reuse of much of the runscript code from the above single-column model. It
consists of all the components used in the previous model along with a
dynamical core which is used as the time stepper. The model was run for 2 years and the results presented are the mean over the last 6 months. The
simulated climate of the model is as expected from such a configuration: the
zonal mean zonal winds show two strong westerly jets which penetrate to the
surface. The zonal mean temperature shows a distinct tropical cold point and
an increase in the temperature above the tropopause. The zonal mean
convective heating rate shows deep heating in the tropics and much shallower
heating in the subtropical areas dominated by the descent of air. This simulation
ran at a resolution of 128 longitudes, 62 latitudes (or T42 resolution), and
28 levels.
Conclusions and future avenues
sympl and climt represent a novel approach to climate
modelling which provides the user with fine-grained control over the
configuration of the model. sympl provides a rich set of entities
which describe all functionality typically expected of a climate model. This
set of entities (or classes) allows climt to be an easy to use
climate modelling toolkit by allowing decisions about model creation and
configuration to be made at a single location (the run script) and without
ambiguity. The modular nature of the packages allows for code reuse as one
traverses the hierarchy of models from single-column model to three-dimensional GCMs. We attempt to address concerns about plug-and-play type
architectures by ensuring the inputs and
outputs of each model are cleanly documented, which makes it clear whether
components are compatible or not. The use of Python allows for delegating
computationally intensive code to compiled languages while still providing an
intuitive and clean interface to the user. This choice also allows users
access to a large variety of libraries written in Python for purposes ranging
from machine learning to visualization .
The main focus in the near future would be to add more components, especially
a cloud microphysics scheme, to allow sympl/climt to simulate a more
realistic benchmark climate. Due to its flexibility, we believe our modelling
framework is well suited to the simulation of general planetary atmospheres
and for exoplanet modelling, and adding components relevant to these fields
will also be a priority. Another important component to add would be a
flexible grid interpolation component to allow interaction between components
based on different model grids. While care has been taken to ensure that
parallel computing is possible, we have yet to address the question of
distributed memory and computing. While building models in a simple
Message Passing Interface (MPI) scenario seems feasible in the near future, more sophisticated configurations
with components running in parallel will need some thought and design.
Nevertheless, sympl and climt represent an important step
towards creating flexible, usable, and readable models. We hope that they will
be a useful addition to the growing collection of Python-based tools
available to the climate science community.
sympl is available at
https://github.com/mcgibbon/sympl (last access: 20 August 2018). The
digital object identifier (DOI) for the version documented in this paper is
https://zenodo.org/record/1346405.
climt is available at https://github.com/CliMT/climt (last access:
20 August 2018). The digital object identifier (DOI) for the version
documented in this paper is https://zenodo.org/record/1400103.
The supplement related to this article is available online at: https://doi.org/10.5194/gmd-11-3781-2018-supplement.
All three authors were involved in the design of
sympl and climt. JMM and JM were involved in
writing code for the packages. RC was the main author of earlier versions of
climt. All three authors contributed to the writing of the paper.
The authors declare that they have no conflict of
interest.
Acknowledgements
Joy Merwin Monteiro and Rodrigo Caballero acknowledge a research grant from
the Swedish e-Science Research Center (SeRC). Jeremy McGibbon was funded by
DOE grant DE-SC0016433 as a contribution to the CMDV (CM)4 project and by
the Natural Sciences and Engineering Research Council of Canada (NSERC)
Postgraduate Scholarship Doctoral Program (PGS-D).The article processing charges for this
open-access publication were covered by Stockholm University.
Edited by: Carlos Sierra
Reviewed by: Spencer Hill and one anonymous referee
ReferencesAlpire, A.: Predicting Solar Radiation using a Deep Neural Network,
available at: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215715
(last access: 20 August 2018), 2017.Clough, S. A., Shephard, M. W., Mlawer, E. J., Delamere, J. S., Iacono,
M. J., Cady-Pereira, K., Boukabara, S., and Brown, P. D.: Atmospheric
radiative transfer modeling: a summary of the AER codes, J. Quant.
Spectrosc. Ra., 91, 233–244, 10.1016/j.jqsrt.2004.05.058, 2005.
Dagum, L. and Menon, R.: OpenMP: an industry standard API for
shared-memory
programming, IEEE Comput. Sci. Eng., 5, 46–55, 1998.DeLuca, C., Theurich, G., and Balaji, V.: The Earth System Modeling
Framework, in: Earth System Modelling, vol. 3, SpringerBriefs in
Earth System Sciences, Springer, Berlin, Heidelberg, 43–54,
10.1007/978-3-642-23360-9_6, 2012.Donahue, A. S. and Caldwell, P. M.: Impact of Physics Parameterization
Ordering in A Global Atmosphere Model, J. Adv. Model. Earth Sy.,
10, 481–499, 10.1002/2017MS001067, 2018.Emanuel, K. A. and Zivkovic-Rothman, M.: Development and Evaluation of a
Convection Scheme for Use in Climate Models, J. Atmos. Sci., 56,
1766–1782, 10.1175/1520-0469(1999)056<1766:DAEOAC>2.0.CO;2, 1999.Fraedrich, K., Jansen, H., Kirk, E., Luksch, U., and Lunkeit, F.: The
Planet Simulator: Towards a user friendly model, Meteorol. Z., 14,
299–304, 10.1127/0941-2948/2005/0043, 2005.Frierson, D. M. W., Held, I. M., and Zurita-Gotor, P.: A Gray-Radiation
Aquaplanet Moist GCM. Part I: Static Stability and Eddy
Scale, J. Atmos. Sci., 63, 2548–2566, 10.1175/JAS3753.1, 2006.Held, I. M.: The Gap between Simulation and Understanding in Climate
Modeling, B. Am. Meteorol. Soc., 86, 1609–1614,
10.1175/BAMS-86-11-1609, 2005.Held, I. M. and Suarez, M. J.: A Proposal for the Intercomparison of the
Dynamical Cores of Atmospheric General Circulation Models, B. Am.
Meteorol. Soc., 75, 1825–1830,
10.1175/1520-0477(1994)075<1825:APFTIO>2.0.CO;2, 1994.Hoyer, S. and Hamman, J. J.: xarray: N-D labeled Arrays and Datasets in
Python, J. Open Res. Softw., 5, 10, 10.5334/jors.148, 2017.Jeevanjee, N., Hassanzadeh, P., Hill, S., and Sheshadri, A.: A perspective on
climate model hierarchies, J. Adv. Model. Earth Sy., 9, 1760–1771,
10.1002/2017MS001038, 2017.McGibbon, J., Monteiro, J., Weber, N., erol512, and Chor, T.: mcgibbon/sympl:
v0.4.0 (Version v0.4.0), Zenodo, 10.5281/zenodo.1346405, 2018.
Monteiro, J., McGibbon, J., and erol512: CliMT/climt (Version v0.15.3),
Zenodo, 10.5281/zenodo.1400103, 2018.Peng, R. D.: Reproducible Research in Computational Science, Science,
334, 1226–1227, 10.1126/science.1213847, 2011.Randall, D. A.: A University Perspective on Global Climate Modeling, B. Am.
Meteorol. Soc., 77, 2685–2690,
10.1175/1520-0477(1996)077<2685:AUPOGC>2.0.CO;2, 1996.Raymond, D. J.: Regulation of Moist Convection over the West Pacific
Warm Pool, J. Atmos. Sci., 52, 3945–3959,
10.1175/1520-0469(1995)052<3945:ROMCOT>2.0.CO;2, 1995.Reed, K. A. and Jablonowski, C.: Idealized tropical cyclone simulations of
intermediate complexity: a test case for AGCMs, J. Adv. Model. Earth Sy.,
4, M04001, 10.1029/2011MS000099, 2012.Theurich, G., DeLuca, C., Campbell, T., Liu, F., Saint, K., Vertenstein, M.,
Chen, J., Oehmke, R., Doyle, J., Whitcomb, T., Wallcraft, A., Iredell, M.,
Black, T., Da Silva, A. M., Clune, T., Ferraro, R., Li, P., Kelley, M.,
Aleinov, I., Balaji, V., Zadeh, N., Jacob, R., Kirtman, B., Giraldo, F.,
McCarren, D., Sandgathe, S., Peckham, S., and Dunlap, R.: The Earth System
Prediction Suite: Toward a Coordinated U.S. Modeling Capability, B. Am.
Meteorol. Soc., 97, 1229–1247, 10.1175/BAMS-D-14-00164.1, 2015.Valcke, S., Redler, R., and Budich, R.: Earth System Modelling, vol. 3,
SpringerBriefs in Earth System Sciences, Springer Berlin Heidelberg,
10.1007/978-3-642-23360-9, 2012.Vallis, G. K., Colyer, G., Geen, R., Gerber, E., Jucker, M., Maher, P.,
Paterson, A., Pietschnig, M., Penn, J., and Thomson, S. I.: Isca, v1.0: a
framework for the global modelling of the atmospheres of Earth and other
planets at varying levels of complexity, Geosci. Model Dev., 11, 843–859,
10.5194/gmd-11-843-2018, 2018.