We introduce Devito, a new domain-specific language for implementing high-performance finite-difference partial differential equation solvers. The motivating application is exploration seismology for which methods such as full-waveform inversion and reverse-time migration are used to invert terabytes of seismic data to create images of the Earth's subsurface. Even using modern supercomputers, it can take weeks to process a single seismic survey and create a useful subsurface image. The computational cost is dominated by the numerical solution of wave equations and their corresponding adjoints. Therefore, a great deal of effort is invested in aggressively optimizing the performance of these wave-equation propagators for different computer architectures. Additionally, the actual set of partial differential equations being solved and their numerical discretization is under constant innovation as increasingly realistic representations of the physics are developed, further ratcheting up the cost of practical solvers. By embedding a domain-specific language within Python and making heavy use of SymPy, a symbolic mathematics library, we make it possible to develop finite-difference simulators quickly using a syntax that strongly resembles the mathematics. The Devito compiler reads this code and applies a wide range of analysis to generate highly optimized and parallel code. This approach can reduce the development time of a verified and optimized solver from months to days.

Large-scale inversion problems in exploration seismology constitute some of the
most computationally demanding problems in industrial and academic research.
Developing computationally efficient solutions for applications such as seismic
inversion requires expertise ranging from theoretical and numerical methods
for optimization constrained by a partial differential equation (PDE) to the low-level
performance optimization of PDE solvers. Progress in this area is often
limited by the complexity and cost of developing bespoke wave propagators (and
their discrete adjoints) for each new inversion algorithm or formulation of wave
physics. Traditional software engineering approaches often lead developers to
make critical choices regarding the numerical discretization before manual
performance optimization for a specific target architecture and making it ready
for production. This workflow of bringing new algorithms into production, or
even to a stage that they can be evaluated on realistic datasets, can take many
person months or even person years. Furthermore, it leads to mathematical
software that is not easily ported, maintained, or extended. In contrast, the
use of high-level abstractions and symbolic reasoning provided by
domain-specific languages (DSLs) can significantly reduce the time it takes to
implement and verify individual operators for use in inversion frameworks, as
has already been shown for the finite-element method

State-of-the-art seismic imaging is primarily based upon explicit finite-difference schemes due to their relative simplicity and ease of
implementation

creating a high-level mathematical abstraction for programming finite differences to enable composability and algorithmic optimization;

insofar as possible using existing compiler technologies to optimize the affine loop nests of the computation, which account for most of the computational cost; and

developing specific extensions for other parts of the computation that are non-affine (e.g., source and receiver terms).

The first of these aims is primarily accomplished by embedding the DSL in Python
and leveraging the symbolic mathematics package Sympy

The use of symbolic manipulation, code generation, and just-in-time compilation
allows for the definition of individual wave propagators for inversion in only a
few lines of Python code, while aspects such as varying the problem
discretization become as simple as changing a single parameter in the problem
specification, for example changing the order of the spatial discretization

The remainder of this paper is structured as follows: first, we provide a brief
history of optimizing compilers, DSL, and existing wave-equation seismic
frameworks. Next, we highlight the core features of Devito and describe the
implementation of the featured wave-equation operators in
Sect.

Improving the runtime performance of a critical piece of code on a particular
computing platform is a nontrivial task that has received significant
attention throughout the history of computing. The desire to automate the
performance optimization process itself across a range of target architectures
is not new either, although it is often met with skepticism. Even the very
first compiler, A0

Dr. Hopper
believes

Community acceptance of these new “automatic coding systems” began when
concerns about the performance of the generated code were addressed by the
first “optimizing compiler”, Fortran, released in 1957 – which not only
translated code from one language to another but also ensured that the final
code performed at least as good as a handwritten low-level
code ^{®},
and R.

In addition to these relatively general mathematical languages, more
specialized frameworks targeting the automated solution of PDEs have long been
of interest

Moreover, several computational frameworks for seismic imaging exist, although
they provide varying degrees of abstraction and are typically limited to a
single representation of the wave equation. IWAVE

In general, the majority of the computational workload in wave-equation-based
seismic inversion algorithms arises from computing solutions to discrete wave
equations and their adjoints. There are a wide range of mathematical models
used in seismic imaging that approximate the physics to varying degrees of
fidelity. To improve development and innovation time, including code
verification, we describe the use of the symbolic finite-difference framework
Devito to create a set of explicit matrix-free operators of arbitrary spatial
discretization order. These operators are derived, for example, from the acoustic
wave equation

Overview of the Devito architecture and the associated example workflow. Devito's top-level API allows users to generate symbolic stencil expressions using data-carrying function objects that can be used for symbolic expressions via SymPy. From this high-level definition, an operator then generates, compiles, and executes high-performance C code.

Defining a Devito

Devito aims to combine the performance benefits of dedicated stencil
frameworks

A Devito

A complete description of the compilation pipeline is provided
in

Example code defining the two-dimensional wave equation
without damping using Devito symbols and symbolic processing
utilities from SymPy. Assuming

Example definition of a forward operator.

The primary user-facing API of Devito allows for the definition of complex stencil operators from a concise mathematical notation. For this purpose, Devito relies strongly on SymPy (Devito 3.1.0 depends upon SymPy 1.1, and all dependency versions are specified in Devito's requirements file). Devito provides two symbolic object types that mimic SymPy symbols, enabling the construction of stencil expressions in symbolic form.

In addition to

It is important to note here that

Example definition of an adjoint operator.

Example definition of a gradient operator.

The symbolic nature of the function objects allows for the automatic derivation of
discretized finite-difference expressions for derivatives. Devito

The discretization of the spatial derivatives can be defined for any order. In
the most general case, we can write the spatial discretization in the

Definition of FWI gradient update.

FWI algorithm with line search.

We consider here a second-order time discretization for the
acoustic wave equation, as higher-order time discretization requires us to rewrite the PDE

In this expression,

Following the convention used for spatial derivatives, the above expression can
be automatically generated using the shorthand expression

The iteration over time to obtain the full solution is then generated by the
Devito compiler from the time dimension information. Solving the wave equation
with the above explicit Euler scheme is equivalent to a linear system

Numerical wave field for a constant velocity

The field recorded data are measured on a wave field that propagates in an
infinite domain. However, solving the wave equation in a discrete infinite
domain is not feasible with finite differences. In order to mimic an infinite
domain, absorbing boundary conditions (ABCs) or perfectly matched layers (PMLs)
are necessary

The least computationally expensive method is the absorbing boundary condition that adds a single damping mask in a finite layer around the physical domain. This absorbing condition can be included in the wave equation as

The

Seismic inversion relies on data-fitting algorithms, and hence we need to
support sparse operations such as source injection and wave field (

Time discretization convergence analysis for a fixed grid, fixed propagation time (150 ms), and varying time step values. The result is plotted in a logarithmic scale and the numerical convergence rate (1.94 slope) shows that the numerical solution is accurate.

Comparison of the numerical convergence rate of the spatial finite-difference scheme with the theoretical convergence rate from the Taylor theory. The theoretical rates are the dotted line with the corresponding colors. The result is plotted in a logarithmic scale to highlight the convergence orders as linear slopes and the numerical convergence rates show that numerical solution is accurate.

Gradient test for the acoustic propagator. The first-order (blue) and second-order (red) errors are displayed in logarithmic scales to highlight the slopes. The numerical convergence order (1.06 and 2.01) shows that we have a correct implementation of the FWI operators.

The necessary expressions to perform interpolation and injection are
automatically generated through a dedicated symbol type,

FWI on the acoustic Marmousi-ii model. Panel

Seismic inversion methods aim to reconstruct physical parameters or an image of the Earth's subsurface from multi-experiment field measurements. For this purpose, a wave is generated at the ocean surface that propagates through to the subsurface and creates reflections at the discontinuities of the medium. The reflected and transmitted waves are then captured by a set of hydrophones that can be classified as either moving receivers (cables dragged behind a source vessel) or static receivers (ocean bottom nodes or cables). From the acquired data, physical properties of the subsurface such as wave speed or density can be reconstructed by minimizing the misfit between the recorded measurements and the numerically modeled seismic data.

Recovering the wave speed of the subsurface from surface seismic measurements
is commonly cast into a nonlinear optimization problem called full-waveform
inversion (FWI). The method aims at recovering an accurate model of the
discrete wave velocity,

To solve this optimization problem with a gradient-based method, we use the
adjoint-state method to evaluate the gradient

The discretized adjoint system in Eq. (

We consider the acoustic isotropic wave equation parameterized in terms of
slowness

Convection equation in Devito. In this example, the initial Dirichlet
boundary conditions are set to

The main (PDE) stencil expression to update the state of the wave field is
derived from the high-level wave-equation expression

A more detailed explanation of the seismic setup and parameters such as the source and receiver terms in Fig.

To create the adjoint that pairs with the above forward modeling propagator we
can make use of the fact that the isotropic acoustic wave equation is
self-adjoint. This means that for the implementation of the forward wave
equation

Based on the definition of the adjoint operator, we can now define a similar
operator to update the gradient according to Eq. (

To compute the gradient, the forward wave field at each time step must be
available, which leads to significant memory requirements. Many methods exist to
tackle this memory issue, but all come with their advantages and disadvantages.
For instance, we implemented optimal
checkpointing with the library Revolve

Initial

At this point, we have a forward propagator to model synthetic data in
Fig.

This FWI function in Fig.

Given the operators defined in Sect.

Adjoint test for different discretization orders in 2-D, computed on a two-layer model in double precision. The highlighted values represent the error in the values and indicate at which decimal the error appears.

Adjoint test for different discretization orders in 3-D, computed on a two-layer model in double precision. The highlighted values indicate the error in the values and indicate at which decimal the error appears.

The numerical accuracies of the forward modeling operator
(Fig.

The measure of accuracy of a numerical solution relies on a hypothesis that we satisfy for these two tests:

the domain is large enough and the propagation time small enough to ignore boundary-related effects, i.e., the wave field never reaches the limits of the domain; and

the source is located on the grid and is a discrete approximation of
the Dirac to avoid spatial interpolation errors. This hypothesis
guarantees the existence of the analytical and numerical solution for
any spatial discretization

Burgers' equations in Devito. In this example, we explicitly use
the FD function

We analyze the numerical solution against the analytical solution and verify that the
error between these two decreases at a second-order rate as a function
of the time step size

The analytical solution is defined as

The spatial discretization analysis follows the same method as the temporal discretization analysis.
We model a wave field for a fixed temporal setup with a small enough time step to ensure negligible
time discretization error (

The numerical slopes obtained and displayed in Fig.

Initial

We concentrate now on two tests, namely the adjoint test (or dot test) and
the gradient test. The adjoint-state gradient of the objective function
defined in Eq. (

The adjoint test is also individually performed on the source–receiver
injection–interpolation operators in the Devito test suite. The results,
summarized in Tables

With the forward and adjoint propagators tested, we finally verify that the
Devito operator that implements the gradient of the FWI objective function
(Eq.

In Fig.

We show a simple example of FWI in Eq. (

Poisson equation in Devito with field swap in Python.

Poisson equation in Devito with buffered dimension for automatic swap at each iteration.

This result highlights two main contributions of Devito. First, we provide PDE
simulation tools that allow for the easy and efficient implementation of an inversion
operator for seismic problems and potentially any PDE-constrained optimization
problems. As described in Sects.

Right-hand side

Finally, we describe three classical computational fluid dynamics examples to
highlight the flexibility of Devito for another application domain.
Additional CFD examples can be found in the Devito code repository in the
form of a set of Jupyter notebooks. The three examples we describe here are
the convection equation, the Burgers' equation, and the Poisson equation. These
examples are adapted from

The convection governing equation for a field

The same way we previously described it for the wave equation,

The solution of the convection equation is displayed in
Fig.

In this second example, we show the solution of the Burgers' equation. This example demonstrates that Devito supports coupled systems of equations and nonlinear equations easily. The Burgers' equation in two dimensions is defined as the following coupled PDE system:

where

We show the initial state and the solution at the last time step of the
Burgers' equation in Fig.

Different spatial discretization order accuracies against runtime
for a fixed physical setup (model size in

We finally show the implementation of a solver for the Poisson equation in
Devito. While the Poisson equation is not time dependent, the solution is
obtained with an iterative solver and the simplest one can easily be implemented
with finite differences. The Poisson equation for a field

The solution of the Poisson equation is displayed in
Fig.

These examples demonstrate the flexibility of Devito and show that a broad range of PDEs can easily be implemented with Devito, including a nonlinear equation, a coupled PDE system, and steady-state problems.

In this section we demonstrate the performance of Devito from a
numerical and inversion point of view, as well as the absolute performance
from a hardware point of view. This section only provides a brief overview of
Devito's performance, and a more detailed description of the compiler and its
performance is covered in

Devito's automatic code generation lets users define the spatial and temporal
order of FD stencils symbolically and without having to re-implement long
stencils by hand. This allows users to experiment with trade-offs between
discretization errors and runtime, as higher-order FD stencils provide more
accurate solutions that come at increased runtime. For our error–cost
analysis, we compare absolute error in

The results in Fig.

Roofline plots for a

Roofline plots for a

Roofline plots for a

We present performance results of our solver using the roofline model, as
previously discussed in

We show three different roofline plots, one plot for each domain size
attempted, in Figs.

We observe that the time to solution increases nearly linearly with the size of
the domain. For example, for a 16th-order discretization, we have a

A key motivation for developing an embedded DSL such
as Devito is to enable quicker development, simpler maintenance, and better
portability and performance of solvers. The other benefit of this approach is
that HPC developer effort can be focused on developing the compiler
technology that is reapplied to a wide range of problems. This software reuse
is fundamental to keeping the pace of technological evolution. For example, one
of the current projects in Devito regards the integration of YASK

We have introduced a DSL for time–domain simulation for inversion and its application to a seismic inverse problem based on the finite-difference method. Using the Devito DSL, a highly optimized and parallel finite-difference solver can be implemented within just a few lines of Python code. Although the current application focuses on features required for seismic imaging applications, Devito can already be used in problems based on other equations; a series of CFD examples is included in the code repository.

The code traditionally used to solve such problems is highly complex. The primary reason for this is that the complexity introduced by the mathematics is interleaved with the complexity introduced by the performance engineering of the code to make it useful for practical purposes. By introducing a separation of concerns, these aspects are decoupled and simplified. Devito successfully achieves this decoupling while delivering good computational performance and maintaining generality, both of which shall continue to be improved in future versions.

The asset

git clone -b v3.1.0

https://github.com/opesci/devito

cd devito

conda env create -f environment.yml

source activate devito

pip install -e .

MAL, MIL, NK, and FL designed and implemented the symbolic interface and Sympy extension in Devito.

MIL, FL, MAL, NK, and PV implemented the Devito compiler and the code generation framework.

NK and MAL implemented the checkpointing for Devito.

FJH and GJG were the PIs for the project and provided design and application input so that Devito would be usable and scalable.

PAW, MAL, and MIL developed and implemented the examples presented in this paper.

The authors declare that they have no conflict of interest.

The development of Devito was primarily supported through the Imperial College London Intel Parallel Computing Centre. We would also like to acknowledge support from the SINBAD II project and support from the member organizations of the SINBAD Consortium as well as EPSRC grants EP/M011054/1 and EP/L000407/1. Edited by: Simon Unterstrasser Reviewed by: Jørgen Dokken and one anonymous referee