The open-source modeling framework MAgPIE (Model of Agricultural Production and its Impact on the Environment) combines economic and biophysical approaches to simulate spatially explicit global scenarios of land use within the 21st century and the respective interactions with the environment. Besides various other projects, it was used to simulate marker scenarios of the Shared Socioeconomic Pathways (SSPs) and contributed substantially to multiple IPCC assessments. However, with growing scope and detail, the non-linear model has become increasingly complex, computationally intensive and non-transparent, requiring structured approaches to improve the development and evaluation of the model.
Here, we provide an overview on version 4 of MAgPIE and how it addresses these issues of increasing complexity using new technical features: modular structure with exchangeable module implementations, flexible spatial resolution, in-code documentation, automatized code checking, model/output evaluation and open accessibility. Application examples provide insights into model evaluation, modular flexibility and region-specific analysis approaches. While this paper is focused on the general framework as such, the publication is accompanied by a detailed model documentation describing contents and equations, and by model evaluation documents giving insights into model performance for a broad range of variables.
With the open-source release of the MAgPIE 4 framework, we hope to contribute to more transparent, reproducible and collaborative research in the field. Due to its modularity and spatial flexibility, it should provide a basis for a broad range of land-related research with economic or biophysical, global or regional focus.
Global land use is expected to undergo major changes over the
coming decades caused by population growth, climate change, climate change
mitigation and various other socioeconomic changes. Climate change has
already had significant impacts on crop yields
In light of these challenges, methodological tools that quantify and analyze
such effects and inform decision makers are required. To this end, models
such as GCAM
This paper presents the MAgPIE 4 (Model of Agricultural Production and its Impact on the Environment 4) modeling framework which has been built to cope with the aforementioned challenges of complexity, manageability and transparency. The framework addresses these challenges via two conceptual foundations; it rests on modularity and flexibility in the level of detail.
Modularity denotes the concept of building a model as a network of separate modules reflecting its different components, instead of handling the model as a whole. A module can have different realizations, each of which gives a different representation of the subsystem it models. Building the model as a network of modules eases the understanding of the model as well as the modification of components of it.
Flexibility in the level of detail means adjusting the temporal and spatial resolution. It also means that module realizations can be chosen based on the research question and thereby adjusting the model complexity appropriately.
The flexibility and the modular concept enable a tailor-made setup of simulations consistent with the spatial, temporal and contextual scope of the analysis. It allows for reducing complexity where it is not needed and increasing simulation detail where it makes a difference. The resulting indefiniteness in model specification is reflected by a shift in terminology from model (MAgPIE before version 4) to framework (MAgPIE 4 and beyond), reflecting that very different models of the land-use sector can be built with the same framework.
In the subsequent sections, we present the concept of the modeling framework of MAgPIE 4, starting with a brief description of the model history, the new features in version 4 and a short overview of the modules in version 4. This is followed by a methodological section about the modeling framework explaining its technical properties such as modularity and spatial flexibility. The main text is completed by an output section – showing some specific use case of the modular structure and spatial flexibility provided by the framework – as well as a discussion and conclusion section. Supplementary material provides model code, model documentation and extended evaluation information to better embed the presented work.
MAgPIE was first introduced in
Version 2 of the model was the first step towards spatial flexibility. The
spatial
In terms of content, version 2 introduced endogenous yield increases through
investments into research and development
Structurally, the next evolution came with version 3 introducing the
concept of modules, allowing to split the code into thematic components and
to have different realizations of the same component. Content-related
extensions in version 3 were the introduction of afforestation as a climate
mitigation measure that is endogenously calculated and incentivized by a tax
on GHG emissions
Linked to the global gridded crop model LPJmL
While the modularization concept was introduced with version 3, the code was
only partly modularized and a full modularization was only achieved with
version 4 of the model. In addition to the modularization, version 4
increases spatial flexibility by introducing the concept of flexible regions.
In addition to the flexible number of clusters within a world region, it
allows the user to freely choose the number and shape of world regions to be
simulated in the model. While all previous model versions were limited to the
regional aggregation introduced in version 1, it is now possible to choose a
regional aggregation, with the country level
Content-wise, MAgPIE 4 includes a new food-demand module, which couples MAgPIE 4 iteratively with a stand-alone food-demand model. The module estimates the distribution of body mass index, height and food intake by age group, sex and country. Moreover, it estimates food waste and a more detailed dietary composition. For a given level of income, changes in food prices affect food demand through their effects on purchasing power. Furthermore, version 4 includes a more detailed representation of food processing.
Finally, version 4 is the first open-source version of MAgPIE
The MAgPIE 4 framework consists of 38 modules which are listed and briefly
described below (by name and order as they appear in the code). A detailed
description of each module and their realizations is part of the model
documentation
Figure
MAgPIE 4 framework with
simplified modular structure and module interactions. See the model
documentation
The framework consists of two layers. An outer layer written in R
The outer layer makes sure that model simulations can run in parallel and are portable and easily reproducible. Collections of runs can be written as R scripts with consecutive run execution statements. In each run execution, a run composition process will apply the provided model configuration, create a run output folder and copy all relevant files to that folder.
The inner layer written in GAMS
Modularizing a model means separating the modeled system into multiple subsystems that exchange information only through clearly defined interfaces. Modularization helps to better comprehend the complex model and makes it easier to exchange or debug its components. Rather than having to think of the model as a single entity, it allows for separate conceptualizations of inter- and intra-module interactions.
The purpose and interface of each module is defined via a module contract. Model developers can expect that the module behaves according to the contract and design their implementations correspondingly. Developers of a module can design a realization with the contract solely as a guideline, ignoring the rest of the model. Modularization disentangles model development and offers a safe method for model modifications under limited knowledge of the complete model.
Modularization allows for different representations of the same module, which we call realizations. For each model run, the model configuration defines which realization is activated for each module. Different realizations can vary in their representation of processes, assumptions or level of detail but not in their interfaces and general purpose defined in the module contract. Modularization therefore has the benefit to allow for module comparisons. Different representations of a subsystem can be compared under ceteris paribus conditions for the rest of the model. This is a strong add-on to the current practice of model-comparison studies between different IAMs, where differences in subsystem dynamics cannot be isolated due to differences in the overarching frameworks.
A module in MAgPIE is represented as a folder with realizations of the same
module as subfolders. Each subfolder contains code and data required for
its execution. Important for a modular structure is the existence of local
environments. GAMS contains a single, global environment that allows each
variable or parameter to be accessed from anywhere in the code. To emulate
local environments, a dedicated naming convention distinguishing local from
global objects through a given prefix is employed. Code violations are
avoided via support functions
MAgPIE 4 is designed and modularized in a way that modules of the model can be excluded completely or single modules can run in stand-alone mode. This might be the case for testing a specific module under perfect control of the incoming variables and parameters, or it might be an application for which only certain components play a role. This reduced specification can be then used to develop a module in a toy model environment before it is used in the full model, saving time and resources during development.
Technically, a stand-alone reduced model form is created by writing a
separate main GAMS execution script which includes only a part of the
existing modules. Interfaces which are outputs from modules excluded from the
reduced model have to be provided by the reduced model main script. For
example, food demand could be estimated in a reduced form only considering
population and income growth but omitting the price feedback from the
production side and thereby most other modules (see the “reduced model
feature” section in the Appendix
To allow for parallel execution of model runs and to improve reproducibility, MAgPIE performs a model run composition. The purpose of the composition is to isolate the current model run before execution. Isolation is achieved by creating a separate output folder for each run in which all relevant data are copied. The main component of each output folder is a single GAMS file containing the full GAMS model and all inputs. This file is created by replacing all “include” statements in the original GAMS model code with corresponding input files or code segments. In the case of conditional inclusions (e.g., realization selection), only the active inclusion is considered (e.g., the chosen realization). This approach leads to a fully self-contained GAMS file which can be shared and runs as a stand-alone module. All other files in the output folder are supplementary and either used for run postprocessing or to provide additional information about the run setup (e.g., the run configuration file). For archiving, it is recommended to store the whole output folder as an image of the respective run.
The framework currently has two built-in spatial levels: a coarse level of world regions and a finer one of spatial clusters characterized by similar local characteristics on the subregional level. Both levels are flexible in resolution.
The world regions in the model have the
Input data preprocessing at ISO country or 0.5
Model documentation is based on the in-house-developed toolkit goxygen
Model evaluation is performed with a validation database containing historical data and projections for most outputs returned by the model. After each model run, a validation report is generated automatically as a PDF file. This report includes evaluation plots showing model outputs, historical data and other projections jointly for each output variable.
The automatically generated model evaluation documents for single model runs
currently allow comparison of about 1000 output variables with reference
data. Comparison between model runs, i.e., between different scenarios, is
rather difficult and inconvenient if the model results are scattered across
different evaluation documents. To overcome this issue, we developed
(a) a routine for generating a single evaluation document with outputs for
multiple model runs and (b) the interactive scenario analysis and evaluation
tools appMAgPIE and appMAgPIElocal
Evaluation plots for MAgPIE 4 inputs and outputs for the default
settings, a run with soil organic matter explicitly modeled, a run with an
alternative factor requirement setup with costs proportional to the
production volume and a stand-alone run of the food demand module. Sources of
historical data:
Figure
The evaluation plots show different stages and major components of a MAgPIE
simulation. As Fig.
As all other aspects shown in the figure go beyond what is used or simulated in the food demand module, all remaining values could only be reported by the non-stand-alone runs. The combination of per capita food demand and total population provides the total food demand in the model which triggers total feed demand through consumption of livestock products. Also here the identical scenario assumption leads to the same results in all three runs. Differences can be observed in the global land cover and the productivity measures (land-use intensity and average crop yields). Cropland shows higher expansion in the alternative scenarios compared to the default scenario, while both scenarios show less intensification and lower yields. While the differences are rather small in the case of soil organic matter, the differences are quite pronounced in the alternative factor requirement case. In the case of soil organic matter, this effect is triggered via the natural availability of nitrogen in the soil. Having SOM switched off, the model assumes that all required nitrogen is provided as fertilizer, while simulating SOM explicitly uncovers the already available nitrogen in the soil. This reduces the overall fertilizer requirements and slightly incentivizes land expansion as it gives the model access to more nitrogen. As the food demand is rather independent of this decision, more land expansion leads to lower intensification requirements, lowering land-use intensity as well as average yields. Having factor requirements primarily linked to the production rather than to the area on which it is produced strongly reduces the incentive in the model to intensify. Area-dependent factor requirements strongly favor high yielding locations for production, giving the model a strong incentive to concentrate production on high productive areas and to further boost productivity via intensification. Production-dependent factor requirements on the other hand do not favor locations based on productivity, making also rather unproductive areas interesting for production and thereby reducing the incentive for intensification. In combination, this leads to significantly higher cropland expansion, higher forest reduction, less intensification and significantly lower crop yields. One can also observe that the difference in average yields is higher than in land-use intensity, owing average yields to drop for two reasons: the lower land-use intensification and the expansion into low productive areas.
CO2 emissions show strong fluctuations in all scenarios due to missing constraints linking carbon stocks with the goal function of the model (e.g., carbon pricing). This makes it in many cases an arbitrary decision for the optimizer to expand cropland into carbon-rich or carbon-poor areas. Besides its fluctuations, the plot also shows higher overall emissions in the case of volume-based factor costs due to the overall higher expansion of cropland and reduction in forest areas.
Figures
Standard MAgPIE 4 world regions and cluster setup: 12 equally treated world regions with 200 clusters in total.
Figure Canada, Australia and New Zealand: CAZ; China: CHA;
European Union: EUR; India: IND; Japan: JPN; Latin America: LAM; Middle East
and north Africa: MEA; non-EU member states: NEU; other Asia: OAS; reforming
countries: REF; Sub-Saharan Africa: SSA; United States: USA.
Study setup tailored to assessments with a focus on Brazil, with six world regions and 500 clusters: Brazil (BRA) in increased spatial resolution, its major trade partners Latin America (LAM), United States (USA), China (CHA) and Europe (EUR) in default resolution and the rest of the world (ROW) combined to one region with reduced resolution.
Figure
Comparison of global and Latin American forest cover with historical data sets and projections of other models.
Figure
Comparison with historical data sets as well as projections on forest cover
show that the differences between mappings are rather small compared to the
overall uncertainty in these numbers. Nevertheless, a deeper look into the
simulations uncovers that the global numbers of the Brazil-centric setup are
unreliable, as the reduced deforestation rate compared to the default setup is
a consequence of the applied mapping. As the ROW region basically acts as a
huge free-trade region, it can fulfill strong demand pressure coming from
Sub-Saharan Africa with production from elsewhere, while trade limitations in
the default setup limit this exchange and trigger deforestation within
Sub-Saharan Africa
In the case of LAM, both runs show a rather similar picture in the aggregated forest cover projections for the region and it is not possible to clearly reject one of them. This is particular important as the regional aggregates in LAM are in the scope of both mappings and therefore should be sound. When choosing between them, one has to decide whether spatial details in Brazil or global trade patterns are the more decisive factor for accurate estimates of regional forest cover in LAM.
Comparison of changes in forest share from 2000 to 2050 in Brazil between the default setup and Brazil setup.
Looking at forest change patterns in Brazil and neighboring countries between
2000 and 2050, it becomes easier to introduce a ranking between the setups
(Fig.
The observed specialization is a consequence of the homogeneous biophysical characteristics within each cluster which lead to either/or decisions in the model. It will either fully take a cluster into production or ignore it completely. In the default setup, this effect is very pronounced due to the low number of clusters within Latin America. With more clusters, as in the Brazil setup, clusters better grasp the real spatial distributions of biophysical characteristics in the region and therefore lead to a more diverse picture. Whereas this effect is especially relevant for regional studies with a focus on spatial patterns, it is less critical for global dynamics as long as the spatial aggregation is not introducing any systematic biases to the model.
While the Brazil setup improves the spatial representation of Brazil, it is only a first step as deforestation patterns show. As a second step towards a regional study, which is missing in this paper, it is always required to adopt regional distinctiveness into the model, such as region-specific policies relevant at this level of detail for this specific region.
Since the first version of MAgPIE, the model has evolved from a crop-focused land-use allocation model to a modular open-source framework with a broad range of covered processes.
One main improvement introduced in MAgPIE 4 is the full code modularization. It is used as a tool to make the model more manageable as it structures the code in self-containing components which are interacting via interfaces with each other. It makes existing and missing interactions in the model more visible and allows to easily replace components by alternative implementations. While the modular structure is rather intuitive for a system with loosely linked components, one could argue that it might prevent a proper implementation of strongly integrated systems. Our experience is that, while the modular concept is working best for clearly separable systems, it also works in all other cases. The difference with strongly integrated systems is that the amount of interfaces and the required effort for developing new realizations are higher. Nevertheless, it still improves transparency in terms of model interactions and does not exclude any systems or dynamics from being represented in the model. Modules are also not static and the modular structure itself can and will also be changed if required. Modules might get created, deleted, merged or split over time. Module interfaces might get extended, reduced or modified. As both happen less frequently than changes within modules, the modular structure can be best described as semi-static.
Besides modularization, MAgPIE 4 introduces a series of other features such as automatic documentation of the GAMS code, the possibility to run parts of the model in a stand-alone manner, flexible spatial resolution and automatized creation of evaluation reports. The evaluation of selected model outputs shows that MAgPIE 4 projections connect well to historical data and projections from other modeling teams. Therefore, we consider MAgPIE 4 as an appropriate tool for simulating scenarios of future land use. The case study with higher spatial resolution for Brazil demonstrates how the flexible spatial resolution approach works and how it can be meaningfully applied for research questions with a regional focus. With the open-source publication of the MAgPIE 4 model code, we aim to increase the transparency and reproducibility of model experiments for reviewers, stakeholders and other interested groups. Furthermore, we expect that the future development of the MAgPIE modeling framework will benefit from cooperation with individuals and other research institutions, as enabled by the open-source availability of the code.
The MAgPIE code is available under the GNU Affero
General Public License, version 3 (AGPLv3) via GitHub
(
The aim of a modular GAMS code is to separate different parts of the model
code from each other and to set the interaction rules between each other.
Usually, such a separation is achieved via local environments. If information
should be transferred from one module to another, this has to be done
explicitly via a global environment which is visible to all modules. The
global environment acts as an interface between modules. GAMS does not
distinguish between environments. All objects are accessible from everywhere
in the code. To emulate local environments, we introduced a naming convention
indicating whether the object should be treated as global or local. Each
object is required to have a prefix in its name indicating what type of
object it is (e.g., “v” for variable or “p” for parameter) and to which
environment it belongs (local or global). While elements in the global
environment are marked with an “m” (module interface), elements in local
environments carry a number in its prefix that is unique for every module. In
this naming convention, “vm_area” represents, for instance, a global (m)
variable (v) containing area information, while “p42_costs” is a local
parameter (p) of module 42 containing cost information. While local objects
are technically still accessible from everywhere in the code, they are
formally only allowed to be accessed from within the corresponding module. In
MAgPIE 4, the proper use of the naming convention is ensured by the R function
codeCheck in package lucode
Each module in MAgPIE comes with a module contract that can be found at the beginning of the documentation for each module. The contract consists of three components: task description, required inputs and promised outputs.
The task description defines the purpose of the module. The
list of inputs defines which inputs the module expects in order to
be able to perform its tasks. The output list defines the
information the module will provide to the rest of the model. The contract
contains all information that is necessary to be able to work with the module
or to develop it. It therefore reduces the need to understand the model as a
whole. The contract approach is similar to the function concept in other
languages. The difference in GAMS is that a module cannot be run at once but
is split up into topic-wise chunks and distributed over the whole model run.
Table
Module components.
In the first chunk, each module can introduce its own sets. Similarly, the declarations of parameters, variables and equations of all modules follow as a second chunk. All other chunks follow with the same principle. This split into chunks allows modules to interact at different stages of the run. They can, for instance, exchange information before the model is solved and exchange another set of information after the model has been solved. Technically, this is implemented via an include file, which is going through all modules for each chunk, checking whether a module provides a code piece to the given chunk and if so includes it.
The modular concept also allows to introduce alternative versions of a module, called “realization”. Similarly to the include file, each module comes with a GAMS file including a realization based on the choice in the configuration of the model. Different realizations are implemented as alternative folders in the corresponding module. The implementation of a realization is only bound by the module contract. This implies that it must be able to perform its calculations based on the promised inputs and must provide the promised outputs. This level of freedom allows to have very different realizations of a module.
When developing a module realization, it might be handy not to have to run a full-feature model simulation but rather a reduced version of the model. To slightly reduce model complexity, all modules can be switched to their simplest realization and the spatial resolution of the model can be reduced. If the rest of the model should rather be reflected as a toy model with very limited complexity, the reduced model feature can be used. As each module defines which inputs it needs for the run via its module contract, it is also possible to write a dummy model that only provides these inputs to the module and handles the outputs it receives from the module that should be run in stand-alone mode. This can be handy if a module is to be tested under well-defined boundary conditions or if a study purely focuses on a subcomponent of the model.
In MAgPIE, such a reduced model version is created by adding a corresponding dummy model to the “stand-alone” folder of the model. The dummy model includes the module that should run in stand-alone mode and ensures that all interfaces of the module are properly addressed. The reduced model itself can be run again via the standard R interface. Only the name of the model (cfg$model) has to be changed in the configuration file from main.gms to the name of the new dummy model.
For the model evaluation, we set up an extensive database with historical and
projected data for the various outputs the model can produce. In
Fig.
Note that the first three evaluation plots in Fig.
Spatially explicit land cover in MAgPIE 4 is initialized with a modified
version of the LUH2v2 data set for the year 2000
The evaluation plots for cropland, pasture and forest also show projections
from other models for SSP 1–5 reference scenarios
More information about the runs can be found in the corresponding
evaluation documents
Evaluation plots for MAgPIE 4 inputs and outputs for SSP 1–5
reference scenarios at global level. Sources of historical data:
The supplement related to this article is available online at:
HLC wrote the original model. AP and HLC guided the model development. JPD developed and implemented the framework structure (modularity, spatial flexibility, code-based documentation). JPD, BLB, IW, FH, MS, KK, UK, XW, AM, DK, GA, AWY, EA, LB, SW and AG prepared input data. JPD, BLB, IW, FH, MS, KK, UK, XW, AM, DK, GA, AWY, EA and HLC developed the content of the model framework. JPD, AG, DK and LB provided technical support for the development. JPD, BLB, IW, FH, MS, KK, XW, AM, DK, GA, AWY, EA, FB, DC and AP wrote the model documentation. JPD managed the open-source release. JPD, IW, MS, DK and FH wrote the manuscript. KK, AM, BLB and JPD developed the model schematic. JPD, BLB, FH, GA and EA designed the output examples. All authors prepared the model framework for release, discussed the manuscript and supported the writing of the article.
The authors declare that they have no conflict of interest.
The authors thank for the data provided by FAOSTAT, World Bank and the SSP scenario modelers.
We thank Christoph Müller, Elmar Kriegler, Susanne Rolinski, Nico Bauer, Gunnar Luderer and colleagues at PIK for valuable discussions during the development of the modeling framework. We thank Joshua Elliot and Todd Munson for their support in improving the model optimization process in the framework.
The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement no. 689150 (SIM4NEXUS), no. 776479 (COACCH) and no. 652615 (SUSTAg via the FACCE SURPLUS framework FKZ 031B0170A). This work was also supported by ENavi (FKZ 03SFK4B1), one of the four Kopernikus Projects for the Energy Transition funded by the German Federal Ministry of Education and Research (BMBF). We acknowledge the doctoral scholarship for Geanderson Ambrósio granted by Fundaçãco de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), and the scholarship for Ewerton Araujo from CAPES/Programa de Doutorado Sanduí-che no Exterior process no. 88881.135263/2016-01. We also acknowledge Leibniz Association's Economic Growth Impacts of Climate Change (ENGAGE) project under grant no. SAW-2016-PIK-1 which funded the research of Abhijeet Mishra. The work of Kristine Karstens was funded by the DFG Priority Program “Climate Engineering: Risks, Challenges, Opportunities?” (SPP 1689) and specifically the CEMICS2 project (grant no. ED78/3-2).
Lastly, we thank the three anonymous reviewers for their valuable remarks which led to significant improvements of the paper. The publication of this article was funded by the Open Access Fund of the Leibniz Association.
This paper was edited by David Topping and reviewed by three anonymous referees.