An open and extensible framework for spatially explicit land use change modelling: the lulcc R package

We present the lulcc software package, an objectoriented framework for land use change modelling written in the R programming language. The contribution of the work is to resolve the following limitations associated with the current land use change modelling paradigm: (1) the source code for model implementations is frequently unavailable, severely compromising the reproducibility of scientific results and making it impossible for members of the community to improve or adapt models for their own purposes; (2) ensemble experiments to capture model structural uncertainty are difficult because of fundamental differences between implementations of alternative models; and (3) additional software is required because existing applications frequently perform only the spatial allocation of change. The package includes a stochastic ordered allocation procedure as well as an implementation of the CLUE-S algorithm. We demonstrate its functionality by simulating land use change at the Plum Island Ecosystems site, using a data set included with the package. It is envisaged that lulcc will enable future model development and comparison within an open environment.


Introduction
Spatially explicit land use change models are used to understand and quantify key processes that affect land use and land cover change and simulate past and future change (Veldkamp and Lambin, 2001;Mas et al., 2014). These models are commonly implemented in compiled languages such as C/C + + and Fortran and distributed as software packages or extensions to proprietary geographic information systems such as ArcGIS or Idrisi. As Rosa et al. (2014) pointed out, it is uncommon for the source code of land use change modelling software to be made available (e.g. Verburg et al., 2002;Soares-Filho et al., 2002;Verburg and Overmars, 2009;Schaldach et al., 2011). While it is true that the concepts and algorithms implemented by the software are normally described in scientific journal articles, this fails to ensure the reproducibility of scientific results (Peng, 2011;Morin et al., 2012), even in the hypothetical case of a perfectly described model (Ince et al., 2012). In addition, running binary versions of software makes it difficult to detect silent faults (faults that change the model output without obvious signals), whereas these are more likely to be identified if the source code is open (Cai et al., 2012). Moreover, it forces duplication of work and makes it difficult for members of the scientific community to improve the code or adapt it for their own purposes (Morin et al., 2012;Pebesma et al., 2012;Steiniger and Hunter, 2013). In this paper we describe the development of lulcc, a new R package designed to foster an open approach to land use change science.
Current software packages for land use change modelling usually exist as specialised applications that implement one algorithm. Indeed, it is common for applications to perform only one part of the modelling process. For example, the Conversion of Land Use and its Effects at Small regional extent (CLUE-S) software only performs spatial allocation, requiring the user to prepare model input and conduct the statistical analysis upon which the allocation procedure depends elsewhere (Verburg et al., 2002). This is time-consuming and increases the likelihood of user errors because inputs to the various modelling stages must be transferred manually between applications. Furthermore, very few programs include methods to validate model output, which could be one reason for the lack of proper validation of models in the literature, as noted by Rosa et al. (2014). The lack of a common interface amongst land use change models is problematic for the community because there is widespread uncertainty about the appropriate model form and structure for modelling applications . Under these circumstances it is useful to experiment with various models to identify the model that performs best in terms of calibration and validation (Schmitz et al., 2009). Alternatively, ensemble modelling may be used to understand the impact of structural uncertainty on model outcomes (Knutti and Sedláček, 2012). However, while some land use change model comparison studies have been carried out (e.g. Pérez-Vega et al., 2012;Mas et al., 2014;Rosa et al., 2014), fundamental differences between models in terms of scale, resolution and model inputs prevent the widespread use of ensemble land use change predictions (Rosa et al., 2014). As a result, the uncertainty associated with model outcomes is rarely communicated in a formal way, raising questions about the utility of such models (Pontius and Spencer, 2005).
An alternative approach is to develop frameworks that allow several modelling approaches to be implemented within the same environment. One such application is PCRaster, a free and open-source geographic information system (GIS) that includes additional capabilities for spatially explicit dynamic modelling (Schmitz et al., 2009). The PCRcalc scripting language and development environment allows users to build models with native PCRaster operations such as map algebra and neighbourhood functions. Alternatively, the PCRaster application programming interface (API) allows users to extend its functionality in various programming languages using native and external data types (Schmitz et al., 2009). For example, the current version of FALLOW (van Noordwijk, 2002;Mulia et al., 2014), a deductive land use change model, is built using the PCRaster framework. Ter-raME (Carneiro et al., 2013) is a platform to develop models for simulating interactions between society and the environment. It provides more flexibility than PCRaster because models can be composed of coupled sub-models with various temporal and spatial resolutions (Moreira et al., 2009;Carneiro et al., 2013). The platform is built on the opensource TerraLib geospatial library (Câmara et al., 2008), which handles several spatio-temporal data types, includes an API for coupling the library with R (R Core Team, 2014) to perform spatial statistics, and supports dynamic modelling with cellular automata. The LuccME extension to TerraME includes implementations of CLUE-S and its predecessor, CLUE (Veldkamp and Fresco, 1996;Verburg et al., 1999), written in Lua.
The R environment is a free and open-source implementation of the S programming language, a language designed for programming with data (Chambers, 2008). Although the development of R is strongly rooted in statistical software and data analysis, it is increasingly used for dynamic simulation modelling in diverse fields (Petzoldt and Rinke, 2007). Additionally, in the last decade it has become widely used by the spatial analysis community, largely due to the sp package (Pebesma and Bivand, 2005;Bivand et al., 2013) which unified many alternative approaches for dealing with spatial data in R and allowed subsequent package developers to use a common framework for spatial analysis. The raster package (Hijmans, 2014) provides many functions for raster data manipulation commonly associated with GIS software. Building on these capabilities, several R packages have been created for dynamic, spatially explicit ecological modelling (e.g. Petzoldt and Rinke, 2007;Fiske and Chandler, 2011). In addition, two recent land use change models have been written for the R environment. StocModLCC (Rosa et al., 2013) is a stochastic inductive land use change model for tropical deforestation, while SIMLANDER (Hewitt et al., 2013) is a stochastic cellular automata model to simulate urbanisation. Thus, R is well-suited for spatially explicit land use change modelling. To date, however, R has not been used to develop a framework for land use change model development and comparison. The remainder of this paper is divided into four sections. First, we discuss the principle design goals of lulcc. We then describe the software and demonstrate its main functionality with an example application to the Plum Island Ecosystems site, using data included with the package. This is followed by a discussion of the strengths and main limitations of the software and approach, as well as areas for future development. Finally, we draw brief conclusions from the project.

Design goals
The first design goal of lulcc is to provide a framework that allows users to perform various stages of the modelling process illustrated by Fig. 1 within the same environment. It therefore includes methods to process and explore model input, fit and evaluate predictive models, allocate land use change spatially, validate the model and visualise model outputs. This provides many advantages over specialised software applications. First, it improves efficiency and reduces the likelihood of user errors because intermediate inputs and outputs exist in the same environment (Fiske and Chandler, 2011;Pebesma et al., 2012). Second, it encourages interactive model building because separate aspects of the procedure can easily be revisited. Third, it is straightforward to experiment with different model set-ups. Finally, and perhaps most importantly, it improves the reproducibility of scientific results because the entire modelling process can be expressed programmatically and be communicated as such with reasonable effort (Pebesma et al., 2012).
The lulcc software package is intended to be an alternative to the current paradigm of closed source, specialised software programs that, in our view, disrupt the scientific process. Thus, the second design goal is to create an open and  extensible framework allowing users to examine the source code, modify it for their own purposes and freely distribute changes to the wider community. The package exploits the openness of the R system, particularly with respect to the package system, which allows developers to contribute code, documentation and data sets in a standardised format to repositories such as the Comprehensive R Archive Network (CRAN) (Pebesma et al., 2012;Claes et al., 2014). As a result of this philosophy, R users have access to a wide range of sophisticated tools for statistical modelling, data management, spatial analysis and visualisation.
One of the consequences of providing a modelling framework in R is that users of the software must become programmers (Chambers, 2000). We recognise that this represents a different approach to the current practice of providing land use change software packages with graphical user interfaces (GUIs), and acknowledge that for users unfamiliar with programming it could present a steep learning curve. Therefore, the third design goal is to provide well-documented software that is easy to use and accessible for a users with varying levels of programming experience. The package includes complete working examples to allow beginners to start using the package immediately from the R command shell, while more advanced users should be able to develop modelling applications as scripts. Furthermore, the package is designed to be extensible so that users can contribute new or existing methods. Similarly, the source code of lulcc is accessible so that users can locate the methods in use and understand algorithm implementations. Acknowledging that many scientists lack any formal training in programming (Joppa et al., 2013;Wilson et al., 2014), we hope this final goal will ensure the soft-ware is useful for educational purposes as well as scientific research.

Software description
To achieve the design goals, we adopted an object-oriented approach. This provides a formal structure for the modelling framework that allows the various stages of land use change modelling applications to be handled efficiently. Furthermore, it encourages the reuse of code because objects can be used multiple times within the same application or across several different applications. It is extensible because it is straightforward to extend existing classes using the concept of inheritance, or create new methods for existing classes. In lulcc we use the S4 class system (Chambers, 1998(Chambers, , 2008, which requires classes and methods to be formally defined. This system is more rigorous than the alternative S3 system because objects are validated against the class definition when they are created, ensuring that objects behave consistently when they are passed to functions and methods. Figure 2 shows the class structure of lulcc, while Table 1 shows the functions included with the package. Here we describe the main components of lulcc integrated with an example application for the Plum Island Ecosystems data set. The script used in this paper, including the code used to create the various figures, is supplied with the package as a "demo". Instructions to obtain the package and run the demo script are provided in the "Code availability" section.

Data
The failure to provide driving data for land use change modelling exercises alongside published literature is identified by Rosa et al. (2014) as a major weakness of the discipline. The lulcc package includes two data sets that have been widely used in the land use change community, allowing users to quickly start exploring the modelling framework. The first of these contains data from the Plum Island Ecosystems Long Term Ecological Research site in northeast Massachusetts (http://pie-lter.ecosystems.mbl.edu/), which in recent decades has undergone extensive land use change from forest to residential use (Aldwaik and Pontius, 2012). The data set included in lulcc was originally developed as part of the MassGIS program (MassGIS, 2015) but has been processed by Pontius and Parmentier (2014). Land use maps depicting forest, residential and other uses are available for 1985, 1991 and 1999 together with maps of three predictor variables: elevation, slope and distance to built land in 1985.
The second data set includes information from Sibuyan Island in the Philippines, and is a modified version of the data set supplied with the CLUE-S model (Verburg et al., 2002). Create a ROCR performance object for each prediction object contained in a PredictionList object predict Make predictions using a PredictiveModelList object randomForestModels Fit multiple random forest models rpartModels Fit multiple recursive partitioning and regression tree models resample Resample an ExpVarRasterList object to the parameters of an ObsLulcRasterStack object ThreeMapComparison Calculate three-dimensional contingency tables (Pontius et al., 2011) total Sum the total number of cells belonging to each class of a categorical raster map

Data processing
One of the most challenging aspects of land use change modelling is to obtain and process the correct input data. Currently, lulcc requires all spatially explicit input data to exist either in the file system, in any of the formats supported by raster, or in the R workspace as raster objects (RasterLayer, RasterStack or RasterBrick). The most fundamental input required by land use change models is an initial map of observed land use, which is usually obtained from classified remotely sensed data. This map represents the initial condition for model simulations and, for inductive modelling, is used to fit predictive models. Sometimes it is more useful to consider observed land use transitions: in this case an additional map for an earlier time point is required, as shown by Fig. 1. Ideally, two more observed land use maps for subsequent time points should be obtained for calibrating and validating the land use change model (Pontius et al., 2004a). The current version of the software only supports categorical land use data, which means that each pixel must belong to exactly one category.
In lulcc, observed land use data are represented by the ObsLulcRasterStack class. In the following code snippet we load the package into the current session, create an ObsLul-cRasterStack object for the Plum Island Ecosystems data set and plot the result (Fig. 3): > library(lulcc) > data(pie) > obs <-ObsLulcRasterStack (x=pie, pattern="lu", categories=c(1,2,3), labels=c("Forest","Built","Other"), t=c(0,6,14)) > plot(obs) The ObsLulcRasterStack object is important to land use change studies in lulcc because it defines the spatial domain of subsequent operations. The t argument in the constructor function specifies the time points associated with the observed land use maps. The first time point must always be zero; if additional maps are present they should be associated with time points greater than zero, even in backcast models. In most land use change modelling applications the time step between two time points represents 1 year but there is no requirement for this to be the case.
A useful starting point in land use change modelling is to obtain a transition matrix for observed land use maps from two time points to identify the main historical transitions in the study region (Pontius et al., 2004b), which can be used as the basis for further research into the processes driving change. In lulcc we use the crossTabulate function for this purpose: biophysical and socioeconomic explanatory variables. These may be static, such as elevation or geology, or dynamic, such as maps of population density or road networks. In lulcc these two types of explanatory variable are separated by a simple naming convention, which is explained in detail in the package documentation (see Supplement). Collectively, they are represented by an object of class ExpVarRasterList, which can be created as follows: > ef <-ExpVarRasterList (x=pie, pattern="ef") Apart from observed land use and explanatory variables other input maps may be required. The two allocation routines cur-rently included with lulcc accept a mask file, which is used to prevent change within a certain geographic area such as a national park or other protected area, and a land use history file, which is used as the basis for certain decision rules. These are handled by lulcc as standard RasterLayer objects. All input maps should have the same spatial resolution as the corresponding ObsLulcRasterStack object. This can be achieved using the resample function from the raster package, which has been extended to receive lulcc objects. The ExpVarRasterList object created above can be resampled to the parameters of an ObsLulcRasterStack object with the following command: > ef <-resample(ef, obs)

Predictive modelling
Inductive land use change models relate the pattern of observed land use to spatially explicit explanatory variables. Logistic regression is a common type of predictive model used for inductive land use change modelling (e.g. Pontius and Schneider, 2001;Verburg et al., 2002). However, there is growing interest in the application of local and nonparametric models (e.g. Tayyebi et al., 2014). One reason why R is attractive for land use change modelling is that it has become the de facto standard for statistical software development. As a result, lulcc can easily support various predictive modelling techniques by utilising code from existing R packages. Currently, lulcc supports binary logistic regression, available in base R, recursive partitioning and regression trees, provided by the rpart package (Therneau et al., 2014), and random forests, provided by the randomForest package (Liaw and Wiener, 2002). Parametric models such as logistic regression assume the data to be independent and identically distributed (Overmars et al., 2003). In spatial analysis this assumption is often violated because of spatial autocorrelation, which reduces the information content of an observation because its value can to some extent be predicted by the value of its neighbours (Beale et al., 2010). There is also some evidence that nonparametric models may be affected by spatial autocorrelation (Mascaro et al., 2014), even though they do not assume independence. A simple approach to reduce the impact of this phenomenon is to fit predictive models to a random subset of the data (e.g. Verburg et al., 2002;Wassenaar et al., 2007;Echeverria et al., 2008). In the following code snippet, we create training and testing partitions for the Plum Island Ecosystems data set by performing a stratified random sample. We do this using the map for 1985 to illustrate the procedure when only one observed map is available. We then extract the data for the training partition with the getPredictiveModelInputData function and pass the resulting data.frame to the three model fitting functions: The model fitting functions each return an object of class Pre-dictiveModelList containing a predictive model for each land use type. With these objects, it is straightforward to map the suitability of every pixel in the study region to the various land uses. To do this, we use the generic predict function with some additional functionality from the raster package and plot the resulting RasterStack object (Fig. 4) In some circumstances it may be appropriate to supply a model with no explanatory variables to an allocation routine. For example, Verburg and Overmars (2009) used such a model for natural and semi-natural vegetation because in their particular case study the selection of pixels for conversion to these land uses was based on the suitability of pixels to agricultural and urban land rather than the suitability of natural and semi-natural vegetation. In lulcc, this can most easily be achieved by fitting a binary logistic regression model with no explanatory variables. To do this, a formula such as Forest∼1 should be supplied to the glmModels function.
Methods to evaluate statistical models are provided by the ROCR package (Sing et al., 2005), allowing the user to assess model performance using various methods including the receiver operator characteristic (ROC), which is used to measure the performance of models predicting the presence or absence of a phenomenon (Pontius and Parmentier, 2014). It is often summarised by the area under the curve (AUC), where one indicates a perfect fit and 0.5 indicates a purely random fit.
In lulcc we extend the native ROCR classes to better suit our purposes. The prediction and performance classes of ROCR are extended by PredictionList and PerformanceList to handle objects of class PredictiveModelList. In the following example we evaluate the logistic regression models using the testing partition from the 1985 observed land use map.
Since the Plum Island Ecosystems data set contains three observed land use maps, we could also test the predictive models using data from a subsequent time point. The procedure to evaluate several PredictiveModelList objects using these classes is as follows:  Figure 5 shows the ROC curves for each land use type and for each type of predictive model supported by lulcc. The plots show that binary logistic regression and random forest models perform similarly for all land uses, while regression tree models perform least well. Another use of ROC analysis is to assess how well the models predict the cells in which gain occurs between two time points. This is only possible if a second observed land use map is available for a subsequent time point. In the following code snippet, we perform this type of analysis for the gain of built between 1985 and 1991. First, we create a data partition in which cells not candidate for gain (cells belonging to built in 1985) are eliminated. We then assess the ability of the various predictive models to predict the gain of built in this partition:  Figure 6 shows the resulting ROC curve.

Demand
Spatially explicit land use change models are normally driven by non-spatial estimates of either the total number of cells occupied by each category at each time point or the number of transitions among the various categories during each time interval. This means regional drivers of land use change, such as population growth and technology, are considered implicitly (Fuchs et al., 2013). While some models calculate demand at each time point based on the spatial configuration of the landscape at the previous time point (e.g. Rosa et al., 2013), it is more common to specify the demand for every time point at the beginning of the simulation (e.g. Pontius and Schneider, 2001;Verburg et al., 2002;Sohl et al., 2007).
In lulcc the way in which demand is specified is unique to individual allocation models. Currently, both allocation models currently included in the package require the total number of cells belonging to each category at every time point to be supplied as a matrix or data.frame before running the allocation routine.
Land use area may be estimated using non-spatial land use models or, in the case of a backcast model, national and subnational land use statistics may be used (e.g. Ray and Pijanowski, 2010;Fuchs et al., 2013). The lulcc software package includes a function to interpolate or extrapolate land use area based on two or more observed land use maps: this approach is often used to predict the quantity of land use change in the near-term (Mas et al., 2014). For the current example, we obtain land use demand for each year between 1985 and 1999 by linear interpolation as follows: > dmd <-approxExtrapDemand (obs=obs, tout=0:14) In reality we are not usually interested in simulating land use change between two time points for which observed land use data are available. However, doing so is useful for model pattern validation, allowing us to test the ability of models to predict the spatial allocation of change given the exact quantity of change.

Allocation
The allocation algorithm in land use change models determines the pixels in which various land use transitions should take place (Verburg et al., 2002). Currently lulcc includes two allocation routines: an implementation of the CLUE-S algorithm and a stochastic ordered procedure based on the algorithm described by Fuchs et al. (2013). Both routines allow the user to optionally provide various decision rules. These are implemented before the main allocation algorithm at each time point and allow the user to incorporate additional knowledge about the study site.

Decision rules
The first decision rule included in lulcc is used to prohibit certain land use transitions. For example, in most situations it is unlikely that urban areas will be converted to agricultural land because the initial cost of urban development is high (Verburg et al., 2002). The second rule specifies a minimum number of time steps before a certain transition is allowed, while the third rule specifies a maximum number of time steps after which change is not allowed. These rules are used to control land use transitions that are time dependent, such as the transition from shrubland to closed forest (Verburg and Overmars, 2009). The fourth rule prohibits transitions to a certain land use in cells that are not within a userdefined neighbourhood of cells already belonging to that land use. This rule is particularly relevant to cases of deforestation or urbanisation. Within the allocate function the first three decision rules are applied by the allow function and the fourth rule is applied by the allowNeighb function. For time dependent decision rules, the user should supply a land use history raster map, specifying the length of time each pixel has belonged to the current land use. If this is not supplied, each pixel is assigned a value of one representing one model time step. To apply neighbourhood rules, it is necessary to supply corresponding neighbourhood maps to the allocation routine. In lulcc these are represented by the NeighbRasterStack class. Objects of this class are created with the following command:  1,2,3)) Essentially, the allow and allowNeighb functions identify disallowed transitions according to the decision rules and set the suitability of these cells to n/a. These transitions are ignored by the allocation routine. Care should be taken to ensure that after any decision rules are taken into account there are sufficient cells eligible to change in order to meet the specified demand at each time point.

CLUE-S allocation method
The CLUE-S model implements an iterative procedure to meet the specified demand at each time point and handle competition between land uses. The model is summarised briefly here: for a full description see Verburg et al. (2002) and Castella and Verburg (2007). The algorithm in lulcc is based on the description of the model provided by Verburg et al. (2002) only. As a result, for the reasons discussed by Ince et al. (2012), users should not expect to exactly reproduce the output from the original model implementation.
In the first instance each cell is allocated to the land use with the highest suitability as determined by the predictive models. Whereas the original CLUE-S model is based on binary logistic regression, lulcc allows any predictive model supported by PredictiveModelList to be used. For each land use the algorithm determines whether the allocated area is less than, equal to or greater than the specified demand. If it is less than or greater than demand, the suitability of each pixel in the study region to the land use in question will be increased or decreased, respectively, by an amount depending on the difference between the allocated area and specified demand. If the allocated area equals demand, the suitability is left unchanged. This procedure is repeated until the demand for all land uses, within a user-defined tolerance, is met. At each iteration the original model perturbs the suitability of each pixel to the various land uses in order to limit the influence of nominal differences in land use suitability on the final model solution. This is replicated in lulcc with the parameter jitter.f, which controls the upper and lower limits of the uniform random distribution from which the perturbation applied to each pixel is drawn. The default value of jitter.f is zero, resulting in a deterministic model. For a full description of the various other parameters supplied to the CLUE-S routine please consult the package documentation.
In lulcc allocation models are represented by unique classes. In the following code snippet, we first set the decision rules to allow all possible transitions and then define some parameter values. Then, we create an object of class CluesModel and pass this to the generic allocate function: As an iterative procedure, the CLUE-S algorithm employs for loops, which are slow in R. To overcome this limitation, we have written the CLUE-S procedure as a C extension using the .Call interface.

Ordered method
The ordered allocation method is based on the algorithm described by Fuchs et al. (2013). The approach is less computationally expensive and more stable than the CLUE-S algorithm because it does not simulate competition between land uses. Instead, land allocation is performed in a hierarchical way according to the perceived socioeconomic value of each land use. For land uses with increasing demand only cells belonging to land uses with lower socioeconomic value are considered for conversion. In this case, n cells with the highest suitability to the current land use are selected for change, where n equals the number of transitions required to meet the demand, as specified by the demand matrix supplied as an input to the allocation routine. The converted cells, as well as the cells that remain under the current land use, are masked from subsequent operations. For land uses with decreasing demand only cells belonging to the current land use are allowed to change. Here, n cells with the lowest allocation suitability are converted to a temporary class which can be allocated to subsequent land uses. The land use with the lowest socioeconomic value is a special case because it is considered last and, therefore, the number of cells that have not been assigned to other land uses must equal the demand for this land use.
We modify the algorithm described by Fuchs et al. (2013) to allow stochastic transitions. If this option is selected, the allocation suitability of each cell allowed to change is compared to a random number between zero and one drawn from a uniform distribution. If demand for the land use is increasing only cells where the allocation suitability is greater than the random number are allowed to change, whereas for decreasing demand only cells where it is less than the random number are allowed to change. To make the model determin-istic, the user can set the stochastic argument to FALSE when the allocate function is called.

Pattern validation
Spatially explicit land use change models are validated by comparing the initial observed map with an observed and simulated map for a subsequent time point (Pontius et al., 2011). Previous studies have extracted useful information from the three possible two-map comparisons (e.g. Pontius et al., 2008); however, recently Pontius et al. (2011) devised the concept of a three-dimensional contingency table to compare the three maps simultaneously. Not only is this approach more parsimonious, but it also yields more information about quantity and allocation performance (Pontius et al., 2011). For example, from the table it is straightforward to identify sources of agreement and disagreement considering all land use transitions, all transitions from one land use or a specific transition from one land use to another. In addition, it is possible to separate agreement between maps due to persistence from agreement due to correctly simulated change. This is important because in most applications the quantity of change is small compared to the overall study area (Pontius et al., 2004b;van Vliet et al., 2011), giving a high rate of total agreement which can misrepresent the actual model performance. It is useful to perform pattern validation at multiple resolutions because comparison at the native resolution of the three maps fails to separate minor allocation disagreement, which refers to allocation disagreement at the native resolution that is counted as agreement at a coarser resolution, and major allocation disagreement, which refers to allocation disagreement at the native resolution and the coarse resolution (Pontius et al., 2011). In lulcc, three-dimensional contingency tables at multiple resolutions are represented by the ThreeMapComparison class. Two subclasses of ThreeMapComparison represent two types of information that can be extracted from the tables: AgreementBudget represents sources of agreement and disagreement between the three maps at several resolutions while FigureOfMerit represents figure of merit scores.  This measure, which is useful to summarise model performance, is defined as the intersection of observed and simulated change divided by the union of these (Pontius et al., 2011), such that a score of one indicates perfect agreement and a score of zero indicates no agreement. Plotting functions for This procedure was repeated for the CLUE-S model output. The agreement budgets for the transition from forest to built for the two allocation procedures are shown by Fig. 7, while

Discussion
The example application for Plum Island Ecosystems demonstrates the key strengths of the lulcc package. First, it allows the entire modelling procedure to be carried out in the same environment, reducing the likelihood of mistakes that commonly arise when data and models are transferred between different software programs. A framework in R specifically allows users to take advantage of a wide range of statistical and machine learning techniques for predictive modelling. The framework allows users to experiment with various model structures interactively and provides methods to quickly compare model outputs. The example also highlights the advantages of an object-oriented approach; land use change modelling involves several stages and without dedicated classes for the associated data it would be difficult to keep track of the intermediate model inputs and outputs. The lulcc software package is substantially different from alternative environmental modelling frameworks. Most importantly, lulcc is designed for land use change modelling only, whereas frameworks such as PCRaster and TerraME provide general tools that can be applied to various spatial analysis problems such as land use change, hydrology and ecology. As a result, these tools are targeted towards the model developer rather than the end user. In contrast, most software programs for land use change modelling are designed with the user in mind, with very few providing any way for users or developers to improve or even understand model implementations. With lulcc we have attempted to reduce the gap between user and developer. The R system is well-suited for this task, as Pebesma et al. (2012) noted "the step from being a user to becoming a developer is small with R". The package system ensures that lulcc will work across Windows, Mac OS and Unix platforms, whereas many existing applications are platform dependent. Comprehensive documentation of the functions, classes and methods of lulcc, together with complete working examples, enable the user to immediately start using the software, while the objectoriented design ensures that developers can easily write extensions to the package.
Despite its manifest advantages, there remain some drawbacks to land use change modelling in R. First, the lack of a spatio-temporal database back end to support larger data sets (Gebbert and Pebesma, 2014) restricts the amount of data that can be used in a given application because R loads all data into memory. The raster package overcomes this limitation by storing raster files on disk and processing data in chunks (Hijmans, 2014). The lulcc software package has been designed to make use of this facility where possible; however, during allocation it is necessary to load the values of several maps into the R workspace at once because the allocation procedure must consider every cell eligible for change simultaneously. The generic predict function belonging to the raster package offers one possible solution to this problem, allowing predictive models to be used in a memory-safe way. In effect, this would mean spatially explicit input data including observed land use maps and explanatory variables could be handled in chunks and only the resulting probability surface would have to be loaded into the R workspace. However, this is not currently implemented in lulcc because it is excessively time-consuming compared to the current approach. Despite this limitation, since most applications involve a relatively small geographic extent or, in the case of regional studies (e.g. Verburg and Overmars, 2009;Fuchs et al., 2015), use a coarser map resolution, memory should not normally cause lulcc applications to fail. For example, the CluesModel and OrderedModel objects from the above example each had a size of approximately 40 Mb, which is easily handled by modern personal computers. On a 64-bit machine with Intel Core i3 with 1.4 GHz and 4 Gb RAM, the allocation methods for the two Model objects took 50 and 8 s, respectively.
The software presented here is still in its infancy and there are several areas for improvement. The present allocation routines receive the quantity of land use change for each time point before the allocation procedure begins. However, some recent models do not impose the quantity of change but instead allow change to occur stochastically based on land use suitability. For example, StocModLcc (Rosa et al., 2013) deforests a cell if the probability of deforestation is less than a random number from a uniform distribution. The quantity of change is simply the number of cells deforested after each cell in the study region is considered for deforestation twice, with the probability of change, which depends on the allocation of previous deforestation events, updated after the first round. One advantage of this approach is that it accounts for uncertainty in the quantity and allocation of change simultaneously, whereas the current routines in lulcc only consider the allocation of change as a stochastic process. Other models such as LandSHIFT (Schaldach et al., 2011) receive demand at the national or regional level from integrated assessment models such as IMAGE (Stehfast et al., 2014) or Nexus Land-Use (Souty et al., 2012). Coupling lulcc with this class of model would be a valuable addition to the software because land use change is increasingly recognised as an issue with drivers and implications at local, regional, continental and global levels.
An important contribution of lulcc is to provide modules to assist with model pattern validation, a crucial aspect of model development that is nevertheless frequently overlooked within the land use change modelling community (Rosa et al., 2014). A further improvement that could be made to the package is to incorporate more sophisticated ways of fitting and testing the predictive models that estimate land use suitability. For example, a routine to calculate the total operating characteristic (TOC) (Pontius and Parmentier, 2014) would improve upon the ROC analysis currently supported. While ROC shows two ratios, hits / (hits + misses) and false alarms / (false alarms + correct rejections), at multiple resolutions, TOC reveals the quantities used to calculate these ratios, allowing greater interpretation of model diagnostic ability.
One of the main strengths of lulcc is that multiple model structures can be explored within the same environment. Thus, the more allocation routines available in the package the more useful it becomes. Two existing land use change models, StocModLCC and SIMLANDER, are written in R and available as open-source software. Future work could integrate these routines with lulcc to broaden the available model structures and, therefore, improve the ability of lulcc to capture model structural uncertainty. The methods in the current version of lulcc only permit an inductive approach to land use change modelling. Deductive models are fundamentally different because they attempt to model explicitly the processes that drive land use change (Pérez-Vega et al., 2012). This means that, unlike inductive models, they can be used to establish causality between land use change and its driving factors (Overmars et al., 2007). Including this class of model in lulcc would allow inductive and deductive land use change models with different spatial resolutions to be dynamically coupled in order to better capture the complexity of the land use system (Moreira et al., 2009).
Free and open-source software improves the reproducibility of scientific results and allows users to adapt and extend code for their own purposes. Thus, we encourage the land use change community to participate in the future development of lulcc. Perhaps one of the simplest ways to improve the package is to experiment with the example data sets to identify bugs and areas for improvement. Those with more programming experience may wish to extend the functionality of the package themselves and contribute these changes upstream. In addition, existing land use change models can easily be included in the package by wrapping the original source code in R, a relatively straightforward task for commonly used compiled languages (C/C++, Fortran). Users may also develop their own R packages that depend on lulcc for some functionality: this is one of the strengths of the R package system. Finally, we invite land use change modellers to submit land use change data sets (observed and, if possible, modelled land use maps and spatially explicit explanatory variables) for inclusion in the package.

Conclusions
In this paper we have presented lulcc, a free and open-source software package providing an object-oriented framework for land use change modelling in R. The lulcc software package allows various aspects of the modelling process to be performed within the same environment, supports three types of predictive models and includes two allocation routines. The modelling process can be expressed programmatically, facilitating reproducible science. Releasing the software under an open-source licence (GPL) means that users have access to the algorithms they implement when they run a particular model. As a result, they can identify improvements to the code and, under the terms of the licence, are free to redistribute changes to the wider community. We view lulcc as an initial step towards an open paradigm for land use change modelling and hope, therefore, that the community will participate in its development.

Code availability
The R project for statistical computing is available for Windows, Mac OS and several Unix platforms. To download R, visit the project home page: https://www.r-project.org/. Two popular and free integrated development environments (IDEs) are provided by RStudio (https://www.rstudio.com/) and ESS (http://ess.r-project.org/). We suggest that potential lulcc users familiarise themselves with the raster package by reading the "Introduction to the raster package" vignette, available on the package home page: https://cran.r-project. org/web/packages/raster/.
The lulcc source code currently resides on CRAN. This paper corresponds to version 1.0 of the package. It can be downloaded from the R command line as follows: > install.packages("lulcc") The script for the Plum Island Ecosystems application is available as a demo within the package. To load the package and run the demo, type the following commands: > library(lulcc) > demo(package = "lulcc") > demo(topic = "gmd-paper") The Supplement related to this article is available online at doi:10.5194/gmd-8-3215-2015-supplement.