An open and extensible framework for spatially explicit land use change modelling in R : the lulccR package ( 0 . 1 . 0 )

present the lulccR package, an object-oriented framework for land use change modelling written in the R programming language. The contribution of the work is to resolve the following limitations associated with the current land use change modelling paradigm: (1) the source code for model implementations is frequently unavailable, severely compromising the reproducibility of scientific results and making it impossible 10

present the lulccR package, an object-oriented framework for land use change modelling written in the R programming language. The contribution of the work is to resolve the following limitations associated with the current land use change modelling paradigm: (1) the source code for model implementations is frequently unavailable, severely compromising the reproducibility of scientific results and making it impossible 10 for members of the community to improve or adapt models for their own purposes; (2) ensemble experiments to capture model structural uncertainty are difficult because of fundamental differences between implementations of different models; (3) different aspects of the modelling procedure must be performed in different environments because existing applications usually only perform the spatial allocation of change. The package 15 includes a stochastic ordered allocation procedure as well as an implementation of the widely used CLUE-S algorithm. We demonstrate its functionality by simulating land use change at the Plum Island Ecosystems site, using a dataset included with the package. It is envisaged that lulccR will enable future model development and comparison within an open environment.

Introduction
Land use and land cover change is degrading biodiversity worldwide and threatening the sustainability of ecosystem services upon which individuals and communities depend (Turner et al., 2007). Cumulatively, it is a major driver of global and regional environmental change (Foley, 2005). For example, as a result of extensive deforesta- 25 tion in Central and South America and Southeast Asia land use and land cover change 3360 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | is the second largest anthropogenic source of carbon dioxide (Le Quéré et al., 2009), while the conversion of rainfed agriculture and natural land cover to intensively managed agricultural systems in northwest India is now putting severe pressure on regional water resources (Rodell et al., 2009;Shankar et al., 2011;Wada et al., 2012). In addition, land use and land cover change may influence local and regional climate through 5 its impact on the surface energy and water balance (Pitman et al., 2009;Seneviratne et al., 2010;Boysen et al., 2014). Land use change models are widely used to understand and quantify key processes that affect land use and land cover change and simulate past and future change under different scenarios and at different spatial scales (Veldkamp and Lambin, 2001;Mas et al., 2014). The output of these models may be 10 used to support decisions about local and regional land use planning and environmental management (e.g. Couclelis, 2005;Verburg and Overmars, 2009) or investigate the impact of change on biodiversity (e.g. Nelson et al., 2010;Rosa et al., 2013), water resources (e.g. Li et al., 2007;Lin et al., 2008;Rodríguez Eraso et al., 2013) and climate variability (e.g. Sohl et al., 2007Sohl et al., , 2012. 15 Land use and land cover change is the result of complex interactions between different biophysical and socioeconomic conditions that vary across space and time (Verburg et al., 2002;Overmars et al., 2007). Several different model structures have been devised to capture this complexity and meet different objectives. Some models operate at the global or regional scale to estimate the quantity of land use change at national or 20 subnational levels based on economic considerations (e.g. Souty et al., 2012), whereas spatially explicit models, the focus of the present study, operate over a spatial grid to predict the location of land use change (Mas et al., 2014). Inductive spatially explicit models are based on predictive models that predict the suitability of each model grid cell as a function of spatially explicit predictor variables, while deductive models pre- 25 dict the location of change according to specific theories about the processes driving change (Overmars et al., 2007;Magliocca and Ellis, 2013). Inductive and deductive models operating at different spatial scales may be combined to better represent the complexity of a system (e.g. Castella and Verburg, 2007;Moreira et al., 2009). The 3361 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | main output of land use change models is a set of land use maps depicting the location of change over time. Detailed reviews of different models and modelling approaches are available in Verburg et al. (2004), Brown et al. (2013) and Mas et al. (2014).
Spatially explicit land use change models are commonly written in compiled languages such as C/C++ and Fortran and distributed as software packages or exten-5 sions to proprietary geographic information systems such as ArcGIS or IDRISI. As Rosa et al. (2014) points out, it is uncommon for the source code of model implementations to be made available (e.g. Verburg et al., 2002;Soares-Filho et al., 2002;Verburg and Overmars, 2009;Schaldach et al., 2011). While it is true that the concepts and algorithms implemented by the software are normally described in scientific journal articles, this fails to ensure the reproducibility of scientific results (Peng, 2011;Morin et al., 2012), even in the hypothetical case of a perfectly described model (Ince et al., 2012). In addition, running binary versions of software makes it difficult to detect silent faults (faults that change the model output without obvious signals), whereas these are more likely to be identified if the source code is open (Cai et al., 2012). Moreover, it 15 forces duplication of work and makes it difficult for members of the scientific community to improve the code or adapt it for their own purposes (Morin et al., 2012;Pebesma et al., 2012;Steiniger and Hunter, 2013).
Current software packages usually exist as specialised applications that implement one algorithm. Indeed, it is common for applications to perform only one part of the 20 modelling process. For example, the Change in Land Use and it Effects at Small regional extent (CLUE-S) software only performs spatial allocation, requiring the user to prepare model input and conduct the statistical analysis upon which the allocation procedure depends elsewhere (Verburg et al., 2002). This is time consuming and increases the likelihood of user errors because inputs to the various modelling stages 25 must be transferred manually between applications. Furthermore, very few programs include methods to validate model output, which could be one reason for the lack of proper validation of models in the literature, as noted by Rosa et al. (2014). The lack of a common interface amongst land use change models is problematic for the com-3362 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | munity because there is widespread uncertainty about the appropriate model form and structure for different applications . Under these circumstances it is useful to experiment with different models to identify the model that performs best in terms of calibration and validation (Schmitz et al., 2009). Alternatatively, ensemble modelling may be used to understand the impact of structural uncertainty on model 5 outcomes (Knutti and Sedláček, 2012). This approach has been used successfully in the CMIP5 experiments (Taylor et al., 2012;Knutti and Sedláček, 2012), global and regional drought prediction (Tallaksen and Stahl, 2014;Prudhomme et al., 2014) and species distribution modelling (Grenouillet et al., 2011), for example. However, while some land use change model comparison studies have been carried out (e.g Pérez- 10 Vega et al., 2012;Mas et al., 2014;Rosa et al., 2014), fundamental differences between models in terms of scale, resolution and model inputs prevent the widespread use of ensemble land use change predictions (Rosa et al., 2014). As a result, the uncertainty associated with model outcomes are rarely communicated in a formal way, raising questions about the utility of such models (Pontius and Spencer, 2005). 15 An alternative approach is to develop frameworks that allow several different modelling approaches to be implemented within the same environment. One such application is the PCRaster software, a free and open source GIS that includes additional capabilities for spatially explicit dynamic modelling (Schmitz et al., 2009). The PCRcalc scripting language and development environment allows users to build models with 20 native PCRaster operations such as map algebra and neighbourhood functions. Alternatively, the PCRaster application programming interface (API) allows users to extend the functionality of PCRaster in different programming languages using native and external data types (Schmitz et al., 2009). For example, the current version of FALLOW (van Noordwijk, 2002;Mulia et al., 2014), a deductive model that simulates farmer de- 25 cisions about agricultural land use in response to biophysical and socioeconomic driving factors, is built using the PCRaster framework. TerraME (Carneiro et al., 2013) is a platform to develop models for simulating interactions between society and the environment. It provides more flexibility than PCRaster because models can be composed 3363 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | of coupled sub-models with different temporal and spatial resolutions (Moreira et al., 2009;Carneiro et al., 2013). The platform is built on the open source TerraLib geospatial library (Câmara et al., 2008) which handles different spatio-temporal data types, includes an API for coupling the library with R (R Core Team, 2014) to perform spatial statistics, and supports dynamic modelling with cellular automata. The LuccME exten-5 sion to TerraME includes current implementations of CLUE and CLUE-S (Veldkamp and Fresco, 1996;Verburg et al., 1999), an earlier version of CLUE-S that operates at larger spatial scales, written in Lua.
The R environment is a free and open source implementation of the S programming language, a language designed for programming with data (Chambers, 2008).
Although the development of R is strongly rooted in statistical software and data analysis, it is increasingly used for dynamic simulation modelling in diverse fields (Petzoldt and Rinke, 2007). Additionally, in the last decade it has become widely used by the spatial analysis community, largely due to the sp package (Pebesma and Bivand, 2005;Bivand et al., 2013) which unified many different approaches for dealing with spatial 15 data in R and allowed subsequent package developers to use a common framework for spatial analysis. The rgdal package (Bivand et al., 2014) allows R to read and write formats supported by the Geospatial Data Abstraction Library (GDAL) and OGR library. Through the raster package (Hijmans, 2014), R now includes most functions for raster data manipulation commonly associated with GIS software. Building on these capabil- 20 ities, several R packages have been created for dynamic, spatially explicit ecological modelling (e.g. Petzoldt and Rinke, 2007;Fiske and Chandler, 2011). In addition, two recent land use change models have been written for the R environment. StocModLCC (Rosa et al., 2013) is a stochastic inductive land use change model for tropical deforestation while SIMLANDER (Hewitt et al., 2013) is a stochastic cellular automata model 25 to simulate urbanisation. Thus, R is well suited for spatially explicit land use change modelling. To date, however, R has not been used to develop a framework for land use change model development and comparison. In this paper we describe the lulccR package, a free and open source software package for land use change modelling in the R environment.

Design goals
The first design goal of lulccR is to provide a framework that allows users to perform every stage of the modelling process, shown by Fig. 1, within the same environment. It 5 therefore includes methods to process and explore model input, fit and evaluate predictive models, estimate the fraction of the study area belonging to each land use type at different timesteps, allocate land use change spatially, validate the model and visualise model outputs. This provides many advantages over specialised software applications. Firstly, it improves efficiency and reduces the likelihood of user errors because inter-10 mediate inputs and outputs exist in the same environment (Fiske and Chandler, 2011;Pebesma et al., 2012). Secondly, it encourages interactive model building because different aspects of the procedure can easily be revisited. Thirdly, it means it is straightforward to investigate the effect of different inputs and model setups on model outcomes. Finally, and perhaps most importantly, it improves the reproducibility of scientific results 15 because the entire modelling process can be expressed programmatically (Pebesma et al., 2012).
lulccR is intended as an alternative to current paradigm of closed-source, specialised software packages which, in our view, disrupt the scientific process. Thus, the second design goal is to create an open and extensible framework allowing users to exam-20 ine the source code, modify it for their own purposes and freely distribute changes to the wider community. The package exploits the openness of the R system, particularly with respect to the package system, which allows developers to contribute code, documentation and data sets in a standardised format to repositories such as the Comprehensive R Archive Network (CRAN) (Pebesma et al., 2012;Claes et al., 2014). As 25 a result of this philosophy R users have access to a wide range of sophisticated tools for data management, spatial analysis and plotting and visualisation. Significantly, given 3365 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | the importance of predictive models to land use change modelling, R has become the standard development platform for statistical software.
One of the consequences of providing a modelling framework in R is that users of the software must become programmers (Chambers, 2000). We recognise that this represents a different approach to the current practice of providing land use change 5 software packages with graphical user interfaces (GUIs), and acknowledge that for users unfamiliar with programming it could present a steep learning curve. Therefore, the third design goal is to provide software that is easy to use and accessible for a users with different levels of programming experience. The package includes complete working examples to allow beginners to start using the package immediately from the R 10 command shell, while more advanced users should be able to develop modelling applications as scripts. Furthermore, the package is designed to be extensible so that users can contribute new or existing methods. Similarly, the source code of lulccR is accessible so that users can locate the methods in use and understand algorithm implementations. Acknowledging that many scientists lack any formal training in pro- 15 gramming (Joppa et al., 2013;Wilson et al., 2014), we hope this final goal will ensure the software is useful for educational purposes as well as scientific research.

Software description
To achieve the design goals we adopted an object-oriented approach. This provides a formal structure for the modelling framework which allows the different stages of land 20 use change modelling applications to be handled efficiently. Furthermore, it encourages the reuse of code because objects can be used multiple times within the same application or across different applications. It is extensible because it is straightforward to extend existing classes using the concept of inheritance or create new methods for existing classes. In lulccR we use the S4 class system (Chambers, 1998(Chambers, , 2008, which 25 requires classes and methods to be formally defined. This system is more rigorous than the alternative S3 system because objects are validated against the class defini-tion when they are created, ensuring that objects behave consistently when they are passed to functions and methods. Figure 2 shows the class diagram for lulccR together with a list of the most important functions. Here we describe the main components of lulccR integrated with an example application for the Plum Island Ecosystems dataset to demonstrate its functionality.

Data
The failure to provide driving data for land use change modelling exercises alongside published literature is identified by Rosa et al. (2014) as a major weakness of the discipline. The lulccR package includes two datasets that have been widely used in the land use change community, allowing users to quickly start exploring the modelling 10 framework.

Plum Island Ecosystems
The Plum Island Ecosystems Long Term Ecological Research site is located in northeast Massachusetts and includes the watersheds of the Ipswich River, Parker River and Rowley River (http://pie-lter.ecosystems.mbl.edu/). Research at the site aims to 15 understand the response of coastal ecosystems to changes in land use, climate and sea level (Hobbie et al., 2003;Alber et al., 2013). In recent decades the area, which is located approximately 50 km from the Boston, has undergone extensive land use change from forest to residential use (Aldwaik and Pontius, 2012). This has altered the hydrological behaviour of the three watersheds with negative impacts on downstream 20 ecosystems ( Morse and Wollheim, 2014). The dataset included in lulccR was originally developed as part of the MassGIS program (MassGIS, 2015) but has been processed by Pontius and Parmentier (2014). Land use maps depicting forest, residential and other uses are available for 1985, 1991 and 1999. Although MassGIS provides a fourth land use map for 2005 this was produced using a different classification methodology 25 and cannot be used for change detection (Morse and Wollheim, 2014). Three predictor 3367 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | variables are included: elevation, slope and distance to built land in 1985. Land use for the site in 1985 is shown by Fig. 3.

Sibuyan
Sibuyan is a small island with a total area of 456 km 2 belonging to Romblon province in the Phillipines. The central region is mountainous and heavily forested while the sur-5 rounding area is used for natural land cover, plantations, agriculture and other uses (Verburg et al., 2002). The island is relevant for land use change studies because its rich biodiversity is threatened by illegal logging and unsustainable farming practices (Villamor and Lasco, 2009). The dataset included in lulccR is an adapted version of the dataset distributed with the CLUE-S model, and includes includes an observed land 10 use map for 1997, a number of predictor variables, a map of the Mount Guiting-Guiting Natural Park, a protected area in the centre of the island, and four demand scenarios for the period 1997 to 2011. In addition, we include the simulated map for 2011 from the original CLUE-S software, corresponding to the first demand scenario, for benchmarking purposes. The naming convention of this map follows that of the observed 15 land use map for 1997. Further information about Sibuyan island in the context of land use change is provided elsewhere in Verburg et al. (2002Verburg et al. ( , 2004, for example.

Data processing
One of the most challenging aspects of land use change modelling is to obtain and process the correct input data. In lulccR all spatially explicit input data must be stored 20 in one of the file types supported by rgdal or exist in the R workspace as a raster object belonging to the raster package (RasterLayer, RasterStack or RasterBrick). The most fundamental input required by land use change models is an initial map of observed land use, which is typically obtained from classified remotely sensed data. This map represents the initial condition for model simulations and, for inductive modelling, it is Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | use transitions: in this case an additional map for an earlier timestep is required, as shown by Fig. 1. Ideally, two more observed land use maps for subsequent timesteps should be obtained for calibrating and validating the land use change model (Pontius et al., 2004a).
In lulccR observed land use data are represented by the ObsLulcMaps class. In 5 the following code snippet we install the lulccR package from github and create an ObsLulcMaps object for the Plum Island Ecosystem dataset: > library(devtools) > install_github("simonmoulds/r_lulccR", subdir="lulccR") > library("lulccR") 10 > obs <-ObsLulcMaps(x=pie, pattern="lu", categories=c(1,2,3), labels=c("Forest","Built","Other"), t=c(0,6,14)) 15 The ObsLulcMaps object is important to land use change studies because it defines the spatial and temporal domain of subsequent operations. Thus, additional spatial data to be included in the model input should have the same characteristics as the maps contained in the corresponding ObsLulcMaps object (however, some helper functions are available to resample maps to the correct spatial resolution). The t argu-20 ment in the constructor function specifies the timesteps associated with the observed land use maps. The first timestep must always be zero; if additional maps are present they should have timesteps greater than zero, even in backcast models. In most land use change modelling applications a timestep represents one year but there is no requirement for this to be the case.

25
A useful starting point in land use change modelling is to obtain a transition matrix for two observed land use maps from different times to identify the main historical transitions in the study region (Pontius et al., 2004b), which can be used as the basis for fur-3369 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ther research into the processes driving change. In lulccR we use the crossTabulate function for this purpose: > crossTabulate(x=obs, index=c(1,2)) The output of this command reveals that for the Plum Island Ecosystems site the dominant change between 1985 and 1999 was the conversion of forest to built areas. 5 Inductive and deductive land use change models predict the location of change based on spatially explicit biophysical and socioeconomic explanatory variables. These may be static, such as elevation or geology, or dynamic, such as maps of population density or road networks. In lulccR these two types of explanatory variable are separated by a simple naming convention, which is explained in detail in the package documentation (see Supplement). Collectively, they are represented by an object of class ExpVarMaps, which can be created as follows: Apart from observed land use maps and predictor variables other input maps may be required. The two allocation routines currently included with lulccR accept a mask file, 15 which is used to prevent change within a certain geographic area such as a national park or other protected area, and a land use history file, which is used as the basis for certain decision rules. These are handled by lulccR as standard RasterLayer objects.

Predictive modelling
Inductive land use change models are based on predictive models which relate the pat-20 tern of observed land use to spatially explicit explanatory variables. Logistic regression is the most widely used model type (e.g. Pontius and Schneider, 2001;Verburg et al., 2002), however, there is growing interest in the application of local and non-parametric models to inductive land use change modelling (e.g. Tayyebi et al., 2014). Currently lul-ccR supports binary logistic regression, available in base R, recursive partitioning and forests, provided by the randomForest package (Liaw and Wiener, 2002). In all cases a separate model must be obtained for each land use type in the study region. lulccR does not provide additional functionality to fit predictive models to the observed data since R is already optimised for this purpose.
Parametric models such as logistic regression models assume the input data to be 5 independent and identically distributed (Overmars et al., 2003;Wu et al., 2009). In spatial analysis this assumption is often violated because of spatial autocorrelation, which reduces the information content of an observation because its value can to some extent be predicted by the value of its neighbours (Beale et al., 2010). While nonparametric models such as regression trees and random forest make no assumption of independence, a recent study by Mascaro et al. (2014) showed that these models may nevertheless be affected by spatial autocorrelation. Dormann et al. (2007) discusses several ways to account for spatial autocorrelation, however, the simplest, and most widely used, approach is to fit the models to a random subset of the data (e.g. Verburg et al., 2002;Wassenaar et al., 2007;Echeverria et al., 2008). This method is provided 15 in lulccR using the createDataPartition function of the caret package (Kuhn et al., 2012) to perform a stratified random sample of the data. The data partition is obtained as follows: where the final command fits a binary logistic regression model to predict the occurrence of 5 built based on three explanatory variables (elevation, slope and distance to 1971 built area). This procedure is repeated for each land use in the study area, which, for the Plum Island Ecosystems dataset, includes forest and other land uses in addition to built. For forest we employ a null model (a model with no explanatory factors) because the transition from forest to built is determined by the location suitability of built rather than that of forest. Of the predictive 10 models supported by lulccR only binary logistic regression permits a null model to be fitted. Predictive models for each land use are represented by an object of class PredModels: > glm.models <-PredModels(list(forest.glm, built.glm, other.glm), obs=obs) The resulting object makes it straightforward to plot the suitability of each land use over the 15 study region using the calcProb function in combination with some additional functionality from the raster package (see Supplement). The resulting plot is shown by Fig. 4.
Methods to evaluate statistical models are provided by the ROCR package (Sing et al., 2005), allowing the user to assess model performance using several methods including the receiver operator characteristic (ROC), which is widely used to measure 20 the performance of models predicting the presence or abscence of a phenomenon. This method uses a threshold to transform an index variable, in our case the output of the predictive models which varies between zero and one, to a boolean variable where values above the threshold are true (1) and values below the threshold are false (0). The transformed variable is compared to reference information to generate 25 a contingency table with entries for true positives, false positives, true negatives and false negatives. The ROC considers multiple thresholds in order to plot a curve of true positive rate against false positive rate (Pontius and Parmentier, 2014). It is often summarised by the area under the curve (AUC), where one indicates a perfect fit and 0.5 indicates a purely random model. In lulccR we extend the native ROCR classes to better suit our purposes. The prediction method of ROCR is extended by Prediction to handle one or more PredModels objects to enable comparison of different types of predictive model. The resulting object is then used to create a Performance object, for which a plot method exists. The procedure to evaluate several PredModels objects is as follows: 5 > glm.pred <-Prediction(models=glm.models, obs=obs, ef=ef, partition=part$test) > glm.perf <-Performance(pred=glm.pred, measure="rch") # not shown: create rpart.perf and rf.perf in the same way > p <-Performance.plot(list(glm=glm.perf, rpart=rpart.perf, rf=rf.perf)) 15 > print(p) # ROC plots Figure 5 shows the ROC plots for each land use type and for each type of predictive model supported by lulccR. The plots show that binary logistic regression and random forest models perform similarly for built and other land uses, while regression tree models perform least well in both cases.

Demand
Spatially explicit land use change models are normally driven by non-spatial estimates of the total area occupied by each land use type at every timestep. This means regional drivers of land use change, such as population growth and technology, are considered implicitly (Fuchs et al., 2013). While some models calculate demand at each timestep 25 based on the spatial configuration of the landscape at the previous timestep (e.g. Rosa et al., 2013), it is more common to supply land use area for every timestep at the 3373 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | beginning of the simulation (e.g. Pontius and Schneider, 2001;Verburg et al., 2002;Sohl et al., 2007). This is the approach currently supported in lulccR. Land use area may be estimated using non-spatial land use models or, if the study aims to reconstruct historic land use change, national and subnational land use statistics may be used (e.g. Ray and Pijanowski, 2010;Fuchs et al., 2013). lulccR includes a function to interpolate 5 or extrapolate land use area based on two or more observed land use maps: this approach is often used to predict the quantity of land use change in the near-term (Mas et al., 2014). For the current example we obtain land use demand for each year between 1985 and 1999 by linear interpolation, as follows: > dmd <-approxExtrapDemand(obs=obs, tout=0:14) 10 In reality we are not usually interested in simulating land use change between two known points. However, doing so is useful for model validation: we test the ability of the model to predict the location of change given the exact quantity of change.

Allocation
The allocation component of land use change models estimates the location of change 15 in the study region at each timestep (Verburg et al., 2002). Currently lulccR includes two inductive allocation routines: an implementation of the CLUE-S algorithm and a stochastic ordered procedure based on the algorithm described by Fuchs et al. (2013). Before running either allocation procedure lulccR implements a number of decision rules to identify the specific land use transitions that are allowed at each location. 20 While the set of rules included in lulccR could be used as the basis of a simple agentbased model of the type employed by Castella and Verburg (2007), their main purpose is to allow the modeller to include additional knowledge about the system while still relying on an inductive allocation procedure. These objects are supplied as the main input to objects inheriting from the virtual class Model, which represents standard information required by the two allocation routines currently implemented in lulccR and, indeed, most allocation routines described in the literature. Subclasses of Model are associated with a particular allocation method. These classes inherit general information held in Model and include specific informa-10 tion such as parameters and additional spatial input such as mask and land use history files. A generic allocate function receives objects inheriting from class Model and performs the relevant allocation routine. All methods belonging to the generic allocate function update the Model object with the allocation results. This design ensures that it is easy to add additional allocation routines to lulccR: developers simply 15 need to define a new subclass of Model and write a new allocate method. Here we describe the decision rules and allocation routines currently available in lulccR.

Decision rules
The first decision rule included in lulccR is used to prohibit certain land use transitions. For example, in most situations it is unlikely that urban areas will be converted to 20 agricultural land because the initial cost of urban development is high (Verburg et al., 2002). The second rule specifies a minimum number of timesteps before a certain transition is allowed, while the third rule specifies a maximum number of timesteps after which change is not allowed. These rules are used to control land use transitions that are time-dependent. For example, the transition from shrubland to closed forest is slow 25 and cannot occur after only one year (Verburg and Overmars, 2009), whereas for some types of agriculture a location is only suitable for a certain number of growing seasons 3375 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | because of declining soil quality. The fourth rule prohibits transitions to a certain land use in cells that are not within a user-defined neighbourhood of cells already belonging to the same land use. This rule is particularly relevant to cases of deforestation or urbanisation because this sort of change usually occurs at the boundaries of existing forests or cities, respectively.

5
Within the allocate function the first four decision rules are implemented by the allow function while the fifth decision rule is performed by the allowNeighb function. To apply neighbourhood rules it is necessary to supply corresponding neighbourhood maps to the allocation routine. In lulccR these are represented by the NeighbMaps class. Objects of this class are created with the following command: Essentially, the allow and allowNeighb functions identify disallowed transitions according to the decision rules and set the suitability of these cells to NA. These transi- 15 tions are ignored by the allocation routine. Care should be taken to ensure that after any decision rules are taken into account there are sufficient cells eligible to change in order to meet the specified demand at each timestep.

CLUE-S allocation method
The CLUE-S model implements an iterative procedure to meet the specified demand 20 at each timestep. The model is summarised briefly here: for a full description see Verburg et al. (2002) and Castella and Verburg (2007). In the first instance each cell is allocated to the land use with the highest suitability as determined by the predictive models. Whereas the original CLUE-S model is based on binary logistic regression, lulccR allows any predictive model supported by PredModels to be used. After this step the suitability is increased for land uses where the allocated area is less than demand and decreased for land uses where it is greater than demand. The extent to 3376 which the suitability is increased or decreased is a function of the difference between allocated change and demand. This procedure is repeated until the demand is met. The original model perturbs the location suitability to limit the influence of nominal differences in land use suitability on the final model solution. This is replicated in lulccR except the user has greater control over the degree of perturbation. In effect, therefore, 5 this parameter can be used to make the procedure more or less stochastic. In the following code snippet we first set the decision rules to allow all possible transitions and then define some parameter values. Then, we create an object of class CluesModel and pass this to the allocate function: > clues.rules <-matrix(data=c (1,1,1,1,1,1,1,1,1

Ordered method
The ordered allocation method is based on the algorithm described by Fuchs et al. (2013). The approach is less computationally expensive and more stable than the 5 CLUE-S implementation because it does not simulate competition between different land use types. Instead, land allocation is performed in a hierarchical way according to the perceived socioeconomic value of each land use type. For land uses with increasing demand only cells belonging to land uses with lower socioeconomic value are considered for conversion. In this case, n cells with the highest location suitability for the current land use are selected for change, where n equals the number of transitions required to meet the demand. The converted cells, as well as the cells that remain under the current land use, are masked from subsequent operations. For land uses with decreasing demand only cells belonging to the current land use are allowed to change. Here, n cells with the lowest location suitability are converted to a temporary 15 class which can be allocated to subsequent land uses. The land use with lowest socioeconomic value is a special case because it is considered last and, therefore, the number of cells that have not been assigned to other land uses must equal the demand for this land use. In practice, this means that the location suitability for this class has no influence on the result. We modify the algorithm described by (Fuchs et al., 2013) 20 to allow stochastic transitions. If this option is selected, the location suitability of each cell allowed to change is compared to a random number between zero and one drawn from a uniform distribution. If demand for the land use is increasing only cells where the location suitability is greater than the random number are allowed to change, whereas for decreasing demand only cells where it is less than the random number are allowed In lulccR the ordered model is represented by the OrderedModel class. In the following code we create an OrderedModel object, supplying the order in which to allocate change (built, forest, other), and pass this to the generic allocate function: > ordered.model <-OrderedModel(x=input, order=c(3,1,2)) 5 > ordered.model <-allocate(ordered.model)

Validation
Spatially explicit land use change models are validated by comparing the initial observed map with an observed and simulated map for a subsequent timestep (Pontius et al., 2011). Previous studies have extracted useful information from the three 10 possible two-map comparisons (e.g. Pontius et al., 2007), however, recently Pontius et al. (2011) devised the concept of a three-dimensional contingency table to compare the three maps simulataneously. Not only is this approach more parsimonious, it also yields more information about allocation performance (Pontius et al., 2011). For example, from the table it is straightforward to identify different sources of agreement 15 and disagreement considering all land use transitions, all transitions from one land use or a specific transition from one land use to another. In addition, it is possible to separate agreement between maps due to persistence from agreement due to correctly simulated change. This is important because in most applications the quantity of change is small compared to the overall study area (Pontius et al., 2004b;van Vliet 20 et al., 2011), giving a high rate of total agreement which can misrepresent the actual model performance. It is common to perform the validation procedure at multiple resolutions because comparison at the native resolution of the three maps fails to separate minor allocation disagreement, which refers to allocation disagreement at the native resolution that is counted as agreement at a coarser resolution, and major allocation 25 disagreement, which refers to allocation disagreement at the native resolution and the coarse resolution (Pontius et al., 2011).

3379
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | In lulccR, three-dimensional contingency tables at different resolutions are represented by the ThreeMapComparison class. Two subclasses of ThreeMapComparison represent different types of information that can be extracted from the tables: the AgreementBudget class represents sources of agreement and disagreement between the three maps at different resolutions while 5 the FigureOfMerit class represents figure of merit scores. This measure, which is useful to summarise model performance, is defined as the intersection of observed and simulated change divided by the union of these (Pontius et al., 2011), such that a score of one indicates perfect agreement and a score of zero indicates no agreement. Plotting functions for AgreementBudget and FigureOfMerit objects 10 allow the user to visualise model performance at different resolutions. The ordered model output for Plum Island Ecosystems is validated in the following way: This procedure was repeated for the CLUE-S model output. The agreement budgets for the 20 transition from forest to built for the two model outputs are shown by Fig. 7, while Fig. 8 shows the corresponding figure of merit scores.

Discussion
The example application for Plum Island Ecosystems demonstrates the key strengths of the lulccR package. Firstly, it allows the entire modelling procedure to be carried out in the same environment, reducing the likelihood of mistakes that commonly arise when data and models are transferred between different software packages. A framework in R specifically allows users to take advantage of a wide range of statistical and machine 3380 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | learning techniques for predictive modelling, and, because R is widely regarded as the de facto standard for statistical model development, users of the package will have access to the most recent developments in these fields. The framework allows users to experiment with different model structures interactively and provides methods to quickly compare different model outputs. The example also highlights the advantages 5 of an object-oriented approach: land use change modelling involves several stages and without dedicated classes for the associated data it would be difficult to keep track of the intermediate model inputs and outputs. lulccR is substantially different from alternative environmental modelling frameworks. Most significantly, lulccR is designed for land use change modelling only, whereas frameworks such as PCRaster and TerraME provide general tools that can be applied to different spatial analysis problems such as land use change, hydrology and ecology. As a result, these tools are targeted towards the model developer rather than the end user. In contrast, existing land use change models are designed with the user in mind, with very few models providing any way for developers to improve or even un-15 derstand the model implementation. With lulccR we have attempted to reduce the gap between user and developer. The R system is well suited for this task, as Pebesma et al. (2012) notes "the step from being a user to becoming a developer is small with R". The package system ensures that lulccR will work across Windows, Mac OS X and Unix platforms, whereas many existing applications are platform dependent. Compre-20 hensive documentation of the functions, classes and methods of lulccR, together with complete working examples, enable the user to immediately start using the software, while the object-oriented design ensures that developers can easily write extensions to the package.
Despite its manifest advantages, there remain some drawbacks to land use change 25 modelling in R. Firstly, the lack of a spatio-temporal database backend to support larger datasets (Gebbert and Pebesma, 2014) restricts the amount of data that can be used in a given application because R loads all data into memory. The raster package overcomes this limitation by storing raster files on disk and processing data in chunks (Hij-3381 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | mans, 2014). lulccR has been designed to make use of this facility where possible, however, during allocation it is necessary to load the values of several maps into the R workspace at once because the allocation procedure must consider every cell eligible for change simultaneously. The generic predict function belonging to the raster package provides one possible solution to this problem, allowing users to make predic-5 tions with predictive models in a memory-safe way. In effect, this would mean spatially explicit input data including observed land use maps and predictor variables could be handled in chunks and only the resulting probability surface would have to be loaded into the R workspace. However, this is not currently implemented in lulccR because it is excessively time consuming compared to the current approach. Despite this limitation, since most applications involve a relatively small geographic extent or, in the case of regional studies (e.g. Verburg and Overmars, 2009;Fuchs et al., 2015), use a coarser map resolution, memory should not normally cause lulccR applications to fail. The software presented here is still in its infancy and there are several areas for improvement. The present allocation routines receive the quantity of land use change for 15 each timestep before the allocation procedure begins. However, some recent models do not impose the quantity of change but instead allow change to occur stochastically based on land use suitability. For example, StocModLcc (Rosa et al., 2013) deforests a cell if the probability of deforestation is less than a random number from a uniform distribution. The quantity of change is simply the number of cells deforested after each 20 cell in the study region is considered for deforestation twice, with the probability of change, which depends on the location of previous deforestation events, updated after the first round. One advantage of this approach is that it accounts for uncertainty in the quantity and location of change simultaneously, whereas the current routines in lulccR only consider the location of change as a stochastic process. Other models such as LandSHIFT (Schaldach et al., 2011) receive demand at the national or regional level from integrated assessment models such as IMAGE (Stehfast et al., 2014) or Nexus Land-Use (Souty et al., 2012). Coupling lulccR with this class of model would be a valu-able addition to the software because land use change is increasingly recognised as a regional and global issue that occurs over multiple scales.
One of the main strengths of lulccR is that multiple model structures can be explored within the same environment. Thus, the more allocation routines available in the package the more useful it becomes. Two existing land use change models, StocModLCC 5 and SIMLANDER, are written in R and available as open source software. Future work could integrate these routines with lulccR to broaden the different model structures and, therefore, improve the ability of lulccR to capture model structural uncertainty. The methods in the current version of lulccR only permit an inductive approach to land use change modelling. Deductive models are fundamentally different because they at-10 tempt to model explicitly the processes that drive land use change (Pérez-Vega et al., 2012). The main advantage of these models is that they are able to establish causality because they allow modellers to test specific theories about the location of change and predictor variables whereas inductive models simply associate land use change with explanatory variables through predictive models (Overmars et al., 2007). For example, 15 the application for Plum Island Ecosystems shows that the presence of urban land is related to elevation, slope and distance to built land in 1971, however, the allocation models require no specific theory as to why this may be the case. Providing this class of model would permit multiscale studies whereby inductive and deductive land use change models operating at different spatial resolutions are dynamically coupled in 20 order to better capture the complexity of the land use system (Moreira et al., 2009).
Free and open source software encourages the reproducibility of scientific results and allows users to adapt and extend code for their own purposes. Thus, we encourage the land use change community to participate in the future development of lulccR. Perhaps one of the simplest ways to improve the package is to experiment with the 25 example datasets to identify bugs and areas for improvement. Those with more programming experience may wish to extend the functionality of the package themselves and contribute these changes upstream. In addition, existing land use change models can easily be included in the package by wrapping the original source code in R; 3383 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | a straightforward task for commonly used compiled languages (C/C++, Fortran). Of course, users may also develop their own R packages that depend on lulccR for some functionality: this is one of the strengths of the R package system. Finally, we invite land use change modellers to submit land use change datasets (observed and, if possible, modelled land use maps and spatially explicit predictor variables) for inclusion in the 5 package.

Conclusions
Land use change models are useful for several tasks, from supporting local planning decisions to studies of regional and global environmental change. However, currently available software for land use change modelling is generally closed-source and usu-10 ally implements only one land use change model. In this paper we have presented lulccR, a free and open source software package providing an object-oriented framework for land use change modelling in R. lulccR allows the entire modelling process to be performed within the same environment, supports three different types of predictive model and includes two allocation routines. Releasing the software under an open 15 source licence (GPL) means that users have access to the algorithms they implement when they run a particular model. As a result, they are able to identify improvements to the code and, under the terms of the licence, are free to redistribute these changes to the wider community. We view lulccR as an initial step towards an open paradigm for land use change modelling and hope, therefore, that the community will participate in 20 its development.

Code availability
The lulccR source code currently resides on GitHub: https://github.com/simonmoulds/ r_lulccR.  R News,5,[9][10][11][12][13]available   . Land use suitability maps for Plum Islands Ecosystems study area. The "forest" land use class has uniform suitability because we employ a null model. Occurrence of "built" is related to elevation, slope and distance to 1971 built area, while "other" is related to elevation and slope only. persistence simulated correctly persistence simulated as change change simulated as change to wrong category change simulated correctly change simulated as persistence Figure 6. Overall agreement budget comparing the lulccR CLUE-S algorithm with the original model output for 2011. This shows a good level of agreement between the two maps: the proportion of persistence and change simulated correctly is high compared to incorrectly simulated persistence or change.