Downscaling land use and land cover from the Global Change Assessment Model for coupling with Earth system models.

structure flow shown in An overview of the GCAM model is now part of the main manuscript, complementing the introduction paragraph and giving more details on the representation of the terrestrial biosphere. There is a figure showing the structure of GCAM and the overall computational flow (Figure S1) in supplementary materials, which we suggest to leave there given the more detailed description now part of the manuscript. Abstract The Global Change Assessment Model (GCAM) is a global integrated assessment model used to project future societal and environmental scenarios, based on economic modeling and on a detailed representation of food and energy production systems. The terrestrial module in GCAM represents agricultural activities and ecosystems dynamics at the sub- 15 regional scale, and must be downscaled to be used for impact assessments in gridded models (e.g. climate models). In this study, we present the downscaling algorithm of the GCAM model, which generates gridded time series of global land use and land cover (LULC) from any GCAM scenario. The downscaling is based on a number of user-defined rules and drivers, including transition priorities (e.g. crop expansion preferentially into grasslands 20 rather than forests) and spatial constraints (e.g. nutrient availability). The default parameterization is evaluated using historical LULC change data, and a sensitivity experiment provides insights on the most critical parameters and how their influence changes regionally and in time. Finally, a reference scenario and a climate mitigation scenario are downscaled to illustrate the gridded land use outcomes of different policies on 25 agricultural expansion and forest management. Several features of the downscaling can be modified by providing new input data or changing the parameterization, without any edits to the code. Those features include spatial resolution as well as the number and type of land classes being downscaled, thereby providing flexibility to adapt GCAM LULC scenarios to the requirements of a wide range of models and applications. The downscaling system is 30 version controlled and freely available. This paper presents LULC downscaling capabilities, code, and parameters developed for the GCAM model. An evaluation analysis with historical land use data is performed to quantify the spatial accuracy of land use change allocation, and a sensitivity analysis is conducted to identify critical parameters. Finally, the downscaling method is applied to scenarios of future land use change projected by GCAM, generating gridded land use 10 projections that can be used for impact assessments in coupling experiments.


Introduction
LULC change is a key component of environmental change studies. More than 50% of the terrestrial biosphere has now been transformed to urban areas, croplands or rangelands by anthropogenic activities (Ellis, 2011). Estimates of the carbon budget from historical LULC change range from 12.5% to 33% of all anthropogenic carbon emissions depending 5 on the time period and method considered (Houghton et al., 2012). These emissions combined to LULC-induced albedo and moisture dynamic alterations are a significantalbeit poorly constrained -climate forcing (e.g. Brovkin et al., 2013;Mahmood et al., 2010;Pongratz et al., 2010). The array of LULC changes impacts extends to many other environmental aspects, including biodiversity, freshwater resources and air quality (Foley et 10 al., 2005), hence the importance of projecting future land use scenarios for impacts assessments.
The Global Change Assessment Model (GCAM) has been developed to better understand interactions between natural and human systems and anticipate their coevolution in the future. It combines representations of the global economy, energy systems, 15 agriculture and land use with a representation of terrestrial, ocean and atmospheric biogeochemical, ice-melt and climate processes (Clarke et al., 2007;Kim et al., 2006). GCAM is extensively used to explore the direct effects of changes in exogenous assumptions, such as population, technology, yield improvements, economic and environmental policies, as well as their system-wide repercussions. These analyses are performed using GCAM as a 20 standalone model or coupled to specialized models with greater capabilities for impact assessments. For example, GCAM provided greenhouse gas emissions and land use/land cover (LULC) projections for one of the 4 scenarios of the IPCC 5 th Assessment Report, the Representative Concentration Pathways 4.5 (RCP4.5, Thomson et al., 2011). Those data were used as inputs to climate models to generate climate projections and perform a wide 25 range of impact, adaptation and vulnerability assessments (Pachauri et al., 2014).
Coupling models involves adapting the data and models so that the flow of information is consistent with the source model and can be assimilated by the destination model. In the case of GCAM, spatial resolution is a technical challenge for coupling studies looking at the impacts of LULC change. GCAM represents these dynamics at sub-regional scale while 30 Earth system models and regional natural resource (i.e., agriculture and forestry) models typically operate at gridded scales. For example, the U.S. is divided into 8 agro-ecological zones (AEZs, Monfreda et al., 2009), of various sizes (up to 2.5 Million km 2 ) and shapes. GCAM tracks the share of land categories in each AEZ, but not where they are actually located within each zone. This issue was addressed for the IPCC 5 th Assessment Report with 35

Overview of the GCAM model
GCAM is global in scope and spatially disaggregated into geopolitical regions that interact through international trade. GCAM computes supply, demand and prices in 5-year intervals for a variety of primary energy carriers, secondary energy carriers, and agricultural 15 and forest products. Economic behavior in GCAM is determined as a result of markets in which the supplies and demands for economic goods and services are assumed to clear on the model's time step. As supplies and demands shift in GCAM's regions, market clearing prices change as do production, consumption and trade patterns.
The terrestrial system is represented in GCAM to account for its role in food, wood, and 20 energy production  and in the carbon and water cycles Le Page et al., 2013), providing capabilities to explore interactions and the implications of environmental policies (Figure S1 in Supp. Material, Kraucunas et al., 2014;Thomson et al., 2011). While we focus here on describing the spatial scale and land types in GCAM because they are essential aspects for the downscaling, a more technical description of the 25 terrestrial module is available in the literature Wise et al., 2014;Wise and Calvin, 2011).
GCAM represents the world terrestrial biosphere into 283 spatial units, the result of the intersection of two spatial scales. First, the world countries are aggregated into 32 geopolitical and socio-economic regions (Figure 1a), a scale at which most economic sectors are 30 represented (e.g. industrial production, energy use, trade, and natural resources). Second, global land area is split into 18 agro-ecological zones (AEZs, Figure 1b) to represent natural ecosystems and agricultural activities, providing a climate-based zoning to better account for vegetation and crop productivity. These 18 AEZs are intersected with the 32 regions to get the 283 unique combinations of region and AEZ.
Each of these 283 region/AEZ can have up to 22 types of land cover: five types of natural ecosystems and 17 types of managed lands (Table 1). Initial land areas for the first GCAM time step (2005) are inferred from the History Database of the Global Environment 5 (HYDE, Klein Goldewijk et al., 2011), the FAO RESOURCESTAT database (FAO, 2010) and potential vegetation data (Ramankutty and Foley, 1999a), as detailed in Kyle et al. (2011). For future time steps, GCAM integrates a range of drivers to determine LULC change, including food demand (population growth, diet changes), region/AEZ level crop productivity and costs (e.g. labour, fertilizer), energy demand (bioenergy crops) and 10 environmental policies, among others. A major application of GCAM consists in exploring LULC projections under alternative configurations of these drivers Thomson et al., 2010Thomson et al., , 2011. For example, while the reference scenario projects continuing deforestation and expansion of global agriculture in response to growing food demand, implementing a terrestrial carbon market shifts the economics of land use decisions towards 15 agriculture intensification and afforestation. These scenarios are depicted at the region/AEZ scale, as illustrated in Figure 2 for croplands, but can be gridded with the downscaling algorithm detailed below.

Downscaling algorithm
The GCAM LULC downscaling method is based on previous work by West et al. (2014) 20 at the national scale for the U.S. It consists of allocating tabular land areas from a relatively coarse set of spatial units to a higher resolution land cover grid (West et al., 2010). In the case of GCAM, these spatial units are defined by the intersection between regions and AEZs (Figure 1), and the resolution of the final grid for the global analysis presented here is 0.25 degrees (~28km at the equator). The algorithm relies on gridded observations of LULC and 25 on a set of user-defined rules to spatially allocate land types within each region/AEZ ( Figure  3).
The algorithm starts with a pre-processing phase of the GCAM and observation-derived gridded LULC data to harmonize their attributes. It is labeled the reconciliation phase, and consists of (1) aligning total land areas of each region/AEZ from both data sources, and (2) 30 converting their respective land schemes to a common set of land types that are ultimately downscaled.
Following data reconciliation, the actual downscaling is performed. The algorithm starts in the base-year (first time step, e.g. 2005), and modifies the observed gridded data until land shares in each region/AEZ are equal to the shares in the GCAM model. If a given region/AEZ has 100km 2 of crops in GCAM but only 80km 2 in the observation, 20km 2 of crops will be created by converting other land types that are in excess. The geospatial 5 allocation of the additional 20km 2 of cropland depends on a number of downscaling rules that are described in the next section, including land use transition priorities and spatial constraints (e.g., nutrient availability for agriculture). Once completed, the base-year downscaled data consists in gridded LULC that is consistent with GCAM at the region/AEZ scale with spatial patterns similar to observed LULC ( Figure 3). This gridded 10 GCAM LULC in the base-year then becomes the starting point to which LULC change of the following time step is attributed, following the same downscaling rules. This process is repeated for each time step until the entire scenario is downscaled (e.g. 2005-2100).
The next sections describe the code and its user-defined parameters in more details. For the sake of clarity, it is based on a relatively simple configuration of the downscaling, 15 referred to as "default configuration", but many aspects are flexible as specified in the text. The spatial LULC data used in the default configuration are from the MODIS MCD12Q1 version 5.1 product for the year 2005, PFT Type 5 classification (Friedl et al., 2010), aggregated from 500m to 0.25 degree resolution. Although the downscaling aims at maintaining consistency with the original GCAM land outputs, the total land area in each region/AEZ has to be adjusted to match the observation data. Otherwise, expansion on water (case of more land in GCAM) and removal of observed terrestrial land (case of less land) would be necessary, which was considered unrealistic. All 25 GCAM land types (GLTs) are adjusted by the same ratio within a given region/AEZ:

Reconciliation phase
Where A G and A O are the GCAM and observed areas, A GC the final, adjusted GCAM areas, R and Z the region/AEZ and L the GCAM land type considered. When compared to 30 MODIS, total area of most regions/AEZs differs by less than 3%. Globally, total land area is 126.9 Million km2 in GCAM and 128.1 Million km2 in MODIS (not including small islands and other territories that are not represented in GCAM and thus remain equal to observations throughout the downscaling).
2.1.2 Aggregation,to,common,land,type,categories, Once total land areas are the same, both the GCAM and spatial land types (GLTs and SLTs) are aggregated to a common scheme of final land types (FLTs). For example, the 5 MODIS data have only one cropland type, while GCAM has 13 different types. All GCAM crop types are thus aggregated to a single crop category to enable the downscaling: if GCAM corn was kept as an individual land type, there would be no indication of where those corn crops should be allocated from the MODIS data. In the default configuration, the common scheme includes 7 FLTs: forests, shrubs, grass, crops, urban land, snow, and sparse 10 vegetation.
Aggregation of the spatial data is user-defined with an input table (Table 2). For a given SLT, a number from 0 to 1 determines the share of that land type that goes to each of the FLTs. Typically, the share is either 0 or 1, meaning that an SLT is entirely attributed to a single FLT (Table 2). However, some land cover classification products include mixed land 15 types (e.g. mosaic of crops and forests). In such a case, the area under that type could be split into crops and forests by using shares of 0.5.
Aggregation of the GCAM data at the region/AEZ scale follows the same concept (Table 3), only with a different approach when a given GLT has to be split into 2 or more FLTs: the shares received by each FLT are determined by the spatial data in that 20 region/AEZ. For example, the GLT RockIceDesert can qualify to both the snow and the sparse FLTs. However, the fraction that should go to each depends on the region/AEZ considered: most would go to snow in Greenland, and to sparse in central Australia. The table is thus filled with 1s for both (Table 3) to indicate that this GLT needs to be split, and the code computes the actual split as the share of snow and sparse seen in the spatial data (e.g. 25 MODIS) for each region/AEZ.

Downscaling rules
Once the GCAM and spatial data are reconciled and aggregated to the same land type categories (i.e. FLTs), downscaling is performed based on a set of user-defined rules: a "treatment order" defining which FLTs are downscaled first, an "intensification versus 30 expansion ratio", "transition priorities" defining what type of land swaps are favored, and "spatial constraints" which attribute to each grid-cell a likelihood to receive an expanding FLT. All these rules influence the spatial patterns of the final downscaled GCAM data (see sensitivity analysis in Sect. 3).

Treatment,order,
The algorithm downscales all FLTs one after the other; an order thus has to be defined. The input files include a table where users specify that order (Table 4).

Intensification,versus,expansion,ratio,
When the GCAM projections indicate that the area of a given FLT is increasing, the 5 additional area can be downscaled on grid-cells where the FLT already exists -which is referred to as intensification, or on grid-cells where it does not yet exist -referred to as expansion. In the real world, the ratio of intensification versus expansion varies in space and time. In North America for example, land giveaways, infrastructure development and a number of other factors led to a large-scale westward expansion of agricultural activities from 10 1800 to 1950, then to their intensification until today, with most of the Corn Belt now featuring more than 80% crop cover (Ramankutty and Foley, 1999b). In the default configuration presented here, the intensification ratio (intens_ratio) is set to 0.8 globally, and is part of the sensitivity analysis (Sect. 3). The code can be modified to define specific ratios for different regions or time periods, however. Note that the ratio is a target, which sometimes 15 cannot be met. In the extreme case where croplands exist in all grid-cells of a region/AEZ for example, expansion is impossible. The code then applies the desired expansion target as intensification instead.

Transition,priorities,
At each time step, GCAM computes LULC change at the region/AEZ scale, but does 20 not give any indication on land use transitions. For example, if crops and forests increase while shrubs and grass decrease, the share of each possible conversion -or transition -is not known (shrubs to crops, grass to crops, shrubs to forests, grass to forests). It is however an aspect we have to represent when downscaling LULC change to a spatial grid, and is also relevant information for Earth System modelers (see Discussion). A preference order for 25 land use transitions is thus user-defined in the parameter files, for each FLT (Table 5). In the default configuration, crops are set to preferentially replace urban land, then grasslands, then shrublands, then forests, etc. This is a preference only: specific transitions can only happen if the FLT to be converted is projected to decrease in GCAM. In the example given above, crops could not be increased into forested land because forests are also projected to increase. 30 This is related to the concepts of "net" versus "gross" LULC change (see discussion).

Spatial,constraints,
Any kind of spatial constraints can be used to influence the downscaling. For a constraint to be implemented, users must provide the input data at the resolution considered and parameterize its influence. The input data must be bound from 0 (fully constraining) to 1 (no constraint). Then the parameterization defines the relative contribution of that specific constraint in influencing the downscaling, and is specific to each FLT. In the default 5 configuration illustrated in Table 6, three spatial constraints are listed: kernel density, soil workability and nutrient availability.
Kernel density represents the proximity and density of a given FLT around a given gridcell. It is computed by default in the code, adjusting to the new FLT distribution at each time step. It was implemented under the assumption that new areas of a given FLT tend to 10 appear close to where it already is (e.g. desertification and crop expansion around agricultural areas).
where KD is the kernel density, n the number of neighboring grid-cells included in the 15 computation, F FLT the fraction of the FLT in the grid-cell considered, and D the distance of that grid-cell to the grid-cell for which the kernel density is being computed. A user-defined parameter -radius -represents the size of the moving window used to compute KD for each grid-cell, thus controlling the number n. Note that the kernel density depends on land types in the surrounding grid-cells, thus has a different influence on the downscaling than 20 intensification, which is activated based on the considered grid-cell only.
Soil workability and nutrient availability are two indicators of agricultural suitability from the Harmonized World Soil Database (HWSD, Fischer et al., 2008). In Table 6, kernel density contributes 100% of the spatial constrain for all FLTs but for crops, for which it contributes 40% while nutrient availability contributes 40% and soil workability the 25 remaining 20%. Note that the parameterization can include negative numbers to indicate that a constraint has the opposite influence: one could for example favor desertification in low nutrient areas by having a negative value in the cell at the intersection of the "nutrient availability" row and "sparse" column in Table 6. Based on those spatial constraints contributions, each grid-cell is attributed an index S from 0 to 1 assessing how suitable they 30 are to receive the considered FLT. In the case of crops and the default configuration: where KD is the kernel density, NA the nutrient availability, and SW the soil workability, which relative contributions (40, 40 and 20, respectively) are specified in Table 6. Each spatial constrain being a dimensionless index bound from 0 to 1, the suitability index is dimensionless as well. To better illustrate how the downscaling is actually computed, we provide a specific 15 example: in a given region/AEZ, crops are projected by GCAM to increase by 100km 2 from 2015 to 2020, while grasslands and forests decrease by 70km 2 and 30km 2 , respectively. All other FLTs remain unchanged. Of the 100km 2 cropland increase, 80km 2 are set to occur as intensification and 20km 2 as expansion (user-defined intensification versus expansion ratio of 0.8, see Sect. 2.2.2Error! Reference source not found.). 20

Computational flow and implementation of the downscaling rules to
The code goes through the first intensification loop (Figure 4), following the treatment order (Table 4). Urban, snow and sparse land types are skipped as they remain unchanged, then the loop goes to crops, which have to be intensified by 80km 2 . The code thus goes through the land use transition priorities for crops (Table 5), starting with urban land, which does not have any area to spare and therefore remains unchanged. The second priority is 25 grass, which does have land to spare as it is projected to decrease by 70km 2 . All grid-cells in that region/AEZ containing both crops and grass (intensification is for grid-cells which already have crops) are thus selected, and their suitability index (S, sect. 2.2.4) is computed. Assuming that the more suitable a grid-cell the more intensification it will receive, the 70km 2 of potential conversion are tentatively distributed to the selected grid-cells according to their 30 suitability index: where TC stands for tentative conversion, gc the grid-cell and gcs the group of grid-cells selected because they contain both crops and grass. The actual conversion is the minimum value between TC and the area of grass on each grid-cell: where AC stands for actual conversion and GA for grass area. Given that the tentative 5 conversion might not be possible because the existing grass area is lower, the total conversion achieved after the first computation can be less than 70km 2 while some grid-cells might still have grass to spare. The computation is therefore repeated until 70km2 is reached, or until there are no more grid-cells with pre-existing crops and with grass to spare. For this example, we will assume that 60km 2 of grass was converted to crops. The total 10 intensification target being 80km 2 , 20km 2 has to be done on non-grass areas. The code thus goes to the next crop transition priority ( Table 5): shrublands is skipped as it remains unchanged, but forests are projected to decrease by 30km 2 . The intensification computation is repeated to convert forests to crops. We will assume that only 7km 2 of forest-to-crop conversion could be achieved. At the end of the intensification, 67km 2 of the projected 15 100km 2 cropland increase has been allocated, 60km 2 by converting grasslands, 7km 2 by converting forests, leaving 33km 2 to be done by expansion. Of those 33km 2 , 10km 2 have to replace grasslands as they are projected to decrease by 70km 2 total, 60 of which have been converted by crop intensification. The remaining 23km 2 have to replace forests as they are projected to decrease by 30km 2 , 7 of which have been converted by crop intensification. 20 The expansion function is similar to the intensification function, except that only a fraction of the pre-selected grid-cells are selected to receive the remaining crop area, to avoid patterns of ubiquitous expansion, especially for croplands which generally expand in clusters around newly cultivated lands. Continuing with the applied example, the code goes through the same treatment order and transition priorities, thus first expanding crops into grasslands. 25 Because crop expansion can only occur on grid-cells that do not yet have crops, all grid-cells in the considered region/AEZ containing grass but not crops are pre-selected as "candidate" grid-cells to apply the grass conversion to crops. The candidate grid-cells are then ranked based on their suitability index. The code selects the most suitable candidate grid-cells following a percentage defined by the user (selection threshold, 25% in the default 30 configuration) or -if stochastic selection is turned on -using the suitability index of each grid-cell as the probability it will be selected in a Bernoulli trial. Only the selected grid-cells will be used when computing the tentative and actual conversion, in the same way as for intensification. 10km 2 of crops are expanded following this scheme into grasslands, and 23km 2 into forests. 35 The final intensification ratio is thus 67/100 = 0.67, less than the user-defined target (0.8 in the default configuration) because of a land-driven limitation on the amount of intensification that can be achieved. In other cases, the reverse situation occurs and a larger proportion of LULC change has to be done through intensification, conducted in the second round of intensification (see Figure 4). 5 At the end of a full downscaling run, gridded LULC areas are obtained for each of the user-defined land types and for each time step, which can be interpolated to annual data. LULC transitions (the amount of land switched between each pair of land types in each gridcell) are also explicitly tracked by the algorithm given their importance for the carbon cycle (Brovkin et al., 2013). They are however not exported by default given their large size and 10 specific format requirements as input for models capable of using them.

Method
The downscaling code was applied to historical LULC change data with a range of alternative configurations to evaluate how realistic the spatial allocation is and to quantify the 15 sensitivity of the results to the user-defined parameters.
Gridded estimates of historical land use from the HYDE database (version 3.1) were combined to gridded estimates of potential vegetation from the SAGE database to create base-year gridded maps of LULC and Region/AEZ aggregated data of LULC change as inputs to the downscaling code. Although the HYDE data include estimates back to 10000 20 B.C., we contain the evaluation analysis to the 1700-2005 period. About 81% of today's cropland area in the database has been established within that period. Note that HYDE is a reconstruction product and shows substantial deviations from satellite data such as MODIS, but it is used as what we considered the best option to evaluate the downscaling and sensitivity to the parameters, especially due to the temporal span. Using a satellite product 25 directly (e.g. MODIS) would restrain the evaluation to a 10-20 years period with much less land use change. The ratio of misclassified over real land use change would also be an issue, with for example up to an order of magnitude between apparent versus real change in the MODIS annual land cover product (Friedl et al., 2010).
Six alternative configurations were defined for the 10 parameters shown in Table 7,  30  Table 8 and Table 9. The downscaling was thus run 60 times, each time maintaining the default configuration but changing the value of the considered parameter.

!! ! Equation!6!
where GCAMdc is the downscaled GCAM crop area, HYDEc the crop area in the HYDE data, and byear is the base-year of the run (first time step). The metric is computed 5 for single or aggregated regions/AEZs and can vary from -1 to +1. It represents the fraction of land use change that was allocated to the "right" grid-cells, i.e. in agreement with the land use change computed from HYDE between the base year and 2005. At the extremes, -1 means the downscaling did exactly the opposite to what was observed (decreasing crops where it was supposed to increase and vice-versa). A value of +1 is a perfect match. The 10 metric was computed for tropical, temperate and boreal biomes separately, as delineated in Figure 1b based on the AEZs.

Results
The historical downscaling of LULC change starting from the 1900 base-year is presented in Figure 5. Europe had already acquired most of today's cropland extent by 1900, 15 but all other regions experienced a substantial increase in cropland area, both in the form of intensification (e.g. India) or expansion (e.g. the corn belt in North America). The downscaling algorithm leads to a spatial 2005 cropland distribution that is in general agreement with the HYDE data, yet lacking their smooth patterns (e.g. North America, India in Figure 5b,c). However, this smooth aspect seems to be an artifact of the HYDE data 20 when compared to the MODIS data ( Figure 5c and Figure 7a).
The performance metric generally ranges from 0.3 to 0.7 according to the biome and configuration considered (Figure 6), indicating that the downscaling allocates fairly well the changes in cropland area (the metric is bound from -1 to 1). Performance and sensitivity to the downscaling parameters are quite different between tropical, temperate and boreal 25 biomes, indicating that LULC dynamics differ and cannot be captured by a single downscaling configuration. Overall, however, sensitivity to the intensification versus expansion ratio and to the relative contribution of kernel density are the strongest, suggesting the importance of proximity to pre-existing agricultural areas for the allocation of new crops. The performance of the downscaling is also clearly influenced by the base-year, 30 especially in the case of tropical regions, and, expectedly, by the aggregation of the output LULC to coarser resolutions.
Performance was also evaluated for the downscaling of forests ( Figure S2), which is a critical aspect for many environmental studies (e.g. carbon cycle, biodiversity). The results are mostly relevant for the tropical biome, where the evaluation shows similar patterns of sensitivity to those of croplands. Both temperate and boreal biomes experienced relatively little forest change from 1900 to 2005. 5

Future projections 4.1 Objectives and configuration
We applied the downscaling to two GCAM scenarios of 2005-2100 LULC projections to illustrate the capabilities of the algorithm and its potential applications for environmental impact studies. In the "reference" scenario, the economy and technological developments 10 are not targeted by any environmental policies, thus driving human-and natural-system dynamics in a business-as-usual fashion. This scenario features substantial crop expansion to meet the increasing food demand (growing population, diet changes, etc). In the Mitigation Policy 4.5 (MP4.5) scenario, an economic policy in the form of a global carbon market applied to industrial, fossil-fuel and terrestrial emissions is implemented to limit radiative 15 forcing in 2100 to +4.5W.m -2 . Incentives to carbon sequestration in the MP4.5 lead to afforestation in many regions, while agriculture tends to develop in high-yield areas. It is a replicate of the Representative Concentration Pathways 4.5 scenario (RCP4.5) produced by GCAM for the IPCC 5 th assessment report (Thomson et al., 2011, p.4), but instead uses the latest GCAM release (GCAM4). 20 Contrarily to the historical evaluation analysis that was using HYDE data for the baseyear gridded LULC, the projection analysis starts in 2005 with observation-derived MODIS LULC. The downscaling is run with the default configuration presented in Sect 2.

Results
The downscaled reference and MP4.5 LULC scenarios feature key differences in 2100 25 due to their specific economic and policy context (Figure 7). Population reaches about 9 billion in 2060 in GCAM (slowly decreasing thereafter), contributing to increasing food demand that cannot be met with the projected yield improvements on current agricultural areas. As a result, global crop area is projected to increase by 10% in the reference scenario , with substantial expansion in tropical and sub-tropical forests (Figure 7c,d),30 compensated by afforestation in other regions (0.3% deforestation globally). In the MP4.5 scenario, economic incentives for terrestrial carbon sequestration lead to a different solution. Afforestation becomes a profitable option for landowners and global forested area increases by 34%, replacing agriculture in many regions (Figure 7e,f). To meet global food demand, agricultural production is intensified in high-yield areas (e.g. India, China) and expands into marginal lands with the support of irrigation and other technological developments (e.g. Western US, Middle East). Globally, those changes of agricultural practices enable a reduction of crop area by 10.4%. Note that these general LULC trends are determined by 5 GCAM, including deforestation and afforestation: the downscaling does not make any region/AEZ-scale land use change decision, but instead spatially delineates those decisions to a gridded format. Note also that the downscaled 2005 GCAM crops show much more similar patterns to MODIS than those obtained in the historical evaluation analysis ( Figure  7a versus Figure 5b). This is because 2005 was the last year in the evaluation analysis while it 10 is the base-year for the projections.

Discussion
GCAM models human and natural systems at the scale of regions and AEZs, but the LULC downscaling system presented here enables a gridded representation of the land. The gridded outputs are consistent with the GCAM projections and can be influenced with a 15 number of user-defined parameters. The optional spatial constrains provide the capability to adjust the downscaling to capture regional land-use change dynamics. One example illustrated in this study consists in using soil characteristics (i.e., nutrients, workability) to drive the allocation of agricultural activities. The downscaling system was primarily developed as an integrating tool, enabling LULC change impact assessments by providing 20 gridded inputs to other models that cannot be run with the original GCAM data at region/AEZ scale. The gridded outputs may also be used to directly analyze spatial patterns within and between GCAM scenarios, albeit with the understanding that realistic results depend on the chosen configuration for the region, time period and aspects of LULC that are being considered. 25 Models that might be coupled to GCAM through LULC downscaling all have specific input data requirements -different land types or spatial resolution for example. The downscaling system can be easily adapted to meet a number of these requirements, without any edits to the code. Any number of final land types (FLTs) can be downscaled provided that base-year data is available and that parameters specific to each land type are provided 30 (e.g. aggregation rules, transition priorities). For example, in the case of a model requiring separate broadleaf/needle-leaf and deciduous/evergreen forest types -for which base-year distribution is readily available in MODIS - Tables 2-6 need to be modified into Tables S2-S6 (supp. Material) and all 4 types of forests will be downscaled. Another example consists in downscaling specific crop types for agricultural models (instead of a single crop category as 35 shown in this study), which requires crop-specific base-year data and modified Tables 2-6. The necessary data and configuration tables are provided as a beta-version with the downscaling system (see user manual). Other aspects that are flexible include the downscaling resolution, the spatial domain (e.g. continental, regional focus), and the baseyear LULC data (e.g. higher quality regional datasets). 5 The downscaling method can be used for a number of applications related to LULC change. It contributed to a study assessing global gridded carbon fluxes from agricultural production and consumption (Wolf et al., 2015), by downscaling FAO crop inventory data at the county/state/province level to a 0.05 degree grid. It was also applied to downscale specific crop types as well as irrigation practices from a detailed USA version of GCAM 10 ( Huang et al., in prep) to study the response of terrestrial hydrology to future scenarios of LULC change.
Despite the flexibility, some aspects are intrinsic to the GCAM model and the downscaling code and might be a limitation for certain applications. GCAM models net LULC change within each spatial unit (e.g. region/AEZ), as opposed to gross LULC, and 15 thus minimizes the amount of LULC change from one timestep to the other. For example, a crop increase by 100km 2 in a region/AEZ could be the result of several LULC change storylines, including one where crops increase by 150km 2 in some part of the region/AEZ and decrease by 50km 2 elsewhere. In that case 200km 2 of land are converted (gross change), with consequences for carbon emissions, landscape fragmentation or the water cycle. The 20 downscaling does not try to model those dynamics a posteriori from the GCAM net land use change. A new version of GCAM is currently being developed to represent land use dynamics annually, which the downscaling algorithm can process. The net versus gross LULC change issue will be mostly eliminated for these scenarios as multiple land conversions within the same year are rare. The lack of flexibility to account for specific 25 dynamics through the regional and temporal domain of the downscaling is another limitation, which can be addressed with code changes. As shown in the evaluation and sensitivity analysis, regional performances vary depending on the amount and type of observed LULC change and on the period considered. For example, patterns of agricultural expansion into the Amazon forest will be best downscaled under a specific configuration 30 that would be sub-optimal to represent intensification and encroachment into semi-arid areas in the U.S. Great Plains. Similarly, that same configuration would not be the best to reproduce the move towards intensification and away from deforestation observed in the Amazon basin since 2004. The parameterization is currently common to all regions and for the entire downscaling period, but can be made flexible with relatively simple edits to the 35 code.