Improving the dynamics of Northern Hemisphere high-latitude vegetation in the ORCHIDEE ecosystem model

Processes that describe the distribution of vegetation and ecosystem succession after disturbance are an important component of dynamic global vegetation models (DGVMs). The vegetation dynamics module (ORC-VD) within the process-based ecosystem model ORCHIDEE (Organizing Carbon and Hydrology in Dynamic Ecosystems) has not been updated and evaluated since many years and is known to produce unrealistic results. This study presents a new parameterization of ORC-VD for mid-to high-latitude regions in the Northern Hemisphere, including processes that influence the existence, mortality and competition between tree functional types. A new set of metrics is also proposed to quantify the performance of ORC-VD, using up to five different data sets of satellite land cover, forest biomass from remote sensing and inventories, a data-driven estimate of gross primary productivity (GPP) and two gridded data sets of soil organic carbon content. The scoring of ORC-VD derived from these metrics integrates uncertainties in the observational data sets. This multi-data set evaluation framework is a generic method that could be applied to the evaluation of other DGVM models. The results of the original ORC-VD published in 2005 for mid-to high-latitudes and of the new parameterization are evaluated against the above-described data sets. Significant improvements were found in the model-ing of the distribution of tree functional types north of 40 • N. Three additional sensitivity runs were carried out to separate the impact of different processes or drivers on simulated vegetation distribution, including soil freezing which limits net primary production through soil moisture availability in the root zone, elevated CO 2 concentration since 1850, and the effects of frequency and severity of extreme cold events during the spin-up phase of the model.

The terrestrial biosphere plays an important role in the carbon (Schimel, 1995;Ciais et al., 2013), water (Oki and Kanae, 2006) and energy balances of Earth (Trenberth et al., 2009). Interactions between vegetation and the atmosphere involve complex biophysical and biogeochemical processes and feedbacks (Heimann and Reichstein, 2008;Foley et al., 2003). To simulate past and future changes on long timescales, Earth system models must represent how the distribution and structure of ecosystems respond to changes in climate, CO 2 and land use. This need provides the motivation for the development of dynamic global vegetation models (DGVM). In DGVMs, vegetation distribution, carbon stocks and fluxes exchanged with the atmosphere are simulated through fast processes (canopy exchange, soil heat and moisture dynamics, photosynthesis), intermediate processes (vegetation phenology, carbon allocation and growth, soil carbon decomposition) and slow processes (vegetation dynamics, recovery from disturbances) (Sitch et al., 2003; Published by Copernicus Publications on behalf of the European Geosciences Union. Krinner et al., 2005). DGVMs have been used to study the response of ecosystems to recent climate change (e.g., Piao et al., 2006) and to project the evolution of the coupled carbonclimate system (e.g., Cox et al., 2000). The coupling of vegetation dynamics with a climate model allows for the inclusion of vegetation-atmosphere interactions related to ecosystem migration in global climate simulations (Quillet et al., 2010). The representation of vegetation structural dynamics in DGVMs builds on principles previously applied in biogeography models and "gap models" (Sitch et al., 2003). Biogeography models define the patterns of vegetation physiognomy based on plant functional types (PFTs) driven by temperature, precipitation, CO 2 , climate-related disturbances, and soil properties (Prentice et al., 1992;Haxeltine and Prentice, 1996). Gap models on the other hand simulate forest dynamics at patch scale, including demographic processes (recruitment, growth, death), competition, and disturbance (Prentice and Leemans, 1990;Bugmann, 2001).

D. Zhu et al.: Improving dynamic vegetation model in high latitudes
Vegetation distribution largely depends on bioclimatic limits and competition between species, which are regrouped into PFTs in most DGVMs (Woodward, 1987;Sitch et al., 2003;Krinner et al., 2005). Bioclimatic limits consist of direct limiting factors (e.g., minimum temperature for survival) and indirect limitations that control primary productivity and in turn the competitive ability of a PFT (e.g., optimal temperature for photosynthesis, various temperature and moisture phenological controls of leaf-out and senescence). PFTs with a better tolerance to extreme climate conditions and higher growth efficiency during the growing season are more competitive than others and their distribution will therefore expand. The competence of any PFT is dependent on the underlying plant traits that define this PFT. The traits for a given PFT are fixed in most DGVMs but can also be variable within PFTs based on trait-climate relationships derived from a trait database. For example, Verheijen et al. (2013) conducted a variable trait simulation with the JSBACH DGVM for three leaf traits (specific leaf area, and the constants defining the maximum rate of photosynthesis, v cmax ,j max ), showing significant difference in predicted dominant PFTs compared with fixed trait simulation. Higgins et al. (2014), however, pointed out the inherent limitations in Verheijen et al. (2013) using a statistical method to parameterize plant trait diversity and proposed that the focus should not be on trait values, but rather on the tradeoffs between traits . In this study, we will use a fixed trait approach to describe the characteristics of each PFT in ORCHIDEE (Organizing Carbon and Hydrology in Dynamic Ecosystems;the PFTs are listed in Table 1).
ORCHIDEE is the terrestrial surface component of the Institut Pierre Simon Laplace (IPSL) Earth system model. Since the first model description by Krinner et al. (2005), the representation of existing processes has been improved and new processes have been implemented, such as a physically based multi-layer soil hydrology scheme (de Rosnay et al., 2002) and a scheme describing soil freezing and its effects on root-zone soil moisture and soil thermodynamics (Gouttevin et al., 2012). These new parameterizations have been evaluated for static runs in which the geographical distribution of PFTs is specified based on observed satellite land-cover information. Yet, their influence on the simulated PFT distribution when the vegetation dynamics module is activated has not been addressed. The original vegetation dynamics module in ORCHIDEE (hereafter ORC-VD) described by Krinner et al. (2005) was adapted from the LPJ model (Lund-Potsdam-Jena;Sitch et al., 2003) with minor modifications. Unlike the rest of the model, ORC-VD has not been updated since the Krinner et al. (2005) description, and it produces unrealistic results in dynamic runs. For example, Woillez et al. (2011) have shown that the boreal forest area is largely modeled as broadleaf deciduous, whereas in reality it is mainly comprised of needleleaf trees.
The work described here improves ORC-VD, with a focus on Northern Hemisphere vegetation dynamics. Different sets of recent observations have been used to evaluate model performance using quantitative metrics, either related directly to the spatial distribution of vegetation (satellite-observed landcover and tree fraction) or resulting from it (data-driven spatial distribution of gross primary production (GPP), biomass and soil carbon stocks). The evaluation methodology developed here could be used for other DGVMs as well, and is thus of general interest for the DGVM modeling community.
We present a new parameterization of vegetation dynamics in the ORCHIDEE high-latitude version (ORC-HL) described by Gouttevin et al. (2012), with modifications to the equations and parameters describing tree mortality, thermal constraints and a calibration of photosynthesis parameters (v cmax /j max ) (Sect. 2.2). The results of the original module (ORC-HL-OVD) and of the new parameterization (ORC-HL-NVD) are evaluated (Sects. 4,5). Because the biogeochemical and physical processes that characterize high latitudes interact in a complex way with the processes that control vegetation structure, in Sect. 6 we performed and analyzed factorial model simulations changing one process or driver at a time to isolate their impacts on vegetation distribution. In addition, because the initial distribution of the vegetation in 1850 is sensitive to preindustrial climate conditions, we also tested the effect of the return frequency of cold extremes relating to tree mortality during the spin-up phase of the model and discussed its implications. T min,crit : minimum temperature limitation ( • C), below which the mortality rate will increase as Eq.
(3). k BG : maximum background mortality rate (yr −1 ) for tree PFTs. v cmax,opt : optimal maximum rubisco-limited potential photosynthetic capacity (µmol m −2 s −1 ). j max,opt : maximum rate of photosynthetic electron transport (µmol m −2 s −1 ). a crit : critical leaf age for leaf senescence (days); the dependence of v cmax and j max on leaf age for PFTs 4 and 7 was eliminated as described in Sect. 2.2.3. 1993;de Rosnay and Polcher, 1998), and STOMATE (Saclay Toulouse Orsay Model for the Analysis of Terrestrial Ecosystems; Viovy, 1997) which simulates carbon dynamics at a daily time step, including carbon allocation, biomass accumulation, litter and soil carbon decomposition, and phenology. STOMATE includes a dynamic vegetation module with equations adapted from the LPJ model (Sitch et al., 2003) as described by Krinner et al. (2005). ORC-HL is an evolution of ORCHIDEE including additional high-latitude processes, described by Gouttevin et al. (2012). In particular, the simple two layer soil hydrology (Ducoudré et al., 1993) was replaced by an 11-layer diffusion scheme (de Rosnay et al., 2002), which describes water infiltration and diffusion through soil in a physically based way. A soil-freezing scheme is implemented in the 11-layer model to calculate liquid and ice water fractions in each soil layer. This scheme has been shown to improve the representation of pan-Arctic river discharge and soil thermal regimes in permafrost regions (Gouttevin et al., 2012). The basic structure of ORC-HL used in this study is shown in Fig. S1 in the Supplement, in which processes different from Krinner et al. (2005) are marked red. Figure 1 is a schematic of ORC-VD, which simulates the dynamic area covered by each PFT as functions of bioclimatic limitation, competition, mortality and establishment. The basic equations to calculate fractional cover of each PFT are listed below:

Modifications to ORCHIDEE vegetation dynamics
where V is fractional vegetation cover (dimensionless); CA is crown area of individual plant (m 2 ); P is population density (m −2 ); E is establishment rate (m −2 d −1 ); M is mortality rate (d −1 ), including components described in Sect. 2.2.1. The modifications made in this study are described in the following, shown red in Fig. 1.

Tree mortality
Mortality is defined as the reduction in population density during each time step (daily). The overall tree mortality rate (maximum 1) is the summation of each component including background mortality (M BG ), extreme-coldness-(M EC ) and spring-frost-related (M SF ) mortalities, fire-induced mortality, and light competition-induced mortality.

Background mortality
In ORC-HL-OVD, the default calculation of mortality rate for tree PFTs was the inverse of a PFT-specific longevity parameter (30 years for tropical trees, 40 years for temperate trees, 80 years for boreal trees). An alternative calculation in ORC-HL-OVD was a dynamic mortality related to growth efficiency, inherited from LPJ (Sitch et al., 2003): where M BG is the dynamic background mortality for tree PFTs (d −1 ); k BG is the maximum background mortality rate (yr −1 ), set to 0.1 for all tree PFTs in ORC-HL-OVD; and V is vigor or growth efficiency, defined as the ratio of the net annual biomass increment to maximum LAI of the preceding year. V equals to 0 in case of net annual biomass loss. The default calculation defines a constant mortality for each PFT in all grid cells, without considering the variations in mortality of that PFT caused by adaptation to different climate conditions. The dynamic mortality formulation M BG takes into account the influence of growth efficiency on tree mortality and thus can simulate the competitiveness of tree PFTs under various climates, but it does not consider longevity differences between PFTs. In the new version, ORC-HL-NVD, the dynamic M BG formulation, Eq. (2), is again adopted, but k BG is set to different values for tropical (0.14), temperate (0.1) and boreal (0.05) tree PFTs, proportional to the inverse of their respective longevities in the original ORC-HL-OVD model code.

Tree mortality during extremely cold days
In ORC-HL-OVD, when instantaneous minimum temperature on each day (T min ) drops below a PFT-dependent threshold (T min,crit , Table 1), the corresponding tree PFT was completely eliminated. This assumption makes the vegetation distribution highly sensitive to the minimum temperature during a few extremely cold days, which varies from year to year. In reality, trees within a grid cell are unlikely to all die during a single extremely cold event and, moreover, at the resolution at which global models usually run (0.5 • or coarser), a single minimum temperature cannot depict the heterogeneity within each grid cell. Therefore, we replaced the original threshold-based LPJ equation by a linearly increasing mortality rate as a function of daily minimum temperature, such that when T min < T min,crit : where M EC is mortality caused by extreme coldness in winter (d −1 ); k EC = 0.04, estimated by trial and error based on the return frequency of below-threshold T min both within and between years according to the CRU-NCEP climate forcing. The PFT-specific T min,crit (Table 1) confines the distribution of each tree PFT to their adaptable temperature zones. Boreal needleleaf deciduous trees (PFT 9) have no T min,crit value, meaning that they are insensitive to extreme coldness and thus can prevail over other boreal tree PFTs in the model in regions with extreme winters such as eastern Siberia.

Broadleaf tree mortality caused by spring frost
Broadleaf species have the specific property of being vulnerable to freezing events that occur after the spring leafout. Spring frost can cause damage to leaf buds, developing shoots and flowers, leading to reproductive failure and reduced peak growing-season leaf area index. These effects may result in a natural selection of species with a higher frost resistance and affect species distribution in the long term (Augspurger, 2009). Kollas et al. (2013) found that minimum temperature during bud-break was a better predictor of the climate space of seven broadleaf tree species in Europe than winter temperature or mean growing-season temperature.
The change of temperature variability projected by climate models (Cohen et al., 2012;Screen, 2014) may increase or alleviate the risk of spring frost damage. Warmer winters and springs and earlier leaf presence may lead to a greater exposure of midlatitude broadleaf species to spring frost events (Bokhorst et al., 2009;Gu et al., 2007), while the severity of individual cold spells may also decrease because of a faster warming of the Arctic compared to midlatitudes (Screen, 2014). DGVMs must therefore represent We added a frost damage limitation to the distribution of the two broadleaf deciduous tree PFTs (PFTs 6 and 8). After leaf-out in the model, if daily minimum temperature drops below a threshold of −3 • C ( Kollas et al., 2013), tree mortality is assumed to increase with decreasing temperature. This frost-induced mortality is multiplied by the period elapsed since leaf-out because the more time that has elapsed, the larger the mass of vulnerable foliage. Thus, during the consecutive 40 days after leaf-out, when T min < T SF,crit and t − t leaf-out < 40 days, M SF (t, T min ), the spring-frost-induced mortality for broadleaf deciduous trees in PFT 6 and PFT 8 (d −1 ), is given by where T SF,crit = −3 • C; and t leaf-out is the day of the year when leaf-out was simulated in the model.

Growing-season temperature limits to tree extension
In the version of ORCHIDEE described by Krinner et al. (2005), a warm season air temperature (T ws ) limit was set to exclude all tree PFTs from cold Arctic regions, with T ws being required to exceed 7 • C for trees to become established or be able to stay at a grid point. T ws was calculated using a linear relaxation method (a substitute for the running mean method to reduce computer memory requirement) given by: where t is the time step, 1 day; τ is the relaxation time of 60 days; and T daily is the daily mean air temperature.
In ORC-HL-OVD, used as a starting point for this study, this T ws criterion was removed. In ORC-HL-NVD, we reintroduced a growing-season temperature criterion to constrain tree extension to Arctic regions but modified the original formulation using recent results. In their global study of temperature controls on high altitude treelines, Körner et al. (2004) found a growing-season mean soil temperature of 6.7 ± 0.8 • C to be the most consistent criterion to predict treelines across different climate zones. Other predictors tested (growing-season length, thermal sums and thermal extremes) were shown to have too large amplitudes and were therefore less suitable indicators of the altitudinal treeline position (Körner et al., 2004). We assumed that the cold limits of trees at both high altitude and high latitude are similar, which is supported by the recent study of Randin et al. (2013), and thus used the Körner et al. (2004) empirical results to redefine the thermal constraint on the existence of trees (treeline) in ORCHIDEE.
Combining the same definition of growing season as Körner et al. (2004), i.e., the period during which 10 cm depth soil temperature exceeds 3.2 • C, with their linear relationship between soil temperature in the root zone and canopy air temperature, we prescribe the large-scale thermal limitation of trees in ORC-HL-NVD as follows: mean weekly air temperature during the growing season (T GS ) must exceed 7 • C, corresponding to T GS,root larger than 6.7 • C; the growing season is calculated as the period when weekly air temperature is greater than 0 • C, which corresponds closely to T root above 3.2 • C. The new T GS criterion shows more consistency with the current treeline positions than the earlier T ws criterion described by Krinner et al. (2005) (Fig. S2).

Modifying v cmax and j max
The values of the maximum rate of rubisco carboxylase (v cmax,opt ) and maximum rate of photosynthetic electron transport (j max,opt ) for each PFT were revised using the results of the ORCHIDEE parameter optimization against flux tower measurements from Kuppel (2012). Corresponding values are given in Table 1. In ORC-HL-OVD, v cmax (or j max ) is the product of v cmax,opt (or j max,opt ) and a leaf efficiency factor (e rel ), itself determined by relative leaf age (a rel ). a rel is defined as the ratio of the calculated leaf age since leaf-out considering four leaf cohorts to a PFTdependent leaf longevity (a crit in Table 1) (Krinner et al., 2005). As the value of a rel increases with time since t leaf-out , e rel increases from 0 to 1 quickly at the beginning of the growing season and then gradually decreases if a rel > 0.5 when leaves become senescent near the end of the growing season. This rule was originally implemented to simulate the influence of seasonal variation in leaf age on photosynthetic activity for all tree PFTs. However, unlike deciduous trees, temperate and boreal evergreen needleleaf trees can keep their needles for 4-6 consecutive years, or even longer for some species (Richardson et al., 2000), resulting in a rather constant leaf age. Thus, we removed the dependence of v cmax and j max on leaf age for temperate and boreal evergreen needleleaf trees (PFTs 4 and 7) in ORC-HL-NVD.

Simulation protocol
Six different runs with ORC-HL (Table 2) were performed to test the impact of the new dynamic vegetation parameterizations and parameter calibrations. Since the modifications in the vegetation dynamics module were mainly for temperate and boreal PFTs, the simulation domain is the Northern Hemisphere from 20 to 90 • N. All runs were conducted at 2 • resolution. The climate forcing files were from the 6-hourly CRU-NCEP data set (http://dods.extra.cea.fr/store/p529viov/ cruncep/V4_1901_2012/readme.htm), resampled from their  Each simulation was preceded by a spin-up from bare ground (i.e., fractional cover of PFT 1 equals to 1 everywhere). For the standard run with the new vegetation dynamics parameterizations (NEW), in spin-up, ORC-HL-NVD was forced by repeatedly using CRU-NCEP 1901-1920 climate data and constant preindustrial CO 2 concentration (285 ppm) for 250 years. Then the soil carbon submodel was driven by the previous outputs for 1000 years for the soil carbon pools to reach equilibrium; this was followed by another 50 years of ORC-HL-NVD to complete the spin-up. Each transient simulation from 1850 to 2010 was started from the last year of the spin-up, forced by historical CRU-NCEP climate and rising CO 2 concentration. No climate data were available before 1901, so for that period randomly selected years between 1901 and 1920 were used. The OLD run used the original vegetation dynamics equations from Krinner et al. (2005) in the ORC-HL version so that comparing NEW and OLD allows us to evaluate the improvements listed. The other four runs (EXP1-3, STAT) were similar to NEW except for one different setting for each run (Table 2). In EXP1, we deactivated soil freezing to test its impact on vegetation distribution. In EXP2, we used fixed CO 2 concentration at 285 ppm to test the sensitivity of vegetation distribution to rising CO 2 . In EXP3, the model spin-up was forced by the CRU-NCEP 1901-1920 average climatology instead of the 20-year cycle, in order to examine the impact of interannual climate variability on the initial PFT distribution after spin-up. In STAT runs, dynamic vegetation was deactivated and a fixed land-cover map was prescribed, in order to separate the effect of simulated versus observed PFT fractions on GPP, biomass and soil carbon. In STAT1 and STAT2, the PFT map was prescribed from ESA CCI land cover v1.1 (European Space Agency Climate Change Initiative; Bontemps et al., 2013, http://maps.elie.ucl.ac.be/CCI/viewer/index.php) and a synergetic land-cover product (SYNMAP; Jung et al., 2006), respectively.
Fires play an important role in determining vegetation patterns by preventing trees from achieving their climate potentials of height, biomass and fractional cover (Bond et al., 2005). Fire occurrence in ORC-HL is formulated using the fire model of Thonicke et al. (2001), based on litter quantity and moisture (Krinner et al., 2005). In this study, the fire module was activated in all the runs. But in a separate test, ORC-HL-NVD was run without the fire module. Compared to NEW, this simulation showed a small increase (5 %) in the total temperate and boreal forest area in the Northern Hemisphere (20-90 • N) without fire. Since a more sophisticated fire model, SPITFIRE (Thonicke et al., 2010), was implemented in the ORCHIDEE standard version (Yue et al., 2014), we compared the results of burned area simulated by the Thonicke et al. (2001) fire module (implemented in ORC-HL) with that simulated by SPITFIRE (implemented in OR-CHIDEE standard version). The similar results of the average annual burned area during 1981-2010 (2.7 Mkm 2 for the former and 2.1 Mkm 2 for the latter, in Northern Hemisphere forests) justify the use of the Thonicke et al. (2001) fire module in this study.
In this study, agriculture is excluded from all the dynamic runs in order to simulate the potential vegetation distribu- where V k,c,orig is the model-simulated fractional vegetation cover for PFT k (except C3 and C4 crops) and for grid cell c; V k,c is the fraction of PFT k for grid cell c, after postprocessing; and V crop,c is the observed fraction of cropland for grid cell c, in this study we use croplands estimated from the ESA land-cover map.
For total GPP and soil carbon stocks, since ORCHIDEE outputs the values per unit PFT which are multiplied by PFT fractions and summed up to derive the total amount, the results from dynamic runs were post-processed using the following equation (taking GPP as an example) to compare with observational data: where GPP k,c is GPP for natural PFT k and for grid cell c (g C m −2 yr −1 PFT −1 ), simulated by dynamic runs; GPP crop,c is GPP for crops (including C3 and C4) for grid cell c (g C m −2 yr −1 PFT −1 ), simulated by STAT1 (prescribed from the ESA map); GPP c is total GPP for grid cell c (g C m −2 yr −1 ), after post-processing; and n = 11, the number of natural PFTs.

Evaluation data sets
We use satellite observations of land cover translated into the PFTs of ORCHIDEE to evaluate the simulated vegetation distribution. In order to account for uncertainties in observation-based estimates, we used three different landcover maps: the ESA CCI land cover v1.1 for year 2010, GLC2000 (JRC, 2003), and ISLSCP II vegetation continuous field for 1992-1993 (DeFries and Hansen, 2009). The first two land-cover products (hereafter ESA and GLC) were converted from their original classifications (22 categories based on LCCS system) into PFT maps, using the cross-walking method of Poulter et al. (2011). The third product (hereafter VCF) provides the fractional cover of bare ground, herbaceous vegetation and forest (further split into evergreen or deciduous, and broadleaf or needleleaf), and was merged with climate zones of the Köppen-Geiger classification system to resolve to PFT classes, based on Poulter et al. (2011). For Siberia, two additional regional land-cover maps were used, the PFT map of Siberia at 1 km scale from Ottlé et al. (2013) based on the GlobCover2005 product (Bicheron et al., 2006), hereafter OSIB, and the Russian land-cover data set produced by the International Institute for Applied Systems Analysis (Schepaschenko et al., 2011), hereafter IIASA, which was converted into a PFT map using the cross-walking method of Poulter et al. (2011). Along with ESA, GLC and VCF, the five land-cover products were used to evaluate the model skill at simulating the vegetation distribution across Siberia. The PFT maps were aggregated at 2 • × 2 • , matching the resolution run by ORCHIDEE in this study. Figure 2 displays an RGB composite-color map of the vegetation fractional cover partitioned between broadleaf (including evergreen and deciduous, red), needleleaf evergreen (green), and needleleaf deciduous (blue) trees, from the five PFT maps.
Simulated GPP was evaluated using the data-derived field obtained from FLUXNET data, satellite fAPAR and gridded climate and land-cover data using a model tree ensemble (Jung et al., 2011), hereafter MTE. A recent forest carbon density map (Thurner et al., 2014) for Northern Hemisphere boreal and temperate forests (30-80 • N), derived from radar remote sensing of growing-stock volume (GSV), was used to evaluate modeled forest biomass. For soil carbon stocks, the simulated soil carbon density was compared with the Harmonized World Soil Database (HWSD; 0-1 m depth, FAO/IIASA/ISRIC/ISSCAS/JRC, 2012) and the Northern Circumpolar Soil Carbon Database (NCSCD; Hugelius et al., 2013). Since the model results for soil carbon are not fully comparable to NCSCD due to lack of peatland carbon accumulation and cryoturbation processes in ORC-HL, metrics were not applied to soil carbon for establishing a model score. All gridded observation-derived data were aggregated at 2 • × 2 • .
Apart from gridded data products based on satellite observations, independent forest inventory data at country/region level as compiled by Pan et al. (2011), including forest area and biomass, were also compared with model results.

Metrics for model evaluation
Different metrics can be used to quantify the agreement between model results and observations, including Pearson correlation, model-to-data deviation, mean error, and root mean square error (see Kelley et al., 2013;Cadule et al., 2010). However, most of these metrics do not consider observational uncertainty. When there are multiple observations available and no particular data set can be proved to be more accurate than others, which is the case for land cover, the choice of an observational data set for model evaluation may have a large influence on the model performance score. In order to quantify the agreement between simulated and observed fields, as well as to integrate the uncertainty of observations, a metric normalized by observational uncertainty (skill, S) was defined to evaluate model performances in terms of PFT fractional cover, GPP and forest biomass. For the following equations, M refers to the model results and O to observational data.

Metrics for PFT fractional abundance evaluation
For PFT fractions, a beta diversity metric (β) was used to calculate the disagreement between two different PFT maps, defined as the Euclidian distance of PFT classes (Poulter et al., 2011;Ottlé et al., 2013). For every grid cell c, beta diversity between model and observational data set i (β c,M_Oi ) was calculated as where V k,c,M is fractional abundance for PFT k and for grid cell c, simulated by model; V k,c,Oi is fractional abundance for PFT k and for grid cell c, from observational data set i; and n = 11 is the number of natural PFTs. Similarly, the disagreement between two observations was quantified using β c,Oi_Oj , as where V k,c,Oi and V k,c,Oj are fractional abundances from different observations i and j separately. β is bound to the interval [0, √ 2], with higher values representing larger discrepancies between two PFT maps. To take into consideration uncertainties of the different satellite landcover products (Sect. 3.2), we use the mean of the model versus all data sets normalized by the mean of all combinations between different data sets. In order to derive a bounded score, with higher values representing better model performance, the metric for the model skill at simulating vegetation distribution in every grid cell (S V,c ) was defined as where P is the number of all combinations between different data sets; Q is the number of data sets. If S V,c > 1 for both models, indicating that the observation-based estimates have too large uncertainties to be qualified for model evaluation, this grid cell c is left out. The S V,c of each grid cell was averaged over the Northern Hemisphere (20-90 • N) to get an overall score (S V ). In the calculation of S V , grid cells where mean β c,O_O is higher than mean β c,M_O for both models (S V,c > 1) were excluded, because in these pixels the uncertainties in the observational data are too large to qualify them for model evaluation -the choice of data set might significantly alter the model evaluation result. Grid cells where both model and data sets have 100 % bare ground (Sahara and Greenland) and grid cells with a crop fraction higher than 0.5 were masked out (18 % of the total number of land points in that part of the Northern Hemisphere included in the study). The same rules were also applied to the calculation of regional average β c,M_O and β c,O_O .
To analyze the improvement of NEW over OLD for different PFTs, a dissimilarity index (D) was also calculated for groups of PFTs: broadleaf evergreen (PFTs 2 and 5), broadleaf deciduous (PFTs 3, 6 and 8), needleleaf evergreen (PFTs 4 and 7), needleleaf deciduous (PFT 9), and total tree and grass (PFTs 10 and 11). For each PFT group and grid cell c, D group,c was defined as the absolute bias in fractional cover between two maps: where V group,c,M is fractional abundance for PFT group and for grid cell c, simulated by the model; and V group,c,Oi and V group,c,Oj are fractional abundances from different observations i and j separately. The average D group,M_O and D group,O_O were calculated over the studied region, in which the grid cells where the corresponding group does not exist in any of the models or observations were excluded. In practice, we set a threshold of 0.01 to determine the existence of each group. We did not use the β equation here after regrouping PFTs (e.g., needleleaf deciduous versus non-needleleaf deciduous, so that there are only two PFTs in the β equation), because in that case the average β group,M_O (or β group,O_O ) for the Northern Hemisphere (20-90 • N) would be too optimistic, considering that many of the pixels will be equal to 0 due to the limited distribution range of the corresponding group. A detailed justification for the use of β and D can be found in the Supplement.
Geosci. Model Dev., 8, 2263-2283 Figure 3. Beta diversity (β) between the three observational data sets (ESA, GLC and VCF) (left panel), and mean dissimilarity index (D) among them for different PFT groups (right panel). β ranges from 0 to 1.4, and D ranges from 0 to 1, both with higher values representing larger disagreement. Figure 3 shows the spatial pattern of β between the three observational data sets (ESA, GLC and VCF) and mean D among them for different PFT groups. The β values between different data sets show a higher agreement for ESA versus GLC (an average β of 0.25) and lower agreement for VCF versus ESA or GLC (average β of 0.37 and 0.35 respectively). ESA and GLC legends are based on the FAO Land Cover Classification System (LCCS), while in VCF the original 1 km continuous field data (DeFries et al., 2000), in which the forest fractional area is given for each grid cell instead of a discrete classification scheme, was aggregated to 0.5 • resolution for ISLSCP II under the guidance of IGBP (International Geosphere-Biosphere Programme), by DeFries and Hansen (2009). LCCS uses a low threshold (15 %) of tree cover for forest definition, whereas IGBP uses a threshold of 60 % (Poulter et al., 2011), resulting in relatively lower tree cover in VCF than in either ESA or GLC land-cover maps.
For the PFT groups, higher D values were found for grassland, indicating significant uncertainty in observed grassland fractions. The difference may come from uncertainties in the remotely sensed land-cover products, as well as from uncertainty in the reclassification of land-cover classes into PFT categories. The overlap of broadly defined arid-land classifications (i.e., grassland, shrubland, barren) of land-cover products can introduce errors in partitioning between trees, grass and bare land, in deserts and tundra regions (Poulter et al., 2011).

Metrics for GPP and forest biomass evaluation
GPP and forest biomass were evaluated using gridded observational data containing uncertainty estimates. The metric for model performances was defined as where S G,c /S B,c is model skill at simulating GPP or forest biomass for grid cell c; X c,M is GPP or forest biomass for grid cell c, simulated by model; X c,O is GPP or forest biomass for grid cell c, from observation; and σ O is the standard deviation of the observation.
In grid cells where |X c,M − X c,O | < σ O , indicating a model-data difference within the uncertainty of the observational data, S G,c or S B,c is set to 1. The S G,c or S B,c of each grid cell were averaged over the Northern Hemisphere to get an overall score (S G or S B ).

Northern Hemisphere vegetation distribution
The present-day vegetation distributions simulated by OLD and NEW are shown in Fig. 4 as RGB composite color maps as for Fig. 2. Fractional covers for each PFT are shown in Fig. S3. Compared with OLD, NEW introduces two major improvements to the results. First, the tree distribution in cold subarctic regions has a northern boundary consistent with observations, mostly due to the introduction of a growing season temperature constraint (Sect. 2.2.2). Second, the observed dominance of needleleaf evergreen trees over broadleaf deciduous trees in northern Europe and North America is reproduced by NEW and not by OLD, an improvement mainly due to the introduction of the spring frost limitation for broadleaf deciduous trees (Eq. 4) and the removal of the v cmax (and j max ) leaf-age dependency for evergreen needleleaf trees (Sect. 2.2.3). Figure 5 displays the spatial pattern of β index for OLD, NEW and different satellite land-cover products. Compared with OLD, the NEW results significantly reduce β in the boreal forests of Canada, western Siberia and northern Europe, consistent with results shown in Fig. 4. The disagreement is also reduced in pan-arctic tundra regions, after correction of the unrealistically high fraction of trees in these regions originally present in OLD. The average β over the Northern Hemisphere land surface (20-90 • N, excluding bare ground and agricultural grid cells) for NEW versus ESA, GLC and VCF are 0.56, 0.48 and 0.47 respectively, equivalent to a 3.5, 13 and 28 % reduction (i.e., improvement) compared with OLD. The large variation of β for different observations shows the importance of accounting for uncertainty in observation-based estimates of land cover in DGVM evaluations, because the arbitrary choice of a specific land-cover product may result in quite different scores.
Accounting for uncertainty in observed PFT distributions, the model skill at simulating the vegetation distribution (S V ) is shown in Fig. 6 for OLD and NEW. The average S V for the major Northern Hemisphere forested countries or regions are listed in Table 3, showing improvement in all countries/regions. Larger improvements of NEW over OLD are found in European Russia (42 %), Asian Russia (29 %) and Canada (33 %). The overall S V for the Northern Hemisphere is 0.72 in NEW compared to 0.61 in OLD, equivalent to 18 % improvement. In OLD, 13 % of the land grid cells have a β c,M_O value of less than the uncertainty between different satellite products (β c,O_O ); in NEW, this fraction increases to 27 %. The resolution dependency of S V was tested by conducting two additional runs similar to OLD and NEW except for a 1 • × 1 • resolution (Fig. S4 in Supplement). These showed robust results for S V .
The forest areas simulated by the dynamic simulations and estimated from the land-cover products were aggregated to country level and compared with independent forest area from national forest inventories (Pan et al., 2011) (Table 4). In OLD, forest areas are systematically overestimated, especially for Asian Russia and Canada. The bias is decreased in NEW, for which most of the differences are less than 30 % except for an overestimation in Canada (50 %). This overestimation is, however, within the differences between the three land-cover products and the forest inventory data at country scale (Table 4). Forest areas estimated by VCF are systematically lower than inventory data, due to the difference in forest definition mentioned previously. The largest underestimation of VCF occurs in Asian Russia, where the vast taigatundra transition zones with relatively sparse trees make the definition-related biases more prominent.

Distribution of specific groups of plant functional types
For the different PFT groups described in Sect. 3.3.1, the Northern Hemisphere average dissimilarity index (D) is plotted in Fig. 7 for OLD and NEW versus observations, as well as between different observational data sets. For the broadleaf evergreen group, D is small for both OLD and NEW, and similar to the uncertainty in the data, because the broadleaf evergreen fraction is smaller than other tree PFT groups in temperate and cold zones. For the broadleaf deciduous, needleleaf evergreen and needleleaf deciduous groups, the average D for NEW versus the three data sets is reduced (i.e., improved) by 53, 13 and 67 %, respectively, compared with OLD. The OLD overestimation of broadleaf deciduous area in Canada, Scandinavia and European Russia is corrected in NEW (Fig. 8b). The large underestimation of needleleaf evergreen in OLD is partly corrected in NEW, but a significant underestimation of the needleleaf evergreen coverage still exists in southern Siberia and western Canada (Fig. 8c). For needleleaf deciduous, the unrealistically high fractions in subarctic regions in OLD are cor-  rected in NEW, but needleleaf deciduous fractions in southern Siberia and Canada are still higher than observations, at the cost of needleleaf evergreen (Fig. 8d). A strong disagreement between simulated and observed grassland fractions persists in NEW (average D of 0.35), but the data-data comparison also shows significant discrepancy (average D of 0.19) (Figs. 7, 3). Since there are no specific shrubland and tundra PFTs in ORCHIDEE, the NEW simulation has high fractions of C3 grass (PFT 10) in both arid and cold areas, including subarctic regions, the western USA and the middle of Eurasia (Fig. 8e). The average D for the grass fraction between OLD and observed land-cover maps is 0.27, lower than NEW, because the overestimations of tree cover in OLD decrease the distribution ranges of grassland, leading to a relatively higher agreement with observations for grassland cover than NEW. Table 4. Forest areas (Mkm 2 ) for different countries/regions simulated by models (OLD and NEW) and estimated from land-cover products (ESA, GLC, VCF), in comparison with those from Pan et al. (2011). The relative differences compared to Pan et al. (2011)   . Dissimilarity index (D, Eq. 11) for fractional cover of PFT groups including total tree, grass, and four tree subtypes between model (OLD, blue, and NEW, red) and observations, and between different observations (black), averaged over Northern Hemisphere (20-90 • N). D ranges from 0 to1, with higher values representing larger disagreement.

Case study for Siberia, using regional land-cover data sets
For Siberia, the same metrics were calculated based on five observational data sets (ESA, GLC, VCF, OSIB and IIASA). As shown in Fig. 9a, the average β for NEW versus all data sets is significantly reduced compared to OLD along all longitudes, with a larger reduction (improvement) in central Siberia and the most eastern part of Russia. The average values of β over Siberia for NEW versus ESA, GLC, VCF, OSIB and IIASA are 0.59, 0.46, 0.38, 0.35 and 0.41, respectively, equivalent to 0, 10, 51, 45 and 26 % reduction compared with OLD, respectively. The average β between different data sets is 0.37, with larger β between ESA and VCF (0.50) and between GLC and VCF (0.47), and smaller β for GLC and IIASA (0.23), VCF and OSIB (0.28), and ESA and GLC (0.29). OSIB and VCF both have lower fractions of tree PFTs than the other three maps. In particular, the needleleaf deciduous fractions in OSIB and VCF for the densest forest areas are less than 0.65, while other maps can reach 0.85. The model skill (S V ) that integrates observational uncertainty for Siberia is shown in Fig. 9b (OLD) and 9C (NEW). The average S V for Siberia is 0.87 in NEW compared to 0.65 in OLD, equivalent to 32 % improvement. In OLD, 11 % of the Siberian grid cells have a β c,M_O value of less than the uncertainty between different satellite products (β c,O_O ); in NEW, this fraction increases to 40 %.

Gross primary productivity
The latitudinal pattern of annual GPP averaged for 1999-2008 from OLD and NEW is shown in Fig. 10, compared with STAT1 and STAT2 (prescribing ESA and SYNMAP land cover) and with the data-driven MTE GPP (Jung et al., 2011). For total GPP in the Northern Hemisphere (20-90 • N), the 10-year average annual GPP simulated by NEW is 45.4 P g yr −1 , close to OLD (42.6 P g yr −1 ) and MTE (42.2 ± 2.4 P g yr −1 ). As for the static runs, total GPP in STAT1 is 35.9 P g yr −1 , smaller than MTE. Since MTE by Geosci. Model Dev., 8, 2263-2283, 2015 www.geosci-model-dev.net/8/2263/2015/  Jung et al. (2011) was based on SYNMAP land-cover data (Jung et al., 2006) to describe the vegetation at FLUXNET sites, STAT2 has a GPP (42.3 P g yr −1 ) closer to MTE. The difference between STAT1 and STAT2 shows that the choice of land-cover map makes a strong impact on modeled GPP. Compared with ESA, SYNMAP has a larger forest area (29 versus 22 Mkm 2 ) and similar grassland area (∼ 11 Mkm 2 ) for the Northern Hemisphere, explaining its larger GPP.
The spatial patterns of GPP simulated by OLD and NEW are similar (Figs. 10, 11a). Compared with MTE, both NEW and OLD overestimate GPP in eastern USA, western Europe and southern Asia and underestimate GPP in middle and eastern Siberia (Fig. 11a), indicating that the similarity in total Northern Hemisphere GPP between NEW and MTE masks compensating regional biases. The STAT1 and STAT2 runs produce very similar patterns of GPP to those from NEW (not shown), suggesting that the regional bias of GPP in ORCHIDEE is not related to the modeled PFT distribution, but to other non-modeled factors such as nitrogen interactions.
The model skill at simulating annual GPP (S G ) averaged over different countries is given in Table 3. The average S G for the Northern Hemisphere in OLD and NEW are similar (∼ 0.6). The improvement in vegetation distribution in NEW does not lead to a significant improvement of GPP, probably because simulated GPP in the same grid cells for high latitudes has only a weak dependence on the modeled PFT. For example, in Canada and northern Europe needleleaf ev-ergreen trees (PFT 7) are dominant in NEW but broadleaf deciduous trees (PFT 8) are dominant in OLD, the GPP differences between these two PFTs are less than 1.5 g C m −2 yr −1 per PFT (or 25 %), explaining why different modeled PFT fractions in this region do not result into large differences in GPP. This result means that GPP is not a discriminant variable for evaluating the performance of a vegetation dynamics module at high latitudes.

Forest biomass
The country-level forest biomass (above-and belowground) simulated by OLD, NEW and the two static runs with prescribed PFT maps were compared with forest inventory data from Pan et al. (2011) (Table 5). The satellite-based spatially explicit forest biomass estimates from Thurner et al. (2014) over temperate and boreal forests at 30-80 • N were also aggregated to country level, showing generally good agreement with the data from Pan et al. The results in NEW are lower than the inventory for all countries, with the largest underestimation by 61 % in Asian Russia. OLD gives a higher total forest biomass in Asian Russia, but the biomass density of OLD and NEW are similar (∼ 2.4 kg C m −2 forest) and both lower than Pan et al. (4.1 kg C m −2 forest). The large overestimation of biomass in Canada by OLD is reduced in NEW, due to both reductions in forest area (Table 4, Table 5. Forest biomass (Pg C) for different countries/regions simulated by models (OLD, NEW and two static runs) and estimated from Thurner et al. (2014), in comparison with that from Pan et al. (2011). STAT1 and STAT2 prescribe different PFT maps, ESA and SYNMAP. The relative differences compared to Pan et al. (2011) are given in parentheses.  (Table 4), the small underestimation (6 %) in total biomass results from a negative bias in biomass density simulation in the model. It is notable, however, that the biomass density in Canada estimated by Thurner et al. (3.7 kg C m −2 forest) is also significantly lower than that given by Pan et al. (2011) (6.1 kg C m −2 forest). In order to separate the bias of simulated biomass density from the bias of modeled tree cover, the spatial distributions  Figure 10. Latitudinal mean annual GPP (2 • bands) during 1999-2008 from OLD (blue) and NEW (red), compared with that from STAT (static run in which ORC-VD is deactivated, green dashed lines) and MTE (Jung et al., 2011;black). In STAT1 and STAT2, the PFT map is prescribed from ESA and SYNMAP, respectively. of forest biomass per unit forest area (kg C m −2 forest) simulated by OLD and NEW are shown in Fig. 11b and compared with the satellite-based estimates by Thurner et al. (2014). The original overestimation in eastern Canada, northern Europe and European Russia by OLD is improved in NEW, although underestimation in western Canada and Siberia still exists in NEW. Biomass at equilibrium is positively correlated with both NPP (net primary production) and turnover time of carbon in biomass pools. Natural disturbances and forest management can thus lower biomass by reducing the turnover time (Jandl et al., 2007;Litton et al., 2004). Since older forests store more biomass carbon than younger forests Luyssaert et al., 2008), managed and frequently burned forests may not be able to reach their climatedependent maximum biomass.

Asian Russia
In order to diagnose the possible causes of the biomass deviation from data, the ratio of forest biomass from NEW to that from Thurner et al., as well as the ratio of forest NPP (average during 2001-2010) from NEW to MODIS-NPP (NTSG), is plotted in Fig. S6. In eastern Canada, forest biomass is overestimated by NEW, while NPP is close to MODIS NPP, indicating an overestimation of biomass car- bon turnover time in ORCHIDEE compared to reality. In western Canada and southern Siberia, the underestimation of biomass is attributable to underestimation of NPP.
The model skill at simulating forest biomass (S B ) averaged over different countries is given in Table 3. S B is improved in NEW for all countries compared to OLD, with the largest improvement found in Canada (66 %). The overall S B for 30-80 • N is 0.59 in NEW, compared to 0.46 in OLD, equivalent to 28 % improvement.

Soil carbon
The spatial patterns of soil carbon density simulated by OLD and NEW (0-2 m depth) are shown in Fig. 11c, compared with those from HWSD (0-1 m depth; FAO/IIASA/ISRIC/ISSCAS/JRC, 2012) and NCSCD (0-1 m depth; Hugelius et al., 2013). Over the grid cells present in NCSCD, the total soil carbon is 285 Pg in HWSD, markedly lower than that in NCSCD (460 Pg C for the upper meter of soil), indicating large uncertainties in the empirical soil carbon data. Since the ORC-HL in this study does not include processes of peatland and wetland carbon accumulation, whereas in NCSCD the peat deposits contain about 30 % of the total soil organic carbon mass in the upper meter (Tarnocai et al., 2009) and wetland carbon stock is estimated to account for 20 % of the total 1 m deep soil organic carbon pool in Russia (Schepaschenko et al., 2013), the model results are not fully comparable to NCSCD. The spatial patterns of soil carbon from OLD and NEW are similar (Fig. 11c). Over the grid cells present in NCSCD, the total soil carbon simulated by OLD and NEW is 263 and 283 Pg C, respectively.
A comparison of soil carbon simulations from several land-surface models coupled with climate models in CMIP5 (Todd-Brown et al., 2013) suggested that most models cannot reproduce grid-scale variation in soil carbon and that the substantial disagreement between the HWSD and NCSCD data sets, and their lack of quantitative uncertainty estimates, limit their ability for benchmarking land carbon models.

Soil freezing
The area of seasonally frozen ground covers 50 % of the Northern Hemisphere land, or 48 Mkm 2 (Zhang et al., 2003). Soil freezing limits plant access to soil moisture and thus impacts the simulated PFT distribution through a set of complex interactions between productivity, tree-grass competition, and soil water limitations. In permafrost regions, the limitation of growing-season water availability due to soil freeze-thaw processes was shown to substantially contribute to the low vegetation carbon densities (Beer et al., 2007). In ORCHIDEE, a soil heat diffusion equation with latent heat (Gouttevin et al., 2012) is solved for each soil layer that impacts soil temperature and liquid water content. In this study, we tested the effects of soil freezing on the vegetation distribution by comparing NEW and EXP1 in which soil freezing processes were not activated (all other parameters being the same). In EXP1, soil temperature can drop below 0 • , but liquid water continues to be available in the root zone irrespective of soil temperature conditions. Figure 12 shows the difference in tree fraction and in water availability (WA) during the growing season (May-September) between NEW and EXP1. In the model, soil moisture available to plants is defined by WA, the relative soil moisture in the root zone, a b difference in tree fractional cover (NEW-EXP1) difference in water availability (NEW-EXP1) Figure 12. Difference of tree fractional cover (a) and water availability (WA) (b) between with and without soil freezing (NEW-EXP1). WA is averaged over the growing season (May-September) and over tree PFTs (PFTs 2-9) weighted by their fractions.
weighted by PFT-specific root profiles. A value of WA = 0 defines the wilting point, and WA = 1 the field capacity. A stress factor is applied to stomatal conductance and canopy photosynthesis if WA drops below a critical value of 0.4, and this stress factor increases linearly for 0 < WA ≤ 0.4 (Krinner et al., 2005). When soil freezes in autumn and winter, the amount of liquid water in the root zone is reduced as water is immobilized as ice in soil pores. In the growing season, WA in NEW is also lower than that in EXP1 (Fig. 12b). This is consistent with previous results of model validation at site scale (Gouttevin et al., 2012) in which the upper layer (0-20 cm) soil moisture in summer was found to be more depleted if the soil freezing module was activated. In regions underlain by permafrost, there is a spring peak in runoff originating from meltwater which does not infiltrate into frozen soils (Gouttevin et al., 2012). If soil freezing is not modeled as in EXP1, meltwater will infiltrate into soil, leading to overestimated soil water content in the growing season. The reduction of tree fraction in the presence of freezing occurs where there is significant reduction of WA (Fig. 12a). In areas with a small reduction (less than 0.1) in WA, however, there is a slight increase in tree fraction. The tree fraction in the model equals to population density multiplied by individual crown area. On the one hand, as WA decreases, GPP, LAI and crown area are smaller; yet, on the other hand, reduced LAI leads to increased available space for establishment, resulting in a subsequent increase in population density, compensating for the loss of crown area. Therefore, reductions in WA may lead to inconsistent changes in tree fraction, depending on their relative effects on crown area and population density.

Changing CO 2 since 1850
Terrestrial plants respond to elevated atmospheric CO 2 concentration by increasing assimilation rate and reducing diffu-b a c difference in tree fractional cover (NEW-EXP2) difference in tree NPP (gC m -2 d -1 ) (NEW-EXP2) difference in water availability (NEW-EXP2) Figure 13. Difference of tree fractional cover (a), tree NPP (g C m −2 d −1 ) (b) and WA (c) between with and without CO 2 rising (NEW-EXP2). NPP is averaged over tree PFTs (PFTs 2-9) weighted by their fractions. WA is averaged over the growing season (May-September) and over tree PFTs (PFTs 2-9) weighted by their fractions.
sive stomatal conductance (Lammertsma et al., 2011), both processes are included in ORCHIDEE (Krinner et al., 2005). Under elevated CO 2 concentration, the enhanced photosynthetic capacity and thus increased NPP of forest (Norby et al., 2005;Hickler et al., 2008) leads to higher growth efficiency of trees and thus higher tree fractional coverage. In the model, tree PFTs are superior to grass PFTs in terms of light competition, i.e., when trees expand, grass PFTs will give way to trees. Therefore, tree cover is expected to increase at the cost of grasslands under elevated CO 2 . Here we conducted a sensitivity test (EXP2) with fixed preindustrial CO 2 concentration (285 ppm). Compared with EXP2, the simulation NEW forced by historical CO 2 concentration produces higher tree fractions (Fig. 13a) by 2010, the spatial pattern of which mirrors the pattern of tree NPP increase (Fig. 13b) in the model. In NEW, total temperate and boreal forest area in the studied region (20-90 • N) are modeled to increase by 2.6 Mkm 2 (11.5 %) from 1850 to 2010. In EXP2 the increase is only 1.1 Mkm 2 (4.8 %) indicating that about 58 % of the increase in forest area is attributable to the historical increase of CO 2 , the rest being attributable to climate warming (longer growing seasons) and changes in rainfall.
Since the processes of CO 2 uptake by photosynthesis and water loss by transpiration are tightly coupled, increasing CO 2 concentration results in increased water use efficiency (Lammertsma et al., 2011;O'ishi et al., 2009). Figure 13c displays the difference of WA for trees between NEW and EXP2. Compared to the fixed-CO 2 simulation, NEW pro-Geosci. Model Dev., 8, 2263-2283, 2015 www.geosci-model-dev.net/8/2263/2015/ duces higher WA by ∼ 5 % in mesic regions such as Europe, western Siberia and the eastern part of North America, and similar WA in drier regions such as middle and eastern Siberia and the western part of North America.

Effects of the return frequency and severity of extreme cold events during the spin-up
As mentioned in Sect. 2.2.1, the distribution range of tree PFTs in ORCHIDEE is influenced by extremely cold days in winter, which vary from year to year. When the PFTdependent threshold T min,crit (Table 1) is applied (Eq. 3), this mechanism results in a considerable difference in modeled tree fraction between the results of a spin-up forced by cycling multi-year climatic data versus an average climatology.
In EXP3, the model spin-up used the average climatology of the period 1901-1920 from CRU-NCEP and was compared with NEW where interannually variable climate from years 1901-1920 was repeated in a loop. The minimum temperature in winter (T min ) derived from the climatology is significantly higher than T min considering the 20 individual years (Fig. 14a). Since the intra-annual variations among different years are not synchronous, a low temperature of a day in 1 year is offset by a higher temperature of the same day during another year; this leads to a milder climate in the climatology. The vegetation distributions after spin-up are very different between NEW and EXP3, as shown in Fig. 14. In EXP3, temperate trees (PFTs 4-6) can extend northward, taking up the boreal tree positions, while the distribution of boreal needleleaf evergreen (PFT 7) and broadleaf deciduous (PFT 8) trees is squeezed to the climatic range of needleleaf deciduous trees (PFT 9). Compared with the initial state after spin-up in NEW, total forest area in the studied region (20-90 • N) in EXP3 increases by 5.1 Mkm 2 (22 %), among which PFTs 4-6 increase by 2.7 Mkm 2 , PFTs 7 and 8 increase by 6.3 Mkm 2 , and PFT 9 decreases by 3.9 Mkm 2 . Apart from average climatology, recycled single-year climate is occasionally used in spin-up phase, which can also lead to large variance in initial vegetation distribution after spin-up due to interannual climate variability. Figure 14d shows the considerable difference in the fraction of PFTs 7 and 8 between two spin-ups forced by 2 different single years arbitrarily chosen (1914 and 1901). Similar results were obtained when three sets of forcings (1-year, climatological mean, and cycling of the whole period 1960-1999) were used in the spin-up process of CLM-DGVM (Li et al., 2011). Since climatology or recycled 1-year climatic data are sometimes used in the spinup of land-surface models, it is notable that this may bias DGVMs to produce unrealistic or unstable results if vegetation distribution is sensitive to extreme temperatures in the model. Thus, it is more appropriate to cycle multi-year climatic data to force DGVMs in a spin-up.  Figure 14. (a) minimum temperature (T min ) isotherms calculated from the 20-year average climatology (red lines) and the mean of the 20 T min for each year (green lines). The T min values are labeled on the lines, corresponding to the PFT-dependent T min,crit for temperate and boreal trees (see Table 1). (b, c) difference of the vegetation fractional cover for the last year of spin-up between EXP3 (using 20-year climatology as forcing file in spin-up) and NEW for temperate trees (PFTs 4-6) (b) and boreal broadleaf deciduous/needleleaf evergreen trees (PFTs 7-8) (c). (d) difference in fraction of PFTs 7-8 between spin-up results forced by climatic data of 2 different single years (1914 and 1901).

Conclusions
This study has presented an improved parameterization and a calibration of Northern Hemisphere vegetation dynamics in the ORCHIDEE process-based ecosystem model, based on a version that includes frozen soil moisture and its impacts on plant productivity. Keeping the original model's concept of plant functional types, we modified the processes that influence tree existence, mortality and competition. We are aware that most of these modifications are based on empirical bioclimatic constraints; if a more mechanistic understanding of mortality could be determined in the future, especially regarding physiological pathways in cold temperatures, these bioclimatic constraints might be replaced by sufficient physiological processes to enable the model to more realistically simulate vegetation distribution. A new performance metric applicable for DGVM evaluation in terms of vegetation fractional cover was used to evaluate ORCHIDEE, which integrates uncertainties in different land-cover maps. The new version of the ORCHIDEE vegetation dynamics module shows marked improvement in the simulated PFT distribution compared to the previous version. A more realistic simulation of the northern tree limit is obtained, as well as of the distribution of evergreen and deciduous conifers in the boreal zone. The model still overestimates grass fraction in dry regions of central Asia and western North America, possibly because of the lack of a specific shrubland PFT. Grass fraction was also overestimated in the Arctic tundra. Considering the large coverage of shrubland and tundra in northern middle and high latitudes, a proper representation of shrub and tundra plant functional types in DGVMs, as well as their biophysical and biogeochemical processes, should be a priority for future development. The better PFT distribution results in improvements in simulated forest biomass, while significant regional biases still remain for GPP, forest biomass and soil carbon distributions, indicating other structural biases in the carbon cycle parameterizations in the model. Incorporating the PFT trait variation into DGVMs, which allows for more variation in vegetation responses to climate in the model than fixed traits, might be an interesting future development to improve the modeled vegetation dynamics and carbon cycle.
The Supplement related to this article is available online at doi:10.5194/gmd-8-2263-2015-supplement.