Modelling the role of fires in the terrestrial carbon balance by incorporating SPITFIRE into the global vegetation model ORCHIDEE – Part 1: simulating historical global burned area and fire regimes

Abstract. Fire is an important global ecological process that influences the distribution of biomes, with consequences for carbon, water, and energy budgets. Therefore it is impossible to appropriately model the history and future of the terrestrial ecosystems and the climate system without including fire. This study incorporates the process-based prognostic fire module SPITFIRE into the global vegetation model ORCHIDEE, which was then used to simulate burned area over the 20th century. Special attention was paid to the evaluation of other fire regime indicators such as seasonality, fire size and fire length, next to burned area. For 2001–2006, the simulated global spatial extent of fire agrees well with that given by satellite-derived burned area data sets (L3JRC, GLOBCARBON, GFED3.1), and 76–92% of the global burned area is simulated as collocated between the model and observation, depending on which data set is used for comparison. The simulated global mean annual burned area is 346 Mha yr−1, which falls within the range of 287–384 Mha yr−1 as given by the three observation data sets; and is close to the 344 Mha yr−1 by the GFED3.1 data when crop fires are excluded. The simulated long-term trend and variation of burned area agree best with the observation data in regions where fire is mainly driven by climate variation, such as boreal Russia (1930–2009), along with Canada and US Alaska (1950–2009). At the global scale, the simulated decadal fire variation over the 20th century is only in moderate agreement with the historical reconstruction, possibly because of the uncertainties of past estimates, and because land-use change fires and fire suppression are not explicitly included in the model. Over the globe, the size of large fires (the 95th quantile fire size) is underestimated by the model for the regions of high fire frequency, compared with fire patch data as reconstructed from MODIS 500 m burned area data. Two case studies of fire size distribution in Canada and US Alaska, and southern Africa indicate that both number and size of large fires are underestimated, which could be related with short fire patch length and low daily fire size. Future efforts should be directed towards building consistent spatial observation data sets for key parameters of the model in order to constrain the model error at each key step of the fire modelling.


Introduction
Fire is an important process in the Earth system, that existed long before the large-scale appropriation of natural ecosystems by humans (Bowman et al., 2009;Daniau et al., 2013). Fires have multiple biophysical and ecological consequences, and they are also an important source of atmospheric trace gases and aerosol particles (Langmann et al., 2009;van der Werf et al., 2010). By damaging some plant types and concurrently promoting others, fires play an important role in shaping vegetation structure and function (Bond et al., 2005;Pausas and Keeley, 2009). Fire changes the surface albedo, aerodynamic roughness, and the sensible and latent heat fluxes (Liu et al., 2005;Liu and Randerson, 2008). These fire-induced ecosystem changes could further influence the surface energy budget and boundary-layer climate (Beck et al., 2011;Randerson et al., 2006;Rogers et al., 2013). In addition, gas and aerosol species emitted to the atmosphere from biomass burning modify atmospheric composition and the radiative forcing balance (Tosca et al., 2013;Ward et al., 2012). Fire aerosols also degrade air quality and cause increased health risk (Marlier et al., 2013). Thus fire process and biomass burning emissions need to be included in the Earth system models, which are often used to investigate the role of fire in past, present and future biophysical and biogeochemical processes.
The type of fire model embedded in global vegetation models has evolved from simple fire hazard models (Thonicke et al., 2001) to the current state-of-the-art process-based fire models (Andela et al., 2013;Kloster et al., 2010;Lasslop et al., 2014;Li et al., 2013;Pfeiffer et al., 2013;Prentice et al., 2011) and empirical models with optimisation by observation (Knorr et al., 2014;Le Page et al., 2014). The majority of fire models explicitly simulate ignitions from natural and human sources, fire propagation, fuel combustion and vegetation mortality, ideally at daily or even finer time steps. Recently, Pfeiffer et al. (2013) have incorporated multi-day burning, "coalescence" of fires and interannual lightning variability in the LPJ-LMfire model. Li et al. (2013) incorporated social and economic factors in the human ignition scheme. Lasslop et al. (2014) investigated the model sensitivity to the climate forcing and a number of spatial parameters. Evaluation of fire models in these studies at the global scale has mainly focused on models' capability in broadly reproducing the large-scale distribution of fire activity during the past decade as revealed by satellite observations. Less attention was paid to the simulation of long-term historical fire trend and variation, and fire regimes, which may include the number, size and intensity of fires -essential variables governing fire-climate-vegetation feedbacks (Archibald et al., 2013;Barrett et al., 2011;Hoffmann et al., 2012).
It is well known that across all fire-prone ecosystems, the magnitude and trend of burned area depend strongly on large fire events that represent only a low fraction in total number of fires (Kasischke and Turetsky, 2006;Keeley et al., 1999;Stocks et al., 2002). These large fires have profound impacts on landscape heterogeneity (Schoennagel et al., 2008;Turner et al., 1994), biological diversity (Burton et al., 2008) and may also induce a higher rate of carbon emissions compared with small fires (Kasischke and Hoy, 2012). Besides, the social and economic consequences of extreme large fires are more severe (Richardson et al., 2012). In some ecosystems, past climate warming was shown to have increased the occurrence of large fires Westerling et al., 2006), and fire regimes were projected to even further deviate from historical range given the future climate change (Pueyo, 2007;Westerling et al., 2011). Given the importance of these large fires, it is essential that we should evaluate the ability of fire models to simulate their occurrence.
In this study, we have incorporated the SPITFIRE fire model (Thonicke et al., 2010) into the global land surface model ORCHIDEE (Krinner et al., 2005). This allowed us to simulate global fire activity during the 20th century, and to perform an in-depth model evaluation. In present study, we focus on evaluating the ORCHIDEE-SPITFIRE model performance in simulating fire behaviours and regimes, including ignitions, fire spread rate, fire patch length, fire size distribution, fire season and burned area. Quantification of fire carbon emission as a component of the terrestrial carbon balance will be presented in a companion paper (Part 2). Specifically, the objectives of the present study are: (1) to evaluate simulated burned area using multiple data sets including that from satellite observation, government fire agency, and historical reconstruction over the 20th century (Sects. 3.1, 3.2, 3.3, 3.4. 3.5); (2) to compare simulated fire size distribution with observations, in order to investigate especially the model's ability to simulate large fire occurrence (Sect. 3.6); and (3) to examine potential sources of model error in order to identify future research need and potential model improvements (Sect. 4.2).

Model description
The processes and equations of the fire model SPITFIRE, as described by Thonicke et al. (2010), were implemented in the vegetation model ORCHIDEE (Krinner et al., 2005). The SPITFIRE model operates on a daily time step, consistent Geosci. Model Dev., 7, 2747-2767, 2014 www.geosci-model-dev.net/7/2747/2014/ with the STOMATE sub-module in ORCHIDEE, which simulates vegetation carbon cycle processes (photosynthates allocation, litterfall, litter and soil carbon decomposition). The major processes within SPITFIRE are briefly described below with applicable minor modifications (see Thonicke et al., 2010 for more detailed description).

Daily potential ignition
Daily potential ignition includes ignitions from lightning and human activity. Remotely sensed lightning flashes (Cecil et al., 2012) were obtained from the High Resolution Monthly Climatology of lightning flashes by the Lightning Imaging Sensor-Optical Transient Detector (LIS/OTD) (http://gcmd. nasa.gov/records/GCMD_lohrmc.html). The LIS/OTD data set provides annual mean flashes over 1995-2000 on a 0.5 • grid at monthly time step. This single annual data was repeated each year throughout the simulation. Following Prentice et al. (2011), the proportion of lightning flashes that reach the ground with sufficient energy to ignite a fire is set as 0.03. This value differs from that in the original SPIT-FIRE model as implemented in LPJ-DGVM (Thonicke et al., 2010); there a cloud-to-ground (CG) flashing ratio of 0.2 was used, followed by a further ignition efficiency of 0.04. To estimate potential ignitions by humans, the original Eqs. (3) and (4) in Thonicke et al. (2010) were modified for the purpose of unit adjustment, as below: × a(ND)/10 000 (1) where I H is the daily ignition number (1 day −1 km −2 ), PD is population density (individuals km −2 ). The parameter a(ND) (ignitions individual −1 day −1 ) represents the propensity of people to produce ignition events; and is a spatially explicit parameter as in Thonicke et al. (2010) (Supplement Fig. S1).

Daily fire numbers
Daily fire numbers are derived by scaling the potential ignitions (which include human and lightning ignitions) with the fire danger index (FDI), which is derived by comparing simulated daily fuel moisture to a plant functional type (PFT) dependent moisture of extinction. All fires with a fireline intensity less than 50 kW m −1 are assumed unable to propagate and are suppressed as stated by Thonicke et al. (2010).

Daily mean fire size
Daily mean fire size is calculated by assuming an elliptical shape of fire, with the major axis length being the product of fire spread rate and daily fire active burning time (i.e. the time that fires actively burn during that day). Fire spread rate is obtained using the Rothermel equation (Rothermel, 1972;Wilson, 1982). Fire active burning time is modelled to increase as a logistic function of daily fire danger index, with a maximum of 241 min (4 h). Fire frontline intensity is calculated following Byram (1959), as a product of fuel heat content, fuel consumption, and fire spread rate. Note that over a grid cell of 0.5 degree, the model simulates a set of homogeneous fires with all their characters (including fire size) being identical among each other, i.e. the model represents the temporal but not the spatial heterogeneity over a given grid cell.
Daily grid cell burned area is calculated as the product of fire number and mean fire size.

Fire-induced tree mortality
Fire-induced tree mortality is determined from the combined fire damage of tree crown and cambium. Crown damage depends on the fraction of tree crown that is affected by crown scorch, which further depends on tree crown length, tree height and fire flame height. Fire flame height is derived from surface fire intensity. Cambial damage depends on fire residence time and a prescribed PFT-dependent critical time.
Note that SPITFIRE simulates only crown scorch, but not active crown fires that could propagate through crown fire spread.

Fire carbon emissions
Fire carbon emissions include emissions from surface fuel and crown combustion. Surface fuels are divided into four classes (1, 10, 100 and 1000 h), whose designation in terms of hours describes the order of magnitude of time required to lose (or gain) 63 % of its difference with the equilibrium moisture content under defined atmospheric conditions (Thonicke et al., 2010). Fuel combustion completeness is simulated as a function of daily fuel moisture, with a smaller fraction of fuel being consumed at higher fuel moisture. Crown fuel consumption is related to the fraction of crown that is scorched by fire flame. The values of all PFT-dependent parameters follow Table 1 in Thonicke et al. (2010).

Further modifications made to the SPITFIRE equations
Fires in dry climate regions are limited by the availability of fuel on the ground (Krawchuk and Moritz, 2010;Prentice et al., 2011;Van der Werf et al., 2008b). This constraint is im- Figure 1. Surface-fuel combustion completeness as a function of fuel wetness in (a) original SPITFIRE model by Thonicke et al. (2010) and (b) as modified in the present study, taking the tropical forests as an example. The combustion of live grass biomass is assumed to follow that of 1 h dead fuel.
plicitly included in SPITFIRE equations, because fire occurrence is limited by a fireline intensity of 50 kW m −1 . However during model testing, we found that this threshold is not enough to limit fires in low-productivity regions (with modelled annual net primary productivity of 0-400 g C m −2 yr −1 , corresponding to an annual precipitation of 0-400 mm); and too much burned area was simulated for arid and semiarid regions (see Supplement Fig. S2). Following Arora and Boer (2005), we therefore introduced a new factor that limits the ignition efficiency, depending on the availability of ground fuel. Ignition efficiency varies linearly between zero when ground fuel is lower than 200 g C m −2 , to unity when ground fuel is above 1000 g C m −2 . Here, ground fuel includes aboveground litter and live biomass for grassland PFTs and aboveground litter only for tree PFTs. The equations for surface fuel combustion completeness given by Thonicke et al. (2010) follow Peterson and Ryan (1986), which allow combustion completeness to decrease with increasing fuel wetness and level out when fuel wetness drops below a threshold ( Fig. 1). During model testing, we found that because fuel wetness frequently approaches zero, simulated fuel combustion completeness is much higher than field experiment values as reported by van Leeuwen et al. (2014). We therefore modified the maximum combustion completeness for fuel classes of 100 and 1000 h to be the same as mean combustion completeness by van Leeuwen et al. (2014)

Input data set and the simulation protocol
The 6-hourly climate fields used to drive the model were from the CRU-NCEP data set (http://dods.extra.cea.fr/store/ p529viov/cruncep/V4_1901_2012/readme.htm). Population density for the 20th century was retrieved from the His- A three-step simulation protocol was used. For the first two steps, the atmospheric CO 2 was fixed at the preindustrial level (285 ppm) and climate forcing data of 1901-1930 were repeated in loop. The first step was a without-fire spin-up from bare ground lasting for 200 years (including a 3000-year run of soil-only processes to speed up the equilibrium of mineral soil carbon). The second step was a fire-disturbed spin-up lasting for 150 years, with fire being switched on to account for fire disturbances in the preindustrial era. Fire ignitions from human activity were included in the fire-disturbed spin-up, with the human population density being fixed at 1850 level. This procedure assumes that the model reached an equilibrium state under conditions of pre-industrial atmospheric CO 2 and climate and fire disturbance.
The third step was a transient simulation from 1850 to 2009 with increasing atmospheric CO 2 , climate change, and varying human population density. The climate data used for the transient simulation of 1850-1900 were a repeat of 1901-1910, for the sake of stability. Before entering the transient simulation, the mineral soil carbon stock was verified to vary within 0.1 % (with a slight global carbon sink of 0.13 Pg C yr −1 and a negligible annual trend of 0.003 Pg C yr −1 during the last 50 years of the fire-disturbed spin-up, excluding all crops, for which fires are not simulated). For the current simulation, the vegetation dynamics module of ORCHIDEE was turned off, i.e. the simulation used a static current-day vegetation distribution map (converted into the 13-PFT map in OR-CHIDEE based on the IGBP 1 km vegetation map, http: //webmap.ornl.gov/wcsdown/dataset.jsp?ds_id=930), and no land cover change was included. This static land cover could affect the model-observation agreement in terms of longterm trend and variation of burned area. Land cover change has double effects on burned area: fires used for land cover change contribute directly to burned area; and the indirect effect depends on fire frequencies of the land cover types before and after the land cover change.
Fires on croplands are not simulated in ORCHIDEE, even though the model has two PFTs that approximately represent C3 and C4 crops (but without realistic species-specific phenology). Magi et al. (2012) show that cropland fire seasons differ significantly from those of natural fires, warranting a special treatment of cropland fires in global fire modelling (Li et al., 2012). Cropland fires make up a rather small proportion of the total global amount in terms of both burned Geosci. Model Dev., 7, 2747-2767, 2014 www.geosci-model-dev.net/7/2747/2014/ area (4.3 % according to the GFED3.1 data set) and carbon emissions (less than 2 % according to GFED3.1), given that the "small" fires  are not formally included and recommended in the GFED data set (http://www.globalfiredata.org/data.html). Further, deforestation fires are not explicitly simulated. Evidence shows that deforestation fires occur during the "time window" when climate is dry enough to allow complete burning of deforested fuels (Van der Werf et al., 2008a). We expect the simulated daily fire danger index (i.e. an indicator of suitable climate conditions for burning) is able to partly capture this "fire climate window" that is necessary for deforestation fires to happen. Thus the model is able to implicitly account for some deforestation fire activity in tropical and subtropical forests, but not for all of them, because of the use of a static land cover map. Preliminary analysis shows that the model could capture ∼ 67 % of the deforested area by fires as given by the GFED3.1 data for closed-canopy forests in the region of 20 • S-20 • N for 2000-2005 (2.7 Mh yr −1 vs. 4.0 Mha yr −1 ) (Fig. S3), with the seasonal variation being moderately represented (Fig. S4).

Data sets used to evaluate model performance
Several data sets were prepared and used to compare simulated burned area and fire regimes with various observations.

Satellite-derived burned area data
The GFED3.1 data set provides monthly burned area data at 0.5 • resolution for 1997-2009 with global coverage . The GFED3.1 burned area was mainly generated using MODIS imagery with additional images from TRMM VIRS and ERS ATSR. The fire carbon emissions were also provided which are model simulation results by applying a modified version of the CASA model .
L3JRC data set provides daily global burned area data at 1 km resolution for April 2000 to March 2007; these data were generated from the 1 km SPOT VEGETATION satellite imagery (Tansey et al., 2008). This data set was assembled at 0.5 • resolution at monthly time step for use in the present study.
GLOBCARBON burned area data was produced from a combination of SPOT VEGETATION and ERS2-ATSR2/ENVISAT AATSR data as one of the four land products of the ESA GLOBCARBON initiative (Plummer et al., 2007). Global burned area data were provided at monthly resolution with four different spatial resolutions (1 km/10 km/0.25 • /0.5 • ) covering 1998-2007.

Historical burned area reconstruction for the 20th century
To evaluate the simulated burned area for the 20th century, historical burned area data were used. These data, which cover the period 1900-2000, were compiled by Mouillot and Field (2005) at 1 • resolution and monthly time step (hereafter referred to as the Mouillot data). The data were generated by first synthesising the burned area information from published data at national or regional scale for the periods of the 1980s or 1990s, further interpolated spatially at the global scale at 1 • resolution using the available satellitederived active fire distribution. Then national fire statistics, historical land-use practices and other fire-relevant quantitative information (such as tree ring reconstruction) were used to build the historical fire temporal trend to interpolate historical burned area.
When comparing the burned area given by GFED3.1 and the Mouillot data for their overlapping period of 1997-2000, the global total burned area given by Mouillot and Field (2005) is 52 % higher than GFED3.1 with significant regional discrepancies. As the satellite-derived data is considered to be more reliable than the national or regional statistical data at a large spatial scale, a bias correction was performed on the Mouillot data. We calculated the ratio of burned area by the Mouillot data to GFED3.1 for 1997-2009 for each region (Fig. 2b) and also the globe. This ratio was then applied to correct for each decade of the burned area data in the Mouillot data set.
Note that the regional breakdowns of the globe by GFED3.1, and Mouillot and Field (2005) are different (Fig. 2). For comparison of burned area for the 20th century, the regional breakdown by Mouillot and Field (2005) was adopted as it is based on maximum temporal stability (error consistency), a highly important factor when comparing long-term data.

Fire patch data
Alaskan and Canadian fire management agencies have maintained historical fire monitoring for a relatively long time (dating back to the 1950s). The historical fire information for the US Alaska was retrieved from the Alaska Interagency Coordination Center (AICC, http://afsmaps.blm.gov/ imf_firehistory/imf.jsp?site=firehistory). The fire information for Canada was from the Canadian Wildland Fire Information System (http://cwfis.cfs.nrcan.gc.ca/ha/nfdb). These data sets contain information on fire location, fire size (burned area), fire cause, and for some fires, the fire report and out date. Please note that in these two data sets fires with all sizes are included. Archibald et al. (2010) classified the MCD45A1 500 m MODIS fire burned area data into individual fire patches for southern Africa (Africa south to the Equator). This fire patch information includes location and patch size (with minimum fire size of 0.25 km 2 ). The fire patch data for Canada and US Alaska, and southern Africa are used to evaluate simulated fire size distribution.

Fire season length and the 95th quantile fire size
The global fire season length and the 95th quantile fire size data are provided by Archibald et al. (2013). The fire season length was quantified as the number of months required to reach 80 % of the annual burned area using GFED3.1 data. The MCD45A1 burned area product at 500 m resolution was used to derive the individual fires by applying a flood-fill algorithm, and the 95th quantile fire size in each grid cell was extracted to represent the size of large fires.

Metrics used to evaluate modelled burned area against GFED3.1 data
As the GFED3.1 data is most widely used by the fire modelling community, the model results are evaluated against GFED3.1 data for 1997-2009. Three aspects were examined: mean annual burned area, interannual variability (IAV) and seasonality in burned area. The evaluation was done for each GFED3.1 region (Fig. 2).
For the model error in terms of mean annual burned area (BA), we use the relative difference: where BA model is the simulated burned area averaged over 1997-2009, and BA GFED is GFED3.1 mean annual burned area for the same period. The similarity in IAV (S interannual ) is estimated by the correlation coefficient of the two linearly detrended annual burned area time series by model and GFED3.1 data. Finally, the seasonality similarity (S season ) is given by: where frac_model i and frac_GFED i are the fraction of burned area for the ith month relative to the annual burned area (i.e. monthly BA normalised by the annual BA). S season represents the overlapping area of the two normalised monthly BA series and indicates the fraction of burned area in temporal coincidence. The statistical significance of S season was examined by using a bootstrapping method. First, normalised monthly BA from all 14 regions by the model and GFED3.1 data were pooled together. Second, 100 000 pairs of monthly normalised BA were randomly sampled from the pooled data in order to derive a probability distribution function (PDF) of S season . Third, the single-sided probability (p value) that the calculated S season is from random distribution is obtained for each region, and a p value less than 0.05 indicates the model could moderately capture the seasonality of burning (i.e. significantly different from a random distribution). The peak fire month, which is defined as the month with maximum monthly burned area, is compared between the model and GFED3.1 data. The difference between simulated and observed peak month is quantified by the following index, after Prentice et al. (2011): where θ j is the angle between vectors representing the simulated and observed peak fire month (with January to December resembling 1 to 12 on a clock), n is the total number of grid cells and A j is the burned area by GFED3.1 data. According to Eq. (4), the value of D 2 is zero when simulated peak month is perfectly in phase with the observation, 0.5 if the timing is off by 3 months in either direction, and one (the maximum) if the timing is off by 6 months.

Concatenate simulated consecutive daily fire events into multi-day fire patches
The fire patch data for Canada and US Alaska, and southern Africa contain fires that span multiple days. In the model,  Pfeiffer et al. (2013) introduced a mechanism to allow fires to span multiple days under suitable weather conditions. Inspired by their study, we developed an approach to concatenate fires during consecutive days into "multi-day fire patches". The size of each "multi-day fire patch" is the cumulative daily fire size during its persistence time. This allows the modelled "multi-day fire patch size" to be compared to observations. For clarity, the period of multiple days that a fire spans is hereafter referred to as "fire patch length". The approach to concatenate independent daily fires into multi-day fire patches is illustrated in Fig. 3, which shows simulated daily fire number and fire size for a 0.5 • grid cell in northern Africa for the fire season of October 2001 to April 2002. Fires occur in different consecutive-day periods, i.e. when simulated FDI remains above zero, and when fuel amount and fire intensity exceed the given thresholds. In the example in Fig. 3, the model simulated five such periods, each spanning a different number of days. Within each period, with the increase of FDI (i.e. advancing into more suitable fire weather, shown in subplot a), the simulated daily number of fires (subplot b) and the mean fire size (subplot c) increased as well. Figure 3 lower panel illustrates in detail how separate daily fires were concatenated into fire patches for the period of Day 740-766 (since 1 January 2000). Rather than viewing fires on a given day as being independent from those of previous days, we now consider part of these fires as "persisting" from previous days and part of these fires as new patches. For example, on Day 741, four fires were originally simulated to start and extinguish within this day (with the exact same fire size). We now consider that two of these four fires persisted from the previous day, i.e. the two fire patches staring on Day 740. We then consider that the other two fires of Day 741 were new fire patches initiated on this day. Similarly, the five fires originally simulated on Day 742 are now considered as: -two extended from the two fire patches of Day 740 (which already persisted into Day 741); -two extended from the two fire patches of Day 741; -one new fire patch started at this day.
Following this approach, fires simulated in all following days were identified either as extending from fire patches of previous days, or as new fire patches. As such, the total number of fire patches is equal to the maximum daily fire number during this period. In the example of Fig. 3 lower panel, ten fire patches were extracted from Day 740 to 766 (as indicated by the numbers and arrows in red in the subplot b): The size for each fire patch was the cumulative daily fire size during its persistence time. The procedure illustrated in Fig. 3 was repeated for all the land grid cells (excluding those with less than 10 days of fire occurrence) over 1997-2009, to generate the "multi-day fire patches" over the globe. The subplot (e) shows for each grid cell the quality flag of ORCHIDEEsimulated burned fraction in comparison with observation data sets. ORC-err-burn, where ORCHIDEE shows burning but the other three observation data sets do not; ORC-err-noburn, where at least two of the three observation data sets do show burning, but ORCHIDEE does not; ORC-min, where ORCHIDEE simulates lower burned fraction than the other three data sets; ORC-max, where ORCHIDEE simulates higher burned fraction; ORC-good, where ORCHIDEE-simulated burned fraction falls within the range given by the three observation data sets. When calculating the minimum and maximum burned fraction of the observation data sets, an arbitrary tolerance margin of 25 % was applied around the min/max value to take into account the observation uncertainty. Fires in grassland-dominated systems are well captured by the model, including steppe fires in central and eastern Asia, savanna fires in northern Africa, northern Australia and central to east South America. Two regions could be identified where model simulation is different from all the three observation data sets. One is the woodland savanna (miombo) in southern Africa, where burned area is underestimated by the model (simulated annual burned fraction is ∼ 4 %, but 14-24 % is observed). The other is western and central continental US (dominated by C3 and C4 grass in the land-cover map used by ORCHIDEE) where fires are overestimated (simulated annual fraction is ∼ 6%, but 1-2 % is observed). For the fires in high-latitude (> 45 • N) boreal forest, sparsely forested area and tundra, the magnitude of burned fraction by ORCHIDEE falls between that found in the GFED3.1 and in the L3JRC/GLOBCARBON data sets (Fig. 4).

Comparison of model simulation to satellite observation for the spatial and temporal pattern of burning
Geosci. Model Dev., 7, 2747-2767, 2014 www.geosci-model-dev.net/7/2747/2014/ The simulation pixels are divided into five classes according to their simulation quality, as shown in Fig. 4. Table 1 shows the mean burned area for each category and data set. The grid cells with burning collocated between model and observation data (labelled as ORC-good, ORC-max, ORCmin in Fig. 4) cover the majority of global burned area (76-92 % depending on different data sets), indicating that the model can reproduce the major spatial extent of burning. However, discrepancy still remains, in that 50 % of the modelled global burned area is classified as ORC-max (i.e. overestimation of burned fraction by the model), whereas observation data sets have half of the global burned area labelled as ORC-min (i.e. underestimation of burned fraction by the model). Figure 4 also illustrates the lower burned area found in L3JRC and GLOBCARBON in comparison with GFED3.1 for the Southern Hemisphere and subtropical Northern Hemisphere, in contrast to the higher burned area in the middle-to-high latitude region in the Northern Hemisphere. The simulated latitudinal distribution of burned area generally falls within the minimum-maximum range of the three observation products. Exceptions are the regions of BON and the GFED3.1; and −0.59 between L3JRC and the GFED3.1.

Model evaluation against GFED3.1 burned area data
The simulated burned area for 1997-2009 is evaluated against the GFED3.1 data for each region in terms of mean annual burned area, and similarity in interannual variability and seasonality (see metrics in Sect. 2.5). The results are presented in Table 2  15.0 7.5 (10,11) * A bootstrapping method was used to derive a probability distribution function of Sseason by randomly sampling from the normalised monthly burned area of GFED3.1 and the model for 100 000 times. The bold number indicates that the Sseason is not significantly different from a randomised monthly distribution of burned area at a significant level of 0.05. the 1998 El Niño peak burning), because errors are compensated among regions (Fig. 5b). On regional scale, the model performs best at regions where the IAV of burned area is known to be mainly driven by climate, such as boreal North America (BONA), boreal Asia (BOAS), and equatorial Asia (EQAS). However, the model performs rather poorly for regions where burned area shows little IAV such as Africa (NHAF and SHAF, see also Fig. 6), or the burned area is unrealistically simulated by the model (such as TENA and MIDE). For most regions the model captures the fire seasonality rather well (Table 2), with S season being significantly different from that of a randomly distributed seasonality, except in NHAF, BOAS and SEAS.

Fire and precipitation relationship
The model captures well the empirical relationship between burned area and precipitation found in tropical and subtropical regions ( Fig. 7; see also Prentice et al., 2011;Van der Werf et al., 2008b). In low-precipitation regions (< 400 mm yr −1 ), the climate is favourable for fire but burning is limited by the available fuel. In contrast, regions with higher precipitation (> 2000 mm yr −1 ) always support sufficient fuel amount but fires are limited by the duration of dry season when fires can occur. Burned area is maximal for regions with intermediate precipitation and productivity (Krawchuk and Moritz, 2010).
Maximum burning occurs around an annual precipitation of 1000 mm according to model simulation, compared to 1200 mm by GFED3.1, and 1400 mm by the GLOBCARBON and L3JRC data sets. GLOBCARBON and L3JRC show the lowest burning in this tropical/subtropical belt, followed by the model simulation, with the burned area by GFED3.1 being the highest. The fire and precipitation relationship was further divided into four subregions of America, Africa, Asia and Australia following

Peak fire month and fire season length
The spatial distributions of fire peak months by ORCHIDEE and GFED3.1 are compared in Fig. 8. The spatial pattern of simulated peak fire month is in general agreement with GFED3.1 data. The model simulation gives D 2 as 0.3, indicating that simulated and observed peak fire month differ  by on average 2 months. At regional scale, simulated peak fire months for most regions are within 1 month of those by GFED3.1 data, except MIDE and SHAF (see the far righthand column of Table 2). Refer to Supplement Fig. S6 for more detailed information of modelled and observed seasonal pattern of burning for different GFED3.1 regions. Figure 9 compares simulated fire season length with the GFED3.1-derived fire season length from Archibald et al. (2013). The spatial pattern of fire season length by model simulation agrees well with that given by Archibald et al. (2013), with fire season length lasting 1-3 months in boreal regions, and 4-7 months in semi-arid grasslands and savannas. The fire season length in eastern Africa, South Africa, Botswana, Namibia, Argentina and Mexico is overestimated by the model by 2-4 months; and underestimated in southeast Brazil by 1-2 months.

Long-term trends of burned area during the 20th century
Over the 20th century, the historical trend of modelled global total burned area generally follows the Mouillot reconstruction data (Mouillot and Field, 2005) as corrected by GFED3.1 data (see Sect. 2.4.1), with increasing burned area after the 1930s until the 1990s-2000s, after which global burned area began to decrease (Fig. 10). However, the interdecadal variability of burned area is underestimated. Regionally, simulated decadal burned area agrees relatively well with the Mouillot data in boreal Russia (beginning from the 1930s), although the observation data is subject to great uncertainty, especially before the 1950s (Mouillot and Field, 2005). The simulated burned area also agrees well with fire agency data for Canada and US Alaska (Supplement Fig. S7). The correlation coefficient for annual BA between model and fire agency data is 0.44 after 1950, and 0.57 between model and Mouillot data, when the observation data are considered to be more reliable. This reflects the model capability to capture fire variability driven by climate variation relatively well. The model fails, however, to capture burned area variation for regions where fires from changed land cover likely played a bigger role in the earlier 20th century according to the Mouillot data (Mouillot and Field, 2005); for example, in Australia and New Zealand, USA and southern South America, mainly because the static land cover was used in the simulation. Strong model-observation disagreement also occurs for regions where the implementation of modern fire prevention has drastically reduced the burned area, as occurred in the 1960s in boreal North America, because the general implicit inclusion of human suppression on ignitions in Eq. (1) on the global scale does not accommodate regional uniqueness.

Compare simulated fire size with observation
In this section we compare the simulated fire size distribution against observation over two regions: Canada and US Alaska combined, and southern Africa. The number and size of reconstructed "multi-day fire patches" by the model were used (see Sect. 2.5.2). For both simulated and observed fires, all fires within the test region were pooled together. Fires were  Geosci. Model Dev., 7, 2747-2767, 2014 www.geosci-model-dev.net/7/2747/2014/ binned according to fire patch size in an equal logarithmic distance manner (with the minimum-maximum size range being divided into 100 bins). The mean annual number of fires on an area basis for each bin was calculated. For the fire patches in Canada for which fire start date and fire out date were reported, fire length was calculated and compared with model simulation as well. Figure 11 shows the fire size and the corresponding number of fires for each size bin over US Alaska and Canada, and southern Africa. The modelled fire size distribution reaches a maximum at intermediate fire sizes. This is because when the climate is less favourable for fire occurrence, both number of fires and fire size are limited (corresponding to the low fire size end in Fig. 11a and 11b). While the size and number of fires could be large during the period when climate is dry and large fires are possible (corresponding to the high fire size end in Fig. 11a and 11b), the frequency of high-fire period itself might be rare. The similar pattern is shown by the fire agency data of Canada and US Alaska, but not by the satellite-derived fire patch data in southern Africa, which might be due to that the minimum fire size (25 ha) is limited by the satellite resolution there. However, for both regions, the frequency and size of extreme large fires were underestimated by the model. Further comparison of fire lengths for Canada (Fig. 11c) reveals that the model underestimated fire length by as much as 60 days for the extreme large fires.
We further calculated the cumulative fraction of total burned area by fires below a given quantile of fire size (the minimum size, every tenth quantile from 10th to 90th quantile, and the maximum size) (Fig. 12). According to observation, in boreal Canada and US Alaska, the total burned area is mainly dominated by a few large fires, with the top 10 % of fires (90th quantile to the maximum size) accounting for 99.8 % of the total burned area. By contrast, the same group of fires (i.e. the highest 10 % large fires) account for ∼ 80 % of the simulated total burned area, with the remaining being accounted for by many small fires. For southern Africa, the model distribution follows rather relatively well of the observation. The top 30 % of fires (70th quantile to the maximum size) make up 90 % of the total burned area by satellite data and 85 % by the model simulation. Figure 13 compares the simulated 95th quantile fire size with the global observation by satellite. According to observation, fires with biggest fire size (500-10 000 km 2 ) are grassland-dominated fires in central and eastern Asia, African savanna and northern Australia, which are followed by fires in Russian and Alaskan boreal forest (and sparsely forested area) or tundra, and savanna-woodland fires in Africa and central South America (50-500 km 2 ). Fires in the rest of the world are relatively small (< 50 km 2 ). In terms of spatial fire size distribution, the model could reproduce the biggest fires in grassland-dominated systems in central and eastern Asia (200-1000 km 2 ), northern Australia (50-200 km 2 ), as well as fires in Russian boreal forest (and sparsely forested area) or tundra (50-200 km 2 ; note that tun- dra is treated as C3 grassland in the model), but fire size for these regions is generally smaller than observation by up to one magnitude. The fire size in central South America tends to be underestimated, and overestimated in western to central US. Fire size for the rest of the world is comparable between model and the observation, with general small fire size (< 50 km 2 ) and low fire frequency.

General model performance
The SPITFIRE module was first presented by Thonicke et al. (2010). It was later adapted in the Land surface Processes and exchanges (LPX) model by Prentice et al. (2011), notably with the removal of ignition sources created by humans, and further by Pfeiffer et al. (2013) and Lasslop et al. (2014). Pfeiffer et al. (2013) developed a scheme to allow fire span of multiple days and fire coalescence, explored the role of lightning interannual variability in the model, and included terrain effects on fire at a broad scale. Lasslop et al. (2014) investigated model sensitivity to climate forcing and to the spatial variability of a number of fire relevant parameters. In the current study, the SPITFIRE module was fully integrated into the global process-based vegetation model ORCHIDEE for the first time. Simulated burned area, fire season and fire patch size distribution were evaluated in a comprehensive way against observation for the recent period (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009), and against reconstructed burned area for the 20th century.
The simulated global mean annual burned area for 2001-2006 agrees with the satellite observation data and is most close to the GFED3.1 data. The model could moderately capture the interannual variability in burned area as revealed by GFED3.1 data with the exception of the peak burning of 1998, probably because the deforestation and tropical peat fires in the 1997-98 El Niño event ( Van der Werf et al., 2008a) were not represented. Model-observation discrepancy remains at a regional scale, with underestimation mainly in the savanna of Africa and Australia, and overestimation in central Asia, Middle East, the temperate North America, and Southern Hemisphere South America.
The model can reproduce the maximal burned area for the intermediate range of precipitation for tropical and subtropical regions (35 • S-35 • N) (also shown by Prentice et al., 2011;Van der Werf et al., 2008b). For boreal regions where climate plays a dominant role in driving the temporal variation of burned area, simulated burned area generally agrees well with the historical reconstruction data (boreal Russia for 1930-2009) and government fire agency data (US state of Alaska andCanada for 1950-2009), indicating that the model is capturing the climate as a driver of fire. However, on the global scale, because fire trend is determined by multiple factors including climate, land-use practice and fire suppression (Mouillot and Field, 2005), simulated burned area trend only agrees moderately well with the reconstruction data for the 20th century, given that the static land cover and the simple human ignition equation were used in the model.

Potential sources of systematic errors
Fire is a complex, regional process that involves diverse dimensions of vegetation, climate and human activity (Bowman et al., 2009). Uncertainties in all these factors will contribute to the overall uncertainty in simulated fire activities. Because SPITFIRE simulates fire occurrence through a complex chain that includes factors from potential ignition to fire climate, to fire spread rate and fire size and tree mortality, identifying contributions of each modelling step to the ultimate error in simulated fire regime is problematic. A complete error analysis involving all model parameters is beyond the scope of this study, but following sections are intended to serve as preliminary investigation of model errors.

Ignition sources
On the global scale, due to the limitation of fire by fuel load on the arid and semi-arid regions, modelled annual burned area is more closely related to the total fire numbers rather than the fire danger index (Fig. S8). This might lead to speculation that the potential ignition source is the first identified source of error for simulated burned area. One possible error in ignition is that potential lightning ignitions are not suppressed by human presence in densely populated areas, which cause lightning-ignited fires being overestimated. We have tested this possibility by applying a population densitydependent human suppression of lightning-ignited fires following Li et al. (2012), and the result showed that part of the overestimation of burned fraction in western US and central South America could be reduced (Figs. 4 and S9), but the burned area in Africa and over the whole globe were further underestimated. Published fire models Li et al., 2012;Pechony and Shindell, 2009;Venevsky et al., 2002) generally include ignitions from lightning and human sources, together with explicit or implicit human suppression of fires. However, one common challenge is to properly calibrate ignition parameters. One option is to use active fire counts (as in case of Li et al., 2012;Pechony and Shindell, 2009), but fire counts are not exactly real fire numbers, because a single widespread fire could be seen as many fire counts and the burned area per hotspot may vary by an order of magnitude depending on vegetation composition (Hantson et al., 2013). Simulated burned area on the global scale might be comparable with the satellite observation data. However, very little is known on how this agreement was achieved; nor on whether each step of the modelling process was correctly captured, or if the ultimate agreement in burned area is mainly thanks to error compensation among different steps. Two recent modelling studies (Lasslop et al., 2014;Yang et al., 2014) used some scaling factor to adjust either directly burned area or the ignitions to match simulated global burned area with observation. Pfeiffer et al. (2013) argued that interannual lightning variability was critical for their model to capture the trend and interannual variation of burned area, especially for regions where burned area is dominated by lightning fires such as boreal forests in Alaska and Canada. Currently, a single data set of monthly lightning flashes was repeated each year in our model. However, we have tested the possibility to include the interannual lightning variability by following their approach.
The historical lightning variability during the 20th century was reconstructed by using the convective potential avail-Geosci. Model Dev., 7, 2747-2767, 2014 www.geosci-model-dev.net/7/2747/2014/ able energy (CAPE) output variable from the 20th Century Reanalysis Project (http://portal.nersc.gov/project/20C_ Reanalysis/), following Eq. (1) on page 649 of Pfeiffer et al. (2013). A test simulation was run for 1901-2009 over the whole globe with the variable lightning input. We found that shifting from repeated lightning data to CAPE-derived data decreased the Pearson correlation coefficient between simulated decadal burned area and the Mouillot reconstruction for half of the 14 regions and for the globe, but increased the correlation for other regions. Over 1997-2009 with observation data by GFED3.1 more credible than the 20th century reconstruction, using the CAPE-derived data decreased the Pearson correlation coefficient between annual simulated burned area and GFED3.1 for the globe and for most of the regions. This surprising result could be due to the fact that model internal errors may compensate for possible benefits of using the CAPE-derived data, or because CAPE-derived lightning data does not reflect the real lightning variability everywhere. For more detailed results and discussions, refer to the Sect. S2 of the Supplement.

Fire number, size and fire patch length
The comparison of simulated "multi-day fire patches" against the observation in Canada and US Alaska, and southern Africa shows that the model underestimated the frequency and size of extreme large fires. Given that annual burned area in US Alaska and Canada is slightly overestimated by the model (2.8 Mha yr −1 by model during 1960-2009 against 2.2 Mha yr −1 by fire agency and 1.9 Mha yr −1 by the Mouillot data, also refer to Fig. S7 in the Supplement), the total number of (the small-and mediumsized) fires must be overestimated (i.e. making compensation for the too small fire size). However, the compensation by fire numbers does not occur for southern Africa, where, given the underestimation of large fire size, the total burned area remains underestimated (Fig. 4, Table 2). Over the globe, despite the fact that the model correctly identifies some regions where large fires occur (mainly with grassland fraction higher than ∼ 70 % by the land cover map used in the model), the large fire size remains underestimated -by approximately one magnitude (Fig. 13). The fire size of reconstructed "multi-day fire patches" is defined as the cumulative daily fire size over the corresponding fire patch length, and the underestimation of large fires could be due to underestimation in either fire patch length or daily fire size (i.e. fire patch size grows too slowly over its length). The comparison of simulated fire length with fire agency data in Canada suggests that model generally underestimated fire length. For fires larger than 10 000 ha, the underestimation is as large as 40-60 days. Given similar underestimation of large fire sizes in other ecosystems that are characterised by large-size fires (Fig. 13), the fire length underestimation in Canada is likely to happen elsewhere, though this could not be completely verified mainly because of the lack of fire length observation across the world (the satellite-derived fire size data by Archibald et al., 2010Archibald et al., , 2013 used in this study does not include the corresponding fire length yet).
In Pfeiffer et al. (2013), fires were simulated to span multiple days and extinguish when the cumulative precipitation exceeds some threshold, with the daily fire size remaining limited by 241 min. There is one significant difference between our approach and theirs. In Pfeiffer et al. (2013), fires starting on a given day are always considered as "new fires" to be added on the existing fire count on the previous day. The number of fires over a given grid cell thus accumulates each day as the time advances in a period suitable for fire occurrence. In our model, fires are originally simulated as independent events within individual days (because they are assumed to start and extinguish during the same day). To allow the comparison of simulated fire size against observations, these independent fires were concatenated (regrouped) into fire patches that span a different number of days. We made the concatenation by assuming fires at a given day either persist from the previous days or are actual new fire patches. Thus the number of reconstructed fire patches by our approach is the maximum daily fire number during the consecutive days of burning.
This difference in accounting for fire patches arose from the different approaches to handle ignitions in the two models, in particular ignitions by lightning activities, because anthropogenic ignitions were excluded in Pfeiffer et al. (2013). Pfeiffer et al. (2013) simulated lightning-ignited fires by finally dropping the physical meaning of the number of lightning flashes in the input data. This information was only used to obtain an all-or-nothing answer, allowing either a single fire over the whole 0.5-degree grid cell or no fire at all. However, the number of cloud-to-ground lightning flashes retained its physical meaning in our approach, and was scaled by the simulated FDI and fuel-limiting ignition efficiency to derive a daily number of fires. As no validation information was provided regarding fire number and fire size in Pfeiffer et al. (2013), it is unclear which approach yields results closer to observations. Overall, it remains challenging to develop a proper approach to represent the heterogeneous fire patches, and the growth of each individual patch in grid-based models (Jones et al., 2009).

Daily fire size, active burning time and fire spread rate
The underestimation of large fire sizes in Canada is partly due to underestimation of fire patch length. A closer comparison of Fig. 11a and 11c suggests that, while fire length was underestimated by a factor of 2-3 for fires between 10 4 and 10 5 ha, given the same fire number (shown as the vertical axis of Fig. 11a), the fire size was underestimated by roughly an order of magnitude (10 times). This implies the (mean) daily fire size is likely underestimated as well (roughly 3-5 times).
Within the model, daily fire size is determined by daily fire active burning time and fire spread rate. Evidence shows that wildfires display a characteristic diurnal cycle, with the most active time being around midday and early afternoon when the humidity is at a minimum and wind speeds are higher (Mu et al., 2011;Pyne et al., 1996;Zhang et al., 2012). Outside this active burning time, fires could persist but propagate at a rather low speed (especially at night) or even turn into smouldering and burn with flame later again when the weather is feasible. For extremely large fires, the active burning period could be longer because these fires often occur when the fuel is extremely dry as a consequence of extended drought weather. Currently, this active burning time is simulated to exponentially increase with the fire danger index (i.e. indicator of daily fire weather) with a maximum time of 241 min. This limit might not be feasible for all ecosystems and fire sizes; however, a full exploration of this issue is currently limited by the scarcity of observations.
A global map of simulated 95th quantile fire spread rate is provided in the Supplement (see Fig. S10). Li et al. (2012) compiled fire spread information from the literature and reported typical fire spread rates of 12 m min −1 for grasslands, 10.2 m min −1 for shrubs, 9 m min −1 for needle-leaved trees and 6.6 m min −1 for other trees. The simulated fire spread in grasslands in central and eastern Asia (20-40 m min −1 ) is much higher than the observed range. Considering that daily fire sizes are likely to be underestimated, the limit of 241 min maximum daily active burning time might be too short to correctly simulated large fire sizes in these grassland ecosystems. By contrast, simulated fire spread rate in savanna vegetation (1-5 m min −1 ) is lower than the reported value (10-12 m min −1 ), and this could help to explain the underestimation of fire size and the broad underestimation of burned area in this region.
Fire spread in the northern high-latitude boreal forest, sparsely forested area and tundra is modelled to be extremely high (> 10 m min −1 ). This is mainly because herbaceous plants in these regions are simulated as C3 grasslands in the model with relatively low bulk density. However, the likely high daily burned area due to the high fire spread rate was compensated by simulated short fire patch length in the model (Fig. 11c), so that the simulated burned area for the region of 50-75 • N agrees well with GFED3.1 data (Fig. 5a). Pfeiffer et al. (2013) proposed to relate the grass fuel bulk density with the annual sum of degree days over 5 • C. We have tested their approach and found that this new approach decreased the simulated burned area for the high-latitude region (50-70 • N) and for the globe, and thus was not included in the current version of model.
To gain more insights into the model's behaviour, the simulated 95th quantile fire patch size was related with other parameters (grassland fraction, fuel bulk density). As shown in Fig. S11, the size of large fires exponentially depends on the fire spread rate. The fire spread rate is very sensitive to the fuel bulk density and grass fraction beyond some thresh-old (e.g. fire spread rate surges when grass coverage exceeds ∼ 70 %), with the fuel bulk density being inversely dependent on the grass fraction. Thus the simulated fire size could be sensitive to the land cover map (especially grass fraction) used in the simulation. Besides, as a static land cover map is used in our simulation, the grassland fraction is not allowed to vary as a response to fire disturbance. Thus a full fire-climate-vegetation feedback is limited, and this could probably help to explain the underestimation of fire size.

Influence of fire-climate-vegetation feedback
In the current simulation, the dynamic vegetation module of ORCHIDEE was switched off and a static land cover map was used. Tree mortality was affected by fire-induced tree damage, but tree coverage within a given grid cell was static and not allowed to vary with fire occurrence. A test simulation has been done for Southern Hemisphere Africa following the same simulation protocol as in Sect. 2.3 but with the dynamic vegetation module being switched on, in order to investigate the simulated fire behaviour with dynamic vegetation. Figure S12 compares the simulated annual burned area, grass and tree coverage change and the fire danger index for 1901-2009 with the model in dynamic and static vegetation modes.
In dynamic vegetation mode, the simulated burned area suddenly begins to increase around 1965, in response to the increased fire danger index. The increase in fire activity further increases the grass coverage and reduces tree coverage, causing a positive feedback to finally induce a peak of burned area around 1975, after which the burned area decreases. In static vegetation mode, the simulated burned area shows a similar peak in response to the peak in the fire danger index, however, with a much smaller peak of burned area than that simulated in dynamic vegetation mode because of the lack of fire-vegetation-climate feedback. Simulated burned area by both simulations is still lower than that given by the GFED3.1 data for the period of 1997-2009, although the peak burned area in the dynamic vegetation mode is comparable with GFED3.1 data. This test indicates that including the fire-climate-vegetation feedback could improve the simulation when the climate is favourable for fire occurrence. At the same time, it also suggests that other factors like climate, and the model mechanisms determining the competitiveness of trees versus grass, might also play a role in the error of fire modelling.

Potential error sources for regional bias
Given that burned area is simulated in the model as a result of several sequential steps (lightning and human ignitions, fire number and fire size distribution, fire patch length, daily fire size, daily active burning time and fire spread rate), and the scarcity of global coverage observation data, we are not able Geosci. Model Dev., 7, 2747-2767, 2014 www.geosci-model-dev.net/7/2747/2014/ to give a quantitative estimation of the role of each of these factors in determining the final model error for each GFED region. Rather, here we select several typical regions and briefly discuss the possible reasons for model performance (either good simulation, or over/under-estimation). The model agrees with the GFED3.1 burned area for 50-75 • N relatively well (Fig. 5). Burned area for Canada and US Alaska for 1950-2009 was overestimated by ∼ 27 % compared with fire agency data. However for the GFED region of BONA (Fig. 2), the model overestimates BA by 60 % (Table 2). This is mainly because spatial extent of BONA includes part of the grassland systems in northern US, where the burned area is overestimated (Fig. 4), same as for TENA. The relatively good simulation of boreal burned area (note in Table 2, E BA for BOAS is −0.1) is mainly because underestimated large fire sizes are compensated by overestimated fire patch numbers. Though simulated fire spread rate for some regions is extremely big, the daily fire sizes are still likely underestimated, possibly due to too short daily active burning time.
The burned area for the Northern Hemisphere temperate regions is systematically overestimated by the model, including EURO, CEAS, TENA, CEAM, MIDE, and SEAS. Two reasons are suspected to contribute to this overestimation. First, extensive grassland coverage is found in some regions (CEAS, MIDE, TENA, CEAM, part of MIDE), where simulated fire spread rate is much higher than observation, likely creating high daily fire size. Please note that this is not in contradiction with the underestimated large fire patch sizes (Fig. 13) because fire patch length could be underestimated. Second, for regions where human population density is high and active fire suppression is implemented, such as India in SEAS, China Inner Mongolia in CEAS and Europe, ignitions seem to be excessive, leading to larger burned areas in spite of the small simulated fire sizes. This is partly because the lightning ignitions are not suppressed in the current model version, and because the global human-ignition relationship is not feasible everywhere and the spatial a(ND) data set used in the model is not able to efficiently handle the spatial heterogeneity.
Finally, burned area in the three biggest fire regions of NHAF, SHAF and AUST, which are dominated by savanna and woodland savanna, is underestimated by the model. This underestimation is primarily due to underestimated large fire size, which are not compensated by the ignitions. The simulated fire spread rate in Australia (7.5-15 m min −1 ) seems comparable with observation (10-12 m min −1 ); however it is underestimated in Africa. The underestimation of burned area in SHAF is likely also related with its low grassland coverage, given that the simulated fire size is rather sensitive to the grassland fraction (Figs. S11 and S13).
Further, Archibald et al. (2013) showed that two major fire types dominate the burned area of Africa (frequent intense large fires and frequent cool small fires) and their correlation with environmental factors seems to be clearly distinguished by the human impact index. This implies that the a(ND) values should ideally differ as well among these two fire types, which currently share the same value (Fig. S1).
The regional pattern of model-observation disagreement in our study is also shared by another SPITFIRE implementation in the JSBACH land surface model by Lasslop et al. (2014), who modified a scalar in the human-ignition equation to match the simulated global burned area with observation. It remains somewhat a common challenge for processed-based fire models to correctly represent the global burned area and its spatial distribution; and in some cases ignitions need to be adjusted or optimised according to the observation data (Lasslop et al., 2014;Yang et al., 2014). Li et al. (2013) included the social economic factors in simulating fires for some vegetation types, which could be incorporated in the future development of our model.

Uncertainty/error summary
The preliminary investigation of modelling error reveals that large fire size is underestimated over regions of high fire frequency; and the ignition error is playing an important role in determining the ultimate simulated burned area. On the regional scale, ignition numbers (fire numbers) are either overestimated to compensate fire size underestimation to cause a moderate or overestimated burned area, or are not enough that the simulated burned area is underestimated as well. The underestimation of large fire patch size is likely due to underestimation in both fire patch length and the daily fire size, which could further be limited by the daily active burning time. Overall, the moderate model agreement on global burned area could be achieved only when errors among different regions are compensated.

Future model improvement directions and needed data sets
Currently many efforts in global fire modelling are directed at reproducing the temporal and spatial pattern of burned areas Li et al., 2012;Pfeiffer et al., 2013;Prentice et al., 2011;Thonicke et al., 2010). Total burned area is determined by ignition frequency and fire size, which itself is controlled by fire spread rate (fire intensity) and fire duration. More work is needed to investigate if a model can reproduce the mechanisms that drive burned area: i.e. the rate of spread, fire patch length, daily active burning time, fire size, ignition frequency, and fireline intensity. Comparing observed and simulated fire regimes, which combine information on fire timing (fire season), size, numbers and intensity (Gill and Allan, 2008) will help to reveal gaps in this understanding. The present study is a step in this direction, bringing new in-depth model evaluation.
In summary, the fire processes in the SPITFIRE model are complex enough to include many aspects of wildland and human-caused fire processes in nature. However, little is known about the parameter sensitivities and their contribution to model error. The simulated intermediate model parameters (e.g. fire numbers, fire patch length, fire size, daily active burning time, fire spread rate, fire intensity) are poorly constrained by the observation data. As a result, error compensation could be prevalent in the model and a wider application of the model is impeded.
To advance model development, global measurement data sets of the key fire-relevant parameters, including fire size, fire patch length, fire diurnal variability, fire spread rate, fuel bulk density, wind speed, fire intensity etc., should be established and used to calibrate fire models. On the other hand, the complexity of fire model parameters and the regional nature of fire processes make it unlikely that these parameters could be calibrated in a parameter-by-parameter and site-bysite way, but some more advanced techniques such as data assimilation or model-data fusion could be helpful. Finally, some more mechanistic fire processes should be considered for inclusion into the model, such as crown fire spread and the mechanistic process of fire extinction.

Conclusions
We have integrated the SPITFIRE model into a global process-based vegetation model ORCHIDEE. The historical burned area for the 20th century was simulated and the modelled fire regimes were evaluated against observation data. The model was able to capture well the historical climatic drivers of burned area for the 20th century. However, parameter uncertainties such as number of fire ignitions, daily active burning time and fire spread rate resulted in considerable regional discrepancies. Large fire sizes are generally underestimated, with the error in simulated burned area being partly compensated by overestimated fire numbers. Future model development requires a complete parameter sensitivity analysis for the key processes represented in fire modelling. To constrain the model error, consistent spatial observational data sets should be established for validating the key variables in the model at different modelling steps.
The Supplement related to this article is available online at doi:10.5194/gmd-7-2747-2014-supplement.