Validation of 3D-CMCC Forest Ecosystem Model (v.5.1) against eddy covariance data for 10 European forest sites

. This study evaluates the performances of the new version (v.5.1) of 3D-CMCC Forest Ecosystem Model (FEM) in simulating gross primary productivity (GPP), against eddy covariance GPP data for 10 FLUXNET forest sites across Europe. A new carbon allocation module, coupled with new both phenological and autotrophic respiration schemes, was implemented in this new daily version. Model ability in reproducing timing and magnitude of daily and monthly GPP ﬂuctuations is validated at intra-annual and inter-annual scale, including extreme anomalous seasons. With the purpose to test the 3D-CMCC FEM appli-cability over Europe without a site-related calibration, the model has been deliberately parametrized with a single set of species-speciﬁc parametrizations for each forest ecosystem. The model consistently reproduces both in timing and in magnitude daily and monthly GPP variability across all sites, with the exception of the two Mediterranean sites. We ﬁnd that 3D-CMCC FEM tends to better simulate the timing of inter-annual anomalies than their magnitude within measurements’ uncertainty. In six of eight sites where data are available, the model well reproduces the 2003 summer drought event. Finally, for three sites we evaluate whether a


Introduction
Terrestrial ecosystems have a relevant role in the global carbon cycle, acting also as climate regulators (Peters et al., 2007;Bonan, 2008;Huntingford et al., 2009).In fact terrestrial ecosystems store large carbon stocks and cause most of the variance of carbon exchange between the atmosphere and land surface (Batlle Bayer et al., 2012).Among terrestrial ecosystems, forests are an essential component in the global carbon cycle because of their high capacity to store carbon in the vegetation and soil pools (Kramer et al., 2002).Through Gross Primary Production (GPP) plants fix atmospheric carbon dioxide (CO 2 ) as organic compounds, enabling terrestrial ecosystems to offset part of the anthropogenic CO 2 emissions (Janssens et al., 2003;Cox and Jones, 2008;Battin et al., 2009).Consequently, changes in GPP could have relevant impacts on atmospheric CO 2 concentration.Thus, accurately simulating terrestrial GPP is key to quantify the global carbon cycle and predict the future trajectories of the atmospheric CO 2 concentration (Wu et al., 2015), and taking into account the various spatial and temporal scales of the processes is a major challenge (Yuan et al., 2007).Terrestrial ecosystem models, used to simulate carbon, water and energy fluxes, are valuable tools for advancing the knowledge of the role of ecosystems in maintaining a multitude of their fundamental services, like the provision of products and the regulation of climate (Ibrom et al., 2006).Such numerical models are also useful to: (1) predict the impacts of climate variability on terrestrial biosphere and related carbon fluxes (Ciais et al., 2005;Brèda et al., 2006;Richardson et al., 2007), ranging from long-term anomalies (Santini et al., 2014) up to extreme events (Zscheischler et al., 2014); and (2) reproduce biophysical and biogeochemical feedbacks of vegetation cover and change on climate, especially when coupled to atmosphereocean climate models through land surface schemes (Bonan, 2008;Arneth et al., 2010;Taylor et al., 2012;Fisher et al., 2015).
At European level, terrestrial ecosystems have been reported to be a significant sink of CO 2 (Luyssaert et al., 2012), with forests playing a relevant role in absorbing anthropogenic emissions for about 10 % (Nabuurs et al., 2003;UNECE and FAO, 2011).
In the last decade some studies have identified systematic errors when modelling terrestrial ecosystem sensitivity to climate variability at multiple timescales (Friedlingstein et al., 2006;Piao et al., 2013;Dalmonech et al., 2015) while sometimes differences in model predictions are very large (Wang et al., 2014a).
To improve the model's capability in reproducing relevant processes related to the land carbon cycle, detailed representation of missing processes should be increasingly developed (Sykes et al., 2001;Campioli et al., 2013;Nolè et al., 2013;Ciais and Sabine, 2013;Prentice et al., 2015).For instance, spatial and temporal environmental heterogeneity is known to play an important role in the dynamics of populations and communities (Kobe, 1996;Chesson, 2000;Clark et al., 2010Clark et al., , 2011)).However, the implications of this heterogeneity for developing and testing regional-to global-scale forest dynamics models that are also able to take into account forest management have still largely to be explored (Zhang et al., 2014).As reported by Wramneby et al. (2008), incorporating increased mechanistic details is expected to improve the explanatory power of a model.Many models for example calculate leaf photosynthesis through the Farquhar model (Farquhar et al., 1980;Farquhar and Sharkey, 1982), while few models take into proper consideration the canopy vertical stratification.Increasing model complexity can sometimes mask a lack of understanding, although models including a larger subset of important processes should be more realistic than a simpler model.However, complex models are tuned to perform well at standard tests but produce widely divergent results when projected beyond the domain of calibration (Prentice et al., 2015).Since European forests are mostly managed and not homogeneous in terms of structure, composition and cohorts, only a few models are able to represent this particular ecosystem complexity and heterogeneity (Grote et al., 2011;Morales et al., 2005;Seidl et al., 2012;Yin et al., 2014).For simulating the impact of forest management on the carbon cycle, it is important to consider the vertical structure of forests and the age-related changes in structure and physiology.
In this study we investigate the performance of the new version of the 3D-CMCC Forest Ecosystem Model (3D-CMCC FEM, Collalti et al., 2014) in quantifying GPP across different forest types and climate conditions in Europe.In contrast to Dynamic Global Vegetation Models (DGVMs), 3D-CMCC FEM incorporates accurate process description focusing on the effects of hierarchy in vertical forest structure and ages on productivity and growth at species level.The model has been designed to maintain computational efficiency, as postulated for the Light Use Efficiency (LUE) Models (Monteith, 1977), coupled to the accuracy of the Process-Based Models (PBMs) (Makela et al., 2000).As described by Wang et al. (2014a, b), a model with both high accuracy and computation efficiency is highly desirable for the purpose of simulating long time series of GPP at high spatial resolution.
Thanks to FLUXNET, a global network of flux tower sites, half hourly net CO 2 , water and energy eddy covariance (EC) flux measurements (Baldocchi, 2003) are now available for a wide range of forest ecosystems.The network provides a continuously increasing set of annual series of half-hourly data (Balzarolo et al., 2014).These data provide valuable information to investigate seasonal phasing and amplitudes of carbon fluxes (Aubinet et al., 2000;Falge et al., 2002;Gielen et al., 2013;Slevin et al., 2015) and to test terrestrial models at the ecosystem scale (e.g.Richardson et al., 2010;Blyth et al., 2011;Chang et al., 2013;Wißkirchen et al., 2013;Bagnara et al., 2014;Balzarolo et al., 2014;Liu et al., 2014;Wang et al., 2014a;Wu et al., 2015).In the present paper daily meteorological and GPP data are provided by FLUXNET.GPP data are exploited as an independent data set to compare, over different timescales, 3D-CMCC FEM simulations for 10 European forest stands varying in species composition, forest structure, cohorts and climates.
The objective of this work is to answer the following questions: 1. Does the model reproduce the magnitude and the timing of GPP across different forest types and forest canopy structures?
2. Does the model reproduce the observed inter-annual GPP variability?
3. Is the model generic enough so that a single set of species-specific parametrizations (i.e.without a siterelated calibration) allows reproducing GPP behaviour across different biomes?
4. Do the model outputs improve when considering a complex heterogeneous three-dimensional canopy structure compared to a simple "big leaf" model canopy representation?
To investigate these issues, we introduced a 3D canopy representation into the 3D-CMCC FEM, while however maintaining its flexibility and the generic features to be applied to different forest ecosystems.The new model can now run on a daily time step and includes as main changes an improved allocation-phenology scheme (with new carbon pools including the non-structural carbon pool, NSC), an implemented water cycle (including snow processes) and the computation of autotrophic respiration.

Model description
The 3D-CMCC FEM (Collalti, 2011;Collalti et al., 2014) (source code and executables are available upon request to the corresponding authors and downloadable at http://dev.cmcc.it/git/3D-CMCC-FEM-git) is hybrid between an empirical and a process-based model relying on the concepts of the LUE approach at canopy level for carbon fixation.The 3D-CMCC FEM is designed to simulate forest ecosystems at flexible scale (from hectare to 1 km per 1 km) and on a daily time step.The model simulates tree growth as well as carbon and water fluxes, at species level, representing ecophysiological processes in heterogeneous forest ecosystems including complex canopy structures.The 3D-CMCC FEM v.5.1 uses daily meteorological data, site-specific data and ecophysiological data (e.g.maximum canopy conductance, specific leaf area; see Table S3 and Collalti et al., 2014) to simulate forest processes.The model code architecture allows aggregating trees into representative classes, each characterized with its variables (e.g.carbon pools, leaf area index, tree height) based on their ages, species-specific and structural traits.These variables are identified by the model through four indices: i.e. species (x index), diameter class (Diameter at Breast Height, DBH) (y index), height class (z index), and age cohort (k index); such indices represent the main state variables considered by the model in distinguishing ecosystems across sites.To deal with forest heterogeneity within and across different ecosystems, 3D-CMCC FEM v.x.x (all model versions follow the same architecture) uses a speciesspecific parametrization for each species simulated.Moreover, based on the assumption made by Magnani et al. (2007) that the above-ground net primary production decreases with the ageing of a forest, the model explicitly takes into account all ages within the stand, reproducing a year by year reduction due to senescence (Landsberg and Waring, 1997;Waring and McDowel, 2002).Height classes and the tree position within the forest vertical profile are explicitly treated by the model to estimate the light availability (version 5.1 includes also the albedo effects) using the Monsi-Saeki formulation of exponential attenuation coupled with the "Bigleaf" approach developed for a multi-layered model without considering canopy depth (Collalti et al., 2014;Medlyn et al., 2003).DBH together with stand density control grid-cell horizontal canopy coverage (and gaps) through the computation of the single tree crown coverage and then upscale to grid-cell level (Collalti et al., 2014).In this way, the model is able to reproduce different combinations of uneven-aged, multi-layered and multi-species forests, by optional simulation of e.g.light competition, age-related decline and different species-specific traits.This aspect makes the model flexible to be theoretically used for a wide range of applications in forests and allows quantifying the effects of a particular simulation of forest structure on model performance.

Model implementations
In this study, the 3D-CMCC FEM described in Collalti et al. (2014) has been advanced to version 5.1 to improve the representation of forest processes, like phenology, canopy photosynthesis -including autotrophic respiration -tree carbon allocation and water dynamics.The improved phenology routine is based on a new C allocation scheme, that includes new carbon pools among which the non-structural-carbon (NSC) pool, related to five phenological transitions for deciduous species and three phenological transitions for evergreen species, both updated once per day.Autotrophic respiration is explicitly simulated and separated into maintenance and growth respiration.Maintenance respiration is the func-tion of the nitrogen content (a new added pool) in the living pools, while growth respiration is computed proportionally to the carbon allocated to the different tree compartments.

Photosynthesis and net primary production
As in Collalti et al. (2014) the carbon flux is still estimated in 3D-CMCC FEM through the Light Use Efficiency approach multiplying, for a particular species x, the absorbed photosynthetic active radiation (APAR, i.e. the radiation intercepted by the canopy) with the leaf area index (LAI, m 2 m −2 ) with either the prognostic potential radiation use efficiency (ε x , grams of dry matter MJ −1 ) or the maximum canopy quantum use efficiency (α x , µmol CO 2 µmol −1 PAR) (for a full list of model parameters see Table S3).Parameters ε x or α x are controlled by the product of several environmental factors (modifiers) indicated as mod x,k (dimensionless values varying between 0 and 1 and differing for each species x and age class k) depending on: vapour pressure deficit, daily maximum and minimum air temperatures, soil water content and site nutrient status (for a full modifiers description see Landsberg and Waring, 1997).Gross Primary Production (GPP; gC m −2 day −1 ) is thus calculated using the following equation: where APAR z is the absorbed radiation by the trees at the zth layer (where z represents the layer of representative height for each height class).
Conversely from the previous version where Autotrophic Respiration (AR) was set as a constant fraction of GPP (Waring and Landsberg, 1998), in this version AR is explicitly simulated.AR is treated by distinguishing into Maintenance Respiration (MR), governed by a Q 10 type response function (Ryan, 1991;Bond-Lamberty et al., 2005) and Growth Respiration (GR) assumed to be a constant proportion (30 %) of all new tissues produced (Larcher, 2003).Net Primary Production (NPP), is then calculated as follows: (2) NPP is then partitioned into biomass compartments and litter production following dynamic allocation patterns that reflect environmental constraints (i.e.light and water competition) and age.

Daily meteorological forcing and snow dynamics
The model implements a daily time step (previous version was at monthly time step) thanks to the temporal frequency of meteorological forcing input data: daily maximum (T max ; • C) and minimum air temperature (T min ; • C), soil temperature (T soil ; • C), vapour pressure deficit (hPa), global solar radiation (MJ m −2 day −1 ) and precipitation amount (mm day −1 ).In addition, the model uses the day-time (T day ; • C) and night-time (T night ; • C) average temperature computed as follows (Running and Coughlan, 1988): where T avg is the daily average air temperature ( • C).When the soil temperature is missing among in situ observed data, the model estimates it for the upper 10 cm of the soil layer through an 11-day running weighted average of T avg and further corrected by the presence of a snowpack as in Thornton (2010), Kimball et al. (1997) and Zeng et al. (1993).The variable related to the snowpack thickness was included as a water cycle component by reproducing the daily amount (mm day −1 ) of snow melt driven by average air temperature (T avg ) and incident net global radiation (Rad soil ; W m −2 ), while snow sublimation is only driven by T avg .
In the case of snow presence, if T avg is higher than 0 • C, considering the melting point as in Running and Coughlan (1988) and Marks et al. (1992), the rate of daily snowmelt is estimated by where t coeff is the snowmelt coefficient (0.65 Kg m −2 • C −1 day −1 ), ε snow is the absorptivity of snow (0.6), H fus is the latent heat of fusion (335 kJ kg −1 ) and Rad soil is the incident net global radiation at the soil surface (kJ m −2 day −1 ).
Otherwise, if T avg is lower than 0 • C , snow sublimation is computed by where H sub is the latent heat of sublimation (2845 kJ kg −1 ).

Phenology and carbon allocation
Phenology plays a fundamental role in regulating photosynthesis and other ecosystem processes (e.g.carbon and nitrogen dynamics), as well as inter-individual and inter-species competitive relations and feedbacks to the climate system (Richardson et al., 2012a).In the updated model version phenology and carbon allocation depend on six different carbon and nitrogen pools (while three carbon pools where considered in the previous versions).Five pools represent the main tree organs: foliage, (fine and coarse) roots, stem, branch and bark fraction.One new pool corresponds to NSC (starch and sugar) stored in the whole tree.The inclusion of this new pool was necessary to represent NSC mobilization and consequently leaf phenology (e.g.leaf production during spring for deciduous trees) and carbon allocation.Woody pools are furthermore distinguished between live and dead wood.In the new version of 3D-CMCC FEM, LAI values are predicted for sun and shaded leaves (De Pury and Farquhar, 1997;Thornton and Zimmermann, 2007;Wu et al., 2015), minimizing the effects of the "Big-leaf" approach (Monteith, 1965;Sellers et al., 1997), as a function of the amount of carbon allocated to the leaf pool.It is noteworthy that each pool and each structural state variable is daily updated according to the meteorological data, forest structure and simulated fluxes.Following Arora and Boer (2005), for deciduous species the model considers five phenological transitions (being just four in the previous versions: bud burst, peak LAI, leaf fall period and dormancy) that drive the seasonal progression of vegetation through phases of dormancy/quiescence, budburst, maximum growth, active growth and senescence as in the following: 1. Leaf onset starts from quiescence when thermic sum (the sum of the T day air temperatures exceeding the threshold T base value of 5 • C) exceeds a species-and site-specific temperature threshold value (Rötzer et al., 2004;Dufrene et al., 2005) and when the LAI value reaches LAI = max(LAI) × 0.5.The costs of expanding buds during this period of high carbon demand are supported by NSC (Landhausser, 2010;Dickmann and Kozlowski, 1970).
2. During the budburst phase, carbon and NSC are allocated to the foliage pool, as long as the balance between GPP and AR is positive (Barbaroux and Bréda, 2002;Campioli et al., 2013;Scartazza et al., 2013).
3. During the succeeding maximum growth phase and lasting up to peak LAI, carbon is allocated into foliage and fine-root pools (Sabatè et al., 2002), based on the pipe model theory (Shinozaki et al., 1964a, b), to optimize photosynthesis; otherwise, no growth occurs and NSC is used.
4. Successively, the full growing phase lasts up to the day when day length (in hours) is shorter than a speciesspecific threshold value.In this phase carbon is allocated into stem, fine and coarse roots, branch and bark, and into NSC pools in order to refill the reserves for the next years.
5. Finally, during the leaf fall (i.e.yellowing or senescence) phase, lasting until the leaf fall (assumed linear) is complete, the total positive carbon balance is allocated to the NSC pool.
For evergreen species the model follows a similar but simplified approach simulating a first maximum growth phase, when the model allocates NSC to foliage and fine roots up to reach peak LAI, and a second full growing phase, when the model allocates to the other pools.As in Lawrence et al. (2011), for litterfall we assume and simplify that there are no distinct periods, but rather a continuous shedding of foliage and fine roots of the previous years.
All tree pools are updated at a daily time step depending on NPP balance.Nitrogen concentration for each pool is considered as a C / N ratio following Dufrene et al. (2005) and Thornton (2010).The C / N stoichiometry is constant and depends on species; unfortunately, the model still lacks an interactive C-N cycle.Forest stand structural attributes, e.g.DBH, tree height and crown competition are also updated at a daily time step based on species-specific biometric relationships.

Autotrophic respiration
Based on the approach of BIOME-BGC model (Thornton, 2010) 3D-CMCC FEM computes the daily AR of all living tissues.MR is a modified Van't Hoff function (Davidson et al., 2006;Mahecha et al., 2010) of temperature with the temperature sensitivity parameter Q 10 (see below) and a linear function of the nitrogen content (N content = 0.218 kgC kgN −1 day −1 ; Ryan, 1991) in the living compartments.The Q 10 function is an exponential function for which a 10 • C increase in temperature relates to a Q 10 factor change in the rate of respiration.MR is partitioned into day time and night time respiration using, in place of temp in Eq. ( 7): t day and t night for foliage, t soil for fine and live coarse roots, and t avg for live stem and branch: GR x,y,z,k is considered as a fixed ratio (30 %) of all newly grown (i.e.living) tissues as proposed by Larcher (2003).

Data description
Model validation has been performed for 10 different forest sites (Table 1) included in the European EC fluxes database cluster (URL: http://www.europe-fluxdata.eu).For each site, 3D-CMCC FEM v.5.1 simulations were performed averagely for 10 years, forced with gap-filled daily meteorological data, according to the available time series.The selected sites cover a wide range of European forest ecosystems across different latitudes, landscapes and three climatic zones: temperate, Mediterranean and subalpine.For all sites, daily time series of meteorological variables (maximum and minimum air temperature, precipitation, vapour pressure deficit and incoming solar global radiation) were used as drivers, while GPP was used for model output validation.The GPP derives from Net Ecosystem Exchange (NEE) measurements that have been previously quality checked and processed including storage correction, spike detection and low turbulence condition (u * ), filtering according to the method in Papale et al. (2006) and gap filled using the Marginal Distribution Sampling method (MDS; Reichstein et al., 2005).The GPP is not directly measured by the eddy covariance technique but it is estimated using a par-  In the rest of the paper, we will refer to these data as "measured" or "observed" GPP for simplicity but it is important to highlight that they are obtained using a modelling approach (although strongly based on direct measurements).

Model and experimental set-up
Site data needed for model initialization concerned information on forest structure (DBH, tree height, age and density), its species composition and soil characteristics (e.g.soil depth, texture and bulk density).These data were used for each site to initialize the model, i.e. to describe soil characteristics and the initial forest conditions at which the model starts to simulate forest processes.Initialization data were taken from the BADM (Biological, Ancillary, Disturbance, Metadata) files, available at http://www.europe-fluxdata.eu,for each of the selected sites, and complemented by a literature review and personal contacts with the sites' Principal Investigators.Length of model simulations, basic sites description and forest attributes used for model initialization are shown in Table 1.As a whole, for all sites, the speciesspecific ecophysiology has been parametrized generically (i.e.not related to the simulated site) using only data from the literature (e.g.Breuer et al., 2003;Mollicone et al., 2003;Pietsch et al., 2005;White et al., 2000) independently from site-related measurements (for a list of model ecophysiological and structural species-related parameters see Table S3).As in White et al. (2000), Reich et al. (2007) and in Naudts et al. (2015) in the case of multiple values available for a single parameter, the mean values were used.Using the mean parameter estimates avoided hidden model-tuning (i.e. the use of unrealistic value to obtain the best fit) and largely reduces the likelihood that simulation results are biased by hidden calibration.
In addition, several studies (Bolstad et al., 1999;Griffin et al., 2001;Ibrom et al., 2006;Misson et al., 2007;Cescatti et al., 2012;Guidolotti et al., 2013;Migliavacca et al., 2015) claim that beside environmental variables, spatial heterogeneity (horizontal and vertical) of the stand structure and composition (age, species) also plays an important role at the ecosystem level.To evaluate if a more detailed simulation of forest heterogeneity improves model performances, a number of replicated simulations were performed for three heterogeneous sites (BE-Bra, IT-Ren and DE-Tha), based on different model initializations in terms of forest layers, species composition and/or ages (Table 1).These replicates start from a forest representation very close to reality (e.g.cohorts, mixed species composition and different canopy layers) to a more generalized one.For reasons of comparability, in these test sites the model has been forced with the same meteorological input data, and eco-physiological species-related parametrizations, i.e.only model initializations data, related to stand attributes, differ.These data are based on different sources: site measurements and/or literature data and/or experimental settings.
In the case of BE-Bra we initialized the model with nearly all possible combinations of initialization data sets.The first simulation (BE-Bra P_Q-3L) has explicitly taken into account the site heterogeneity (vertical and horizontal) (following Gielen et al., 2013, and ancillary data sources) consisting in mixed species composition at a different canopy coverage rate of Quercus robur (Q) and Pinus sylvestris (P) (20 and 80 %, respectively), with two cohorts (oaks and pines, 65 and 72 years old, respectively) and three forest layers.In the second simulation (BE-Bra P), only a single-layer of Scots pines was considered (following Janssens et al., 2002 andVerbeeck et al., 2007).In the third, fourth and fifth simulations (BE-Bra Q_3L, BE-Bra Q_2L, BE-Bra Q_1L, respectively) only three, two and one layer of pedunculate oaks (following Curiel Yuste et al., 2005 and experimental set-up) were assumed.Additionally, two more experimental set-ups combined two layers of oak and one layer of pine (BE-Bra P_Q-2L) and one layer of oak and pine (BE-Bra P_Q-1L).
For IT-Ren, in the first simulation, two layers and two cohorts were considered (IT-Ren 2L_2C) following Montagnani et al. (2009).In the second case, stand heterogeneity has been grouped into one layer, i.e. minimizing forest structure, and one single averaged cohort (IT-Ren 1L_1C; experimental set up).
For DE-Tha, two species (DE-Tha 2S) (spruce 80 % and pine 20 %, respectively) were modelled in the first simulation (following Grünwald and Bernhofer, 2007), while in the second experiment only the dominant species (spruce; DE-Tha 1S) was considered (BADM source).

Validation approach
In order to analyse model performance, we used daily (X daily ), monthly (X monthly ) and annual (X annual ) time series for modelled and observed GPP values, which were compared at the different timescales.At first, we conducted a comparison via appropriate performance indices on longterm annual average (i.e. over the full series of all the available years), then we evaluated how the model performed in the different seasons aggregating values for months of the same season.
In addition, to avoid misleading results in the daily and monthly signal comparisons due to the strong seasonality for both daily and monthly signals, we followed the decomposition technique proposed by Zhao et al. (2012).To partially remove the seasonal cycle signal, we build a new daily (Y daily ) and a new monthly (Y monthly ) data set for both observed and modelled data, respectively.The Y daily is created by subtracting the daily time series from the daily mean of the month, and the Y monthly by substracting the monthly time series from the annual mean (see Table S1b in the Supplement).
For both X and Y data sets we firstly adopted the Pearson coefficient of correlation (r).Then, we calculated the where i represents the day (or month), and σ (GPP EC ) is the standard deviation of the full daily (or monthly) series of observed GPP consisting of N records.
In addition, model performances were measured for the same series through the "Model Efficiency" index (MEF) following Reichstein et al. (2002) and Migliavacca et al. (2015): In contrast to correlation coefficient r, the MEF index (Bowman and Azzalini, 1997) measures not only the correlation between modelled and observed data (in other words, how well they reproduce the phase of observations), but also their "coincidence", i.e. the deviation from the 1 : 1 line, and it is sensitive to systematic deviations between model and observations (Reichstein et al., 2002).
Another index used in model evaluation is the standardized Mean Absolute Bias (MABstd) (Li et al., 2010) instead of the classical bias index, to avoid compensations for errors of opposite signs, and standardized (as for NRMSE) to allow comparison across sites: To evaluate the model performances in terms of variability patterns, we adopted a procedure to compare each GPP MD value to both its correspondent GPP EC value and the GPP EC -GPP MD difference, at daily and monthly levels.Since the different sites have different ranges of GPP, we arranged in ascending order GPP EC time series, then divided the whole series in 18 classes, each one containing values of a 5 percentile class.For each group of GPP EC we calculated the median and reported the range.We calculated the same statistics for the values of GPP MD arranged so that dates of GPP MD and GPP EC matched.We chose the median rather than the average because it is less influenced by outliers.We decided to use the range rather than the variance as a measure of variability, because giving information on asymmetry.
In order to assess the Inter-Monthly and Inter-Annual Variability (IMV and IAV, respectively), individual GPP values for each month and year considered were normalized following Vetter et al. (2008) and Keenan et al. (2012).In brief, we subtracted the respective observed or modelled average from individual (monthly and yearly) observed and modelled value as follows: where avg(GPP) is the long-term (full series of all the available years) average of monthly (for IMV) or yearly (for IAV) GPP from observations (EC) and modelling (MD), respectively.A kernel density estimation (kde) was performed to qualitatively observe probability distribution functions (PDFs) respectively of the IMV and IAV values (Bowman and Azzalini, 1997).
To evaluate 3D-CMCC FEM ability in reproducing the observed IMV and IAV, we calculated the NRMSE based on monthly and annual time series of IMV and IAV values, respectively.The NRMSE, adopted as a normalized index of error allowing comparability among different sites, was thus calculated as in Eq. ( 8) but using IMV and IAV instead of GPP individual values, following the approach of Keenan et al. (2012).

GPP evaluation over long-term annual and seasonal scale
Both monthly and daily simulated (MD) GPP show high correlations with EC data and these results are consistent with MEF values as well as with NRMSE and MABstd (Table S1a, and Fig. 1a and b).On average, deciduous forests reveal better correlation between MD and EC data than evergreen forests, with a mean r of 0.86, while evergreen and mixed stands show average r of 0.81 and 0.77, respectively.For all stations p < 0.0001.These results are confirmed by Taylor diagrams (Taylor, 2001) (Fig. 2a) which show that the model performs satisfactorily for daily fluxes, in four (i.e.DE-Hai, DK-Sor, DE-Tha, FI-Hyy) of 10 sites falling within ±0.5 normalized standard deviations from the reference point (REF; representing observed data) and having correlation around 0.9.For six sites (all the evergreen needleleaf plus deciduous except FR-Hes), the normalized standard deviation of simulated data is close to that of observed data (represented by reference line with normalized standard deviation, i.e. radial distance from the axis origin equal to 1).Simulated data for IT-Cpz, FR-Hes and FR-Pue have, respectively, a normalized standard deviation of approximately +0.2, +0.3 and +0.4 (as difference from that of observations) consistently with the lower correlation values; BE-Bra shows the highest negative difference, in terms of standard deviation, of around −0.3.On average, the least performing result is for IT-Cpz that shows a correlation below 0.60 and falls outside ±1 normalized standard deviation from the reference point.The Taylor diagram in Fig. 2b shows the model's capability to better simulate GPP at monthly scale.
For seven sites (all deciduous and evergreen needleleaf), the normalized standard deviations of modelled data are close to that of observations (reference line), and the correlation is well above 0.90 and within ±0.5 normalized standard deviation from the reference point.IT-Cpz and BE-Bra show improved results with respect to daily data: respectively, their correlation increases by more than 0.1 units, they fall within the +0.2 and −0.2 units of normalized standard deviation differences with respect to that of observations, and they enter in the field of ±1 and ±0.5 normalized standard deviation from the reference point, although for IT-Cpz the values for all statistical indices are consistently the lowest.Although less strongly, FR-Pue monthly data also have better perfor-mances than daily data results in terms of higher correlation (0.89) and closer position in terms of normalized standard deviations units from the reference point even if the other indices are a little bit far from the average values of the other sites.
To reduce the effects of seasonality, we also examine model performance using the decomposition method (Sect.2.5).In the daily time step, the overall model performance is much lower in Y data set (Fig. 1c and Table S1b) than in X data set, that is, r = 0.51, MEF = −0.43,NRMSE = 1.18 and MABstd = 0.8 in Y data set vs. r = 0.82, MEF = 0.57, NRMSE = 0.63 and MABstd = 0.44 in X data set.The large model errors at synoptic scale have been well recognized by previous studies (Dietze et al., 2011;Zhao et al., 2012).The model shows to be a good predictor for DE-Tha and FR-Pue and to be less predictive for DK-Sor and FR-Hes with respect to the X data set.Accordingly, for FR-Pue comparisons between X and Y data sets show that this site is less affected by seasonality while DK-Sor is the most affected one.As expected, in the monthly time step, the decomposition technique returns more similar results between X and Y data sets.Worst results are for IT-Cpz while best results are for DE-Hai, DK-Sor, DE-Tha and IT-Ren (see Table S1b and Fig. 1d).Overall, after smoothing the seasonality the model shows to be slightly better predictive with average values among sites consistent with observed data (r = 0.94, MEF = 0.85, NRMSE = 0.36 and MABstd = 0.27).Comparison between X and Y data sets shows that DE-Hai is less affected by seasonality and IT-Cpz is the most affected one.In brief, comparison between X and Y data sets shows similar reconstruction ability in the monthly time step, but very different in the daily time step because X data set contains the feature of large seasonality.Given that one of the objects of this study focuses on seasonality fluctuation, we mainly show the results based on X data set hereafter without specification.
To summarize, although with similar inter-site variability, monthly correlations across different sites are higher than daily ones, with average correlations of 0.94 for deciduous, 0.90 for evergreen and 0.92 for mixed stand (Fig. 1 and Table S1a).
Daily and monthly NRMSE are usually less than 1.00.Monthly NRMSE is less than daily NRMSE, 0.41 vs. 0.63 on average, respectively (Table S1a).These results confirm that the model performs better at a monthly than at a daily timescale (Fig. 1), likely because of averaging effects of daily variability in GPP estimation.
The same consistency is shown for MEF index that is on average 0.79 (monthly) and 0.57 (daily), with largely lower values for the two Mediterranean forests (IT-Cpz and FR-Pue) at both the daily and monthly timescale (Table S1a and Fig. 1).
Considering the annual mean in deciduous forests (Table S1a), the model slightly underestimates the GPP by and IT-Col it shows an overestimation of 5.2 % on average.Concerning evergreen forests, we find an overall model underestimation of 2.1 %, with higher variability compared to deciduous forests, and more divergent in the case of the two Mediterranean ecosystems, ranging from underestimation of 18.4 % (318 gC m −2 year −1 ; IT-Cpz) to overestimation of 12.1 % (158 gC m −2 year −1 ; FR-Pue).Results for the mixed forest site of BE-Bra are reasonable, with an underestimation of about 4.4 %.
In terms of inter-annual variability of the yearly mean, GPP MD falls well within the range of GPP EC standard deviations for all sites except at IT-Cpz (Fig. 3).Deciduous broadleaved and the evergreen needleleaf are the best reproduced.
Performance indices from daily and monthly observed and modelled GPP series analysed at seasonal level are shown in Table S2 and Figs. 4 and 5. Winter (DJF) and summer (JJA) correlations were generally lower than those in autumn (SON) and spring (MAM).Specifically, DJF and JJA showed a correlation of 0.45 and 0.46, respectively, on a daily scale and a value of 0.59 and 0.50 on a monthly scale; MAM and SON showed on a daily scale an average correlation of 0.72 and 0.77, respectively, while on monthly scale a correlation of 0.82 and 0.86 with two low values of 0.05 and 0.06 for monthly DJF and MAM for IT-Cpz was shown.Winter and summer monthly average NRMSE of 1.13 and 1.00, respectively, were not significantly different to the 0.66 and 0.57 of spring and fall.MEF and MABstd indices values suggest similar findings than NRMSE.
Figure 6 shows overall modelled vs. observed fluxes over daily and monthly scales, and the absolute difference ( GPP MD, i.e.GPP MD minus GPP EC ) vs. observed fluxes (GPP EC ) as calculated by the difference matrix described in Sect.2.5.Overall, the aggregated data reveal high correlation also due to a progressively reduced range of data, and then variability, at higher GPP values (Fig. 6; top plots).Figure 6 (bottom plots) show patterns of GPP MD with increasing GPP EC .These differences result in strong reduction of discrepancies for GPP EC greater than 8.5 gC m −2 day −1 for daily, or 7.3 gC m −2 day −1 for monthly time series.
The average intra-annual GPP variations are analysed by calculating the long-term average and standard deviation values for each month of the year (Fig. 7).In spring, the modelling results from deciduous forests present a larger variability than the observed data, especially during budburst and in late spring.The model generally matches the observed phenology timing (budburst, peak LAI, leaf senescence and their fall, i.e. length of growing season, data not shown).Consistent biases were observed in late summer.

Inter-monthly and inter-annual variability
The distribution of the IMV for the analysed sites reveals in general lower variance for modelled than observed data (Fig. 8 and Table 2).Regarding deciduous  2).Concerning broadleaved evergreen vegetation, we observe very good agreement between observed and modelled IMV central tendency measures in FR-Pue with most of the frequencies between ±2 gC m −2 day −1 .In FR-Pue, however, we notice that the distributions are slightly shifted, especially around the median, with resulted variance from modelled data in disagreement with observed data.We detect high IMV distribu-Table 2. IMV and IAV NRMSE for the analysed sites.Each specific IMV distribution was tested for normality goodness of fit (N = normal distribution, P = non-normal distribution).A test for equivalence of central tendency was performed between IMV MD and IMV EC values.(na) refers to the case of sites with inconsistent distributions (one normally, one not normally distributed).The (*) marks refer to the acceptance of the null hypothesis that the two distributions are equivalent for the specific statistic (α = 0.05).ECT stands for "Equivalence for Central Tendency"; EV for "Equivalence for Variance".tion disagreement in IT-Cpz, where the PDF from observed IMV is normally distributed and the one from modelled IMV is not (as resulted by a χ 2 goodness of fit test).IMV MD series in BE-Bra (mixed forest) are in low agreement with those from EC. Modelled variance is low, and positive IMV values are especially scarcely represented.Table 2 also shows the NRMSE for IAV and IMV series.There is no apparent correlation either between sites species and average error, or between distributions uniformity and NRMSE.In fact, the lowest NRMSE for IMV was found in BE-Bra and IT-Col and the highest in DE-Hai and DK-Sor.On average the model has a NRMSE for IMVs of about 1.2. Figure 9 shows the modelled and measured individual IAV values for each studied site.The magnitude of IAV MD was on average of the same order as IAV EC , showing the model's ability to reproduce the inter-annual variability range, and capturing about 62 % of the anomalies' signs (i.e.timing) for the total set of years.The model generally better captured conifers' IAV sign (i.e.DE-Tha, FI-Hyy, and IT-Ren), with 66 % of the time against about 59 % for the deciduous forests (i.e.DE-Hai, DK-Sor, FR-Hes, IT-Col) and 55 % for the Mediterranean ones (i.e.FR-Pue and IT-Cpz).However, the IAV difference in magnitude was better represented for deciduous forests than for conifers, as inferred by the average NRMSE of, respectively, 1.45 and 1.67 (calculated by averaging values reported in Table 2).Although the model reproduced the timing of anomalies satisfactory in more than half of cases (a little bit more than in a random selection), the correlations had a wide spread across sites.Quantitatively, modelled anomalies suggest better results for FR-Pue (r = 0.76) and worse results for IT-Ren (r = −0.54).
In the case of the year 2003 with its summer heat and drought extreme (Ciais et al., 2005;Vetter et al., 2008), the anomaly sign has been well captured by the model for six of the eight sites analysed for that year (no observations were available for BE-Bra and IT-Ren) (Fig. 9).At IT-Cpz and DK-Sor, average IAV MD has the opposite sign to IAV EC , while 2003 was recognized as not remarkably anomalous at IT-Col.Similarly, the model results match with that found by Delpierre et al. (2009) about the anomalous carbon uptake during the warm spring of 2007 compared with the decadal mean for FR-Pue, FR-Hes and DE-Tha.

Comparison within different forest structure simulations
Considering the presence of only one species (either pines or oaks) strongly limits the model to simulate the daily and monthly GPP patterns in BE-Bra (Table 3).This site represents a mixed stand of deciduous and evergreen tree species that assimilates CO 2 all year round, although low temperatures in winter and spring reduce photosynthesis for pines also.The observed GPP fluxes are then caused by the "mixture", at a varying degree, of both oak and pine trees.Considering BE-Bra as a pure oak forest with a variable number of layers (simulation codes: BE-Bra Q_3L, BE-Bra Q_2L, BE-Bra Q_1L) the model results for annual GPP deviate from −1.2 up to −7.4 %; considering a pure pine forest (BE-Bra P) or a combination of pines and one layer of oak (BE-Bra  ever, for this site, the analysis of performance indices based on daily and monthly series shows no evidence of improved model results -slightly better "r" values for 1L_1C in respect to 2L_2C are counterbalanced by the other indices (i.e.NRMSE, MEF) where the 2L_2C representation (for both daily and monthly time step, respectively) shows to slightly improve model results; no difference is found for MABstd.

Discussion
In this paper, we have analysed the capability of the latest version of the 3D-CMCC FEM (v.5.1) to simulate intraannual to inter-annual GPP variability over 10 heterogeneous European forest sites, representative of different ecosystems and bioclimatic regions, by comparing model results with ob- servations based on the EC technique.Although the model provides a reasonable reproduction of the observed values, we may evince some critical issues.First, the observed GPP data are affected by high uncertainties (Kenan et al., 2002;Papale et al., 2006;Richardson et al., 2012a, b).According to Luyssaert et al. (2007) these uncertainties in the 10 case studies considered here, although at the biome level, have a very high spread, varying from ±557.9 (for FI-Hyy) to ±700 gC m −2 year −1 (for IT-Cpz).Besides uncertainty in the EC technique, model assumptions and parametrizations can increase discrepancies compared to observed GPP data.
A potential further source of error in the model runs that may need to be considered or accounted for is related to our choice of not making a site-specific parametrization.Since we used general parametrizations, large uncertainties could be detected especially in the variables that determine, www.geosci-model-dev.net/9/479/2016/Geosci.Model Dev., 9, 479-504, 2016 for example, the length of the growing season (Richardson et al., 2010), and the latitudinal differences (acclimation) of the maximum, minimum and optimum temperatures for photosynthesis.Improvement could be achieved with a sitespecific parametrization, but this falls beyond our goal to make the model generally applicable.In addition, to avoid a misleading model evaluation coming from strong seasonality (especially for deciduous sites) we followed the decomposition technique proposed by Zhao et al. (2012).On average, 10 years of simulations have been conducted for each site.In addition, in three sites different model initializations (i.e.considering different forest structure, composition and cohorts) were used to quantify improvements in model results when a more detailed heterogeneity forest structure representation and processes are simulated.Modelled GPP results were compared against those from EC observations collected for these sites encompassing three mono-specific (pure) stands of beech, holm oak and Scots pine, and three uneven-aged, multi-layered and mixed stands.
Based on results, we can now provide answers to the four initial questions: 1. Does the model reproduce the magnitude and timing of GPP across different forest types, structures and compositions?
Overall, as desirable, the model is skilful in reproducing the annual GPP and its intra-annual (seasonal) cycle, calculated as both daily and monthly value averages, with the monthly scale performing better across all statistical indices considered.These results can be however considered as a "false positive" due to the strong seasonality of GPP patterns that influences and causes higher values of correlation than the model's capabilities to reproduce GPP fluxes (Zhao et al., 2012).This is clearly related to the tendency to linearize the relationship among CO 2 flux and PAR and/or temperature, as also reported by Ruimy et al. (1995) and Wu et al. (2015).
Overall, statistical indices of daily and monthly modelled values for both X and Y data sets were highly consistent with EC data, except for the Mediterranean sites (where seasonality is less pronounced) (see Table S1a and b).Summer drought stress appeared to be the most limiting factor on photosynthesis at FR-Pue (Falge et al., 2002;Reichstein et al., 2002;Sabatè et al., 2002) while the presence of shallow groundwater table at IT-Cpz seemed to reduce the severity of summer drought.This reduction cause a smoothing of seasonality well highlighted in the Y data set (see Table S1b) where IT-Cpz showed to be unanimously one of the worst simulated site at both daily and monthly timescale while FR-Pue and DE-Tha, both evergreens, were less affected by seasonal patterns.This behaviour is confirmed by the daily values of DK-Sor and IT-Col for monthly data, both deciduous, that showed to be the most affected -in other words, if we smooth over the seasonal trends results get worse while the model indicated to be less sensitive for those evergreen sites where seasonality is not marked with high values of correla-tion for DE-Tha, FI-Hyy and Fr-Pue.These results confirm that seasonality has a remarkable effect on a model evaluation.
Concerning the seasonality, all statistical indices divided by seasons in Table S2 are consistent in showing a nonnegligible uncertainties in representing GPP patterns.The overall agreement despite temporal mismatches suggested that errors compensated over the year, but are cumulated in specific time windows (e.g.seasons).As reported for other models (Morales et al., 2005 andNaudts et al., 2014), the model's performances are generally worse in winter (DJF) and summer (JJA).Biases and differences in winter GPP variance may be related, among other things, to the model algorithms used to simulate LAI and to the algorithm used to calculate GPP from EC data (Reichstein et al., 2005), since GPP variability should be low during DJF or absent for deciduous forests.However, mismatches are also related to the way in which 3D-CMCC FEM represents winter and early spring ecosystem processes.The model in fact does not consider the influence of ground vegetation that appears to be not negligible in some cases (Kolari et al., 2006).High GPP variance for evergreen species could be strongly related to low temperatures during winter (Delpierre et al., 2009).Systematic overestimation in winter and spring GPP could then be associated with a lack in representing conifer acclimation, or to soil and atmosphere thermal constraints.At high latitudes and altitudes, another source of uncertainty may be related to freezing and thawing dynamics in soil water (Beer et al., 2007) which are not considered by the model, as with snow sublimation and melting, which are still simplistically represented.
GPP of deciduous forests in summer and autumn are also affected by uncertainties for surface, which is represented by LAI in the model.In addition, GPP is linear with respect to PAR (Monteith, 1977) over monthly or annual timescales, while the relation is strongly nonlinear at the daily scale (Leuning et al., 1995;Gu et al., 2002;Turner et al., 2003;Wu et al., 2015).The linear response of GPP to PAR led to the underestimation/overestimation of GPP under conditions of low/high incident PAR (Propastin et al., 2012;He et al., 2013).In the case of stress or photoinhibition, leaves reduce or stop photosynthesis at too high levels of radiation, while in normal conditions, photosynthesis is light-saturated at high PAR (Mäkelä et al., 2008) which lets canopy photosynthesis saturate at relatively low PAR even in dense tropical forests with high LAI (Ibrom et al., 2008).The model overestimation of summer GPP may thus be partially related to the lack of representation of the canopy photosynthesis saturation processes.
Although adopting a more complex phenology scheme in the comparison between decidous and evergreen forests, our model showed better performances for deciduous compared to evergreen forests.This behaviour is due to the strong seasonality patterns that the deciduous species show, which is consistent with the findings of Zhao et al. (2012) at the two French sites, but contrasts to the results of Morales et al. (2005) who showed that it is generally easier for models to simulate evergreen forests due to the simpler phenology.The present results for evergreen forests are, however, highly affected by the low model performances for the two evergreen Mediterranean forests.As previously stated, overestimation during summer at FR-Pue, and during winter and spring for IT-Cpz, are mostly related to neglecting speciesspecific drought stress response functions.As in Landsberg and Waring (1997), the water modifier is only based on soil physical characteristics and no consideration is given to the stress tolerance or strategy of the species (Larcher, 2003), suggesting that further model developments should focus on this aspect.
Other discrepancies affecting other sites could probably be reduced with a site-specific parametrization.
2. Does the model reproduce the observed inter-monthly and inter-annual GPP variability?
Overall, the distribution of the modelled inter-monthly variability was sufficiently consistent with the observed one.The model, however, showed reduced variability in the distribution for both conifers and deciduous species.The model's ability in better representing higher rather than lower anomalies suggests that it may still be less sensitive to some drivers of variability.In this context, the phenological cycle may have an important role, since it influences canopy cover and is controlled by environmental drivers (Richardson et al., 2010).According to Suni et al. (2003) and Jeong et al. (2013), spring phenology largely affects the summertime carbon budget.Hence, uncertainties in the growing season start date may affect 3D-CMCC FEM's ability to reproduce IMV.In summer and autumn, petioles loss of turgor, cavitation in xylem vessels and leaf yellowing may have an important role in the GPP variability of temperate forests (Reichstein et al., 2007).Even though evergreen forests do not experience complete dormancy in winter, changes in "greenness" can be attributed to seasonal variation in canopy biochemistry, the production of new foliage by canopy species and, particularly where the overstory is sparse, the phenology of understory vegetation (Richardson et al., 2010).Leaves of different ages have different efficiency, sensitivity to solar radiation, temperature and water-related stresses (Chabot and Hicks, 1982).All these elements may have an important role in affecting GPP dynamics, but are still scarcely or not represented by mechanistic ecosystem or forest models.As a confirmation of these suspects, slight modifications in representing phenology and leaf turnover resulted in general improvement of model consistency with EC data (Marconi, 2014).
Distribution of IMV values showed specific patterns attributable to the dominant species.Beech forests IMV PDFs were concentrated around the average value and strongly influenced by high biases.This pattern was probably due to the fact that half of the months in one year have no or little photosynthesis (i.e.early spring, fall and winter) and most of the photosynthetic activity occurs in late spring and summer, when carbon assimilation is influenced by temperatures and solar radiation (Mercado et al., 2009).Conifer PDFs were usually smoother, non-skewed, with reduced variability and fitted by a statistical normal curve.The model showed an average NRMSE for IMV of 1.22 but still captured about twothirds of the annual anomalies' sign (a little bit more than the 50 % that represents a simple causality).
The results for IAV (see Fig. 9) are quite contrasting and largely depend on the site and the number of annual-byannual comparisons.The recent modelling studies, as far as we are aware, show unanimously the difficulties of models to explain the large inter-annual variability in cases where no obvious triggers like management or climatic extreme are at work (e.g.Keenan et al., 2012;Wu et al., 2013).In 3D-CMCC FEM better results have been obtained for FI-Hyy and FR-Pue, so there is no apparent correlation with latitude and forest species.Interestingly, the performance of a DGVM for IAV in FR-Pue is also higher than other sites (Zhao et al., 2012), indicating that the main determinant factor for GPP simulation in this Mediterranean site may not come from the treatment of canopy representation.However, the advantage of a 3-D canopy representation needs to be revalued in the future.Similarly, worse results are reported for IT-Ren, IT-Cpz and BE-Bra where the number of annual correlations are lower than the other sites.The magnitude of differences in the standard deviation generally follows the same tendency, particularly for BE-Bra, IT-Ren and IT-Cpz.These results confirm the model's limited ability to represent the inter-annual variability in these specific sites rather than in these ecosystems.
The comparison between modelled and observed data at the inter-annual timescale shows the model to be sufficiently able to reproduce the sign of variability through the years including the extreme events (heat wave combined to drought) during summer 2003 (Ciais et al., 2005;Vetter et al., 2008) and, for some sites, the anomalous carbon uptake during the warm spring of 2007 described by Delpierre et al. (2009).Potentially negative effects from the anomalous 2003 were modelled into negative GPP anomaly at DK-Sor and IT-Cpz due to model simulation of summer drought stress, while such anomalies are not evident from measurements for DK-Sor (Pilegaard et al., 2011) and IT-Cpz.This could be due to the more maritime climate for DK-Sor and the presence of shallow groundwater for IT-Cpz that weakened the effects in the first part of the summer.In both sites, and including DE-Tha, the effects during July to September were captured by the model (data not shown).As reported by Ciais et al. (2005), Mediterranean sites showed a smaller degree in carbon fluxes, largely dominated by less respiration.It is noteworthy that IT-Col, differently from other European beech stands, does not seems to have suffered from this anomalous heat wave in 2003(G. Matteucci, personal communication, 2015)).Both simulated and observed data showed a positive GPP anomaly, demonstrating that this beech forest benefited from moderate higher temperature values and consequently had "extra" days for assimilation and growth (see also Churkina et al., 2002;Richardson et al., 2010).A similar behaviour was reported also by Jolly et al. (2005) for the Swiss Alps, especially between March and July.This pattern seems to be mostly related to an untimely beginning of the growing season (see Piao et al., 2006), to a reduction in plant transpiration that causes an increase in plant water use efficiency through the partial closure of stomata (Warren et al., 2011) and to high fluxes related to forest floor vegetation.
It is also noticeable that in FR-Hes during the summer of 2004 a negative anomaly occurred, larger than in 2003; and while its sign was captured by the model, its magnitude was not.This can be explained by the modelled postponed effects of a low NSC allocation during the year 2003 to the subsequent periods (Granier et al., 2007;Gough et al., 2009).These results highlight that the model has a sort of "memory" linked to short-term events (e.g.drought stress) and that these events affect the long-term processes.
Quantitatively, modelled inter-annual anomalies show a very large spread across the sites.Correlations vary widely, without any apparent relation with latitude and/or species.If modelled anomaly signs are potentially agreeing with the observed ones most of the time, their magnitude was not.This behaviour seems to be related to several aspects, mainly to an over/under-estimation of the causes that reproduce anomalies, e.g.processes simulated linked to the type of climate anomaly, mismatches in phenology or to a missed representation of other processes (e.g.mast years, disturbances, shallow water).Keenan et al. (2012) assert that a lack in phenological variability and in canopy and soil dynamics are the main culprits of these mismatches but also that flux measurements are affected by random errors especially when fluxes are higher.Poulter et al. (2009) found a similar magnitude of errors with models that were driven by remote-sensing data.Open questions remain as to the proportion of inter-annual variability in land-atmosphere carbon exchange that is directly explainable by variability in climate (Hui et al., 2003;Richardson et al., 2007).
3. Is the model generic enough that a single set of speciesspecific parametrization allows reproducing GPP behaviour across different ecosystems without further need of a site-related calibration?
Overall, the model showed good flexibility although the sites showed a pronounced spatial and temporal heterogeneity (i.e. a variable number of forest layers, different cohorts and species).The model was able to reliably represent the ecophysiology of beech and spruce species at different latitudes, without modifying or tuning the parametrization sets.However, annual and seasonal performance indices, calculated exploiting daily and monthly series, evidenced different performances between the two northern beech sites and the two southern ones.Tables S1 and S2 show a systematic difference in all the statistics used, suggesting the presence of a latitudinal gradient in 3D-CMCC FEM's ability to represent beech forest processes.This gradient could be explained by how the model represents the different limiting factors and their impacts on GPP.For example, we expect low temperatures to be the most important limiting factor at higher latitudes, compared to soil water availability at lower latitudes (Chapin et al., 2002).
We obtained similar results for the two spruce sites.The model showed better performance at higher latitudes.While phenotypic plasticity, and thus the parameter set, may influence the model results, it is noteworthy that the IT-Ren site has different topographic and climatic conditions.Lower average temperatures, higher slopes and non-negligible encroachment of different species in a more complex canopy may negatively influence the model performance in IT-Ren with respect to DE-Tha.Since the model showed unrealistic results for the two Mediterranean forests, it is not easy to determine if and how differences in performances are related to the generality of the model rather than to bad assumptions behind the simulated processes.From our findings, we conclude that for non-water-limited conditions it is possible to yield satisfying results with general parameter sets.
4. Do the model's results improve when considering a complex 3-D canopy structure?
We evaluated possible improvements that could be made if a more accurate model representation at a higher rate of heterogeneity of forest structure, differences in ages and species composition and their linked structural-ecophysiological processes, are assumed.These analyses helped us to understand the importance of each process within the represented combination (i.e.light competition, age-related decline and the specific differences in ecophysiology) on modelled GPP.Doubtless, a direct comparison between modelled and observed GPP data is not possible due to the lack of partitioned measurements of GPP across different layers, cohorts and species.However, in situations where the different ecophysiological behaviours express themselves in the species-specific canopy responses during certain periods of the seasonal cycle, the test of a mixed forest tree model with flux measurements is possible, as the results by Oltchev et al. (2002) showed using the model MixFor-SVAT.
This preliminary analysis can be considered as a sensitivity analysis in terms of processes explicitly simulated instead of lumped parametrization.As a whole, model results using different initialization data are within the observed GPP uncertainties but a quantitative assessment for two sites, BE-Bra and IT-Ren, showed the potential to increase the model's ability in simulating fluxes, while for DE-Tha there is no evidence that model performances could benefit of these efforts.For BE-Bra, taking into account two species (that differ especially for their phenological traits) was beneficial in terms of model performances; the same occurred for different lay-ers (with the exception of BE-Bra P_Q-3L vs. BE-Bra P_Q-2L whose results were similar) and different cohorts.Better performances, in terms of seasonal GPP representations, were obtained when each of the above-mentioned characteristics was accounted for by the model.For IT-Ren, similar results were obtained, although no differences were found in the simulation of phenological patterns in daily and monthly results.Differently, for DE-Tha a differentiation between the two evergreen coniferous species did not cause marked differences in model results, due to low differences in species ecophysiological traits, justifying in these cases the use of a Plant Functional Type (PFT) level of parametrization instead species level (Poulter et al., 2015).

Conclusions
This study aimed at evaluating the performances of the updated version of 3D-CMCC FEM compared to nearly 10 × 10 sites × years GPP data across eddy covariance European forest sites.Although the sites showed high spatial and temporal environmental heterogeneity the model appears able to reproduce trends in all of the 10 sites.Different performance indices showed that daily-and monthly-level model results match well, both for the annual and seasonal scale, against observed data, with some exceptions.Mediterranean sites (IT-Cpz and FR-Pue) showed to be the most problematic in reproducing carbon fluxes.This is likely due to their specific ecosystem peculiarity, e.g.shallow groundwater for IT-Cpz and, for both sites, a low pronounced seasonality.In these two sites, the model showed less generalization unless additional processes were included.Differently from other models, 3D-CMCC FEM, both for daily and monthly simulations and for both X and Y data sets, performs better for deciduous species rather than for evergreen, although deciduous species have a more complex phenology and a more pronounced seasonality.Some mismatches in the simulation over the seasons and over the sites still remain, especially during winter and summer.The first reason for these low agreements in winter can be also attributable to errors during the estimation of GPP from NEE and Ecosystem Respiration values from measurements data.The second can be related to the model's lack or simplicity in representation of snow pack dynamics as reported by Krishnan et al. (2008Krishnan et al. ( , 2009)), especially for evergreen sites (Keenan et al., 2012).Disagreements in summer could be related to model simplicity in simulating soil drought and, using the Monteith approach (Monteith, 1977), to the strong nonlinearity at the daily scale of GPP and PAR, and to the lack of representation of the light saturation processes.In addition, as reported by Keenan et al. (2012), the apparent high variability in the data during the summer season could therefore be due to random errors in the flux measurements, generating larger variability and then lower correlations against modelled data.No marked differences were found in simulations across different latitudes, so model parametrizations for the different tree species could be useful over Europe with quite a high rate of confidence, with the exception of specific cases in Mediterranean forests.
As for other models, 3D-CMCC FEM showed to have the potential to correctly reproduce the signs of inter-annual variability, like the 2003 heat wave and drought extreme and the anomalous carbon uptake during the warm spring of 2007 and their instantaneous biological response to these events.Significant disagreements were, however, found in reproducing the magnitude of these anomalies.
The consideration of stand heterogeneity, when possible or existing (i.e.layers, cohorts and mixed composition), led the model to improve its results in two of the three sites compared to generalized simulations of forest attributes.This plasticity makes the model able to be used in a wider range of forest ecosystems.
The Supplement related to this article is available online at doi:10.5194/gmd-9-479-2016-supplement.

Figure 1 .
Figure 1.3D-CMCC FEM performance indices at different timescales; daily (a) and daily aggregated to month (b) for X data set.(c) and (d) refer to Y daily and Y monthly data set following decomposition technique proposed in Zhao et al. (2012).DE-Tha refers to the 1S simulation, IT-Ren to the 2L_2C simulation, BE-Bra to the P_Q-3L simulation (see text).

Figure 2 .
Figure 2. Taylor diagrams for daily (a) and daily aggregated to month (b) GPP evaluated representing: the deviation of model results from observations in terms of normalized standard deviation of observations, represented by the distance from the site point to the point on the x axis identified as reference (REF); the difference of model normalized standard deviation from that of observations, represented by the distance of the site point with respect to the quarter arc crossing REF; and the correlation, given by the azimuthal position of the site point to the x axis.The sites are numbered in ascending order as follows: (1) DE-Hai, (2) DK-Sor, (3) FR-Hes, (4) IT-Col, (5) FR-Pue, (6) IT-Cpz, (7) DE-Tha, (8) FI-Hyy, (9) IT-Ren, (10) BE-Bra.Colours refer to different IGBP vegetation classes: DBF (yellow), EBF (orange), ENF (light-blue), MF (green).

Figure 3 .
Figure 3. Distributions of annual GPP (gC m −2 year −1 ).MD (red) are model results, EC (blue) measured by eddy covariance.The vertical bars represent ±1 standard deviation.DE-Tha refers to the 1S simulation, IT-Ren to the 2L_2C simulation, BE-Bra to the P_Q-3L simulation (see text).

Figure 4 .
Figure 4. 3D-CMCC FEM performance indices of daily GPP at different seasons.DE-Tha refers to the 1S simulation, IT-Ren to the 2L_ 2C simulation, BE-Bra to the P_Q-3L simulation (see text).

Figure 5 .
Figure 5. 3D-CMCC FEM performance indices of daily GPP aggregated to months at different seasons.DE-Tha refers to the 1S simulation, IT-Ren to the 2L_2C simulation, BE-Bra to the P_Q-3L simulation (see text).

Figure 6 .
Figure 6.Comparison between GPP MD and GPP EC data.The top plots show the average GPP EC : GPP MD correlation for (left) monthly (gC m −2 month −1 ) and (right) daily (gC m −2 day −1 ) data.The bottom plots show absolute difference range between GPP MD and GPP EC while increasing GPP EC values.Negative values are excluded because of model assumptions.DE-Tha refers to the 1S simulation, IT-Ren to the 2L_2C simulation, BE-Bra to the P_Q-3L simulation (see text).

Figure 7 .Figure 8 .
Figure 7. Seasonal (monthly) cycle of GPP across the 10 sites.The grey line and margins of the grey area represent long-term average of monthly GPP EC (gC m −2 month −1 ) and its ±1 standard deviation, respectively.The green and red dashed lines represent the long-term average of monthly GPP MD (gC m −2 month −1 ) and its ±1 standard deviation, respectively.DE-Tha refers to the 1S simulation, IT-Ren to the 2L_2C simulation, BE-Bra to the P_Q-3L simulation (see text).

Figure 9 .
Figure 9. Inter-Annual Variability (IAV) based on Keenan et al. (2012).Red and blue bars indicate the observed and modelled IAV values, respectively; r values refer to correlation between observed and modelled variations.DE-Tha refers to the 1S simulation, IT-Ren to the 2L_2C simulation, BE-Bra to the P_Q-3L simulation (see text).

Table 1 .
Main characteristics of the study sites.Sites were classified according to the IGBP (International Geosphere Biosphere Program) legend as in the FLUXNET database: MF = mixed forest; DBF = deciduous broadleaf forest; EBF = evergreen broadleaf forest; ENF = evergreen needle leaf forest.Year of simulation starting and ending depend on available time series of observed data.