Quantifying the prediction accuracy of a 1-D SVAT model at a range of ecosystems in the USA and Australia: evidence towards its use as a tool to study Earth's system interactions

. This paper describes the validation of the Sim-Sphere SVAT (Soil–Vegetation–Atmosphere Transfer) model conducted at a range of US and Australian ecosystem types. Speciﬁc focus was given to examining the models’ ability in predicting shortwave incoming solar radiation ( R g ), net radiation ( R net ), latent heat (LE), sensible heat ( H ), air temperature at 1.3 m ( T air 1.3 m ) and air temperature at 50 m ( T air 50 m ) . Model predictions were compared against corresponding in situ measurements acquired for a total of 72 selected days of the year 2011 obtained from eight sites belonging to the AmeriFlux (USA) and OzFlux (Australia) monitoring networks. Selected sites were representative of a variety of environmental, biome and climatic conditions, to allow for the inclusion of contrasting conditions in the model evaluation. Overall, results showed a good agreement between the model predictions and the in situ measurements, particularly so for the R g , R net , T air 1.3 m and T air 50 m parameters. The simulated R g parameter exhibited a root mean square deviation (RMSD) within 25 % of the observed ﬂuxes for 58 of the 72 selected days, whereas an RMSD within ∼ 24 % of the observed ﬂuxes was reported for the R net parameter for all days of study (RMSD = 58.69 W m − 2 ) . A systematic under-estimation of R g and R net (mean bias error (MBE) = − 19.48 and − 16.46 W m − 2 ) was also found. Simulations for the T air 1.3 m and T air 50 m showed good agreement with the in situ observations, exhibiting RMSDs of 3.23 and 3.77 ◦ C (within ∼ 15 and ∼ 18 % of the observed) for all days of analysis, respectively. Comparable, yet slightly less satisfactory simulation accuracies were exhibited for the H and LE parameters (RMSDs = 38.47 and 55.06 W m − 2 , ∼ 34 and ∼ 28 % of the observed). Highest simulation accuracies were obtained for the open woodland savannah and mulga woodland sites for most of the compared parameters. The Nash–Sutcliffe efﬁ-ciency index for all parameters ranges from 0.720 to 0.998, suggesting a very good model representation of the observations.


Introduction
The importance of studying land surface-atmosphere interactions to develop a better understanding of Earth's physical processes and feedbacks is evident from several investigations.Today, particularly so in the face of climate change, it has been recognised by the global scientific community as a topic requiring further attention and investigation (Battrick and Herland, 2006;Petropoulos et al., 2014).This is docu-mented by the fact that it is of crucial importance to help address directives such as the European Parliament Directive 2000/60/EC aimed at establishing a framework for community action in the field of water policy, namely the EU Water Framework Directive.On this basis, the need to develop a holistic understanding of how land surface parameters characterising the planet's energy and water budget in different ecosystems has never been more important (ESA, 2012).
Land surface parameterisation schemes (LSPs, also known as land surface models (LSMs)) are one of the preferred scientific tools to quantify at fine spatial and temporal resolutions Earth system interactions.LSPs simulate a number of parameters characterising land surface interactions within the lower atmospheric boundary from a predefined set of surface characteristics (i.e.properties of soil, vegetation and water).Often LSPs are utilised, amongst other schemes, to assess water resources, to evaluate the hydrological impacts of changes in climate and land use, and to model landatmosphere exchanges and emissions of aerosols (Prentice et al., 2015).Recent developments in mathematical modelling have been driven primarily by the progress in computer technology, the expansion of modelling into new fields and disciplines and the need for increased accuracy in model predictions (Bellocchi et al., 2010).As a result, LSPs have advanced considerably to include detailed parameterisations of momentum, energy, mass and biogeochemistry (Rosolem et al., 2013).
One group of LSPs include the Soil-Vegetation-Atmosphere Transfer (SVAT) models.These are mathematical representations of vertical "views" of the physical mechanisms controlling energy and mass transfers in the soil-vegetation-atmosphere continuum.These deterministic models are able to provide estimates of the time course of soil and vegetation state variables at time steps compatible with the dynamics of atmospheric processes.During the last number of decades SVAT models have evolved from simple energy balance parameterisations, e.g. from the bucket schemes adopted by Manabe (1969), through the schemes of Deardorff (1978), to the biosphere-atmosphere transfer scheme (BATS) of Dickinson and Henderson-Sellers (1986) and the simple biosphere (SiB) model of Sellers et al. (1986).At present, SVATs are able to describe the multifarious transfer processes through varying degrees of complexity, including the energy, water and carbon dioxide (CO 2 ) fluxes between the ground surface covered by different vegetation types and the atmosphere over different temporal and spatial scales (Olchev et al., 2008).These require an application context constrained by input variables (atmospheric forcing and vegetation) and input parameters (soil and vegetation properties, initialisation) to simulate the water and energy budget at the surface (Coudert et al., 2008;Ridler et al., 2012).
However, before applying a computer simulation model to perform any kind of analysis or operation, a variety of validatory tests need to be executed.The process of validating a mathematical model's performance, coherence and represen-tation of the natural environment is regarded as an essential step in its development.This allows for an evaluation of its ability to systematically reproduce the system being simulated (model reliability) and the level of accuracy in which the model reproduces the natural environment (model usefulness) (Huth and Holzworth, 2005;Wallach, 2006).Numerous model validation techniques exist; for a comprehensive overview read for example Bellocchi et al. (2010).The procedures to perform the task of validation appear in several forms, depending on data availability, system characteristics and researchers' opinions (Hsu et al., 1999).A common strategy is to examine the model's simulation versus actual observations acquired from the real world using common statistical metrics, and several validation studies of this type have been undertaken globally (Henderson-Sellers et al., 1995;Viterbo and Beljaars, 1995;Liang et al., 1998;Wang et al., 2007;Abramowitz et al., 2008;Slevin et al., 2015).In addition, Kramer et al. (2002), in an attempt to holistically assess the capability of a model for portraying a real world system, have proposed a set of model assessment criteria, namely accuracy, generality and realism.Accuracy is described by the authors as the "goodness of fit" to in situ measurements.Generality is described as the applicability of the model in numerous ecosystems.Realism is described as the ability of the model to address relationships between modelled phenomena.
The SimSphere land biosphere model is one example of a SVAT model.Formerly known as the Penn State University Biosphere-Atmosphere Modeling Scheme (PSUBAMS; Carlson and Boland, 1978;Carlson et al., 1981;Lynn and Carlson, 1990), this 1-D model was considerably modified to its current state by Gillies et al. (1997) and Petropoulos et al. (2013a).Since its early development, the model has become highly variable in its applicational use (for a recent overview of the model use and its applications see Petropoulos et al., 2009a).Amongst other uses, it has been involved in studies concerning the study of land surface interactions (Todhunter and Terjung, 1987;Ross and Oke, 1988) and the examination of hypothetical scenarios examining feedback processes (Wilson et al., 1999;Grantz et al., 1999).Furthermore, its synergistic use with Earth observation (EO) data is being considered at present for the development of operational products of energy fluxes and/or soil moisture on a global scale (Chauhan et al., 2003;ESA, 2012).These investigations have been based around the implementation of a technique commonly termed in the literature as the "triangle" (Carlson, 2007;Petropoulos and Carlson, 2011).A variant of this method, although it does not use SimSphere, is already deployed over Spain to operationally deliver surface soil moisture at 1 km spatial resolution from ESA's own SMOS satellite (Soil Moisture Ocean Salinity; Piles et al., 2011).
As SimSphere's use is rapidly expanding worldwide as both a research and educational tool, its validation and establishment of its coherence and correspondence to what it has been built to simulate is of paramount importance.In this respect, a series of SA (sensitivity analysis) experiments have already been conducted on the model (Olioso et al., 1996;Petropoulos et al., 2009bPetropoulos et al., , 2013a-c)-c).Such studies have allowed quantifying the relative influence of each model input to the simulation of key parameters by the model, ranking them in order of importance to understand how different parts of the model interplay.Yet, to our knowledge, validation studies involving direct comparisons of SimSphere predictions against in situ observations have as yet been scarce and incomprehensive.Such validation exercises have so far only been performed over a very small range of land use/cover types and on earlier versions of the model, when it was still under development (e.g.Todhunter and Terjung, 1987;Ross and Oke, 1988).Furthermore, to our knowledge, very few studies, if any, have validated SimSphere to numerous global ecosystems, for example, over Australian ecosystems.In this context, and given SimSphere use is currently expanding globally, a fully inclusive and comprehensive validation of the model is now of fundamental importance.
In preview of the above, the main objective of this study was to evaluate SimSphere's ability to model key parameters characterising land surface interactions.In this context, the main focus of this study has been to understand specifically the model's ability in predicting shortwave incoming radiation (R g ), net radiation (R net ), latent heat (LE), sensible heat (H ), and air temperature (T air ) at a height of 1.3 and 50 m.Model validation is assessed through a comparison of the model results with corresponding observations from actual in situ measurements acquired at local scale from eight experimental sites (72 days in total) belonging to the OzFlux (Australia) and AmeriFlux (USA) global monitoring networks.This allowed including contrasting conditions in the model evaluation.

SimSphere model description
This work deals with the SimSphere 1-D boundary layer model devoted to the study of energy and mass interactions of the Earth system.Formerly known as PSUBAMS (Carlson and Boland, 1978;Carlson et al., 1981), this model was considerably modified to its current state by Gillies et al. (1997) and Petropoulos et al. (2013a).It is currently maintained and freely distributed from Aberystwyth University, United Kingdom (http://www.aber.ac.uk/simsphere).Further details about the model architecture can be found in Gillies (1993).In brief, the physical components ultimately determine the microclimate conditions in the model and are grouped into three categories, radiative, atmospheric and hydrological.The primary forcing of this component is the available clear sky radiant energy reaching the surface or the plant canopy, calculated as a function of sun and Earth geometry, atmospheric transmission factors for scattering and absorption, the atmospheric and surface emissivities and surface (including soil and plant) albedos.
The vertical structure effectively corresponds to the components of the planetary boundary layer (PBL) that are divided into four layers -a surface mixing layer, a surface of constant flux layer, a surface of vegetation layer and a bare soil layer.The depths of all these layers are somewhat variable with time.The top of the mixing layer is identified by the presence of a temperature inversion that caps the air in convective contact with the surface layer.At night, the situation is reversed as the Earth cools down more rapidly than the atmosphere.The surface "constant flux" layer evolves in the model as a series of equilibrium states between the transition layer below and the mixing layer above.Heat and moisture are assumed to be instantaneously conveyed between the surface and the top of the surface layer, which is chosen to be at a height of 50 m.In reality this height varies between approximately 20 and 50 m.The transition layer applies to a layer in which the vertical exchanges are dominated by molecular and radiative effects as well as by vertical wind changes.In the case of vegetation, the transition layer is represented by the microclimate within and at the top of the vegetation canopy.The substrate layer refers to the depth of the soil over which heat and water is conducted.It consists of two layers, a surface layer and a root zone.Water flows from the surface and the root zone to the atmosphere, respectively, by direct evaporation or through the plants as well as between the two layers.Soil water content is specified by assigning a fractional volume of field capacity, which essentially is the "soil moisture availability".Five layers are used to compute the flow of heat in the substrate.An initial soil temperature profile is assigned on the basis of the initial surface temperature (furnished from a meteorological sounding), as well as a climatological substrate temperature which one obtains from mean data.A governing parameter for heat conduction is the "thermal inertia" that contains both soil conductivity and soil diffusivity (or alternately, the volumetric heat content).This parameter is the one that also governs the rate of H flux to or from the atmosphere through the soil surface.
The horizontal component of the model is composed of four parts: (i) PBL, (ii) surface layer, (iii) transition layer and (iv) substrate layer.Due to SimSphere simulating parameters in a one-dimensional vertical column, the model is restricted horizontally only to areas representative of its initialised conditions; therefore, the model has an undefined spatial coverage.The vegetation component is dormant at night, that is, after radiation sunset.The night-time dynamics for the surface fluxes differ from those during the daytime.Heat and moisture fluxes are exchanged between both the ground and foliage, between plant and interplant airspaces through stomatal and cuticular resistances in the leaf (for water vapour) and the air, between soil and the interplant air spaces and between the entire vegetation canopy and the air.A separate component exists for the bare soil fluxes between the surface and the air.Vegetation and soil fluxes meld at the top of the vegetation canopy, their relative weights depending on the fractional vegetation cover, which is specified as an input to the model.As such, SimSphere is thus referred to as a form of two-stream or two-source model.The soil hydraulic parameters are prescribed from the Clapp and Hornberger (1978) classification.The soil surface turbulent fluxes are determined following the Monin and Obukov (1954) similarity theory which takes into account atmospheric stability.
SimSphere represents various physical processes taking place in a column that extends from the root zone below the soil surface up to a level well above the surface canopy, the top of the surface mixing layer.The processes and interactions simulated by the model are allowed to develop over a 24 h cycle at a chosen time step (typically 30 min), starting from a set of initial conditions given in the early morning.For its parameterisation, input parameters are categorised into seven defined groups; time and location, vegetation, surface, hydrological, meteorological, soil and atmospheric (Table 1).From initialisation, over a 24 h cycle SimSphere assesses the evolution of more than 30 variables associated with the radiative, hydrological and atmospheric physical domains.

Experimental set up
A total of five AmeriFlux and three OzNet experimental sites were used, providing a comprehensive data set of measured micrometeorological parameters together with general meteorological observations.The potential use of several FLUXNET sites was evaluated before deciding on the final eight experimental sites used in the study.Sites were excluded form analysis based on the requirement to fulfil specific criteria, namely (a) sites needed to incorporate different land cover types for the evaluation of the model's ability to simulate fluxes over different land cover/land use types; (b) sites were required to show homogeneous land cover, invariable topography and limited anthropogenic intervention; and (c) site data needed to include measurements of the six parameters validated in the study simultaneously for the same day.Any sites which did not successfully meet this criteria were excluded.Experimental days were further excluded following the pre-processing steps outlined in Sect.4.1.Table 2 provides an overview of the characteristics of the experimental sites used in this study.At each site, micrometeorological measurements of various parameters are acquired including the turbulent fluxes of heat and moisture, R g , R net (at the surface) and T air (often at different heights).Flux measurement methods and calculations performed within the FLUXNET sites are designed with the same specifications at all sites.All collected data are qualitycontrolled and standard procedures for error corrections are prescribed.Details on the FLUXNET measurements and the raw data processing can be found in Aubinet et al. (2000).
The sites were representative of a range of ecosystem types with markedly different site characteristics to in-clude contrasting conditions in the model evaluation.All in situ data acquired from each site were collected covering the year 2011, allowing for a sufficient database for model parameterisation and validation to be developed.All data were obtained from the FLUXNET database (http:// fluxnet.ornl.gov/obtain-data)at Level 2 processing, to allow for consistency and interoperability.This processing level includes the originally acquired in situ data from which any erroneous data caused by obvious instrumentation error have been removed.Additionally, atmospheric in situ data were collected from the freely distributed University of Wyoming's weather balloon data archive (http://weather.uwyo.edu/upperair/sounding.html).Local profiles of temperature, dew point temperature, wind direction, wind speed and atmospheric pressure were taken from the nearest possible experimental sites which were also used in model parameterisation.

SimSphere parameterisation and validation
This section provides a synopsis of the methodology followed in parameterising and subsequently evaluating Sim-Sphere's ability to simulate key parameters characterising land surface interactions.An overview of the main steps included is furnished in Fig. 1.

Data set pre-processing
Following data acquisition, further analysis was implemented aimed at identifying the specific days for which Sim-Sphere would be parameterised and validated for each experimental site.Initially, for each site, cloudy days were identified and eliminated from any further analysis.Judgment on which days (or time periods) were cloud free was based on the diurnal observations of R g .In particular as cloud-free days were flagged as those having smoothly symmetrical R g curves, a property signifying clear-sky conditions (e.g.Carlson et al., 1991).
Subsequently, for the subset of days which included only the cloud-free days, the energy balance closure (EBC) was computed.EBC evaluation has been accepted as a valid method for accuracy assessment of turbulent fluxes derived from eddy covariance measurements (Wilson et al., 2002;Barr et al., 2006).Energy imbalance provides important information on how they should be compared with model simulations (e.g.Twine et al., 2000;Culf et al., 2002).In this study, EBC was principally evaluated by performing a regression analysis (e.g.see Wilson and Baldocchi, 2000;Wilson et al., 2002;Castellvi et al., 2006).The linear regression coefficients (slope and intercept) as well as the coefficient of determination (R 2 ) were calculated from the ordinary least squares (OLS) relationship between the 30 min estimates of the dependent flux variables (LE+H ) and the independently derived available energy (R net −G−S).In addition to this, the energy balance ratio (EBR) parameter was computed by cumulatively summing R net −G−S and LE+H from the 30 min mean average surface energy flux components and then rationing each of the cumulative sums as follows (e.g.Wilson et al., 2002;Liu et al., 2006): . (1) In the above equation, G refers to the soil surface heat flux and S refers to the above-ground heat storage in the vegetation.This index ranges generally from 0 to 1, with values closer to 1 highlighting a satisfactory diurnal energy closure, indicating a good quality of in situ measurements.All days with poor EBC (EBR < 0.750, slope < 0.85, R 2 < 0.930) were excluded from further analysis.
Further conditions were subsequently employed to ensure that selected days were of the highest possible class in terms of in situ data quality.Firstly, all days selected were within the same year to eliminate effects ascribed from interannual variability in vegetation phenology or climatic conditions.Secondly, selected simulation days were assessed for atmospherically stable conditions, namely low wind speeds and low available energy (Maayar et al., 2001).Such conditions were identified by the evaluation of the in situ conditions, where direct measurements of wind speed and energy flux amplitude and diurnal trend were used as indicators of atmospherically stable conditions.As a result, a final set of a total of 72 non-consecutive days from the selected experimental sites were identified as being suitable to be included in this study.

Model parameterisation
SimSphere was parameterised to the daily conditions existent at the flux tower for each of the selected days.In situ data sets provided measurements of soil water content, temperature, wind speed, wind direction and atmospheric pressure at the corresponding time of initialisation, 06:00 LT (local time).Ancillary parameters, critical for the model's initialisation, were largely acquired through either the site's respective principle investigator (for the case of OzFlux), or the FLUXNET database (for the case of AmeriFlux).Such measurements included detailed information on the vegetation (LAI, FVC, vegetation height, cuticle resistance), pedological (soil morphology and soil classification) and topographical (slope, aspect, surface roughness) characteristics of each site.If no further ancillary information was available, specific parameters were acquired through the analysis of standard literature sources (e.g.Mascart et al., 1991;Carlson et al., 1991).The soil type parameters were obtained using the soil texture data provided at each FLUXNET test site and information supplied in some instances by the experimental site managers themselves.This was also the case for the topographical information required in model initialisation.Wind and water vapour sounding profiles which were attained at 06:00 GMT from the University of Wyoming database to correspond to the model's initialisation were also used in model parameterisation.Upon completion of its initialisation, the model was executed for each site/day, forced by the observations acquired from the site on which it was parameterised.The 30 min average value of each of the targeted model outputs per site for the period 05:30-23:30 LT was subsequently exported in Statistical Package for the Social Sciences (SPSS) to validate the model predictions.

Model performance assessment
A series of statistical terms included to evaluate the agreement between the in situ and the SimSphere predictions, including the mean bias error (MBE or bias; Eq. 2) and mean standard deviation (MSD or scatter; Eq. 3) of the observed and modelled values, the RMSD (Eq.4), the mean absolute difference (MAD; Eq. 5), the linear regression fit model coefficient of determination (R 2 ; Eq. 6) and the Nash and Sutcliffe (1970; denoted as Nash) index (Eq.7): 8, 3257-3284, 2015 www.geosci-model-dev.net/8/3257/2015/ P denotes the "predicted" values obtained from SimSphere and O denotes the "observed" values from the selected OzFlux-and AmeriFlux-site days.
The utilisation of these statistics has been widely demonstrated in a number of previous studies comparing model outputs to observational networks (e.g.Alexandris and Kerkides, 2003;Marshall et al., 2013).All statistical metrics were computed from comparisons performed at identical 30 min intervals between the two data sets for each day of comparison.In addition, these statistical parameters, where appropriate, were also computed for each site, providing a summary of the model predictions per experimental site on which the model was validated.

Results
The main results from the comparisons between the Sim-Sphere predictions and the corresponding in situ data for the different parameters evaluated in this study are summarised in Tables 3-8.In addition, Fig. 2 provides a graphical illustration of the agreement between the simulated values and in situ measurements per parameter for all sites together and Fig. 3 illustrates the diurnal agreement between the modelled outputs and in situ-observed fluxes for a selected site and day.The detailed findings from the comparisons performed are made available next.

Incoming shortwave radiation (R g ) at the surface
Simulation accuracy of R g was largely accurate, exhibited by low RMSD (within ∼ 19 % of the observed fluxes) and MAE values (RMSD = 67.83W m −2 , MAE = 46.43W m −2 ; Table 3, Fig. 2).A moderate underestimation of the observed fluxes was also evident (MBE = −19.48W m −2 ).Notably, R g yielded the highest correlated results of all parameters assessed (R 2 = 0.971, Nash = 0.963), further illustrated in Fig. 2, where the distribution of points within the feature space were predominantly centred on the 1 : 1 line, showing a strong relationship between both variables.Within the majority of sites, model simulations consistently underestimated the in situ measurements (MBE = −4.85 to −56.40 W m −2) , with the US_MOZ deciduous forest site being the only exception (MBE = 16.47W m −2 ).That is, the true change (in situ observations), for six of the seven sites tends to be larger than the model-based estimates.Intersite variability was minimal for the simulation of this parameter, with only a difference of ∼ 9 % between the minimum and maximum RMSD as a percentage of the observed fluxes on a per site basis.
Evidently, agreement over the Australian sites generally increased for the period between February and June, with a significant decrease in accuracy from August to early February.For example, over the Calperum grazing pasture site,      RMSD ranged from 24.14 to 53.78 W m −2 (or within ∼ 6 and ∼ 21 % of the observed fluxes) for all the test days located within the period from 24 February to 24 April 2011.In contrast, for the same site, RMSD varied from 84.41 to 149.29 W m −2 (or within ∼ 41 and ∼ 53 % of the observed fluxes) for all the test days for the period between 22 July and 29 December 2011.Similar trends were observed for all other Australian sites, although some anomalies were present.In relation to the US sites, the adverse was found: the highest simulation accuracies were predominantly derived for the test days during the period between October and late April.Clearly, the periods of highest simulation accuracy for both the Australian and US sites correspond to their respective summer season, and are thus consistent between the two countries.Generally, the results for the US sites suggested that the conditions prevalent within the wet season (October-May) may have had an influence on model accuracy.

Net radiation (R net ) at the surface
Table 4 and Fig A much larger intersite variability was reported for the model simulation accuracies of the R net parameter, where RMSD ranged between 33.90 and 78.03 W m −2 (also reflected in the RMSD as a percentage of observed fluxes ranging between ∼ 16 and ∼ 30 % on a per site average basis) showing to some extent a deficiency in the capability of the model to capture the land surface process over varying land cover types.The R net results exhibited largely similar statistical agreement with those observed for the R g parameter.Most noticeably, in correspondence with the R g parameter results, SimSphere showed superior simulation accuracy within the Alice Springs mulga woodland site in comparison to the other land cover types, with the reported accuracies significantly above the overall average (RMSD = 33.90W m −2 , within ∼ 16 % of the observed fluxes, MAE = 26.25 W m −2 ).Moreover, the woody savannah site of Howard Springs again exhibited high simulation accuracies (RMSD = 47.05W m −2 , within ∼ 21 % of the observed fluxes, MAE = 35.74W m −2 ), with comparable accuracies to the simulation of the R g parameter.Con-versely, the model showed an inferior performance when simulating R net within the US_TON wooded savannah site where a systematic and more pronounced underestimation of R net was evident (MBE = −46.10W m −2 ).This constant underestimation by the model led to a poorer agreement between the model predictions and in situ observations for the US_TON site, as reflected in the statistical analysis (RMSD = 78.03W m −2 , within ∼ 30 % of the observed fluxes, MAE = 65.22W m −2 ).It should be noted that the accuracy of the model estimations on a per site basis did not correlate between both the R g and R net parameter estimations, with only the US_WHS shrubland site exhibiting weaker simulation accuracies for both parameters and, as indicated above, a relatively high simulation accuracy for the Howard Springs woody savannah site.
Evidently, as indicated in Table 4, trends in simulation accuracy dependent on test day were apparent.Although comparable; the trends were not as prominent as those exhibited for the R net parameter.Within the Australian sites, low RMSD was exhibited predominantly for the test days within the period from March to July, although some discrepancies were present during specific days.For example, the 27 May simulation date for the Howard Springs site reported an RMSD of 70.60 W m −2 (within ∼ 38 % of the observed fluxes), indicating a day of unusually high error for this period.However, such anomalies were limited.Generally, for the US sites, highest RMSD was exhibited for the period concurrent to the wet season (October-April), with the highest error rates exhibited during the dry period, for example, during the 27 February simulation date for the US_TON site (RMSD = 113.80W m −2 , within ∼ 73 % of the observed fluxes), although, again, the anomalies in such trends were notable yet uncommon.

Latent heat (LE)
As presented in Table 5, the highest RMSD in relation to the observed fluxes was reported for the LE parameter in comparison to all other parameters evaluated (RMSD = 39.47 W m −2 ), where SimSphere showed some deficiencies when reproducing LE fluxes in varying land cover, both in terms of its seasonal and diurnal evolution.An average R 2 value of 0.700 is also indicative of a poorer correlation between the predictions and observations of LE (Fig. 2).When averaged over all days and sites, the modelbased estimates tended towards a conservative overestimation of the observed fluxes, indicated by an average MBE of 2.84 W m −2 .
On a site by site basis, the US_IB1 cropland site consistently yielded the highest statistical agreement between model-predicted and observed values, with low error and high correlation results (RMSD = 52.54W m −2 , within 20 % of the observed fluxes, MAE = 15.16W m −2 , R 2 = 0.827, Nash = 0.945).Notably, all other sites exhibited poorer agreement, with RMSD values in relation to the observed fluxes above 30 % for six of the eight sites (RMSD varying within ∼ 34 and 83 % of the observed fluxes).Generally, each site exhibited a significant range of MBE, from −11.49W m −2 (US_WHS) to 25.65 W m −2 (US_MOZ), suggesting high variability between the partitioning of LE in each ecosystem.Peak LE flux values exhibited high intersite variability, with both the US_IB1 (cropland) and US_MOZ (deciduous broadleaf forest) sites containing the highest LE flux peaks of 458.5 and 376 W m −2 , respectively.In comparison, a maximum LE flux peak of just 143.7 W m −2 was reported for the US_WHS (shrubland) site, suggesting a substantial range of 314.8 W m −2 between the lowest daily peak LE and maximum daily peak LE.Noticeably, trends in simulation accuracy dependent on test day were comparable to both the R g and R net parameter results, however, with significantly higher intersite variability in RMSD ranges.

Sensible heat (H )
SimSphere showed a satisfactory ability to accurately simulate H fluxes in numerous ecosystems for the 72 days included in this study, with average RMSD and R 2 values of 55.06 W m −2 , within ∼ 28 % of the observed fluxes, and 0.829, respectively.Results were largely similar to those of the LE flux simulation accuracies, although the model performance for the LE parameter underperformed compared that of the H flux for the majority of statistical metrics computed herein.
Average RMSD values ranged from 38.07 to 69.94 W m 2 (US_VAR and US_WHS) and within ∼ 17 and ∼ 68 % of the observed fluxes (US_VAR and US_IB1) when analysed on a site by site basis, underlining the greatest intersite variability was reported for this parameter.In addition, R 2 values ranged from 0.73 (US_IB1) to 0.94 (US_VAR).The latter was suggestive that model predictions were generally in good agreement with the in situ measurements, showing a strong relationship between both variables.The grassland site (US_VAR) consistently showed superior model performance in comparison to all other sites, with values indicating an excellent agreement with the observed diurnal evolution (RMSD = 38.07W m −2 , within ∼ 17 % of the observed fluxes, MAE = 28.35W m −2 ).MSD values reported for US_VAR were 19.41 W m −2 lower than the all site average, suggesting a systematically accurate representation of H fluxes at this site.MSDs for H flux were directly comparable to the overall average MSD values reported for R g and R net , yet significantly higher than the LE fluxes.Simulation accuracy was comparably high for the simulated H fluxes for five of the eight sites, with RMSD values in relation to the observed fluxes above 30 % (RMSD within percentage of the observed fluxes varying between ∼ 17 and 30 %).Notably, results for the US_IB1 site exhibited significant error, with RMSD and MSD values of 69.94 W m −2 , within ∼ 68 % of the observed fluxes, and 67.73 W m −2 , respectively.
For the Australian sites no significant trends were evident dependent on simulation day, with generally comparable accuracy ranges for the specific test days including anomalistic days which exhibited significantly higher error ranges.For example, the Howard Springs woody savannah site indicated an RMSD for the majority of simulation days ranging between 28.29 and 50.31W m −2 (within ∼ 15 and ∼ 21 % of the observed fluxes) on a per test day basis, with the 18 April and 13 May experimental days exhibiting RMSDs of 75.86 and 96.93 W m −2 (within ∼ 52 and ∼ 65 %), respectively.Similar intra-site variability was notable for the US sites.

Air temperature 1.3 m (T air, 1.3 m )
SimSphere showed a high capability for simulating T air, 1.3 m with an average RMSD as low as 3.23 • C (within ∼ 15 % of the observed) and relatively high R 2 value of 0.843, see Table 7. Furthermore, T air, 1.3 m exhibited neither a consistent over-or underestimation, with an overall average MBE of 0.28 • C. Simulation accuracy for T air, 1.3 m was relatively stable, with a low range of RMSD values reported over all sites.RMSD values ranged from 2.17 • C (within ∼ 9 % of the observed) in the woodland savannah site of Howard Springs to 4.74 • C (within ∼ 25 % of the observed) in the grazing pasture site of Calperum.Overall, agreement between the predictions and observations was greatest for the Howard Springs site, with results confirming a high overall correlation to the observed diurnal evolution of T air, 1.3 m .The deciduous broadleaf site of US_MOZ also exhibited a comparably high simulation accuracy (RMSD = 2.38 • C, within ∼ 11 % of the observed, MAE = 1.84 • C, Nash = 0.853).The Calperum site exhibited the weakest agreement of T air, 1.3 m with an average RMSD 1.51 • C higher than the all-site average.The R 2 analysis further appraised the model's ability to accurately simulate air temperature, with a range of values indicating high correlation between model-predicted and observed T air, 1.3 m (0.74-0.93).MSD displayed a high range of values (2.1-3.76 • C), showing to some extent the inability of the model to consistently predict T air, 1.3 m with a high level of precision.The trends in simulation accuracy dependent on test day were again insignificant for the T air 1.3 m parameter, exhibiting similar patterns to those indicated for the H flux parameter.

Air Temperature 50 m (T air 50 m )
The model showed a slightly inferior performance in predicting T air 50 m (RMSD = 3.77 • C, within ∼ 18 % of the observed) when compared to T air 1.3 m , with an average RMSD difference of 0.54 • C (∼ 3 % percentage difference in relation to the observed) (Table 8, Fig. 2).A lower average R 2 value of 0.775 is reported compared to that of T air, 1.3 m (R 2 = 0.843), indicating a weaker, yet close, agreement between both variables.However, the values reported still showed a highly acceptable correlation between the modelled esti-mates and the in situ measurements, as indicated by an average Nash value of 0.825.Once averaged, T air 50 m exhibited a minor underestimation of −0.38 • C; however, the range of MBEs reported between sites was significantly less (2.1 • C), suggesting a more consistent simulation of T air at 50 m compared to at 1.3 m by SimSphere.In contrast, agreement between the simulated T air 50 m and in situ measurements resulted in a higher MSD than that reported for the T air, 1.3 m parameter, with the exception of the Howard Springs site.When analysed on a per site basis, notably, in correspondence with the T air, 1.3 m parameter, agreement between the estimated and measured values over both the Howard Springs and US_MOZ sites exhibited the highest simulation accuracy (RMSD = 2.04 and 2.85 • C, within ∼ 8 and ∼ 13 % of the observed, respectively).Moreover, the weakest agreement was reported over the Calperum site, again, in correspondence with the results of the T air, 1.3 m parameter.No systematic trends were apparent in the intersite variability of simulation accuracy dependent on test day.

Discussion
The present study evaluated the ability of the SimSphere SVAT model to accurately represent key parameters characterising land surface interactions within eight ecosystems in two continents.A total of 72 days (10 days per site of the eight sites selected) from year 2011 were selected from Australia and the USA to validate the model's ability to predict R g , R net , LE, H , and T air at a height of 1.3 and 50 m.
Variable model performance was clearly evident when simulating both the LE and H fluxes within contrasting land cover types.For example, as discussed, the highest simulation accuracy was attained within the grassland study sites.In contrast, simulation accuracy within forested ecosystems was less satisfactory.The deciduous forest stand (US_MOZ), with an average canopy height of 24.2 m attained significantly low simulation accuracy and was also outperformed by the mulga forested ecosystem (Alice Springs), characterised by a sparse canopy at a height of 6.5 m.Such results suggest that the increased complexity and heterogeneity of forested environments, particularly those with understory vegetation, can have profound effects on the overall exchange of mass and energy which cannot be represented within the model's parameterisation and hence can influence LE and H outputs.The partitioning of LE and H fluxes are also highly susceptible to a number of other factors.Small changes in the moisture availability, particularly from the deep layer soil water content (SWC), can have a strong influence (Carlson and Lynn, 1991), on the representativeness of the radiosonde data to the existent local conditions (Taconet et al., 1986).As reported by Taconet et al. (1986), an error of just ∼ 2 • C in the sounding profile temperature can cause a variation of ∼ 45 W m −2 in the corresponding fluxes, particularly for H flux. SimSphere was forced with surface and root zone moisture availability taken directly from the in situ data sets.These highly influential parameters were consistently misrepresented within the model's parameterisation, providing a possible reason, in part, for the lower simulation accuracies attained.
R g was estimated by the model to a high level of accuracy (error within ∼ 19 % of the observed fluxes), where an R 2 value of 0.971 and a Nash value of 0.960 reported for all days of analysis suggest that model predictions had excellent correlation to the observed data set.This indicates that SimSphere was able to simulate the trend of R g well.A possible reason for the underestimation of R g by the model is perhaps linked to the solar transmission model and/or the surface albedo calculation in the model, as has also been pointed out previously by Todhunter and Terjung (1978).Furthermore, previous sensitivity analysis studies undertaken upon the model confirm that R g is significantly influenced by a site's aspect (Petropoulos et al., 2014).Therefore, simulation accuracy may partly be related to the model's representation of a site's topographical characteristics.
In the majority of the experimental sites a general underestimation of R net was attained by the model, which led to mean RMSD and R 2 values of 58.69 W m −2 and 0.960, respectively.These results are also comparable to those reported in other analogous validation studies (Carlson and Boland, 1978;Todhunter and Terjung, 1987;Ross and Oke, 1988).Todhunter and Terjung (1987) compared predicted R net from the model versus corresponding R net values obtained from the literature for Los Angeles, USA, and showed both daytime and night time simulations to be in agreement within the range reported in the literature.Ross and Oke (1988) also confirmed the capability of the model for simulating the day-to-day variation of R net for comparisons using 18 cloud-free days over an urban area of Vancouver, B.C., Canada.Ross and Oke (1988) reported an overall average RMSD error of 43 W m −2 for comparisons for all cloud-free days, a minor improvement on the RMSD of 58.69 W m −2 presented herein.Disparity in the results between this study and those studies could be the result of utilising model simulations over dissimilar land cover types, where it is largely accepted that R net partitioning into LE and H fluxes is highly dependable on the vegetation and surface characteristics of the site (Olioso et al., 2000).Previous sensitivity analysis studies undertaken on the SimSphere further confirm this observation (Petropoulos et al., 2014).Similarly to R g , simulation accuracy of R net was described by Ross and Oke (1988) to be a factor of long-wave radiation, mainly the values of atmospheric and surface emissivities (which affect the surface temperature estimation).Increased representation of the surface optical properties and long-wave radiation estimation of the model could greatly enhance simulation accuracy.
Overall, simulation accuracies were lower for estimates of T air 50 m compared to estimates of T air, 1.3 m in all but one site, Howard Springs.One possible explanation for this may be the fundamental problem that model estimates of T air 50 m could only be validated against ancillary air temperature data obtained directly from the site's flux tower; thus, direct comparison specifically at 50 m could not be achieved.Similarly to the LE and H fluxes, variable simulation accuracies dependent on land cover types were also evident.Three sites -Calperum, US_VAR and US_IB1 -exhibit noticeably weaker simulation accuracies in comparison to the remaining sites.Upon further investigation, all three sites showed ecosystems which are characterised by high interannual variability of vegetation phenology, such as vegetation height, leaf width, FVC etc. Modelled T air peaked between 10:30 and 14:30 LT.For instances where a time lag between the predicted and observed T air comparisons is observed, such effects may be linked with the energy storage in the vegetation and the air, as it is not taken into account in the Sim-Sphere simulations.This may partly explain some of the inaccuracies reported for T air estimation in Alice Springs and US_MOZ, as this effect is more important for forested sites.Carlson and Boland (1978) and Carlson et al. (1991) also described a similar hysteresis effect in comparisons which they performed for different vegetation canopies and environmental conditions (urban and rural environments).Carlson and Boland (1978) suggested thermal inertia to be related proportionally to an increase in the time lag between solar noon and the time of maximum H flux and T s , whereas Carlson et al. (1991) admitted that they were unable to practically explain this "hysteresis" trend.Through comprehensive sensitivity analysis studies (Petropoulos et al., 2009b(Petropoulos et al., , 2013a(Petropoulos et al., , 2014)), parameters closely associated with vegetation phenology have been previously outlined to have a highly influential control on air temperature magnitude and extent.Conversely, sites which show relatively stable vegetation phenology such as US_TON (wooded savannah) exhibited more accurate temperature estimates.Furthermore, the air temperature of the site covered by the dead forest had greater daily fluctuation compared to the stands covered by mature forest which generally had the smallest daily fluctuations.However, more studies are required in this direction to categorise the dead forest from mature forest, currently which is not possible in the given land cover database.Improved land cover information can provide more insights into the performances during the validation.As SimSphere assumes a homogenous canopy layer, some discrepancies may occur in the air temperature simulation, which is also the case over here.Furthermore, a very important point to also consider in the overall interpretation of the results is that the model does not account for advective conditions which might be important for instance when strong winds exist.Yet, generally, air temperature at 1.3 and 50 m were well represented by the model with the results obtained showing a significant improvement on values reported in previous validation attempts (Carlson and Boland, 1978;Carlson et al., 1991).
All in all, SimSphere demonstrated a high capability for simulating parameters associated with Earth's energy balance.It is also apparent that the model fulfils three of the Kramer et al. (2002) model assessment criteria, namely accuracy, generality and realism (see also Sect. 1) In regards to accuracy, no significant systematic prediction errors occurred within all of the fluxes analysed, with the exception of a consistent underestimation of R g and R net .Additionally, simulated peak heat and water flux values were in high accordance with the in situ data, typically at 12:30-13:30 LT, with a slight lag for LE and H fluxes (13:00-14:00 LT).In terms of generality, the model has shown high levels, with acceptable simulation accuracies attained in the majority of sites validated.In order to improve the model's generality, the inclusion of more forested environments would comprehensively assess the model's applicability to different land cover types, particularly heterogeneous forest stands where simulation accuracy tends to be lower.Finally, realism in the model has been most notable in the simulation of LE, H and T air fluxes, where a slight change in the vegetation phenology or SWC was accountable for characterising the diurnal evolution of fluxes in all sites validated.
This study can advance our understanding on SimSphere's capability to simulate the interactions between different components of our Earth system and related land surface processes.As no model is perfect, some discrepancies between predictions and measurements will always appear.Identification of these discrepancies are most interesting, because they can teach us more about causes of model uncertainties in the prediction of hydro-meteorological variables and help us to improve the model structure and performance.Some large discrepancies between the simulated and observed data sets could be due to model parameterisation.Apart from environmental factors, some instrumentation errors in the tower flux measurements, indicated by the presence of many spikes (too large or too small values), can also affect the accuracy, even if model-simulated results are in agreement with actual conditions.The other possible reasons is the presence of spikes in the fluxes, observed particularly on the days of low agreement, which could occur because of horizontal advection, footprint changes and non-stationarity of turbulent regimes (Papale et al., 2006).Unfortunately, such conditions cannot be captured and replicated by SimSphere.
Overall, it is important to recognise that uncertainty is inevitable in any model and that a model will never be as complex as the reality it portrays.In this way the model fulfills its objective as a tool that identifies the patterns of change expected, if not always the magnitudes, indicating its usefulness in practical applications either as a stand-alone tool or in combination with remote sensing data as done for instance through the implementation of the "triangle" technique.

Conclusions
This study evaluated the ability of the SimSphere land biosphere model in predicting a number of parameters characterising land surface interactions for eight sites from the global www.geosci-model-dev.net/8/3257/2015/Geosci.Model Dev., 8, 3257-3284, 2015 terrestrial monitoring network, FLUXNET.A rigorous comparison was performed for 72 selected days from the year 2011.The main findings of this study are concluded as follows.
Overall, SimSphere estimates of instantaneous energy fluxes and air temperature showed good agreement in all ecosystems evaluated, apart from a minor underestimation of R g and R net (MBE = −19.48and −16.49W m −2 , respectively).Some ecosystems exhibited poorer simulation accuracies than others, most noticeably cropland (US_IB1) and grazing pasture (Calperum); whilst the woodland savannah (Howard Springs) and mulga woodland (Alice Springs) ecosystems both attained the highest overall simulation accuracies.Comparisons showed a good agreement between modelled and measured fluxes, especially for the days with smoothed daily flux trends.Very high values of the Nash-Sutcliffe efficiency index were also reported for all parameters, ranging from 0.720 to 0.998, suggesting, overall, a very good model representation of the observations.The highest simulation accuracies were obtained for the open woodland savannah and mulga woodland sites for most of the compared parameters.
The process of validating any physical model is imperative to understanding its representation of real-world scenarios.It helps in identifying any deficiencies in the models' predictive ability and to identify any possible sources of error and uncertainty associated with a model.To our knowledge, very few studies, if any, have focused specifically on validating SimSphere to numerous ecosystems in the USA and Australia.On this basis, with the use of this model as either a stand alone research or educational tool, or for its synergy with EO data, its validation is not only timely but essential.SimSphere, despite its inherent architectural limitations, can be applied in future for solving various theoretical and applied tasks.There is certainly room for further improvements to the model to develop it further in terms of its representation of the various physical processes characterising land surface interactions.This is a promising research direction on which model development efforts should be focused in future.

Figure 1 .
Figure 1.Flowchart of the overall methodology followed in evaluating SimSphere's outputs.
. 2 indicate a high overall performance in the model's ability to accurately predict R net , confirmed by the high simulation accuracy (RMSD = 58.69W m −2 , within ∼ 24 % of the observed fluxes, MAE = 46.42W m −2 ) reported for all sites.Furthermore, comparisons of R net for all days of simulation showed a low average MSD of 54.44 W m −2 , indicating the model's capability to precisely represent the amplitude of the R net flux, with low dispersion of variance from the in situ trends, as evidenced in Fig. 2. MBE results indicated a moderate underestimation of the in situ measurements by the model (−16.49W m −2 ), with seven of the eight site averages showing an underestimation of the in situ trends (negative MBE values in a range of −0.09 to −46.10 W m −2 ).

Table 1 .
Summary of the main SimSphere inputs.The units of each of the model inputs are given in parentheses where applicable.

Table 2 .
Description of selected experimental sites used for validating SimSphere.

Table 3 .
Daily simulation accuracy and average site simulation accuracy for R g fluxes.Bias, scatter, RMSD and MAE are expressed in watts per square metre (W m −2 ).Nash index is unitless.

Table 4 .
Daily simulation accuracy and average site simulation accuracy for R net fluxes.Bias, scatter, RMSD and MAE are expressed in W m −2 .Nash index is unitless.

Table 5 .
Daily simulation accuracy and average site simulation accuracy for LE fluxes.Bias, scatter, RMSD and MAE are expressed in W m −2 .Nash index is unitless.

Table 6 .
Daily simulation accuracy and average site simulation accuracy for H fluxes. Bias, scatter, RMSD and MAE are expressed in W m −2 .Nash index is unitless.

Table 7 .
Daily simulation accuracy and average site simulation accuracy for T air 1.3 m .Bias, scatter, RMSD and MAE are expressed in degrees Celsius.Nash index is unitless.

Table 8 .
Daily simulation accuracy and average site simulation accuracy for T air 50 m .Bias, scatter, RMSD and MAE are expressed in degrees Celsius.Nash index is unitless.