The SECHIBA module of the ORCHIDEE land surface model describes the exchanges of water and energy between the surface and the atmosphere. In the present paper, the adjoint semi-generator software called YAO was used as a framework to implement a 4D-VAR assimilation scheme of observations in SECHIBA. The objective was to deliver the adjoint model of SECHIBA (SECHIBA-YAO) obtained with YAO to provide an opportunity for scientists and end users to perform their own assimilation. SECHIBA-YAO allows the control of the 11 most influential internal parameters of the soil water content, by observing the land surface temperature or remote sensing data such as the brightness temperature. The paper presents the fundamental principles of the 4D-VAR assimilation, the semi-generator software YAO and a large number of experiments showing the accuracy of the adjoint code in different conditions (sites, PFTs, seasons). In addition, a distributed version is available in the case for which only the land surface temperature is observed.

Land surface models (LSMs) simulate the interactions between the atmosphere and the land surface, which directly influence the exchange of water, energy and carbon with the atmosphere. They are important tools for understanding the main interaction and feedback processes simulating the present climate and making predictions of future climate evolution (Harrison et al., 2009). Such predictions are subject to considerable uncertainties, which are related to the difficulty in modeling the highly complex physics with a limited set of equations that does not account for all the interacting processes (Pipunic et al., 2008; Ghent et al., 2011). Understanding these uncertainties is important in order to obtain more realistic simulations.

A key challenge of a dynamical model is to adjust the output of the model considering an appropriate source of information. One source of information can be given by measurements (or more generally observations) that contribute to the understanding of the system evolution (Lahoz et al., 2010). Data assimilation merges these observations with the dynamical model in order to obtain a more accurate estimate of the current and future states of the system, given the uncertainties of the model and of the observations. Two basic methodologies can be used for that purpose: the sequential approach (Evensen, 2003), based on the statistical estimation theory of the Kalman filter, and the variational approach, the so-called 4D-VAR (Le Dimet et al., 1986), built from the optimal control theory (Robert et al., 2007). It can be proven that both approaches provide the same solution at the end of the assimilation period, for Gaussian errors (not correlated in time) and linear models. This property does not stand if the processes under study are nonlinear. The main advantage of 4D-VAR comes from its integration in time achieved during the assimilation of the observations, giving rise to a global trajectory of the model optimized over the assimilation time window.

Variational data assimilation has been widely used in land surface applications. The assimilation of land surface temperature (LST) is suitable for an extensive range of environmental problems. As mentioned in Ridler et al. (2012), LST is an excellent candidate for model optimization since it is a solution of the coupled energy and water budgets, and permits one to constrain parameters related to evapotranspiration and indirectly to soil water content.

Castelli et al. (1999) expose a variational data assimilation approach, including surface energy balance in the estimation procedure as a physical constraint (based on adjoint techniques). The authors worked with satellite data and directly assimilated soil skin temperatures. They concluded that constraining the model with such observations improves model flux estimates, with respect to available measurements. Huang et al. (2003) developed a one-dimensional land data assimilation scheme based on an ensemble Kalman filter, used to improve the estimation of the land surface temperature profile. They demonstrated that the assimilation of LST into land surface models is a practical and effective way to improve the estimation of land surface state variables and fluxes.

Reichle et al. (2010) performed the assimilation of satellite-derived skin temperature observations using an ensemble-based, offline land data assimilation system. Results suggest that the retrieved fluxes provide modest but statistically significant improvements. However, these authors noted strong biases between LST estimates from in situ observations, land modeling, and satellite retrievals that vary with season and time of the day. They highlighted the importance of taking these biases into account; otherwise, large errors in surface flux estimates can result.

Ghent et al. (2011) investigated the impacts of data assimilation on terrestrial feedbacks of the climate system. Assimilation of LST helped to constrain simulations of soil moisture and surface heat fluxes. Ridler et al. (2012) tested the effectiveness of using satellite estimates of radiometric surface temperatures and surface soil moisture to calibrate a soil–vegetation–atmosphere transfer (SVAT) model, based on error minimization of temperature and soil moisture model outputs. Flux simulations were improved when the model is calibrated against in situ surface temperature and surface soil moisture versus satellite estimates of the same fluxes.

Bateni et al. (2013) employed the full heat diffusion equation to perform a variational data assimilation. Deviation terms of the evaporation fraction and a scale coefficient were added as penalization terms in the cost function. A weak constraint was applied to data assimilation with model uncertainty, accounting in this way for model errors. The cost function associated with this experiment contains a term that penalizes the deviation from prior values. When assimilating LST into the model, the authors proved that the heat diffusion coefficients are strongly sensitive. As a conclusion, it can be seen that the assimilation of LST can improve the model simulated flows.

In the present study, we focused on the SECHIBA module (Ducoudré et al., 1993), which is part of the ORCHIDEE land surface model dedicated to the resolution of the surface energy and water budgets. Our objective was to test the ability of 4D-VAR to estimate a set of its inner parameters. A dedicated software (called SECHIBA-YAO) was developed by using the adjoint semi-generator software called YAO developed at LOCEAN-IPSL (Nardi et al., 2009). YAO serves as a framework to design and implement dynamical models, helping to generate the adjoint of the model, which permits one to compute the model gradients. SECHIBA-YAO provides an opportunity to control the most influent internal parameters of SECHIBA by assimilating LST (land surface temperature) observations. At a given location and for specific soil and climate conditions, twin experiments of assimilation have been executed. These twin experiments conducted on actual sites were used to demonstrate the accuracy and usefulness of the code and the potential of 4D-VAR when dealing with LST assimilation.

This paper is structured as follows. In Sect. 2, model and data used to illustrate the capabilities of the SECHIBA-YAO are detailed. In Sect. 3, fundamentals of variational data assimilation are presented. In addition, principles of YAO and of its associated modular graph formalism are shown. The principle of the computation of the adjoint with YAO is provided. The implementation of SECHIBA-YAO and the details of the experiments that prove the efficiency of the 4D-VAR assimilation are also given in Sect. 3. Sensitivity experiments and simple twin experiments at two FLUXNET locations are presented in Sect. 4. These experiments illustrate the convenience of YAO to optimize control parameters. Section 5 consists in a discussion and a conclusion. Finally, the specificities of the distributed software are given in Sect. 6.

ORCHIDEE is a land surface model developed at the Institut Pierre Simon Laplace (IPSL) in France. ORCHIDEE is a mechanistic dynamic global vegetation model (Krinner et al., 2005) representing the continental biosphere and its different biophysical processes. It is part of the IPSL earth system model (Dufresne et al., 2013) and is composed of three modules: SECHIBA, STOMATE and LPJ. The version used in this work corresponds to version 1.2.6, released on 22 April 2010. SECHIBA computes the water and energy budgets at the biosphere–atmosphere interface, as well as the gross primary production (GPP); STOMATE (Friedlingstein et al., 1999) is a biogeochemical model which represents the processes related to the carbon cycle, such as carbon dynamics, the allocation of photosynthesis respiration and growth maintenance, heterotrophic respiration and phenology, and finally, LPJ (Sitch et al., 2003) models the global dynamics of the vegetation, interspecific competition for sunlight as well as fire occurrence. ORCHIDEE has different timescales: 30 min for energy and matter, 1 day for carbon processes and 1 year for species competition processes. The full description of ORCHIDEE can be found in Ducoudré et al. (1993), Krinner et al. (2005), d'Orgeval et al. (2006), and Kuppel et al. (2012). In the present study, ORCHIDEE version 1.9 is used in a grid-point mode (at a given location), forced by the corresponding local half-hourly gap-filled meteorological measurements obtained at the flux towers. In this study, only the SECHIBA module is considered.

In SECHIBA, the land surface is represented as a whole system composed of
various fractions of vegetation types called PFTs (plant functional types). A
single energy budget is performed at each grid point, but the water budget is
calculated for each PFT fraction. The resulting energy and water fluxes
between atmosphere, ground and the retrieved temperature represent the canopy
ensemble and the soil surface. The main fluxes modeled are the net radiation
(

In the full version of SECHIBA-YAO, observations of LST or brightness
temperature can be used to constrain model inner parameter or initial
conditions of the model variables. However, the simulated LST is hemispheric
and does not account for solar configuration and viewing angle effects. In
order to compute a thermal infrared brightness temperature from LST, and
neglecting the directional effects, the total energy emitted by the surface
(Rad) can be computed using the following expression:

Measurement towers sprang up around the world, grouped into regional
networks. The data from all networks are accessible to the scientific
community via the FLUXNET website (

Located in South Africa at

Located in the United States of America, on land owned by Harvard University,
the station is located at

Variational assimilation (4D-VAR) (Le Dimet et al., 1986) considers a
physical phenomenon described in space and its time evolution. It thus
requires the knowledge of a direct dynamical model

The basic idea is to determine the minimum of a cost function

Formalism and notations for variational data assimilation are taken from Ide
et al. (1997).

The objective of this work is to show the capacity of 4D-VAR to help
determine the value of the principal inner parameters

The minimization of the cost function (Eq. 4) is based on gradient-descent
approaches. The cost function gradient has the form

The expression above allows us to compute

The control parameters are adjusted several times using a L-BFGS method (Gilbert and LeMaréchal, 1989) until a stopping criterion is reached.

Variational data assimilation requires the computation of the adjoint code
of the direct model, which is a heavy and complex task, especially for a
large model such as SECHIBA. Usually, the adjoint code is computed with the
help of specific softwares (automatic differentiators) (e.g., Bischof et
al., 1997; Giering and Kaminski, 2003; Hascoët and Pascual, 2004). These
softwares are appropriate for the differentiation of large codes, but their
use will be optimal only under specific coding conventions and a good level
of modularity of the codes (Talagrand, 1991). Moreover, manual optimization
of the produced code is often necessary. Therefore, in many practical cases
the automatic production of code will not be totally optimal in terms of
flexibility (e.g., when the direct model is updated frequently, one has to
re-differentiate the whole code). These considerations motivated the
development of a slightly different but complementary approach that focuses
on the high-level structure of the numerical models, embedding
implementation details inside simple entities that can be easily updated.
This has led to the development of the YAO assimilation software at
LOCEAN/IPSL (

YAO is based on the decomposition of a numerical model into elementary
modules interconnected by directional links. On the one hand, the structure
of the model (variables, dependencies…) is described as a graph
structure. On the other hand, the details of the physics are coded inside
C/C

(Left) Example of a modular graph associated with four basic functions and five basic connections, three input points and three output points; (right) simplified description showing the acyclicity of the graph. Source: Nardi et al. (2009).

In YAO, a numerical model must be described as an ensemble of modules related
by connections in order to form a graph. Let us define more precisely the
main components of the graph.

A module is a basic entity of computation, representing a deterministic (but possibly nonlinear) function transforming an input vector into an output vector. A module is viewed graphically as a node of the graph; the sizes of the vectors correspond to the number of input and output connections associated with the node.

A basic connection is an oriented link relating two nodes of the graph. Most basic connections usually represent the transmission of the output of one module taken as input by another one.

The modular graph is the ensemble of the modules and of their connections.
It must be acyclic so that a topological order may be defined on the nodes
of the graph (i.e., if there is connection

Typically, a modular graph describes the equations governing the system of
interest and each physical variable appearing in the governing equations is
associated with a specific module. However, supplementary modules can also be
defined to represent temporary variables required to simplify computations
for complex equations. The user has generally to specify modules at a single
point (

By passing the different modules in topological order, YAO is able to emulate the global model and to calculate the global model outputs given model initial conditions and parameters.

Now, we will see that the usefulness of the graph modular approach is
reinforced when the Jacobian matrix of each basic function is known. For a
basic function

The “lin-forward” algorithm is the following.

Initialize the external context data input points with a perturbation
d

Pass the modules in topological order and propagate the perturbation.

Estimate the perturbation output d

An implementation of a variational assimilation procedure with YAO follows the structure represented in Fig. 3. The YAO compiler builds an executable file following the scheme presented in Fig. 3. This file is independent of the assimilation instructions. The executable file reads these instructions from an instruction file. Due to the graph structure of the model and its adjoint, it is easy to modify the model and its adjoint, e.g., by updating some adequate modules; one can systematically obtain the updated global direct model and the global adjoint.

Structure of a project in YAO. The software generates an executable program from input modules, hat and description files. The generated program reads an instruction file to perform assimilation experiments.

As mentioned in the Introduction, this paper gives access to a compiled version of SECHIBA-YAO and allows one to perform some assimilation experiments related to the control of the 10 most influent internal parameters of SECHIBA by observing the land surface temperature. YAO is a free software that gives the opportunity to modify the SECHIBA code provided in this paper.

The implementation of SECHIBA in YAO starts with the definition of the
modular graph describing the dynamics of the model (see Appendix A).
Elementary processes and interconnections between modules are defined in
order to represent the computation flow in the model. The modular graph was
built as follows.

Every component of the original code was carefully studied line by line directly.

A list of inputs and outputs for each subroutine was made for every routine of SECHIBA. This permits one to know exactly the information flow in the model.

A second zoom in the subroutines was made in order to understand the internal dynamics of the code. This is the last step in the modular graph definition. When studying the subroutines, their complexity was reduced by breaking the different steps into simpler elements. The idea is to have a scalable code. Uncoupled modules give more independence when changing part of the model. Cohesive modules help to understand the model.

The original six subroutines in the SECHIBA-Fortran code are split into 130 modules by the SECHIBA-YAO modular graph, corresponding to every process modeled by SECHIBA and to a number of transitional modules serving as auxiliary computing.

It is important to mention that every variable and subroutine name was kept as in the original model. If a user or developer of SECHIBA-Fortran sees the implementation in YAO, he will find his way easily.

After defining the modular graph in YAO, the second step in the SECHIBA-YAO implementation is the coding of the direct model and the derivatives of the modules. Every module is represented as a source file and the different processes attributed to the module are implemented inside the source file, allowing a better control of the physics; i.e., any change in the physics could be made easily.

Once the direct model has been coded and validated, there are two options to code the derivatives: they can be coded line-by-line based on the forward computing, in order to obtain the Jacobian matrix of the module, or they can also be produced routinely, using an automatic differentiation tool (for example, Tapenade; Hascoet and Pascual, 2013). For SECHIBA-YAO, the derivative process was made line-by-line. The outputs are derived with respect to every input. YAO generates automatically, based on these derivatives, the tangent linear and adjoint model.

Nevertheless, the derivative process introduced errors related to the coding process, to inexact derivatives (e.g., expressions that were not differentiable). In order to reduce it to a minimum number of bugs, the adjoint of the model was validated (as it was made with the direct model). This guarantees the accuracy when performing assimilation. The validation of the adjoint model is presented in Sect. 4.1.

In this section we present several experiments that have been realized using the SECHIBA-YAO system. They were designed to control the 11 most influential internal parameters of SECHIBA when we assimilate the land surface temperature (LST).

In order to deal with non-dimensional control parameters with the same order of magnitude, preprocessing has been applied. The control parameters were first divided into two groups. The first group includes physical parameters, which have a physical dimension. In the present work, these parameters were normalized by dividing them by their prior values in order to control non-dimensional parameters. In such a way, given that the prior value is the true value (in the case of twin experiments), a value of 1 for these parameters indicates that the control parameter has been correctly reconstructed. The second group corresponds to physical parameters that are multiplied by a “multiplicative factor”, which is dimensionless (Verbeeck et al., 2001). The multiplying factors are the control variables of the second group and are set to 1 at the beginning of the assimilation process. The normalization process on the one hand and the use of multiplicative factors on the other hand allow us to deal with numbers of the same order of magnitude, which facilitates the comparison of the sensitivity of the different control variables in the assimilation process.

In the following, all variables are supposed to be preprocessed, so they are normalized and centered around 1.

SECHIBA inner parameters used in this work. There are five inner parameters involved in the model estimations that are controlled, plus six multiplicative factors, all equal to 1.

The model inner parameters are the following (see Table 1):
rsol

Sensitivity analysis results. Parameter hierarchy according to each site and vegetation fraction. The parameters are ranked by decreasing sensibility.

Scenario properties and description.

In order to show the benefit of data assimilation in SECHIBA, we conducted
several experiments using SECHIBA-YAO. Prior to the assimilation process,
different scenarios were defined for the tests (Table 3). A scenario makes
reference to the experimental conditions. It includes the definition of the
vegetation functioning type (PFT), the type of observation to be assimilated,
the observation sampling, the time sampling, the atmospheric forcing file,
the subset of control parameters, the assimilation window size and the time
of the year to start the assimilation. The different scenarios were
calculated using the adjoint model for several typical conditions of the two
FLUXNET sites selected. The dates presented in this paper are representative
of sunny days in summer or winter, with no perturbation coming from clouds
and without rainfall events. In Eq. (4), we take

In order to show the accuracy of the distributed SECHIBA-YAO code, we present
an analysis that allows us to rank the 11 parameters according to their
sensibility estimated by using the adjoint model and to compare the results
to those obtained by using finite differences. We identify the most sensitive
parameters to the estimation of land surface temperature (LST) by computing
the gradients obtained with the adjoint model. This analysis corresponds to a
first-order sensitivity estimate of the influence of the control parameters
on the land surface temperature. In order to do so, local sensitivities were
determined by computing the parameter gradients both by finite difference and
by adjoint calculation (Saltelli, 2008). This
method is really local and the information provided is related to a definite
point in space. The values of the inner parameters (Table 1) and
multiplicative factors (all equal to 1) represent the initial values where
the experiments have been conducted. Because hum

The sensitivity analysis was performed for a subset of inner parameters
related to the energy and water physical processes on bare soil (PFT 1) and
agricultural C3 crop (PFT 12), in order to quantify the role of the
vegetation in the land surface temperature parameters' sensitivity. The land
functional types are useful for distinguishing the different soil types. In
the present case we used the agricultural C3 grass type whose parameters are

The work was done on a daily basis, in order to observe the diurnal variations of sensitivities. At each half-hour time step, model outputs are computed. At each time step, a gradient is computed in order to have the updated gradient value. As we make the assumption that the errors in prior values are very large in comparison with errors in observations, we discard the background term in the cost function (defined in Sect. 2). This simplification is valid as soon as the system is overdetermined (i.e., the number of control parameters is smaller than the number of observations). The initial values of the parameters (before optimization) are those of Table 1. We recall that for numerical purposes, the control parameters have been normalized in order to have the same order of magnitude (i.e., equal to 1). Calculations were performed for both FLUXNET sites considered in this work.

Figure 4 compares, for 28 August 1996 at Harvard Forest, the sensitivities computed for each control parameter with both finite differences and model gradients. Bare soil results are presented in Fig. 4a. The agricultural C3 crop scenario is illustrated in Fig. 4b. The efficiency of the adjoint calculation is first demonstrated in these plots, because the 11 desired parameter sensitivities are obtained in a single integration, whereas it takes 11 runs of the model to compute the same quantity using finite differences. By using the same methodology, sensitivity curves were computed at FLUXNET site Kruger Park (Fig. 5). The comparison between sensitivity analysis done using the adjoint and using finite differences shows a very good agreement between the two methods for both sites. The diurnal characteristics of the parameter sensitivities with a maximum around noon in phase with the diurnal variation of solar radiation are clearly visible.

Comparisons for 28 August 1996 at Harvard Forest of the
sensitivities obtained for each control parameter with both the finite
differences and the model gradients computed with the adjoint model.
Sensitivity analysis results for PFT 1 are in

Comparisons for 11 February 2003 at the Kruger Park site of the
sensitivities obtained for each control parameter with both the finite
differences and the model gradients computed with the adjoint model.
Sensitivity analysis results for PFT 1 are in

Table 2 presents, for Harvard Forest and Kruger Park, the 11 parameters
ranked with respect to their influence. According to the four scenarios
defined (two sites and two PFTs), it can be seen that the hierarchy changes
with the vegetation but remains the same for both sites. Parameter hierarchy
revealed that the highest gradient values correspond to those that have the
largest influence on the land surface temperature estimate. Clearly

The parameters

Parameters with persistent positive sensitivity are rsol

Transpiration processes influence directly the land surface temperature in
the presence of vegetation and are the dominant processes at the studied
sites. Therefore

In general, sensitivities are higher in bare soil conditions for the control
parameters, except for min

The next section presents the different assimilation experiments that we have performed using the SECHIBA-YAO software.

Twin experiments permit one to check the robustness of the variational assimilation method by assimilating synthetic data. First the direct model is run with a set of parameters Ptrue (the initial conditions) in order to produce pseudo observations of land surface temperature LST. Then Ptrue is randomly noised to obtain Pnoise. Assimilations of land surface temperature LST were then performed in the model run with Pnoise as new initial conditions for the control parameters during several days (most of the time, 1 week), leading to a new set of optimized parameters denoted as Passim. Passim is then compared to Ptrue in order to estimate the performances of the assimilation process. Five different assimilation experiments were performed. These experiments are available in the distributed version of SECHIBA-YAO.

The 10 most sensitive parameters are considered in the twin experiments (all
the above parameters except min

Characteristics of the scenarios for each of the twin experiments.

Sampling frequencies for Experiment 1.

Results of Experiment 1 using the Harvard Forest and Kruger Park
sites.

Experiment 2 (different amplitudes of random noise in the
observations) using the Harvard Forest and Kruger Park sites. We present the
mean values for 500 experiments:

Results for Experiments 3 (PFT 1) and 4 (PFT 12). RMSE of model
fluxes

Results for Experiment 5 (PFT 12). RMSE of model fluxes

A scenario for a single experiment is defined by several properties described
in Table 3. Scenarios for all the assimilation experiments are presented in
Table 4. All parameters are controlled at the same time. The duration of each
assimilation experiment is 1 week or 1 month, depending on the experiment.
The time steps

In Experiments 1 and 2, the six most sensitive parameters are controlled. In both cases the vegetation type is PFT 12. In Experiment 1 several observation assimilation samplings are tested, going from 30 min up to 24 h. During 1 month, five independent assimilation tests were run for each observation sampling. In Experiment 2, a weighted random noise was introduced in the observations, going from 10 up to 50 % of the true value of the observation. Both Experiments 1 and 2 use constant perturbations of the control parameters (50 % of its prior value for Experiment 1 and 10 % for Experiment 2) in order to assess the impact of varying the observation sampling and the noise in the observations.

In Experiment 3 the five most sensitive parameters according to the sensitivity analysis (Table 2) were controlled in bare soil conditions (PFT 1) at the Harvard Forest and Kruger Park sites. In this experiment the noise added on the prior values is 50 %.

In Experiment 4 the five most sensitive parameters for each PFT were controlled in the conditions of agricultural C3 (PFT 12), according to the sensitivity analysis (Table 2), in the Harvard Forest and Kruger Park sites. In doing so, we were able to assess the effect of the vegetation fraction on the assimilation system. In addition, taking only the most sensitive parameters in the control set permitted us to increase the assimilation performances, given that the more the observed variable is sensitive to a parameter, the easier the minimization process finds its optimal value, consequently reducing the estimation error. In this experiment the noise added on the prior values is 50 %.

In Experiment 5, all parameters, except min

Experiment 1 investigates the impact of the observation sampling (30 min, 2 h, 6 h, 12 h, 24 h) in the assimilation, since varying the observation frequency leads to varying the number of observations available. Each test was labeled with a number. This number serves as a reference to compare the different results. Table 5 presents the several tests we conducted as well as their initial conditions. For example, in Test 4, only two observations per day are taken at noon and at midnight. In Test 5, we have one observation per day, taken at noon, and so on.

Prior and final errors before and posterior to the assimilation process are presented in Table 6 for the Kruger Park and Harvard Forest sites. The columns represent the different assimilations performed with different frequency sampling in the observations. Five independent assimilations were done for each test. Table 6 reports the mean value of the performances of the assimilation system. Even though small errors were found for the different tests, we do notice that the assimilation system is sensitive to the observation sampling.

The contribution of the observations is demonstrated by an improvement in
the optimization when increasing the frequency of observations, both for the
controlled parameters and the computed fluxes

Experiment 2 aims at studying the impact of introducing a random noise in the
synthetic observations. The random noise follows a normal distribution with
zero mean and variance 1. The perturbed observations are computed using the
following equation:

We note in Table 7a and b that the parameter restitution is degraded when adding random noise to the observations. This shows that the sensitivity of the assimilation system to the noise affecting the LST observations is quite high. When increasing the amplitude of the error, the various errors obtained for the three tests not only suggest the need to take into account the quality of the observations in the model, but also the fact that the parameters are not affected in the same way by the data uncertainties. However, perturbations are still limited and a deeper exploration should be performed to assess the impact on the assimilation performance of noisy observations.

The RMSE errors of the assimilations for Experiments 3, 4 and 5 are presented
in Tables 8 and 9, corresponding to the Harvard Forest and Kruger Park sites.
For all the experiments the noise added on the parameters was 50 %. In
Experiment 3 for PFT 1, the mean errors in the retrieved values for all the
control parameters are on the order of 10

Comparing the results from Experiments 3 and 4 to Experiment 5, degradation in fluxes and parameter restitution can be observed. Effectively, we find higher errors in the fluxes and the final control parameters when increasing the size of the control parameter set (Experiment 5). The best performances in the parameter restitution are obtained when controlling five parameters only. When we control the 10 most sensitive parameters, as in Experiment 5, degradation in the final value of the parameters is observed. Indeed, the larger the control parameter set, the more easily the cost function may converge toward a local minimum (that can be far from the global optimum). In addition, it is difficult to retrieve accurately parameters that are insensitive to LST; thus, the assimilation of this variable in order to optimize these parameters is not efficient.

In this study the adjoint of SECHIBA was implemented using adjoint semi-generator software denoted YAO. The land surface temperature gradients with respect to each control parameter were computed by SECHIBA-YAO, which permitted us to carry out a sensitivity analysis of the parameter influence on the synthetic LST estimation on the one hand and to conduct several assimilation experiments on the other hand.

The first contribution of this paper was the sensitivity analysis results. They showed exactly which parameters of the model are the most sensitive and have to be controlled during the assimilation process. However, it is important to mention that sensitivity analysis depends on the region, the forcing, the PFT, and the time period (hour and day), among other factors. Once the parameter hierarchy was set, twin experiments were performed for different scenarios, aiming at testing the robustness of the assimilation scheme. The second contribution of this work is that we showed the usefulness of the variational data assimilation of LST (land surface temperature) for improving the SECHIBA parameter estimations. LST assimilation has the potential to improve the LSM parameter calibration, by adjusting them properly during the control process. In a forecasting approach, this can be valuable, due to the fact that the simulation can be more reliable, since the model parameters are fitted on actual measurements. Improvement in the fluxes computed by the model after the assimilation of LST was demonstrated. Twin experiments showed the power of variational data assimilation to improve the model parameter estimation. Different experiments conducted for different scenarios and forcing sites were successfully accomplished, meaning that a reduction in the fluxes errors was obtained by introducing information given by the LST synthetic observations. In addition, the influence of the size of the control parameter set in the assimilation performance was proven.

Estimating only the most sensitive parameters to LST increases our chances of finding acceptable values for them after assimilation. Optimizing a larger control parameter set, as in Experiment 5, makes it more difficult for the assimilation system to retrieve the prior value of the control parameters with a high accuracy. After presenting the different experiments, some aspects of data assimilation arise when analyzing the results. The first one concerns the presence of several local minima due to the nonlinearity of the SECHIBA model. Second, we have also shown a significant improvement in the assimilation performances when the sampling frequency of observations is increased, as evaluated in Experiment 1. This suggests that the ability of the model to be constrained depends, among other things, on the observation frequency. By decreasing the number of observations, the control parameter adjustment is less accurate, and the assimilation procedure estimates variables with a larger error. Therefore it can be verified that if we have more LST observations, the assimilation system will fit the parameters better so improved estimations are obtained.

Finally, we observe a strong dependence between the quality of observations
and the parameter restitution, as shown in Experiment 2. It seems crucial to
take into account the uncertainty in the observations, because they do not
affect the assimilation performance in the same way when estimating each
parameter in the minimization process. If we compare Experiments 1 and 2
(Tables 6 and 7), it is clear that the noise on the observations dramatically
increases the mean error on the computed fluxes

Adding extra parameters to the control set increases the complexity of the cost function. By taking into consideration the results of assimilation of LST when controlling the 10 most sensitive parameters (Experiment 5), we could see that, after having made several assimilation runs, LST does not provide enough information to constrain the parameter set, in order to improve the estimation of the SECHIBA parameters. In the case of controlling all parameters we cannot hope to improve the estimation of all model parameters unless we assimilate additional observations or we add a background term in the cost function.

Assimilation with the YAO approach permits the implementation of different assimilation scenarios in a very flexible way when performing different twin experiments: the control parameters and the observed variables (once the adjoint code has been generated), the assimilation windows, the observation sampling, the time sampling and other different features can be changed easily.

A distributed version of SECHIBA-YAO code and several examples with different
scenarios are available at a GitHub dedicated site. YAO can be downloaded
upon request at

The distributed version of SECHIBA-YAO provides an opportunity for scientists to perform their own assimilation. The distributed version allows the control of the five most influent internal parameters of SECHIBA, depending on the vegetation type. In addition, LST or satellite brightness temperature can be used as observations.

The distributed version of SECHIBA-YAO is available in a GitHub repository
(

The version of SECHIBA implemented in YAO includes the two-layer hydrology of Choisnel (1977), mentioned in Sect. 2. SECHIBA original code is implemented in a modular scheme with a set of well-defined routines, independent in its processes and with a single entry point (a main routine handling the rest of the functionalities).

A set of prognostic variables is defined for each module and its assignation depends on the forcing conditions, physical phenomena, etc. SECHIBA can work coupled with the other components of ORCHIDEE (STOMATE and LPJ) or it can be used offline, as it was used in this work. Once SECHIBA is coded in YAO, it can be easily coupled with the other modules of ORCHIDEE.

In SECHIBA, the different routines were originally coded in the Fortran language and can be run at any resolution and over any region of the globe. The version of SECHIBA implemented in YAO is denoted SECHIBA-YAO and follows the Fortran code. In its present form, it can only be run at one point at a time.

ORCHIDEE uses MODIPSL and IOIPSL in its internal processes (see

The main routines in SECHIBA-Fortran are presented in Fig. A1. These are also the routines considered in the YAO implementation of the model. First, DIFFUCO computes the diffusion and plant transpiration coefficients based on the atmospheric conditions, solar fluxes, dry soil height, soil moisture stress and fraction of vegetation. ENERBIL corresponds to the energy budget module. Surface energy fluxes related to the soil are computed, based on atmospheric conditions, radiative fluxes, resistances, surface-type fractions and surface drag. HYDROLC is the hydrological budget module, taking as inputs the rainfall, snowfall, evaporation components, soil temperature profile and vegetation distribution. CONDVEG helps in the computation of the vegetation conditions. The thermodynamics of the model is computed in THERMOSOIL, based on a seven-layer soil profile. Finally, SLOWPROC computes the soil slow processes. When SECHIBA is decoupled from STOMATE, this module also deals with the LAI evolution.

The different SECHIBA components are interconnected as shown in Fig. A2. The output of the different modules serves as inputs for the next one, thus resulting in an interdependency among modules to be considered when modeling SECHIBA-YAO.

SECHIBA subroutines and their corresponding outputs. Source: Benavides Pinjosovsky (2014).

SECHIBA hyper-graph, showing general model dynamics. Source: Benavides Pinjosovsky (2014).

This work used eddy covariance data acquired by the FLUXNET community and in
particular by the following networks: AmeriFlux (U.S. Department of Energy,
Biological and Environmental Research, Terrestrial Carbon Program and
AfriFlux) and the global FLUXNET project
(