Consistent assimilation of multiple data streams in a carbon cycle data assimilation system

. Data assimilation methods provide a rigorous statistical framework for constraining parametric uncertainty in land surface models (LSMs), which in turn helps to improve their predictive capability and to identify areas in which the representation of physical processes is inadequate. The increase in the number of available datasets in recent years allows us to address different aspects of the model at a variety of spatial and temporal scales. However, combining data streams in a DA system is not a trivial task. In this study we highlight some of the challenges surrounding multiple data stream assimilation for the carbon cycle component of LSMs. We give particular consideration to the assumptions associated with the type of inversion algorithm that are typically used when optimising global LSMs – namely, Gaussian error distributions and linearity in the model dynamics. We explore the effect of biases and inconsistencies between the observations and the model (resulting in non-Gaussian error distributions), and we examine the difference between a simultaneous assimilation (in which all data streams are included in one optimisation) and a step-wise approach (in which each data stream is assimilated sequentially) in the presence of non-linear model dynamics. In addition, we perform a preliminary investigation into the impact of correlated errors between two data streams for two cases, both when the correlated observation errors are included in the prior observation error covariance matrix, and when the correlated errors are ignored. We demonstrate these challenges by as-similating synthetic observations into two simple models: the ﬁrst a simpliﬁed version of the carbon cycle processes represented in many LSMs and the second a non-linear toy model. Finally, we provide some perspectives and advice to other land surface modellers wishing to use multiple data streams to constrain their model parameters


Introduction
The carbon cycle is an important component of the Earth system, especially when considering the climatic impact of rising greenhouse gas concentrations from fossil fuel emissions and land use change.It is estimated that the oceans and land surface absorb approximately half of the CO 2 emissions due to anthropogenic activity, but uncertainties remain in the strength and location of sources and sinks, as well as in predictions of future trends (Ciais et al., 2013).Observations allow us to understand the system up until the present day and provide inference about how ecosystems may respond to future change.However, their use in estimating model state variables and boundary conditions is limited beyond diagnostic purposes, and they can be restricted in their spatial coverage.They also do not contain all the information we may need to distinguish between the complex interactions that may occur between many different processes.Incorporating our current knowledge of physical mechanisms of biogeochemical cycles, including carbon, C, dynamics, into land surface models (LSMs) represents a promising approach for analysing these interacting effects, upscaling observations to larger regions, and making future predictions.However, the models can be limited by the lack of process representation, either due to gaps in our knowledge or in our technical and Published by Copernicus Publications on behalf of the European Geosciences Union.computing capability.As a result, model evaluations reveal that not all variables are well-captured by the model under current conditions (Anav et al., 2013), and the spread between model projections is still very large (Sitch et al., 2015).
Aside from model structural and forcing errors, one source of uncertainty is related to the parameter (i.e.fixed) values of a model.Model-data fusion, or data assimilation (DA), allows the calibration, or optimisation, of these values by minimising a cost function that quantifies the model-data misfit while accounting for the uncertainties inherent in both the model and data in a statistically rigorous framework.The C cycle component of most LSMs is complex and contains a large number of parameters; luckily however, there are an increasing number of in situ and remote-sensing-based data streams that can be used for parameter optimisation.These data bring information on different spatial and temporal scales, such as atmospheric CO 2 concentration data: measured at surface stations at continental to global scales, providing information from synoptic timescales to inter-annual variability (IAV) and long-term trends; eddy covariance net CO 2 (net ecosystem exchange -NEE) and latent (LE) and sensible heat fluxes: measured at half-hourly intervals at many sites across different ecosystems/regions, providing information at seasonal to inter-annual timescales; satellite-derived measures of vegetation dynamics, including "greenness" indices (i.e. the Normalised Difference Vegetation Index -NDVI), fraction of absorbed photosynthetically active radiation (FAPAR) and leaf area index (LAI): provided at global scales, and up to daily time steps spanning more than a decade, thus capturing IAV and long-term trends (though usually with a trade-off between spatial and temporal resolution); satellite-derived measurements of soil moisture and land surface temperature: measured at the same temporal and spatial scales as the satellite-derived observations of vegetation dynamics; aboveground biomass measurements: currently taken at only one or a few points in time at plot scale up to regional scale from aircraft and satellite data, or are estimated from allometric relationships at each site; soil C stock estimates: usually only taken at one point in time at plot scale; ancillary data on vegetation characteristics such as tree height or budburst (such data are only measured at certain well-instrumented sites).
Researchers are increasingly attempting to bring these sources of data together to constrain different parts of a model at different spatio-temporal scales within a multiple data stream assimilation framework (e.g.Richardson et al., 2010;Keenan et al., 2012;Kaminski et al., 2012;Forkel et al., 2014;Bacour et al., 2015).However, whilst the potential benefit of adding in extra data streams to constrain the C cycle of LSMs is clear, multiple data stream assimilation is not as simple as it may seem.This is particularly true when considering a regional-to-global-scale, multiplesite optimisation of a complex LSM that contains many parameters, and which typically takes on the order of minutes to an hour to run a one-year simulation.When using more than one data stream there is the option to include all data streams together in the same optimisation (simultaneous approach) or to take a sequential (step-wise) approach.Mathematically, the optimal approach is the simultaneous, but complications may arise due computational constraints related to the inversion of large matrices or the requirement of numerous simulations, particularly for global datasets (e.g.Peylin et al., 2016) and/or due to the "weight" of different data streams in the optimisation (e.g.Wutzler and Carvalhais, 2014).On the other hand, in a step-wise assimilation the parameter error covariance matrix has to be propagated at each step, which implies that it can be computed.If the parameter error covariance matrix can be properly estimated and is propagated between each step, the step-wise approach should be mathematically equal to simultaneous.However, many inversion algorithms (e.g.derivative-based methods that use the gradient of the cost function to find its minimum) require assumptions of model (quasi-)linearity and Gaussian parameter and observation error distributions (Tarantola, 1987, p. 195).If these assumptions are violated, or the error distributions are poorly defined, it is likely that the step-wise approach will not be equal to the simultaneous approach, because information will be lost at each step due to an incorrect calculation of the posterior error covariance matrix at the end of each step.An incorrect description of the observation(-model) error distribution could result from (i) the wrong assumption about the distribution of the residuals between the observation and the model, (ii) a poor characterisation of the error correlations, (iii) an incompatibility between the model and the data (possibly due to a model structural issue or differences in how a variable is characterised), or (iv) a bias in the observations that is not unaccounted for (i.e. is treated as a random error).As mentioned, whilst a simultaneous optimisation is mathematically more rigorous in the sense that the error correlations are treated within the same inversion, if the prior distributions are not properly characterised any bias may be aliased to the wrong parameters (Wutzler and Carvalhais, 2014), and possibly more so than in a step-wise approach.This tutorial-style paper highlights some of the challenges of multiple data stream optimisation of carbon cycle models discussed above.Note that we do not aim to explore all possible issues related to a DA system, for example the choice of the cost function, minimisation algorithm, or the characterisation of the prior error distributions; indeed, pre-Geosci.Model Dev., 9,[3569][3570][3571][3572][3573][3574][3575][3576][3577][3578][3579][3580][3581][3582][3583][3584][3585][3586][3587][3588]2016 www.geosci-model-dev.net/9/3569/2016/vious studies have investigated such aspects at length (e.g.Fox et al., 2009;Trudinger et al., 2007), and therefore we refer the reader to these papers for more information.Section 2 reviews recent carbon cycle multiple data stream assimilation studies with reference to some of the aforementioned challenges.Section 3 demonstrates some these issues related to multiple data stream assimilation with synthetic experiments using two simple models: one a simplified version of the carbon dynamics included in many LSMs and the other a "toy" model designed to demonstrate the issues that arise with complex, non-linear models.Finally Sect. 4 provides some advice to land surface modellers wishing to carry out multiple data stream assimilation to constrain the parameters of their model.
2 Review of existing multiple data stream carbon cycle data assimilation studies

Extra constraint from multiple data streams
Most site-based carbon cycle data assimilation studies have used eddy covariance measurements of NEE and LE fluxes to constrain the relevant parameters of ecosystem models.However, a few studies have also made use of chamber flux soil respiration data and field measurements of vegetation characteristics (e.g.tree height, budburst, LAI) or estimates of litterfall and carbon stocks as ancillary information (e.g.Fox et al., 2009;Keenan et al., 2012;Thum et al., 2016;Van Oijen et al., 2005;Richardson et al., 2010;Williams et al., 2005).Two recent studies combined high-resolution satellite-derived FA-PAR data with in situ eddy covariance measurements to optimise parameters related to carbon, water and energy cycles of the ORCHIDEE and BETHY LSMs at a couple of sites (Bacour et al., 2015;Kato et al., 2013, respectively).At global scales the number of studies that use multiple data streams from satellites or large-scale networks to optimise LSMs has been increasing in recent years, although this remains a relatively new area of research.CCDAS-BETHY was the first global carbon cycle data assimilation system (CCDAS) to make use of the high-precision measurements of the atmospheric CO 2 concentration flask sampling network (Rayner et al., 2005;Scholze, 2003) to constrain parameters of the terrestrial carbon cycle model BETHY (Knorr, 2000).Since its first application using only atmospheric CO 2 concentration data, CCDAS-BETHY has been further developed to consistently assimilate multiple data streams both at local and global scales.In particular, Kaminski et al. (2012) optimised 70 process parameters, plus one initial condition, by simultaneously assimilating a satellite-derived FAPAR product derived from the Medium Resolution Imaging Spectrometer (MERIS; Gobron et al., 2008) and flask samples of atmospheric CO 2 at two sites from the GLOBALVIEW product (GLOBALVIEW-CO2, 2008) at coarse resolution.More recently, Scholze et al. (2016) demonstrated the added value of assimilating remotely sensed soil moisture data in addition to atmospheric CO 2 concentration data.They used the same coarse resolution set-up of CCDAS-BETHY as Kaminski et al. (2012) and CO 2 observations from 10 sites of the GLOBALVIEW product (GLOBALVIEW-CO2, 2012) together with the SMOS L3 daily soil moisture product (version 246;CATDS-L3, 2012).
Three other global CCDASs based on LSMs that are part of Earth system models (ESMs) have been developed in recent years (Peylin et al., 2016;Raoult et al., 2016;Schürmann et al., 2016), two of which used multiple data streams as constraints.Schürmann et al. (2016) optimized model parameters and initial conditions of the land component, JS-BACH (Raddatz et al., 2007), of the MPI ESM (Giorgetta et al., 2013) using atmospheric CO 2 concentration data from 28 sites and the TIP-FAPAR product (Pinty et al., 2007) as joint constraints over a 5-year period.As part of their study they evaluated the mutual benefit of each data stream in a fully factorial design.Peylin et al. (2016) used three different data streams as global constraints for the ORCHIDEE LSM (Krinner et al., 2005), which forms the land surface component of the IPSL ESM (Dufresne et al., 2013), in a multisite, step-wise assimilation approach.First, satellite-derived NDVI data from the MODIS instrument were used in a similar manner to FAPAR as a proxy for vegetation greenness, in order to constrain the phenology parameters at 60 sites for 4 temperate and boreal deciduous plant functional types (PFTs) (MacBean et al., 2015), followed by NEE and LE observations at 78 FLUXNET sites for 7 PFTs to optimise all the carbon-related parameters (Kuppel et al., 2014), and finally atmospheric CO 2 concentration measurements from 53 sites in the GLOBALVIEW network (GLOBALVIEW-CO2, 2013), which predominantly provided a constraint on the initial magnitude of the soil carbon reserves in the model.The three global multiple data stream CCDASs have allowed an improvement in both the mean seasonal cycle as well as the trend of net land surface CO 2 exchange, especially with the inclusion of the atmospheric CO 2 data (Kaminski et al., 2012;Peylin et al., 2016;Schürmann et al., 2016).Atmospheric CO 2 concentration observations are one of the most accurate, long-term datasets in environmental science, and they provide important information about the global CO 2 sink capacity by the land and ocean.
Many of the aforementioned studies reported that adding extra data streams helped to constrain unresolved sub-spaces of the total parameter space.Richardson et al. (2010) and Keenan et al. (2012) concluded that using ancillary information (e.g.woody biomass increment, field-based LAI and chamber measurements of soil respiration), in addition to NEE data, provided a valuable extra constraint on many model parameters, which improved both the bias in model predictions and reduced the associated uncertainties.The results of the REFLEX model-data fusion inter-comparison project also indicated that observations of the different carbon pools would help to constrain parameters such as root allocation and woody turnover that were not well resolved using NEE and LAI data alone (Fox et al., 2009).Similarly at global scale, Scholze et al. (2016) found that assimilating SMOS soil moisture data in addition CO 2 observations reduced the ambiguity in the solution space compared to assimilating CO 2 alone; about 30 parameters out of the 101 were resolved compared to 15 without SMOS data.Bacour et al. (2015) and Schürmann et al. (2016) both reported that the addition of FAPAR data bought extra information on the phenology-related processes in the model, which therefore resulted in different posterior C-flux-related parameter values than when assimilating NEE or atmospheric CO 2 data alone.An interesting aspect of the Kaminski et al. (2012) study was that the inclusion of FAPAR in addition to atmospheric CO 2 concentration samples resulted in a particular improvement for the hydrological fluxes in the model, thus demonstrating the importance of assessing the potential benefit for model variables that may not have been the main target of optimisation.
On the other hand, Williams et al. (2005) observed that one-off, or rarely taken, measurements of carbon stocks were unable to constrain components of the carbon cycle to which they were not directly related.This raises the issue of the relative influence of different data streams in a joint assimilation, particularly if the number of observations for each is vastly different, which will be the case when assimilating both half-hourly C flux data in addition to C stock observations that are typically available at an annual timescale or greater.The spatial distribution of each data stream is also important, especially for heterogeneous landscapes (Barrett et al., 2005;Alton, 2013).
Although a number of multiple data stream assimilation studies exist at various scales, very few studies have specifically investigated the added benefit of different combinations of data streams in a factorial study, with a few notable exceptions (Barrett et al., 2005;Richardson et al., 2010;Kato et al., 2013;Keenan et al., 2013;Bacour et al., 2015;Schürmann et al., 2016).Kato et al. (2013) and Bacour et al. (2015) both evaluated the complementarity of eddy covariance and FA-PAR data streams at site level, i.e. the impact of assimilating one individual data stream on the other model state variable, as well as when both data streams were included in the optimisation (see discussion in Sect.2.2).The study of Keenan et al. (2013) was particularly notable in its aim to quantify which data streams provide the most information (in terms of model-data mismatch) and how many data streams are actually needed to constrain the problem.They reported that of the 17 field-based data streams available, projections of future carbon dynamics were well-constrained with only 5 of the data sources and, crucially, not with eddy covariance NEE measurements alone.These results may be specific to this site or type of ecosystem, but their study highlights the need for further research in this area and, in particular, for synthetic data experiments that allow us to understand which data will be the most useful for a given scientific question.
This will also enable researchers to plan more efficient measurement campaigns with experimentalists, as also pointed out by Keenan et al. (2012).

Issue of bias and inconsistencies between the observations and the model
Despite the theoretical benefit of adding data streams into an assimilation system as additional constraints, several of the aforementioned studies at both site and global scale have reported a bias or inconsistency either between the different observation data streams, or between the observations and the model.This is easily detected when the optimisation of one data stream results in a worse fit than the prior in one or more of the other data streams.Thum et al. (2016) found that the addition of aboveground biomass stocks brought a longerterm constraint on allocation parameters, but they noted an incompatibility when assimilating both annual increment and total biomass data to optimise the longer timescale mortality/turnover parameter.This was due to the fact the total stocks take into account losses related to disturbance and management (e.g.canopy thinning) -processes that were not included in that version of the model.Kato et al. (2013) assimilated SeaWiFS FAPAR (Gobron et al., 2006) and eddy covariance LE measurements at the FLUXNET site in Maun, Botswana.They showed that the individual assimilation of each the two data streams resulted in a perfect (i.e.within the observational uncertainty) fit to the assimilated dataset, but a considerable degradation of the fit to the non-assimilated dataset compared to the prior.A comparison against eddy covariance measurements of gross carbon uptake (gross primary production -GPP) pointed to a bias in the FAPAR data because the fit to the independent GPP data was degraded after assimilating FAPAR data only, while the fit improved after assimilating the LE data only.Nevertheless, the simultaneous assimilation of both data streams achieved a compromise between the two suboptimal results achieved after assimilating only one data stream.The calibration further limited the number of parameters with correlated errors and yielded a higher theoretical reduction in parameter uncertainty and a decrease in the RMSD by 16 % for the GPP data compared to the prior.Bacour et al. (2015) also noted that when assimilating in situ and satellite-derived FAPAR data and in situ NEE and LE flux data from two French FLUXNET sites into the ORCHIDEE LSM, both separately and together, the posterior parameter values changed significantly for the photosynthesis-and phenology-related parameters, depending on the bias between the model and the observations and the correlation between the parameter errors.When NEE data were assimilated alone there was an even stronger positive bias (model-observations) in the start of leaf onset in the FAPAR data than in the prior simulations, and no improvement in the maximum value.This was likely due to the fact that there were enough degrees of freedom to fit Geosci.Model Dev., 9,[3569][3570][3571][3572][3573][3574][3575][3576][3577][3578][3579][3580][3581][3582][3583][3584][3585][3586][3587][3588]2016 www.geosci-model-dev.net/9/3569/2016/ the NEE without changing the phenology-related parameters.Similarly, the fit to the NEE was degraded when the model was only optimized with FAPAR data.The model was able to fit the maximum FAPAR, but this resulted in an adverse effect on the carbon assimilation capacity of the vegetation.The authors argued this was related to incompatibilities between the FAPAR and both the model and NEE measurements, possibly due to the larger spatial footprint of the satellite-derived FAPAR data and/or inaccuracies in the retrieval algorithm.However, given that assimilating in situ FAPAR also degraded the fit to the NEE, they also speculated that the culprit may be an inconsistency between the model and the data due to the different characterisation of FAPAR or LAI in the model compared to the satellite retrieval algorithm.For example, satellite-derived greenness measures (FAPAR/NDVI) also contain information on the non-green elements of vegetation, but the model only simulates green LAI.Furthermore parameters and processes in models have been developed at certain temporal and spatial scales; vegetation is often simply represented as a "big leaf" model in LSMs, taking no account of vertical canopy structure or the spatial heterogeneity in a scene, thus presenting an additional source of inconsistency compared to what is measured.The joint (simultaneous) assimilation of all three data streams in Bacour et al. (2015) reconciled the different sources of information, with an improvement in the model-data fit for NEE, LE and FAPAR.However, the compromise achieved in the joint assimilation was only possible when the FAPAR data were normalised to their maximum and minimum values, which partially accounted for any bias in the magnitude of the FAPAR or inconsistency with the model.The story of biases and apparent inconsistencies in FAPAR data does not end there.A bias correction was also necessary in the study by Kaminski et al. (2012) with CCDAS-BETHY using the MERIS FAPAR product in addition to atmospheric CO 2 data (see above).They found that optimisation procedure failed when using the original FAPAR product because the FAPAR data were biased towards higher values.Only after applying a bias correction on the FAPAR data prior to assimilation was the optimisation successful.Schürmann et al. (2016) also reported the need to reduce a prior model bias in FAPAR.Even though the assimilation successfully corrected for this FAPAR bias, the impact of the prior bias was evident in the spatial patterns of the modelled heterotrophic respiration.Assimilating FAPAR data alone therefore resulted in a slight degradation in the net C flux and consequently led to incorrect simulations of the atmospheric CO 2 growth rate.The addition of CO 2 as a constraint prevented this degradation and resulted in a compromise in which FA-PAR helped to disentangle these processes and find different parameter values compared to the CO 2 -only case, thus improving the fit to both data streams.Forkel et al. (2014) discovered an apparent inconsistency between satellite-derived FAPAR and GPP data in tundra regions when using these data (plus satellite-derived albedo) to optimise the LPJmL LSM.They too speculated that the data might be positively biased, in this case due to issues with satellite measurements taken at high sun zenith angles.However, they gave alternative suggestions, one being that an inadequate model structure may be at fault -for example, LPJmL does not include vegetation classes corresponding to shrub, moss and lichen species that are dominant in these ecosystems.They also noted that the GPP product they used, which is based on a model tree ensemble upscaling of FLUXNET data (Jung et al., 2011), might contain representation-related biases, given that there are very few FLUXNET stations in tundra regions.The issue of representation errors of sites has been touched upon before (e.g.Raupach et al., 2005).Alton ( 2013), who performed a global multi-site optimisation of the JULES LSM with a diverse range of data including satellite-derived LAI, FLUXNET, soil respiration and global river discharge, raised the point that FLUXNET sites are known to be large carbon sinks, which could potentially result in biased global NEE estimates, when using these data for optimising a model.
Resolving these apparent inconsistencies was beyond the scope of most of these studies, aside from applying a bias correction where one was evident.Aside from simple corrections, Quaife et al. (2008) and Zobitz et al. (2014) suggested that LSMs should be coupled to radiative transfer models to provide a more realistic and mechanistic observation operator between the quantities simulated by the model and the raw radiance measured by satellite instruments.This proposition follows experience gained in the case of atmospheric models for several decades (Morcrette, 1991).

Step-wise vs. simultaneous assimilation
The paper by Alton (2013) documents the only previous study to have used a step-wise assimilation approach with more than two data streams, and they found that the final parameter values were independent of the order of data streams assimilated.No studies in the LSM community to date have explicitly examined a step-wise vs. simultaneous assimilation framework with the same optimisation system and model.The step-wise assimilation with the ORCHIDEE-CCDAS detailed in Peylin et al. (2016) has been compared to a simultaneous optimisation using the same three data streams (as well as the same model and inversion algorithm) as part of an ongoing study.In the simultaneous optimisation, the addition of NEE or atmospheric CO 2 concentration measurements resulted in a smaller reduction in the fit to MODIS NDVI compared to the step-wise approach presented in Peylin et al. (2016).As the NDVI data were normalised to the 95th percentile range this was not a result of a simple bias in the magnitude of the data.Rather, it was likely due to inconsistencies between the model and data, as discussed by Bacour et al. (2015, and see above).It is important to reiterate that there should be no difference between the step-wise approach and the simultaneous www.geosci-model-dev.net/9/3569/2016/Geosci.Model Dev., 9, 3569-3588, 2016 approach given an adequate description of the error covariance matrices and compliance with the assumptions associated with the inversion algorithm used.However, in practice it is very difficult to define a probability distribution that properly characterises the model structural uncertainty and observation errors accounting for biases and non-Gaussian distributions.This can lead to issues within a simultaneous assimilation, as described above, and a greater risk of differences between a step-wise and simultaneous assimilation.Nevertheless a step-wise assimilation may be useful on a provisional basis for dealing with possible inconsistencies, as discussed in the introduction.For example in the step-wise approach of Peylin et al. (2016) the uncertainty (variance) of the phenology-related parameters was strongly constrained by the satellite data in the first step (and was propagated to the second step), and therefore the later optimisations using NEE and atmospheric CO 2 data in steps 2 and 3 found alternative solutions for the C-flux-related parameters that provided a better fit to all data streams.Wherever possible however, a simultaneous optimisation is favourable because the strong parameter linkages between different processes are maintained, and therefore biases and inconsistencies between the model and observations should be addressed prior to optimisation.
3 Demonstration with two simple models and synthetic data The three sub-sections in Sect. 2 highlight examples within a carbon cycle modelling context of the three main challenges faced when performing a multiple data stream assimilation, namely, (i) the possible negative influence of including additional data streams on other model variables; (ii) the impact of bias in the observations, missing model processes or inconsistency between the observations and model (as discussed in Sect.2.2); and (iii) the difference between a stepwise and simultaneous optimisation (and the order of data stream assimilation) if the assumptions of the inversion algorithm are violated, which is more likely to be the case with non-linear models when using derivative-based algorithms and least-squares formulation of the cost function (as discussed in Sect.2.3).The latter point is important because derivative methods (compared to global search) are the only viable option for large-scale, complex LSMs given the time taken to run a simulation.In addition to the above three challenges we have performed a preliminary investigation into the impact of correlated errors between the data streams, which is a topic that has not yet been studied in the context of carbon cycle models to our knowledge.This section aims to demonstrate these challenges using simple toy models and synthetic experiments where the true values of the parameters are known.Thus the following sections include a description of the toy models together with the derivation of synthetic observations, the inversion algorithm used to optimise the model parameters and the experiments performed, followed by the results for each test case.

Simple carbon model
To demonstrate the challenges of multiple data stream assimilation in a carbon cycle context, we have chosen a test model that represents a simplified version of the carbon cycle dynamics typically implemented in most LSMs.The model has been well-documented in Raupach (2007) and has been used previously in the OptIC DA inter-comparison project (Trudinger et al., 2007).It is based on two equations that describe the temporal evolution (on a daily time step) of two living biomass (carbon) stores, s 1 and s 2 , and the biomass fluxes between these two stores: In this model formulation, s 1 and s 2 are approximately equivalent to above-and belowground biomass stocks.The unknown parameters p 1 , p 2 , k 1 and k 2 will be optimised in the inversions.The first term on the right-hand side of Eq. ( 1) corresponds to the net primary production (NPP), i.e. the carbon input to the system as a function of time, represented by F (t), weighted by factors (the two fractions in parentheses) that account for the size of both pools, in order to introduce a limitation on NPP.The F (t) forcing term is a random function of time ("log-Markovian" random process) representing the effect of fluctuating light and water availability due to climate on the NPP (Raupach, 2007, Sect. 5.3).The litterfall is an output of s 1 (aboveground biomass store) and an input to s 2 (belowground biomass store) and is calculated as a constant fraction (k 1 ) of s 1 (defined by k 1 s 1 ).Heterotrophic respiration (Rh) is a constant fraction (k 2 ) of the belowground carbon reserve s 2 and is represented k 2 s 2 .The constant s 0 is a "seed production" term set to 0.01 (i.e.not optimised) to ensure the model does not verge towards zero.A more detailed description of the properties of the model is given in Trudinger et al. (2007, Sect. 2.1), and an in-depth analysis of the dynamical behaviour of the model is provided in Raupach (2007).Synthetic observations of both s 1 and s 2 variables were used to optimise all the unknown parameters in the model (see Sect. 3.1.5).

Non-linear toy model
Although the simple carbon model contains a non-linear term it is essentially still a quasi-linear model.In order to illustrate the challenges associated with multiple data stream data assimilation for more complex non-linear models, especially when using derivative methods, we defined a simple nonlinear toy model based on two equations with two unknown Geosci.Model Dev., 9, 3569-3588, 2016 www.geosci-model-dev.net/9/3569/2016/parameters: where s 1 and s 2 also correspond to two model state variables (as for the simple C model), a and b are the unknown parameters included in the optimisation, and t is the independent variable, which could represent time in a real-world scenario.
Note that this model is not based on any particular physical process associated with land surface biogeochemical cycles, but it does contain typical mathematical functions that are observed in reality and implemented in LSMs.For example, the sinusoidal function (Eq.4) could represent diurnal variations of various processes such as photosynthesis and respiration.Exponential response functions (such as in Eq. 3) are also observed for certain processes, including the temperature sensitivity of soil microbial decomposition.As for the simple carbon model, synthetic observations corresponding to the s 1 and s 2 variables were used to optimise both parameters (see Sect. 3.1.5).

Bayesian inversion algorithm
Most data assimilation approaches follow a Bayesian formalism which, simply put, allows prior knowledge of a system (in this case the model parameters) to be updated, or optimised, based on new information (from the observations).
In order to achieve this we define a "cost function" that describes the misfit between the data and the model, taking into account their respective uncertainties, as well as the uncertainty on the prior information.If we follow a Bayesian formalism and least-squares minimisation approach, and assume Gaussian probability distributions for the model parameter and observation error variance/covariance, we derive the following cost function (Tarantola, 1987): where y is the observation vector, H (x) the model outputs given parameter vector x, R the observation error covariance matrix (including measurement and model errors), x b the a priori parameter values, and B the prior parameter error covariance matrix.This framework leads to a Gaussian posterior parameter probability distribution function and requires that the model and its observation operator are linear.The aim of the inversion algorithm is to find the minimum of this cost function, thereby achieving the best possible fit between the model simulations and the measurements, conditioned on their respective uncertainties and prior information.For cases where there is a strong linear dependence of the model to the parameters (at least for variations in x of the size of those expected in the data assimilation system), and where the dimensions of the problem are not too large, the solution can be derived analytically.If not, as is usually the case with LSMs, there are different numerical methods to find the most optimal parameter values.These include global search methods that randomly search the parameter space and test the likelihood of the parameter set at each iteration, and derivative methods, which calculate the gradient of the cost function at each iteration in order to find its minimum.In this study we use the latter class of methods.More specifically we use a quasi-Newton algorithm that uses both the gradient of the cost function and its derivative (Hessian) to evaluate whether the minimum has been reached (i.e.where the gradient is zero).Thus we obtain the following algorithm for iteratively finding the minimum (Tarantola, 1987, p. 195): where i is the iteration number and H is the Jacobian, or firstorder derivatives, of H , which in this study is determined using a finite difference method.Note that as we are potentially dealing with non-linear models, the quasi-Newton method has been slightly adapted to include the constant scaling factor ε i (with a value < 1.0) to ensure that the algorithm will converge.
Of course no inversion algorithm is perfect, and therefore if the characterisation of the error distribution is inaccurate, or when optimising strictly non-linear models, it is possible that the true "global" minimum of the cost function has not been found.Derivative methods in particular can get stuck in so-called "local minima", preventing the algorithm from finding the true minimum.To address this issue we carry out a number of assimilations with different random first-guess points in the parameter space.If they all result in the same reduction in cost function value, we can have more confidence that the true minimum has been found.
Once the minimum of the cost function has been found, the posterior parameter error covariance can be approximated (using the linearity assumption) from the inverse Hessian of the cost function around its minimum, which is calculated using the Jacobian of the model at the minimum of J (x) (for the set of optimized parameters), H ∞ , following Tarantola (1987): Note that the posterior error covariance matrix can be propagated into the model space to determine the posterior uncertainty on the simulated state variables as a result of the parametric uncertainty (as shown in the coloured error bands in the time series plots -Figs. 1 and 5) using the following matrix product and the hypothesis of local linearity (Tarantola, 1987): www.geosci-model-dev.net/9/3569/2016/Geosci.Model Dev., 9, 3569-3588, 2016 3.1.4 Step-wise vs. simultaneous assimilation Step-wise approach In the step-wise approach each data stream (in our cases s 1 and s 2 , see above) is assimilated sequentially, and the posterior error covariance matrix of Eq. ( 7) is propagated to the next step as the prior in Eq. ( 6).Note that the error covariance matrix can only be propagated if it is calculated within the inversion algorithm, which is the case here but may not be possible in other studies.The following details an example for two data streams. - Step 1 is assimilation of the first data stream, s 1 .The prior parameters, including their values and error covariance (x b and B) are optimised to produce a first set of posterior optimised parameters x 1 with error covariance A 1 .
-Step 2 is assimilation the second data stream, s 2 .The parameters, x 1 , and their error covariance, A 1 , are used as a prior to the optimisation system and further optimised to produce the second (and final) set of posterior optimised parameters, x post , and the associated error covariance A.

Simultaneous approach
Both data streams s 1 and s 2 are included in the optimisation, and all parameters are optimised at the same time.The prior parameters, including their values and error covariance (x b and B) are optimised to produce the posterior parameter vector (x post ) and associated uncertainties A.

Optimisation set-up: parameter values and uncertainty, and generation of synthetic observations
In this study we used synthetic observations that were generated by running the model with known (or "true") parameter values and adding random Gaussian noise corresponding to the defined observation error for both s 1 and s 2 (see Table 1).
We optimised a 10-year time window for the simple carbon model, in order to capture the dynamics of the s 1 and s 2 pools over a time period compatible with typically available observations.For the non-linear toy model, which did not correspond to physical processes in the terrestrial biosphere, we ran a simulation over a window of 100 integrations (steps) of the equations.The observation frequency was daily, corresponding to the time step of the simple carbon model (a value of 1 for the non-linear toy model), and the observation error was set to 10 % of the mean value for each set of pseudo-observations derived from multiple first guesses of the model.The true values of all parameters for both models are given in Table 1, together with their upper and lower bounds (following Trudinger et al., 2007).We have not performed a prior sensitivity analysis to decide to which parameters are important to include in the optimisation, as the model variables are sensitive to all of the (small set of) parameters.However, in the case of a more complex, large-scale LSM it is advisable to carry out such an analysis, particularly given the computational burden of optimising many parameters.In this study the parameter uncertainty (1σ ) was set to 40 % of the parameter range following recent studies (e.g.Bacour et al., 2015).Prior values were chosen from a uniform random distribution bounded by the parameter bounds.

Experiments
The specific objective of the following experiments was to test the impact of a bias in the observations that is not accounted for in the R matrix, and the impact of using derivative methods with non-linear models (as may be necessary with large-scale LSMs), particularly in reference to the differences that may arise between step-wise and simultaneous optimisations.
Table 2 details the experiments that were carried out based on all possible combinations for assimilating the two data streams.Three approaches were compared: (i) separatewhere only one data stream was included in the optimisation; (ii) step-wise -where each data stream was assimilated sequentially (both orders: s 1 then s 2 and s 2 then s 1 ); and (iii) simultaneous -where both data streams were included in the optimisation.All parameters for both models were optimised in all experiments; therefore, in the step-wise cases the parameters were optimised twice.The step-wise assimilations were also carried out with and without the propagation of the full posterior parameter error covariance matrix, A 1 , in between steps 1 and 2 (test cases 2b and d -see Table 2); only the for the tests in which the full posterior covariance matrix was not propagated only the posterior variance was propagated.An additional test was included for the simultaneous assimilation in order to test the impact of having a substantial difference in the number of observations for the data stream included in the optimisation, as may be the case for belowground (e.g.soil) biomass observations in reality.Therefore in test case 3b, only one observation was included for data stream s 2 .
The differences in the parameter values and the theoretical reduction in their uncertainty (1 − (σ post /σ prior )) were examined for all eight test cases, as well as the fit (RMSE) to both data streams after the optimisation.For the step-wise approach we investigated whether the fit to the first data stream is degraded in the second step by comparing the RMSE after each step.Note that the reduction in uncertainty is a theoretical, or approximate, estimate of the real uncertainty reduction because of the assumptions made in the inversion scheme.
In a second stage the impact of an unknown, unaccounted for bias in the model was examined.This bias could be a systematic bias in the observations due to the algorithm used for their derivation, the result of missing or incomplete processes Geosci.Model Dev., 9, [3569][3570][3571][3572][3573][3574][3575][3576][3577][3578][3579][3580][3581][3582][3583][3584][3585][3586][3587][3588]2016 www.geosci-model-dev.net/9/3569/2016/s 1 and only --1 obs for s 2 in the model, or an incompatibility between the observations and the model, for example due to differences in spatial resolution or an inconsistent characterisation of a variable between the model and the observations.To test the impact of such an occurrence, we introduced a constant scalar bias into the modelled s 2 variable with a value of 10 (i.e.twice the magnitude of the defined observation uncertainty).All eight experiments were repeated, but a bias was introduced into the model calculation of s 2 that was not accounted for in the cost function (i.e. the error distributions retained a mean of zero).This was treated as an unknown bias and, therefore, was not corrected or accounted for in the inversion scheme; the defined observation uncertainty (Table 1) was not changed for this set of experiments.
In all experiments for both models we used 15 iterations of the inversion algorithm, and 20 assimilations were performed starting from different random "first-guess" points in the parameter space.As discussed in Sect.3.1.3,this was done to test the ability of the algorithm to converge to the global minimum of the cost function.Note that the global minimum and possible reduction in J (x) will be different for each experiment, as each is based on a different cost function.
For all the above tests we assumed independence (i.e.uncorrelated errors) for both the observation and parameter prior error covariance matrices; thus the R and B matrices were diagonal.However, in a final test we performed a simultaneous optimisation to examine the impact of having correlated errors between the s 1 and s 2 observations.Thus the random Gaussian noise added to s 1 for each time step was correlated to the noise added to s 2 .The correlated observation errors were generated following the method used by Trudinger et al. (2007, paragraph 22).We first defined the covariance matrix, R, using the prescribed observation error and correlation between s 1 and s 2 .The correlated error that is added to the synthetic observations is then a multiplication of a vector of Gaussian random noise (variance of 1) by a matrix, X, that corresponds to the Cholesky decomposition of R (so that R = X T X).The added noise was time invariant; that is, there was no correlation between one time step and the next, as we were specifically looking at correlations between the two data streams (see Pinnington et al., 2016, for an analysis of the impact of correlations in the B matrix and temporal error correlation in the observations).We tested both accounting for the correlated errors by populating the corresponding off-diagonal elements of the R (observation error covariance) matrix and ignoring the correlated errors by keeping R diagonal.The reason for performing both tests was to demonstrate the possible real-world scenario where correlated observation errors exist, but that information is not included in the optimisation, likely due to a lack of knowledge as to how to characterise the errors.For both tests we performed optimisations using a combination of different of observation error and correlation magnitudes (observations errors between 0.05 and 20 in 9 uneven intervals, and observation correlations between −0.9 and 0.9 with an interval of 0.4), in order to test the hypothesis that observations with lower uncertainty (therefore higher information content) were less affected by the presence of error correlations.As in the above experiments, 20 random first guesses in the pawww.geosci-model-dev.net/9/3569/2016/Geosci.Model Dev., 9, 3569-3588, 2016 rameter space were used and 15 iterations of the inversion algorithm were performed.

Results
The 20 random first-guess assimilations were examined for each set of experiments for both models (before the results for each test were examined in more detail), in order to check that the algorithm converged to a global minimum.As shown in the Supplement (Fig. S1), a high proportion of the 20 firstguess assimilations across all test cases for both models resulted in a similar reduction in J (x), even though the overall magnitude of the reduction was sometimes different between tests.This indicates that the algorithm does not easily get stuck in any local minima (if they exist).The examples shown in the results below were taken from one first-guess parameter set for each model that belonged to the cluster that had the highest cost function reduction.Any differences seen in the parameter values, their posterior uncertainty or the resultant RMSE reduction described below therefore are due to the specific details of each test and not the inability of the algorithm to find the minimum.

Typical performance with a quasi-linear model and no bias
Figures 1a and b show the simple carbon model simulations for test case 3a (in which both data streams are assimilated simultaneously) for the s 1 and s 2 variables.A large reduction in RMSE is achieved after optimisation (blue curve) with respect to the observations (black curve).Overall, there is a good reduction in RMSE for all test cases (including the individual assimilations 1a and 1b) with a reduction of ∼ 80 % for s 1 and s 2 .In addition, the optimisation of the s 1 and s 2 variables resulted in a good or moderate reduction in RMSE for variables not included in any assimilation: ∼ 60 % for the litterfall (Eq. 1) and ∼ 16 % for the heterotrophic respiration (Rh -Eq.2) across all test cases (not shown), although there was already a good prior fit to the data.As would be expected from these results, the parameter values and the theoretical reduction in parameter uncertainty do not vary between the tests (Fig. 2a and b blue symbols), except for a slight difference in the value of the k 2 parameter in test cases 1a and 3b, for which there is also a lower reduction in uncertainty (∼ 82 % compared to > 95 %).Note that Fig. 2a shows the normalised parameter values, in order to account for differences in the magnitude of the different parameters and their range (the zero line represents the "true" parameter valuesee caption).In this situation therefore, where we have a relatively simple linear model and two data streams to which the model parameters are highly sensitive, we see that the differences between the step-wise and simultaneous approaches are minimal.This is even the case when the error covariance is not propagated between the two steps (test cases 2b and d), suggesting that under this assimilation set-up with this model both s 1 and s 2 individually contain enough spatio-temporal information to retrieve the true values of all parameters, as we can see from the separate test cases 1a and b.However, we cannot definitively say whether this is due to the simplicity or relative linearity of the model -it is possible that observations of variables in more complex linear model would not be able to retrieve the true values of all parameters.
3.2.2Impact of unknown bias in one data streamexample with a simple carbon model In Sect.3.2.1 we saw that there is little difference between a step-wise and simultaneous optimisation if there is no bias in the model or observations, and if the model is quasi-linear and therefore the critical assumptions behind the inversion approach were not violated.However, it is not uncommon to have a bias between your observations and model that is not obvious and, therefore, not accounted for in the optimisation, as the cost function used in most inversion algorithms (and in this study) assumes Gaussian error distributions with a mean of zero.Note that this is also the case when defining a likelihood function for accepting or rejecting parameter values in a global search method.To test the impact of a bias, we added a constant value to the simulated s 2 variable in a second test (see Sect. 3.1.6)that was treated as an unknown bias and, therefore, not corrected or accounted for in the inversion scheme.The impact of this bias on s 1 and s 2 is shown in Fig. 1c-d, and the reduction in RMSE between the model and observations is seen in Fig. 3 for all variables (including Rh and litterfall).The red symbols in Fig. 2 show the resultant parameter values and theoretical reduction in uncertainty as a result of the bias.The inversion cannot accurately find the correct values for all parameters in any test case, and there are now considerable differences between the simultaneous and step-wise approach.Furthermore the order in which the data streams are assimilated in the step-wise cases also results in different posterior parameter values (test cases 2a and b vs. 2c and d in Figs.2a and 3).Nevertheless the optimisation results in a similar reduction in uncertainty on the parameters, except in test case 1b where only s 2 data are assimilated (Fig. 2b).
The main impact of the bias in the modelled s 2 variable is on the value of k 2 parameter (Fig. 2a), which is consistently offset from the true value (dashed line in Fig. 2a) in all test cases.This was expected given that it is the parameter most directly related to the calculation of s 2 .However, in test cases 2a and 3a, the values of p 1 and p 2 are also incorrect (and p 1 for test case 2b).Note that these parameters only indirectly influence the s 2 pool in the model, and therefore we might have expected that they would be less affected by the bias.This nicely demonstrates one issue that could arise in all DA studies, where the bias in a particular variable (in the observations or the model) is aliased onto another process in the model (e.g.Wutzler and Carvalhais, 2014).Such an aliasing of bias onto indirectly related parameters is even Geosci.Model Dev., 9,[3569][3570][3571][3572][3573][3574][3575][3576][3577][3578][3579][3580][3581][3582][3583][3584][3585][3586][3587][3588]2016 www.geosci-model-dev.net/9/3569/2016/more evident when only s 2 is included in the assimilation and s 1 does not provide any constraint (test case 1b) -in this case all parameters are incorrect but the p 2 parameter in particular shows a strong deviation from the true value (Fig. 2a).As a result we see a deterioration in the RMSE for the s 1 , litterfall and Rh variables in test case 1b and in the step-wise cases where s 2 is assimilated in the second step (Fig. 3a, c and d test case 1b, 2a and 2b).However, the RMSE reduction remains high for the s 2 variable for these test cases (Fig. 3b), as the inversion has found a solution that accounts for the bias even though all inferred parameter values are incorrect.The assimilation of s 1 in the second step lowers the reduction in RMSE for s 2 gained in the first step to ∼ 70 %, but it is not a considerable degradation.
Even though the posterior parameter values are incorrect, and despite the fact that the first step results in a degradation, the final reductions in RMSE are largely the same as the situation with no bias for all variables when s 1 is included in a simultaneous assimilation or optimised in the second step (test cases 2c, d and 3a in Fig. 3).This shows that the inclusion of s 1 observations can find a solution to counter the bias in s 2 and prevents a degradation in the fit to the data.If s 2 is assimilated in the second step, there is a negative impact on all other variables, as discussed above, demonstrating again that the order of data stream assimilation can matter when biases or inconsistencies between the data and the model are present.
The analysis of the impact of the bias presented here is specific to this model and the type and magnitude of the bias that was added, but the broader findings can be generalised to any situation in which there is a bias or inconsistency between a model and data that is not accounted for in the assigned error distributions.Exactly what might constitute a bias or inconsistency is discussed more in Sect.2.2.Note also that it is important to examine the impact on the other variables.For the separate test case 1b in which only s 2 data are used to optimise the model, the negative impact on the other variables (Fig. 3) would have been concealed if we had only examined the posterior reduction in RMSE for the s 2 variable.Again, this is a concern that is inherent to all DA experiments, whether single-or multiple-data stream, but we can see from these results (i.e. by comparing the separate test cases 1b with 2a and b) that adding another data stream in a multiple-constraint approach does not always reduce the problem.

Difference between the step-wise and simultaneous approaches in the presence of a non-linear model
As discussed in Sect.3.2.1,there is little difference between the step-wise and the simultaneous assimilation approaches for simple, relatively linear models, unless the observation error (including measurement and model errors) distribution deviates strongly from the Gaussian assumption.However in reality, large-scale, complex LSMs may contain highly nonlinear responses to certain model parameters.To demonstrate the impact of non-linearity in a multiple data stream assimi-  lation context we used a non-physically based toy model chosen for its non-linear characteristics (see Sect. 3.1.2). Figure 4a shows the posterior parameter values for both the a and b parameters of the non-linear toy model for all test cases.The values were not normalised as both parameters had the same range.The horizontal dashed line shows the "true" known values of the parameters (both equal to 1.0) that were used to generate the synthetic observations.Note that no bias has been introduced into the model in the results described here.The prior and posterior model s 1 and s 2 simulations for the non-linear toy model are compared to the synthetic observations in Fig. 5 for both step-wise cases in which the posterior error covariance matrix from step 1 (A 1 -see Sect.3.1.4)was propagated to step 2 (experiments 2a and c -Fig.5a-d) and both simultaneous cases 3a and b (Fig. 5e-h).Finally Fig. 6 summarises the reduction in RMSE between the simulated and observed s 1 and s 2 variables for the nonlinear toy model for all test cases and, in the step-wise cases, the reduction in RMSE after both the first and second steps (light vs. dark green bars).
Assimilating each data stream individually (test cases 1a and b) results neither in an accurate retrieval of the posterior parameters (Fig. 4a) nor in a strong constraint on either parameter, as shown by the lack of theoretical reduction in the parameter uncertainty after the optimisation (Fig. 4b).Despite this, there is a ∼ 90 % reduction in RMSE for the data stream that was included in the optimisation (i.e. for s 1 in test case 1a -Fig.6a, and s 2 in test case 1b -Fig.6b).However, the improvement on the other data stream is much less (28 % reduction in RMSE for s 1 when s 2 is assimilated) or even results in a degradation compared to the prior fit (e.g. in the case of s 2 when s 1 is assimilated -Fig.6b).Lack of improvement, or even degradation, in the RMSE of other variables in the model is a common issue for data assimilation in general, one that is not often evaluated in model-data fusion studies.It is also is not necessarily the result of a bias or incompatibility between the observations and the model.
Only the simultaneous case, in which all s 1 observations have been included in the cost function (test case 3a), manages to retrieve the correct parameter values after the optimisation.The posterior parameter values for all other test cases are incorrect and are considerably different between each case, unlike for the simple carbon model (without a model bias).Most step-wise test cases (particularly 2b-d) do not result in the same parameter values as the simultaneous test case 3a in which all the observations are included (Fig. 4a).This highlights that strong non-linearity in the model sensitivity to parameters, together with the use of an algorithm that is only adapted to weakly non-linear problems, can result in differences between a step-wise and simultaneous approach in multiple data stream assimilation (see Sect. 1).
In the simultaneous optimisation in which all observations are included (test case 3a), the posterior fit to the data dramatically improves for both the s 1 and s 2 data streams after the assimilation (blue dashed line in Fig. 5e and f).This was expected given that the correct values of the parameters were found.For the step-wise cases (test case 2a in Fig. 5a and  b, and test case 2c in Fig. 5c and d), the black dashed line shows the prior, and the posterior after step 1 is shown by a green dashed line.In the step-wise assimilation we see two different scenarios depending on which data stream was assimilated first.In the first step the results are the same as the case where each individual data stream is assimilated separately.In both cases the first step results in a good fit to the data that was included in the optimisation in that step.When the s 1 data were assimilated in the first step (Fig. 5 first row), the fit to s 2 deteriorated after the optimisation (Fig. 5b green dashed line and Fig. 6b -test case 2a_s1), but when the s 2 data were assimilated first (Fig. 5 second row) the optimisation step did manage to achieve an improvement in the s 1 data stream (Fig. 5c green dashed line and Fig. 6a -test case 2c_s1).
In the second step the optimisation of s 2 in test cases 2a and b does not degrade the fit to s 1 when the full parameter error covariance matrix (A 1 ) is propagated between step Geosci.Model Dev., 9,[3569][3570][3571][3572][3573][3574][3575][3576][3577][3578][3579][3580][3581][3582][3583][3584][3585][3586][3587][3588]2016 www.geosci-model-dev.net/9/3569/2016/  1 and 2 (Fig. 5a blue curve and Fig. 6a 2a_s2).Furthermore optimising s 2 in the second step reverses the deterioration in s 2 caused by assimilating s 1 in the first step (Fig. 5b blue curve and Fig. 6b 2a and b dark green bars).However, when s 1 data were assimilated in the second step (test cases 2c and d), we found that the good fit achieved with s 2 observations in the first step was effectively reversed (Fig. 5d blue curve).Therefore assimilating s 1 in the second step degraded the fit to the s 2 observations, even compared to the prior case (Fig. 6b, dark green bars for test cases 2c and d).This nicely highlights one of the main possible issues with a step-wise assimilation framework.
The fact that the final reduction in RMSE values after both steps was ∼ 90 % for most cases, even though the values were not correct for all but case 3a (Fig. 4), indicates that the error correlation between the two parameters (∼ −1.0 -calculated from the posterior error covariance matrix but not shown) led to alternative sets of values that resulted in a similar improvement to the data -a phenomenon known as model equifinality.

Order of assimilation of data streams and propagation of parameter error covariance matrices in a step-wise approach
Comparing the step-wise cases 2a and b with 2c and d for the non-linear toy model reveals that neither order in the assimilation, s 1 then s 2 , or s 2 then s 1 , results in the correct posterior parameter values that match the simultaneous test case (Fig. 4a).This is not a result that can be generalised to all step-wise assimilations as it will depend on the data stream involved and whether they contain enough spatiotemporal information to accurately constrain all the parameters included in the optimisation, as well as any biases in the model or observations (as discussed in Sect.3.2.2) or model non-linearity (Sect.3.2.3).In the case of the non-linear toy model, neither s 1 nor s 2 finds the right parameter values when assimilated individually.Therefore it is not surprising that neither order manages to achieve the right posterior parameter values.Nevertheless, the theoretical uncertainty of both parameters is reduced by > 95 % for the step-wise cases in which A 1 from step 1 is propagated between step 1 and 2 (test cases 2a and c -Fig.4b), even though the posterior values for the step-wise cases are incorrect.This demonstrates that a good theoretical reduction in uncertainty is not always indicative that the right parameters have been found by the optimisation.The lower theoretical reduction in parametric uncertainty for cases 2b and d (Fig. 4b) demonstrates that information is lost between the steps if the posterior error covariance terms of A 1 are not propagated to step 2. From a mathematical standpoint the most rigorous approach is to propagate the full parameter error covariance matrices between each step.Without that constraint not only is information lost in the second step, but the information contained in the second data stream may have a stronger influence compared to a simultaneous assimilation, or stepwise case with a propagated error covariance matrix.The inversion may therefore be more vulnerable to any strong biases or incompatibilities between the model and the observations of the second data stream, or indeed the particular sensitivity of its corresponding model state variable to the parameters.This is one possible explanation for the degradation seen in s 1 in the non-linear toy model when s 2 is optimised in the second step and A 1 is not propagated between the steps Geosci.Model Dev., 9,[3569][3570][3571][3572][3573][3574][3575][3576][3577][3578][3579][3580][3581][3582][3583][3584][3585][3586][3587][3588]2016 www.geosci-model-dev.net/9/3569/2016/ (Fig. 6a test case 2b_s2).The same was also true for the simple carbon model for test case 2b when a bias was introduced into the s 2 simulation (see Sect. 3.2.2 and Fig. 3a).However, the reverse is also true -if the first data stream contains strong biases, then the associated error correlations will be also propagated with A 1 .If autocorrelation in the observation errors, or indeed correlation between the errors of the data streams, is not accounted for, it is likely that the posterior simulations are over-tuned; that is, we will overestimate the reduction in parameter uncertainty.If this is the case and the first step results in incorrect parameter values, the propagation of A 1 could restrict the parameter values to the wrong location in the parameter space and thus inhibit the ability of the inversion to find the correct global minimum.These issues are likely to be more considerable for non-linear models, as seen by the lack of difference among test cases 2a-d in the simple carbon model example (Fig. 2).

Impact of accounting for correlated observation errors in the prior observation error covariance matrix
In a final test we introduced time-invariant correlated noise between the two data streams (see Sect. 3.1.6).We investigated the impact of ignoring cross-correlation between two data streams by comparing the results of (i) an optimisation in which the correlated errors were included in the offdiagonal elements of the prior observation error covariance matrix, R, to (ii) an optimisation in which the correlated observation errors were excluded (i.e.R was kept diagonal).Note that this experiment is only relevant to simultaneous multiple data stream assimilation, as it is not possible to account for cross-correlation between data streams when one is assimilated after the other in a step-wise approach.
The presence of correlated errors increases observation redundancy in the inversion, which would therefore reduce the expected theoretical error reduction compared to uncorrelated observations (experiments not shown).We would expect a further limitation on the expected error reduction with a sub-optimal system, as represented by optimisation (ii) in which there was cross-correlation between the data streams, but the correlated observation errors were ignored in the R matrix (as seen in Chevallier, 2007).
Figure 7 shows the difference between the two optimisations (i.e.including off-diagonal elements in the R matrix minus only diagonal elements in the R matrix), for the reduction in the cost function value (Fig. 7a and d) and posterior s 1 and s 2 observation errors (1σ -Fig.7b, c, e and f), for both the simple C model (top row) and the non-linear toy model (bottom row) and for a range of observation error and correlation.The plot shows the median difference across all 20 random first-guess parameters, and the reduction is calculated as 1 -(posterior/prior).
At low observation error there is no discernible difference between accounting for the correlated observation errors in the R matrix or not.This is likely because there is enough information in the observations to find the global minimum of the cost function.Trudinger et al. (2007) also found that similar posterior values were obtained when comparing observations with correlated and uncorrelated Gaussian errors.However, at a certain point as observation error increases along the x axis (i.e.decreasing information content) there is a difference in the cost function and observation error reduction between the two optimisations for both models (Fig. 7).As expected, the optimal optimisation that includes off-diagonal correlated errors in R results in a higher reduction (blue cells in Fig. 7) in the cost function and posterior observation error than the sub-optimal optimisation (in which the correlated errors are ignored) in all cases except for the s 1 data stream in the simple C model (see below).Furthermore we see a pattern emerging suggesting that the difference between the two optimisations increases with higher observation correlation for the same error magnitude.However, for some combinations of observation error and correlation, the pattern is opposite to what we expect (red cells in Fig. 7), particularly for the s 1 data stream in the simple C model (Fig. 7b).This is possibly because the accuracy of the solution becomes limited by observation uncertainty at higher observation errors, and also due to presence of model non-linearity, which prevents a fully accurate characterisation of the posterior error covariance matrix with the inversion algorithm we have used.
www.geosci-model-dev.net/9/3569/2016/Geosci.Model Dev., 9, 3569-3588, 2016 The key finding of this preliminary investigation into the impact of correlated observation errors is that it becomes increasingly important to properly characterise and account for correlations between data streams if the observations do not contain enough information (i.e.high observation uncertainty or a limited number of observations).However, this is a wide topic that has received little to no attention in the carbon cycle data assimilation literature to date, aside from 2 out of 21 experiments in the wider-ranging study of Trudinger et al. (2007).We therefore suggest that an investigation such this should be extended in order to fully understand the impact of cross-correlation between data streams; however, this is beyond the scope of this paper.

Perspectives and advice for land surface modellers
Although it is clear that in many cases the addition of different observations in a model optimisation provides additional constraints, challenges remain that need to be considered.Many of the issues that we have investigated are relevant to any data assimilation study, including those only using one data stream.However, most are more pertinent when considering more than one source of data.Based on the simple toy model results presented here, in addition to lessons learned from existing studies, we recommend the following points when carrying out multiple data stream carbon cycle data assimilation experiments: -If technical constraints require that a step-wise approach be used, it is preferable (from a mathematical standpoint) to propagate the full parameter error covariance matrix between each step.Furthermore, it is important to check that the order of assimilation of observations does not affect the final posterior parameter values, and that the fit to the observations included in the previous steps is not degraded after the final step (e.g.Peylin et al., 2016).
-Devote time to carefully characterising the parameter and observation error covariance matrices, including their correlations (Raupach et al., 2005), although we appreciate this is not an easy task (but see Kuppel et al., 2013, for practical solutions).In the context of multiple data stream assimilation, accounting for the error correlations between data streams is increasingly important with higher observation uncertainty (or a limited number of observations).Note that it is not possible to account for error correlations between data streams in a step-wise assimilation.
-The presence of a bias in a data stream, or an incompatibility between the observations and the model, will limit the utility of using multiple observation types in an assimilation framework.Therefore it is imperative to analyse and correct for biases in the observations and to determine whether there is an incompatibility or inconsis-Geosci.Model Dev., 9, 3569-3588, 2016 www.geosci-model-dev.net/9/3569/2016/tency between the model and data.Alternatively, it may be possible account for any possible bias/inconsistency in the observation error covariance matrix, R, using the off-diagonal terms or inflated errors (see Chevallier, 2007), or by using the prior model-data RMSE to define the observation uncertainty.
-Most optimisation studies with a large-scale LSM require the use of derivative-based algorithms based on a least-squares formulation of the cost function and, therefore, rely on assumptions of Gaussian error distributions and quasi-model linearity.However, if these assumptions are not met it may not be possible to find the true global minimum of the cost function and the resultant calculation of the posterior probability distribution will be incorrect.This is a particular problem if the posterior parameter error covariance matrix is propagated multiple times in a step-wise approach, although these issues are relevant to both step-wise and simultaneous assimilation.Therefore it is important to assess the non-linearity of your model, and if the model is strongly non-linear, use global search algorithms for the optimisation -although at the resolution of typical LSM simulations (≥ 0.5 × 0.5 • ) this will likely only be computationally feasible at site or multi-site scale.Note also that performing a number of tests starting from different random "first-guess" points in parameter space can help to diagnose whether the global minimum has been reached (as outlined in Sect.3.1.6and discussed at the beginning of Sect.3.2) and, therefore, whether the chosen inversion algorithm is appropriate for optimising your model.
In addition to the above points we note the following related to a situation in which there is a considerable difference in the number of observations for each data stream.We investigated such a situation in this study with test case 3b, in which only one observation was included for the s 2 data stream instead of the complete time series.For both models, test case 3b showed that a substantial difference in number of observations between the data streams could influence the resulting parameter values and posterior uncertainty (compare test cases 3a and b in Fig. 2 for the simple C model and Fig. 4 for the non-linear toy model) as each data stream will have a different overall "weight" in the cost function.Different arguments abound on this issue.Some authors have mentioned the possible need to weight different observation terms in the cost function to increase the influence of data streams with a limited number of observations (e.g.Xu et al., 2006), while others contend that the cost function should not be weighted by the number of observations because the error covariance matrices (B and R) should already define this weight in an objective way (e.g.Keenan et al., 2013); we would agree with this assertion.Indeed Wutzler and Carvalhais (2014) showed that this approach could lead to an overestimate of the posterior uncertainty.As an alternative they proposed a "parameter block" approach in which each data stream only optimises the parameters to which they are most sensitive.We therefore advise modellers not to weight the cost function by the number of observations; instead we suggest adopting an approach such as proposed in Wutzler and Carvalhais (2014) and/or ensuring that B and R matrices are adequately defined.It should not be necessary to weight by the number of observations in the cost function if there is sufficient information to properly build the prior error covariance matrices.
Several diagnostic tests exist to help infer the relative level of constraint brought about by different data streams, including the observation influence and degrees of freedom of signal metrics (Cardinali et al., 2004).Performing these tests was beyond the scope of this study, particularly given that the simple toy models contained so few parameters, but such tests may be instructive when optimising many hundreds of parameters in a large-scale LSM with a number of different data streams.Furthermore, we strongly suggest performing synthetic experiments with pseudo-observations, as in this study, as such tests can help determine the possible constraint brought by different data streams, and the impact of a possible bias and observation or observation-model inconsistency.
Aside from multiple data stream assimilation, other promising directions could also be considered to constrain the problem of lack of information in resolving the parameter space within a data assimilation framework, including the use of other ecological and dynamical "rules" that limit the optimisation (see for example Bloom and Williams, 2015), or the addition of different timescales of information extracted from the data such as annual sums (e.g.Keenan et al., 2012).Finally we should also seek to develop collaborations with researchers in other fields who may have advanced further in a particular direction.Members of the atmospheric and hydrological modelling communities, for example, have implemented techniques for inferring the properties of the prior error covariance matrices, including the mean and variance, as well as potential biases, autocorrelation and heteroscedasticity, by including these terms as "hyper-parameters" within the inversion (e.g.Michalak et al., 2005;Evin et al., 2014;Renard et al., 2010;Wu et al., 2013).Of course this extends the parameter space -making the problem harder to solve unless sufficient prior information is available (Renard et al., 2010), but such avenues are worth exploring.

Conclusions
In this study we have attempted to highlight and discuss some of the challenges associated with using multiple data streams to constrain the parameters of LSMs, with a particular focus on the carbon cycle.We demonstrated some of the issues using two simple models constrained with synthetic observations for which the "true" parameters are known.We performed a variety of tests in Sect. 3 to demonstrate the difwww.geosci-model-dev.net/9/3569/2016/Geosci.Model Dev., 9,[3569][3570][3571][3572][3573][3574][3575][3576][3577][3578][3579][3580][3581][3582][3583][3584][3585][3586][3587][3588]2016 ferences between assimilating each data stream separately, sequentially (in a step-wise approach) and together in the same assimilation (simultaneous approach).In particular we focused on difficulties that may arise in the presence of biases or inconsistencies between the data and the model, as well as non-linearity in the model equations and the importance of accounting of observation error correlations.Many of the issues faced are inherent to all optimisation experiments, including those in which only one data stream is used.It is of upmost importance to determine whether the observations contain biases and/or whether inconsistencies or incompatibilities exist between the model and the observations, and to correct for this or properly account for this in the error covariance matrices.Secondly it is crucial to understand the assumptions and limitations related to the inversion algorithm used.Without these two points being met, there is a greater risk of obtaining incorrect parameter values, which may not be obvious by examining the posterior uncertainty and model-data RMSE reduction.Furthermore it is more likely that the implementation of a step-wise vs. simultaneous approach will lead to different results.Finally, we note that the consequence of not accounting for crosscorrelation between data streams in the prior error covariance matrix becomes more critical with higher observation uncertainty.
This study was not able to examine an exhaustive list of all possible challenges that may be faced when assimilating multiple data streams, but we hope that this tutorial style paper will serve as a guide for those wishing to optimise the parameters of LSMs using the variety of C-cycle-related observations that are available today.We also hope that by increasing awareness about the possible difficulties of modeldata integration we can bring the modelling and experimental communities more closely together to focus on these issues.
The Supplement related to this article is available online at doi:10.5194/gmd-9-3569-2016-supplement.

Figure 1 .
Figure 1.Prior and posterior model simulations compared to the synthetic observations for the simple carbon model for test case 3a for (a) s 1 and (b) s 2 simulations without any model bias,and (c, d) with bias in the simulated s 2 variable.The coloured error band on the prior and posterior represents the propagated parameter uncertainty (1σ ) on the model state variables (in the equivalent colour as the mean curve).This is mostly visible for the prior model simulation (pink band) as there is a high reduction in model uncertainty reduction as a result of the assimilation.

Figure 2 .
Figure 2. (a) Normalised posterior parameter values and (b) poste-rior parameter error reduction for all parameters of the simple carbon model for each test case, and for both the simulations with no bias (blue) and simulations with a bias in the s 2 variable that was not accounted for in the inversion (red).In (a) parameter values were normalised to account for differences in the magnitude of the different parameters and their range; thus it is a measure of the distance from the true value as a fraction of the range and is calculated as (posterior value − true value/max parameter value − minimum parameter value).The closer the value is to the zero dashed line, the closer the posterior value is to the "true" parameter value.To give an indication of the optimisation performance, the following are the normalised first-guess parameter values for this particular example test (compare with posterior values in a): p 1 0.09, p 2 0.29, k 1 0.1, k 2 0.15.

Figure 3 .
Figure 3. Reduction in RMSE for all test cases for simulations with a bias in the s 2 variable: (a) s 1 , (b) s 2 , (c) litterfall and (d) heterotrophic respiration (Rh).For the step-wise cases (2a, b, c and d) the reductions after both step 1 and step 2 are shown in light and dark green, respectively, and are denoted in the x axis labels with "_s1" for step 1 and "_s2" for step 2. The reduction (in %) is calculated as 1 − (RMSE post /RMSE prior ).

Figure 4 .
Figure 4. Posterior parameter values of both the non-linear toy model a and b parameters for each test case for the simulations with no model bias.The y axis range corresponds to the parameter bounds, and the dashed horizontal line represents the "true" known value of both parameters.To give an indication of the optimisation performance, the following are the first-guess parameter values for this particular example test (compare with posterior values in a): a 0.87, b 1.98.(b) Posterior uncertainty reduction for both parameters for all test cases.

Figure 5 .
Figure5.Prior and posterior model simulations compared to the synthetic observations for the non-linear toy model (with no bias) for both the s 1 (left column) and s 2 (right column) variables for (a) and (b) test case 2a (first row) -step-wise approach with s 1 observations assimilated in the first step, followed by the s 2 observations in the second step; (c, d) test case 2c (second row) -step-wise approach with s 2 observations assimilated in the first step, followed by s 1 observations in the second step; and (e, f) test case 3a (third row) -the simultaneous case in which both data streams were included.For both step-wise examples A 1 was propagated between the first and second steps.The coloured error band on the prior and posterior represents the propagated parameter uncertainty (1 σ ) on the model state variables (in the equivalent colour as the mean curve).This is mostly visible for the prior model simulation (pink band) as there is a high reduction in model uncertainty reduction as a result of the assimilation.

Figure 6 .
Figure 6.Reduction in RMSE for all test cases for both (a) s 1 and (b) s 2 variables for the non-linear toy model simulations with no model bias.For the step-wise cases (2a, b, c and d) the reduction after both step 1 and step 2 are shown in light and dark green, respectively, and are denoted in the x axis labels with "_s1" for step 1 and "_s2" for step 2. The reduction (in %) is calculated as 1 − (RMSE prior /RMSE post ).

Figure 7 .
Figure 7. Median difference (across 20 first-guess parameters) between including correlated observation errors in the R matrix (off-diagonal elements) minus ignoring the correlated observation errors (keeping R diagonal) for the reduction cost function (a, d: left column) and the reduction in s 1 and s 2 observation errors (b, c, e, f: middle and right columns), for both the simple C model (a, b, c: top row) and the non-linear toy model (d, e, f: bottom row) for a range of observation errors (x axes) and correlation (y axes) -see Sect.3.1.6.The reduction is calculated as 1 − (posterior/prior).

Table 1 .
The optimisation set-up for both models, including the true parameter values, their range and the observation uncertainty (1σ ), which was set to 10 % of the mean value for each set of pseudo-observations derived from multiple first guesses of the model.The parameter uncertainty (1σ ) was set to 40 % of the range for each parameter.

Table 2 .
List of experiments performed for both models with synthetic data.All parameters are optimised in all cases (therefore in both steps for the step-wise approach).