Comparison of adjoint and nudging methods to initialise ice sheet model basal conditions

Ice flow models are now routinely used to forecast the ice sheets’ contribution to 21st century sea-level rise. For such short term simulations, the model response is greatly affected by the initial conditions. Data assimilation algorithms have been developed to invert for the friction of the ice on its bedrock using observed surface velocities. A drawback of these methods is that remaining uncertainties, especially in the bedrock elevation, lead to non-physical ice flux divergence anomalies resulting in undesirable transient effects. In this study, we compare two different assimilation algorithms based on adjoints and nudging to constrain both bedrock friction and elevation. Using synthetic twin experiments with realistic observation errors, we show that the two algorithms lead to similar performances in reconstructing both variables and allow the flux divergence anomalies to be significantly reduced.


Introduction
Robustly reproducing the responsible mechanisms and forecasting the ice sheets' contribution to 21st century sea-level rise is one of the major challenges in ice sheet and ice flow modelling as highlighted by community-organised efforts such as SeaRISE (Sea-level Response to Ice Sheet Evolution) (Bindschadler et al., 2013;Nowicki et al., 2013a, b) or ice2sea (e.g.Gillet-Chaulet et al., 2012;Shannon et al., 2013;Edwards et al., 2014).
Such projections on decadal timescales are sensitive to the model initial state which can account for an important source of uncertainty in the model response (Aðalgeirsdóttir et al., 2014).Improving the reliability of the model projec-tions requires the model initial state to be better constrained from observations.The problem is that observations are often uncertain, sparse in time and space, and indirect, so that the model state depends on many poorly determined physical parameters and boundary conditions.Gradient-based optimisation methods, such as the control method (MacAyeal, 1993) or the Robin inverse method (Arthern and Gudmundsson, 2010), are efficient means to constrain such model parameters and boundary conditions.These methods have been implemented and applied with success in ice flow models of different complexity in order to infer the basal drag, one of the most uncertain model parameters (e.g.Morlighem et al., 2010;Jay-Allemand et al., 2011;Schäfer et al., 2012;Gillet-Chaulet et al., 2012).
However, remaining uncertainties lead to non-physical ice flux divergence anomalies (Seroussi et al., 2011) resulting in undesirable transient effects in the free surface evolution.A solution to dissipate these transients is to conduct a surface relaxation step prior to the projections (Gillet-Chaulet et al., 2012).This allows admissible flux divergence rates to be reached but at the expense of the accuracy of the modelled surface elevation and surface velocities which can then depart significantly from observations after the relaxation step.
Among the remaining uncertainties, one of the most important is the uncertainty related to the bedrock elevation.The basal topography is derived from ice thickness measurements, mostly obtained from airborne ice-penetrating radars.These measurements can have large uncertainties and are usually at a lower resolution than required model grids (Durand et al., 2011).Standard bedrock elevation maps for Antarctica and Greenland are then produced by interpolation or Kriging, and report standard errors ranging from a few Published by Copernicus Publications on behalf of the European Geosciences Union.
tens of metres to several hundreds of metres depending on the distance to observations and local topographic variability (Fretwell et al., 2013;Bamber et al., 2013).For comparison, the uncertainty on the surface elevation is usually 1 order of magnitude lower (Fretwell et al., 2013).
Because of theses large uncertainties, several methods have been proposed to consider the bedrock elevation as an optimisation variable.For example, Morlighem et al. (2011) derived the adjoint of the continuity equation for the ice thickness.The depth-averaged velocities and surface mass balance are then optimised to minimise the mismatch between modelled and measured ice thicknesses.Surface velocity measurements are used as initial guess for depthaveraged velocities, and (by construction) the flux divergence produced by this approach is in equilibrium with the prescribed surface mass balance.However, there is no constraint that the optimised velocities are a solution of the stress equilibrium equations, so that, in general, the above method does not guarantee that the flow divergence anomalies resulting from an ice flow model initialised with the optimised fields will be reduced.
In their work, van Pelt et al. ( 2013) developed an iterative algorithm where the discrepancy between the surface elevation predicted by the model and the observations is used to correct the bedrock elevation.Thus, the method does not rely on the accurate computation of the derivative of a cost function, as in a control method, and is then more similar to nudging methods that have been widely studied in the past decades in meteorology (e.g.Hoke and Anthes, 1976) and later in oceanography (Verron, 1992;Blayo et al., 1994).However, the method proposed by van Pelt et al. (2013) does not use observed surface velocities to control the model parameters.
Several methods have been explored to construct model states where both the basal friction and the basal topography are treated as optimisation variables.In a pioneer work, Thorsteinsson et al. (2003) developed a least-squares inversion using analytical solutions for the transmission of smallscale basal perturbations to the ice surface.This method has been extended in a non-linear Bayesian framework by Raymond and Gudmundsson (2009) and applied to an Antarctic ice stream by Pralong and Gudmundsson (2011).Bonan et al. (2014) have tested the performances of an ensemble Kalman filter on twin experiments using a shallow-ice flowline model.The adjoint method has been tested by Goldberg and Heimbach (2013) and Perego et al. (2014) with models of different complexity.All these methods usually show good performance in reconstructing both basal friction and basal topography when using observations of both surface elevation and surface velocities, so that mixing between the two variables does not seem to be too problematic for realistic applications (Gudmundsson and Raymond, 2008).In addition, Pralong and Gudmundsson (2011) and Perego et al. (2014) show better performance when the rates of surface elevation change are also constrained from observations.In this paper, we explore two different algorithms to infer both the basal friction and the basal topography and initialise the model state using simultaneous observations at a given time.The first algorithm is in line with Goldberg and Heimbach (2013) and Perego et al. (2014) since it uses the adjoint solution of the force balance equation.We use the shallow shelf approximation to facilitate the derivation of the adjoint.Indeed, in this case the ice thickness appears as a state variable, while it changes the geometry of the domain for higher order approximations (Perego et al., 2014).In its simplest formulation, the algorithm minimises the misfit between model and observed surface velocities, but an additional constraint where the flux divergence is close to a given surface mass balance can be added.The second method is an algorithm combining inversion of basal friction using the adjoint method and nudging of the bedrock topography.The control from the surface velocity observations is imposed by the adjoint step while the nudging step allows to decrease the discrepancy between the flux divergence and the surface mass balance.The main motivation of this second algorithm is its ease of implementation as no inversion of the model with respect to the ice thickness is required.Our objective is then to illustrate its ability to reconstruct the bedrock topography by comparison with the results of the more mathematically founded first algorithm.Both algorithms are implemented in the finite element ice sheet/ice flow model Elmer/Ice (Gagliardini et al., 2013).The methods and algorithms are described in details in Sect. 2. To test their performances, we design a twin experiment in Sect.3. The results are discussed in Sect. 4.

Direct model
For the force balance, we use the standard vertically integrated shallow shelf approximation (SSA) equations (MacAyeal, 1989).This approximation neglects the effects of vertical shearing and is, hence, more adapted to model the flow in areas where the friction is low, resulting in an ice motion dominated by sliding.
where ρ w is the water density, H 0 the ice thickness below sea level, and n x and n y the two components of the horizontal unit vector normal to the calving front.Dirichlet boundary conditions are prescribed for other non-natural boundaries.
The continuity equation for the ice thickness is given by where a is the surface mass balance and accumulation/ablation at the bedrock interface is neglected.

Inverse methods
The objective of the methods is to produce a model state from Eq. (1) that best fits the observations of surface velocities and the rates of change of ice thickness.To minimise the discrepancy between the model and the observations, the optimisation parameter vector p contains both the basal friction coefficient β and the bedrock elevation z b (p = (β, z b )).

Cost functions
The misfit between the model and the corresponding observations is evaluated using cost functions.The first cost function measures the difference between modelled (u) and observed (u obs ) surface velocities: where is the model domain.
The second cost function measures the misfit between modelled and observed thickness rates of change: The modelled rate of change of ice thickness ∂H /∂t is evaluated from Eq. (3) as the difference between the flux divergence solution of Eq. ( 1) and the prescribed surface mass balance.Observed rate of change of ice thickness (∂H /∂t) obs can be estimated from surface elevation trends extracted from radar altimetry measurements (Flament and Rémy, 2012).
In general, both Eqs. ( 4) and ( 5) could be weighted with error covariance estimates such as the one of Flament and Rémy ( 2012).However, this information is not often available.In this paper, observed ice surface velocities (u obs ) and observed rate of change of ice thickness (∂H /∂t) obs are considered perfectly known or perturbed with a Gaussian noise which would make unnecessary the addition of a covariance term.
The objective is then to find the parameter vector p that minimises J v and J div .This can be achieved in different ways as illustrated in the following sections.

Adjoint method
The two cost functions have an implicit dependence on the parameter vector p through the model surface velocities u which are solutions of Eq. ( 1).The gradient of the cost functions with respect to p can be computed efficiently using the adjoint equations of Eq. ( 1).The derivation of the continuous adjoint equations and the gradient of J v with respect to the friction coefficient β can be found in MacAyeal (1993).This can be easily extended for the computation of the gradient with respect to H .
The implementation in Elmer/Ice is carried out in a way that stays as close as possible to the differentiation of the discrete implementation of the direct equations.This method should lead to a better accuracy on the gradient computation than the discretisation of the continuous equations.Elmer/Ice uses programming features that are not supported by automatic differentiation tools and the differentiation of the crucial parts of the discrete source code (e.g.cost function computation, matrix assembly) has been done manually.If the problem is non-linear, as here due to the dependence of the viscosity to the strain rate, and the non-linearity solved using a Picard iterative scheme, the iterations should be reversed (at least partially) in the adjoint code to achieve a good accuracy of the computed gradient (Martin and Monnier, 2013).However, as the present direct solver is equipped with a Newton linearisation of the ice viscosity so that it remains self-adjoint (Petra et al., 2012), the Newton iterations are not reversed in the adjoint code and we only keep the last iteration.The adjoint code has been validated on standard tests by comparing the gradients with those obtained from a finite difference evaluation.The agreement is usually better than 0.1 %.
Inverse problems are often ill-posed, leading to instabilities.It is then necessary to add regularisation terms to the cost function to avoid overfitting of data.This can be done in the form of a Tikhonov regularisation.Here, we define two different regularisations.The first one measures the norm of the first spatial derivative of the component p i of p, thus allowing to give preference to smooth solutions: C. Mosbeux et al.: Initialisation of ice sheet model basal conditions The second forces the optimisation variables to stay close to a certain prior or background information p b .This background can be based on observations or on empirical knowledge.This second regularisation term is written as where σ p is a spatial parameter allowing to give more or less weight to the prior information p i,b .
The computation of the gradients of these two functionals with respect to p is trivial.How these regularisation terms are weighted with respect to the model-data misfit functionals Eqs. ( 4) and ( 5) is described in more details with the description of the algorithms in Sec.2.3.This minimisation is achieved using the quasi-Newton routine M1QN3 (Gilbert and Lemaréchal, 1989) implemented in Elmer/Ice.This method uses an approximation of the second derivatives of the cost function and is therefore more efficient than a fixed-step gradient descent.

Nudging method
By definition, the steady-state solution of Eq. (3) where a is replaced by the apparent mass balance a − (∂H /∂t) obs is the minimum for J div .Running the model forward in time with a constant forcing is then a simple way to minimise J div , equivalent to a relaxation step.Here, we assume that the surface elevation is known so that computed changes in ice thickness are used to correct the bedrock elevation z b .During this process, the ice thickness can substantially deviate from observations.Nudging methods, also called Newtonian relaxation, can remedy to this problem by constraining the thickness to fit observations through an additional callback term in Eq. ( 3), which now writes where the coefficient k defines the amplitude of the callback at each node of the model.These methods imply a trade-off, adjustable through k, between model physics and observations.The callback term can depend on many different criteria such as observation accuracy or distance to observation (Hoke and Anthes, 1976).Here, we take k as a Gaussian function of the distance to the closest observation so that the callback is maximum where an observation is available and decreases to zero far from all observations.The choice of the variance for the Gaussian function is discussed in Sect.4.2.

Algorithms
From the methods presented in the previous section we design two algorithms to infer simultaneously the friction coefficient β and the bedrock elevation z b .To ensure that the friction coefficient remains positive during the inversion, we use the following change of variable β = 10 α . (9)

Adjoint method with two parameters (ATP)
This algorithm uses the gradients of the cost functions derived using the adjoint method to optimise both α and z b .For the regularisation, a constraint on the smoothness is imposed for α using Eq. ( 6) while a constraint on the background information is imposed for z b using Eq. ( 7).The total cost function then writes where γ is a constant fixed to give a similar weight to J v and J div while λ α and λ z b are two constants allowing to adjust the weight given to the regularisation terms.Following Fürst et al. (2015), several pairs (λ α , λ z b ) are tested using a L-curve approach, and optimal values are taken from the combinations that avoid two extremes: overfitting of the observations or excessive regularisation.

Adjoint-nudging coupling (ANC)
In this algorithm, the adjoint method is first used to optimise α only by minimising the following total cost function The bedrock elevation is then updated using the nudging method by solving Eq. ( 8) for a given time period T .T should be neither too short nor too long to allow to reduce J div without overfitting observations.The sensitivity of the method to T is discussed in the results section.These two steps are then repeated iteratively until changes in J v and J div between two iterations are less than 1 %.

Manufactured data sets
A twin experiment is designed to investigate the ability of the two methods to reproduce simultaneously good estimates of the basal friction coefficient and the bedrock elevation.A flowline geometry is preferred to reduce the computational cost and easily test the method, however all the algorithms can be applied to 2-D plane view simulations.A reference experiment for which all the model parameters are prescribed is produced to generate synthetic observations.These observations are then used to test the performances of the two algorithms.

Reference experiment
A flowline of Jakobshavn Glacier, Greenland, is used to test the two algorithms with realistic conditions.Jakobshavn Isbrae is one of Greenland's three largest outlet glaciers and has one of the largest drainage basin on the ice sheet's western margin (Bindschadler, 1984).It is also the fastest Greenland glacier with a terminus velocity greater than 13 km a −1 (Joughin et al., 2008(Joughin et al., , 2014)).The flowline is 550 km long and runs from the ice divide to the ice front.The surface and bedrock elevations are taken from available digital elevation models (Bamber et al., 2013).The basal friction coefficient field is first adjusted so that the model velocities fit observed velocities (Joughin et al., 2010).To have realistic thickness rates of change, the free surface is relaxed to steady state.The surface mass balance a in Eq. ( 3) has been calibrated so that the steady state is close to the initial geometry, and is meant to take into account the flow convergence or divergence along the flowline.The steady-state solution is used as the reference of the twin experiment.
The geometry is discretised through a mesh of 500 linear elements, increasingly refined to the front of the glacier.The element size decreases from ∼ 2 km in the upper part of the glacier to ∼ 400 m down to the front.
Results will only be presented on the first 100 km upstream of the glacier front where velocities are above 100 m a −1 and where the SSA is more appropriate but the inversion is done all along the flowline up to the ridge.

Synthetic observations
Synthetic observations are generated by sampling and/or adding noise to the reference simulation.Details for each required field are given below.These synthetic observations and initial fields for the inverse methods are compared to the reference in Fig. 1.

Surface velocities
Surface velocities are assumed to be observed at the same resolution as the reference simulation but with a white Gaussian noise with a mean µ = 0 and a standard deviation σ = 50 m a −1 .This corresponds to a root mean squared (rms) error of 47.8 m a −1 for the entire flowline.The reference and noisy observed surface velocities are shown in Fig. 1c together with their absolute difference.

Surface mass balance and thickness rate of change
The surface mass balance, a, and thickness rate of change, (∂H /∂t) obs , in Eq. ( 5) are assumed to be perfectly observed.
As the reference simulation corresponds to a steady state, (∂H /∂t) obs = 0.However, the methods are also tested in Sect.4.3 for cases where (∂H /∂t) obs = 0 to show their ability to initialise the model when the flux divergence is not in equilibrium with the surface mass balance.

Surface and bedrock elevations
The surface elevation is assumed to be perfectly observed.
For the bedrock elevation z b , we simulate observations representing airborne radar measurements crossing the flowline.Bedrock elevations are sampled every 10 km with a Gaussian noise centred on zero and with a standard deviation of σ = 50 m.This leads to a rms error of 62.4 m on the 55 observation points of the entire flowline.This error is similar to the errors given in practice on recent bedrock elevation maps (Fretwell et al., 2013;Bamber et al., 2013).For the mesh nodes between the observations, the bedrock is linearly interpolated as shown in Fig. 1a.This is used as the first guess for the inverse methods and as the background information for the regularisation in Eq. ( 7).

Model parameters
The ice viscosity is assumed to be perfectly known and corresponds to the viscosity used in the reference experiment.
Assuming that no observation of the friction coefficient is available, an initial solution has to be postulated.A good first guess for β is provided by using the driving stress to estimate the basal shear stress: where H (x), θ (x), and u(x) are, respectively, the ice thickness, the surface slope and the surface velocity at position x.
The reference and initial values are shown in Fig. 1b.The rms errors on the surface velocities and the rate of change of ice thickness between the initial state and the synthetic observations are, respectively, 761 and 357 m a −1 .
The average relative error on the basal shear stress is measured as where τ b,ref is the basal shear stress in the reference experiment and L the length of the flowline.The relative error on the basal shear stress with our initial estimate of the basal friction β ini is 394 %.The performances of the two algorithms in reducing these initial errors are presented in the following section and will be compared to these initial errors.

Adjoint method with two parameters (ATP)
A set of 255 pairs (λ α , λ z b ) is tested to adjust the weighting of the regularisation terms of Eq. ( 10).The misfits on the different cost functions of Eq. ( 10) for the different pairs (λ α ,  Figure 2b also shows that smaller misfits on J z b clearly involve higher rms misfits on the ice surface velocities (rms u ) and on the rate of change of the ice thickness (rms div ).On the contrary, Fig. 2a does not show a clear relation between the magnitude of J reg α and the magnitude of rms u and rms div .
Both graphs also show a high density of pairs for small rms u and rms div .However, the pair (λ α = 10 11 , λ z b = 10 7 ) seems to come off the others, giving a good trade off between data fitting and regularisation.Notice that the constant γ is fixed to 1 since J v and J div have the same order of magnitude.
The optimisation of both α and z b simultaneously allows a rms misfit of 49.7 m a −1 on velocities to be reached, very similar to the observation rms error, showing no overfitting of velocity data.The rate of change of ice thickness misfit is also largely decreased with a rms value of 19.2 m a −1 .The resulting basal traction τ b and z b as well as the misfit for the surface velocities are given in Fig. 3.The basal traction variability is accurately reproduced with a corresponding average relative misfit of only 25 % along the entire flowline with respect to the reference basal shear stress τ b,ref , i.e. more than a 10-fold decrease of the initial misfit.We only notice local overestimations of slipperiness in bedrock pits without significant impacts on the flow velocities.Indeed, under a defined value of β corresponding to a nearly perfectly sliding case, an additional reduction in friction has no impact on the flow.The same reasoning applies to a nearly perfectly sticky case, where an increased friction would not involve more decrease of the velocity.The bedrock elevation z b is well reconstructed in the first 50 km upstream of the glacier front.The discrepancy with respect to the reference bedrock is larger upstream where the cost function J v is less sensitive because of lower velocities.This could possibly be improved by using a cost function measuring the logarithm of the misfit as in Morlighem et al. (2010), but with a greater risk of fitting noise since the relative observation error is higher in these regions.
In order to assess the influence of accounting for J div on the method, the optimisation is repeated without the J div term in the total cost function Eq. ( 10).The pair (λ α , λ z b ) is kept equal to the previous case since the optimum is hardly affected by the absence of J div in the total cost function.The result is given and compared to the previous one in Fig. 3.The friction coefficient β is again pretty well reconstructed, with a corresponding relative average misfit of 31 % on basal shear stress τ b to be compared to the 25 % obtained with the optimisation of the J div term.However, z b shows nonconsistent high frequencies involving a higher discrepancy with respect to the reference bedrock elevation in the case of optimising J div .Therefore, the optimisation J div has a clear regularisation effect on the parameter z b , by reducing the non-consistent high-frequency oscillations of the solution.Introduction of a Gaussian noise on (∂H /∂t) obs has been investigated in order to assess its effect on the optimisation of J div .Different levels of standard deviation σ have been tested.Results show that the optimisation is little affected by this noise even for standard deviations σ going up to the same order of magnitude as the surface accumulation a. Introduction of systematic bias on (∂H /∂t) obs in a physically acceptable range, i.e. of the same order of magnitude as surface accumulation a, also have few consequences on the optimisation.

Adjoint-nudging coupling (ANC)
The steps for the optimisation of α only are conducted with a value λ α = 5 × 10 9 , which allows a good agreement between the different cost functions and a value γ = 1.
In addition to the regularisation parameters of Eq. ( 11), ANC algorithm depends on the time period for the nudging steps T and the variance of the Gaussian k in Eq. ( 8).The nudging period T impacts the convergence on J v and J div after each cycle.The convergence is substantially similar for T from 1 to 4 years.Longer periods mainly involve a worse minimisation of J v since there is no control on velocities during nudging.Shorter relaxation times do not involve sufficient change of z b inducing a lower minimisation of J div for a given number of cycles.Therefore, a relaxation time T = 1 year is adopted, which seems sufficient to allow significant changes of z b without too much adaptation to the previous intermediate value of the friction coefficient.The algorithm is stopped after 10 cycles, corresponding to the stopping criterion of Sect.2.3.2.For a given T period, tests show that variance values of the Gaussian k in Eq. ( 8) larger than 1 km are excessive and induce non-physical callback amplitudes when departing from observations.After a few cycles, the resulting bedrock induces an increase between modelled and observed velocities that cannot be overcome by the basal drag inversion.Variance smaller than 1 km has little impact on the final result in terms of cost functions.However, among the acceptable values, the 1 km variance gives the best agreement between misfit on the surface velocities and misfit on the rate of change of the ice thickness.
The model is in good agreement with observations with a rms misfit of 46.1 m a −1 in the range of observation noise for velocities, and 15.8 m a −1 for thickness rates of change.The basal shear stress τ b is close to the reference one despite exacerbated variations at some locations.The corresponding relative average misfit with respect to the reference τ b,ref is 30 % for the entire flowline.The reconstructed bedrock elevation z b is also close to the reference on almost 100 km upstream of the front of the glacier.This reflects, especially in fast flowing region, a real improvement of the basal knowledge with respect to the first guess.Moreover, the use of nudging, instead of the adjoint method, does not show the same problem of non-sensitivity in regions of slower flow velocities, as mentioned in Sect.4.1.Note, however, that z b significantly departs from the reference bedrock elevation from 80 to 100 km to the front, strongly linked to the poorer fit of β (see Fig. 4).
As for ATP, introduction of a Gaussian noise in the observed thickness rate of change (∂H /∂t) obs has also been tested.Results show no significant impacts on the optimisation.Nevertheless, introduction of systematic bias in (∂H /∂t) obs has direct consequences on the nudging steps inducing an offset of z b of the range of the systematic bias cumulated on the nudging period T .ANC is therefore more sensitive to systematic bias than ATP.

Further sensitivity experiments
In order to evaluate the efficiency of both algorithms in transient states, we construct new reference cases where (∂H /∂t) obs = 0.This is achieved by multiplying β ref by a factor of 2, 3, 4, 5, and 10.As a consequence, increasing the basal friction involves a disequilibrium of the glacier, an ice thickening, and a decrease of ice flow velocities.
The time period for the glacier to come back to equilibrium, after this change of friction parameter, depends on the amplitude of the perturbation.Here, the perturbation is only applied during 5 years in order to keep the five cases in disequilibrium.Resulting thickness rates of change (∂H /∂t) obs = 0 are in the same order of magnitude as the tuned surface mass balance a.The five new reference cases are presented in Fig. 5.
The results of the optimisations for the five cases of perturbation are shown in Fig. 6 for ATP and Fig. 7 for ANC.The velocity misfit for ATP increases with the amplitude of perturbation with rms values between 47.8 and 52.3 m a −1 while the rms misfit for thickness rate of change increases from 12.7 to 21.8 m a −1 .ANC reaches rms misfits from 45.9 to 47.2 m a −1 for velocities and 12.5 to 21.5 m a −1 for thickness rate of change.The friction coefficient β is well reconstructed for both methods.The corresponding average relative error (with respect to each reference) on basal shear stress varies from 22 to 30 % for ANC and 20 to 28 % for ATP, still according to the amplitude of the perturbation.Both algorithms also allow to improve the knowledge of the bedrock elevation z b with regard to the first guess.We notice a tendency to overestimate the amplitude of bumps and pits in some locations which generally corresponds to an underestimation in the amplitude of variations of β.This latter behaviour highlights the limits of the algorithms and the difficulty of distinguishing the effects of two basal parameters as closely linked as the friction and the bedrock topography.This behaviour had been already highlighted in Goldberg and Heimbach (2013) and Gudmundsson and Raymond (2008) where a higher ice thickness with respect to the reference is compensated by a higher basal friction, and conversely as well.

Flow divergence in transient model
In this section, we assess the impact of our initialisation algorithms on the prognostic response of the model forward in time assuming the same constant forcing used to build the reference state.By doing so, if the initialisation was perfect, one would expect no change of the geometry and ice flow during this prognostic simulation.The experiment is performed from ATP and ANC initial states.A third initialisation state is constructed for which only the friction coefficient has been optimised, keeping z b equals to the a priori z b b .This third initialisation, called "β only" involves a rms misfit on velocities of 43.3 m a −1 and an average relative error ε τ b,ref of 36 % on basal shear stress, similar to the ATP and ANC initial states.However, the rms misfit on the thickness rate of change is significantly higher, 147.8 m a −1 .The prognostic simulations are conducted during a 10-year period in order to see how the initial thickness rate evolves during this time and how it impacts the final ice thickness and ice surface.The thickness rates of change after 1 and 10 years of simulation are shown in Fig. 8a and b, respectively, while the mismatch on the surface elevation after 10 years is shown in Fig. 8c.
ANC and ATP initial states involve thickness rates of change much closer to zero than the optimisation of "β only".This also leads to a lower mismatch ε s on the surface elevation with respect to the reference after 10 years of simulation.Indeed, this mismatch is well below 20 m for both ANC and ATP, except on a few kilometres in the upstream region, whereas the optimisation of β only gives rise to a mismatch globally above 20 m with some regions exceeding 50 m.
In that way, the two algorithms implemented in this study show substantial improvements compared to the optimisation of "β only".We especially notice a better reproduction www.geosci-model-dev.net/9/2549/2016/Geosci.Model Dev., 9, 2549-2562, 2016 of low-scale variations of the surface elevation due to the transfer of similar variations from the bedrock elevation z b (Fig. 9).These variations tend to disappear with the optimisation of the friction only, giving rise to a lower resolution of the surface.However, we should point out that this direct transfer of bedrock variations to the surface is a consequence of the SSA ice flow model used and that a full Stokes model would produce a more diffusive transfer response.

Conclusions
The presented algorithms allow the reconstruction of two poorly known parameters: the bedrock topography z b and the friction coefficient β at the same time.
The optimisation of these two parameters mainly relies on the knowledge of some other data that are easier to measure: ice surface velocities and thickness rates of change.Some local measurements of bedrock elevation and associated errors are necessary in order to define a background z b .The two algorithms aim to infer the set of parameters which minimises the misfit between the model and the corresponding observations of ice surface velocities and thickness rates of change.If the optimisation of ice surface velocities is usually sufficient to infer β, the inference of a second parameter requires more information to distinguish the effects of each parameters on the flow.Observations of rates of change of ice thickness are necessary to allow optimising z b as well.
The two algorithms are based on the optimisation of the friction coefficient β with the adjoint method.The bedrock geometry z b is reconstructed in two different ways, again with the adjoint method for the first algorithm (ATP) and with a nudging method based on mass conservation equation for the second one (ANC).
We have shown that the ATP algorithm is capable to well reproduce β and the corresponding basal shear stresses, while the bedrock elevation z b is only well reproduced in high velocities regions.The lower the velocity, the harder for z b to depart from its initial background value.The iterative algorithm coupling adjoint method and nudging (ANC) gives results that are just as good.Moreover, ANC allows a better reconstruction of the bedrock geometry z b in most regions.This is a very good sign for an adaptation of the method to non-depth-integrated flow models such as full Stokes models where the bedrock topography is no more a state variable but affects the domain geometry making the derivation of the adjoint even more demanding (Perego et al., 2014).Indeed, there is no need to inverse a shape variable like bedrock topography which is a usual obstacle to adjoint-based methods.
Furthermore, the transient simulations over 10 years from initial states reconstructed with the two algorithms developed give very encouraging results.The model divergence is clearly decreased with respect to usual inversion methods of the friction coefficient only.The integration of observations like thickness rates variation through an optimisation of the divergence during inversion or nudging steps, allows to regularise the solution in a physical way and also clearly improves the results.
Finally, the sensitivity experiments shows that the different algorithms can take into account the disequilibrium of mass balance, which is particularly interesting considering that a large amount of outlet glaciers in both Greenland and Antarctica present this feature.

Data availability
The construction of the twin experiment presented in this article is partially based on real data.Surface velocities come from Joughin et al. (2010), while surface and bedrock geometries come from Bamber et al. (2013).Notice that surface topography slightly differs from Bamber et al. (2013) in order to reach steady state.The simulations were performed using the Elmer/Ice finite element model (https://github.com/ElmerCSC/elmerfem).Some modules were specially developed for this application.

Figure 1 .
Figure 1.Reference (solid lines) and initial (dashed lines) state for (a) the bedrock elevations z b , (b) the estimated basal traction τ b = βu, and (c) the surface velocities.In (a), synthetic observations every 10 km are the plain black circles.In (c) the observed velocities are depicted by the circles and the shaded green curve is the absolute difference between observed and reference surface velocities (right axis).

Figure 2 .
Figure 2. Mean error on the thickness rate of change (rms div ) as a function of the mean error on velocity (rms u ) for the 255 pairs of regularisation parameters (λ α , λ z b ).Colour scales show the normalised regularisation terms (a) J reg α and (b) J z b (0 corresponds with the lowest value and 1 with the highest value obtained with the 255 pairs).The chosen value (λ α = 10 11 , λ z b = 10 7 ) is shown with a black circle.

Figure 3 .
Figure3.Results of the ATP algorithm with (orange) and without (red) optimisation of J div , i.e. γ = 1 or γ = 0, respectively, in Eq. (10): (a) absolute difference between observed and model velocities, (b) estimated basal traction, and (c) estimated bedrock elevation.The green shaded area is the difference between the noisy reference velocities and the true velocities.The green solid lines are the reference values and the black dashed line is the initial guess for the bedrock elevation.
λ z b ) is given in Fig. 2. Both graphs show that most of the pairs fitting well the observed velocities can also adequately reproduce the observed rate of change of the ice thickness.

Figure 4 .
Figure 4. Results of the ANC algorithm (purple): (a) absolute difference between observed and model velocities, (b) estimated basal traction, and (c) estimated bedrock elevation.The green shaded area is the difference between the noisy reference velocities and the true velocities.The green solid lines are the reference values and the black dashed line is the initial guess for the bedrock elevation.

Figure 6 .
Figure 6.Range of values for ATP algorithm for the five perturbations of the friction coefficient β. (a, b) Minimum (dark orange shade) and maximum (light orange shade) of absolute difference between observed and model velocities and relative error for τ b , respectively.(c) Range of values for bedrock elevation z b (orange shade).The green solid line is the reference value and the black dashed line is the initial guess for the bedrock elevation.

Figure 9 .
Figure 9. Ice surface elevation z s after 10 years of prognostic simulation for three different initial states: initialisation with ATP algorithm (orange line), with ANC algorithm (purple line), and with the inversion of β only (red line).The green line is the reference surface elevation.The figure focuses on the first 50 km next to the front of the glacier.