Optimisation methods were successfully used to calibrate parameters in an atmospheric component of a climate model using two variants of the Gauss–Newton line-search algorithm: (1) a standard Gauss–Newton algorithm in which, in each iteration, all parameters were perturbed and (2) a randomised block-coordinate variant in which, in each iteration, a random sub-set of parameters was perturbed. The cost function to be minimised used multiple large-scale multi-annual average observations and was constrained to produce net radiative fluxes close to those observed. These algorithms were used to calibrate the HadAM3 (third Hadley Centre Atmospheric Model) model at N48 resolution and the HadAM3P model at N96 resolution.

For the HadAM3 model, cases with 7 and 14 parameters were tried. All
ten 7-parameter cases using HadAM3
converged to cost function values similar to that of the standard
configuration. For the 14-parameter cases several failed to converge, with
the random variant in which 6 parameters were perturbed being most
successful. Multiple sets of parameter values were found that produced
multiple models very similar to the standard configuration. HadAM3 cases that
converged were coupled to an ocean model and run for 20

For the HadAM3P model three algorithms were tested. Algorithms in which seven parameters were perturbed and three out of seven parameters randomly perturbed produced final configurations comparable to the standard hand-tuned configuration. An algorithm in which 6 out of 13 parameters were randomly perturbed failed to converge.

These results suggest that automatic parameter calibration using atmospheric models is feasible and that the resulting coupled models are stable. Thus, automatic calibration could replace human-driven trial and error. However, convergence and costs are likely sensitive to details of the algorithm.

Weather and climate models need to parametrise unresolved processes

Various approaches have been taken to optimising model parameter values.

Attempts have been made using data assimilation techniques to calibrate
parameters. Such systems simultaneously estimate the atmospheric state
and the parameter values.

Another approach is to use forecast error.

The approach we consider is optimisation via direct evaluation of the
model, something attempted by

Here we update T13 to include a larger number of observations and parameters. The observations we use, such as T13 used, are multi-annual, large-spatial averages. As before we continue to use a Gauss–Newton algorithm but include a randomised block-coordinate variant where, on each iteration, a random sub-set of the parameters are perturbed.

Our objectives are as follows:

test how well a Gauss–Newton algorithm does in minimising error
in the HadAM3 N48 model

test for equifinality in which models with different parameter values have similar observed
values

see how coupled model variants of HadCM3

test these algorithms with the N96 HadAM3P model

The remainder of this paper first describes the models, the optimisation method and the observational metrics used. We next describe results of optimisation, the properties of the atmospheric models and how the coupled models behave. We discuss our results before concluding.

In this section we outline our methods. We first describe two related atmospheric models we use. Next we outline the Gauss–Newton algorithm and a randomised block-coordinate variant of it, deal with the need to regularise matrices and describe how the algorithm terminates. We then describe the choices we made in parameter selection and parameter perturbation as well as the observations and covariance matrices we used. Finally we describe how we evaluate the optimised configurations and estimate uncertainties in the parameter values.

We use the N48 (

We build on the approach used by T13 which minimised an objective function
which was the root mean square of the global average outgoing longwave
radiation and reflected shortwave radiation. We extend this to a larger
number of observations, taking account of both observational error and
simulated internal variability. As we focus on large-scale, multi-annual
averages we assume that both terms can be represented by multivariate
Gaussian distributions characterised by covariance matrices

This way of defining

We estimate

The Gauss–Newton algorithm is an iterative two-step algorithm. The
first step is to compute the Jacobian

Having computed the line-search vector,

The Gauss–Newton algorithm can be modified to include an additional constraint by
modifying the cost function to the following:

Parameters, default values, and allowed ranges and
perturbations. Shown for each parameter name are the component of
HadAM3 they are from, the default value, allowed range,
perturbations used in HadAM3–7 cases (

Our algorithm could suffer from using ill-conditioned matrices in two places.

First, if the Hessian matrix is singular or ill-conditioned, defined as
having a condition number greater than

Secondly, we also regularise

We need criteria to terminate the algorithm. Classical Gauss–Newton
terminates when sufficiently close to the stationary point of the cost function (

The algorithm terminates on iteration

In our implementation

For the random variant of the algorithm, if the cost function did not reduce
by

We used up to 14 parameters from the analysis of

We carried out three cases:

We adjusted seven parameters using HadAM3. Step sizes for the
Jacobian calculation were taken from T13 for ENTCOEFF,
VF1, CT and RHCRIT. For the remaining three parameters we used 10

We adjusted 14 parameters, again, using HadAM3. To compute
the step size for the additional parameters we set the value to the upper or lower range value that was most different from the standard value. Then for all 14 parameters, we computed

We adjusted 7 and 13 parameters using HadAM3P using the same step sizes as in the 14-parameter HadAM3 cases.

Parameters, ranges, default values and step sizes for the Jacobian
computations are shown in Table

Here we describe the choices we made in our optimisation study.

We focus on large-scale properties of the climate system and so consider the
northern hemispheric extra-tropical (

Land temperature has an impact on
simulated biology, evaporation, snow and other important parts of
the Earth system with changes in it being a significant impact from climate change. We use the observed
CRU TS Vn 3.21 dataset

This is a key measure of the
hydrological cycle. We also use the CRU TS Vn 3.21
dataset, the HadAM3 N48 land–sea mask, and restrict
to data north of

We use this as a measure of the planetary scale circulation. To correct for model mass loss we used sea-level pressure differences between the global-average value and the extra-tropical Northern Hemisphere and tropics. We did not include the southern extra-tropics as that provided no new information and consequently made the covariance matrix uninvertable. We used values from ERA-Interim as observations and, for a second estimate used, the NCEP reanalysis

This measures the reflectivity of
the Earth and is driven by clouds, snow, sea-ice and other surface
properties. We compute values, and uncertainties, from the vn2.8 EBAF
dataset (updated from

This is a measure of the outgoing thermal radiation from the Earth and is driven by atmospheric temperatures and clouds. We also use the vn2.8 EBAF dataset.

This gives an estimate of the temperature lapse rate. We use ERA-Interim data as observations and for a second estimate use the NCEP reanalysis.

This provides a measure of mid-troposphere water vapour, which is an important greenhouse gas. We also estimate values from ERA-Interim and use the NCEP reanalysis as a second estimate.

See Table

We need to estimate a total covariance matrix (

We also applied a constraint (see Sect.

When producing the datasets for the 7-parameter cases we made two errors in
the computation of

Target values for optimisation cases. Each row corresponds to a region. The target value for net flux into the Earth is 0.5

Normalised initial parameters for 7- and 14-parameter, HadAM3P, and trial cases. All parameters are normalised by their expert-based ranges with 0 (1) being the minimum (maximum) values. Values not shown use the default HadAM3 (or HadAM3P) values. Parameter names are shortened to their first three characters. Initial parameters from the two 14-parameter random Gauss–Newton algorithms are not shown as they match the equivalent values from the standard Gauss–Newton algorithms. Similarly only the HadAM3P13r6 and trial7#diag cases are shown, as other HadAM3P and trial7 cases use the same values.

We evaluate the inverse approach in several different ways. For the algorithm
we consider the expected number of iterations, evaluations and final
error, following the approach of
T13 of using a strategy of repeatedly running the Gauss–Newton algorithm
after it failed until convergence. This gives the expected number of model
evaluations (

The line-search component of the algorithm has a selection effect as it takes the parameter combination that produced the smallest cost function. Due to chaos in the model which leads to pseudo-random noise, this will lead to a selection effect as the smallest cost-function values may have arisen by chance. To avoid this effect and to examine the properties of the resulting models, we take the final optimised parameter sets and for each one run an ensemble of two simulations from December 1998 to April 2010. Each simulation was started from the same initial state but with a different small perturbation. We compare results of these independent simulations for 2000–2005 with the standard configuration and each other and look for evidence of equifinality

For the HadAM3 cases we also carry out 20-

Assuming that the parameter perturbations are small, we can compute the covariance matrix for the parameter error (

From these parameter error covariance matrices we can compute a distance between two parameter sets (

Minimum cost function for line-search component of algorithm (

In this section we present our results. We tried several different algorithms using the HadAM3 and HadAM3P atmospheric models. We first present numerical results on the convergence behaviour of those algorithms, then compare some aspects of the climatologies of the modified models with the standard mode. Finally, we report on results of variants of the coupled atmosphere–ocean HadCM3 model that uses the optimal parameter sets from the HadAM3 test cases.

We carried out several case studies. The first was one in which we perturbed seven parameters using the Gauss–Newton algorithm. Using 14 parameters, we tested the Gauss–Newton algorithm and two random parameter variants. Finally we tested three algorithms using the HadAM3P model configuration. In no case did the algorithm terminate because the cost function was small. Given the crudeness of the observational covariance used in our cost function, we do not draw any inference from this. That would require a much better estimate of observational error than we made. Instead we take the pragmatic view that a perturbed model is comparable to (substantially better than) the standard configuration, in the simulation of the observations we used, if the cost function is less than 120 (80)

For the 7-parameter (HadAM3–7) trials, we generated 12 random initial parameter choices by selecting values from their extreme limits (Table

We carried out five line searches partially to test if any of the scalings on the search vector were preferred. We found no strongly preferred scaling value (Table

Count of unique

We trialled three related algorithms to perturb 14 parameters. The algorithms we tested were the standard Gauss–Newton algorithm (HadAM3-14) and two variants with random perturbations. In one we perturbed six random parameters (HadAM3-14r6) and the other eight (HadAM3-14r8). For each algorithm we did five studies with each one being started from the same random extreme parameter choices (Table

Unlike the HadAM3–7 cases the HadAM3-14 cases did not all produce cost function values comparable to the default model (Fig.

Next we turn to the HadAM3-14r6 cases. This algorithm performed well, with four out the five cases succeeding taking between 6 (60) and 9 (87) iterations (evaluations). Three of the cases had cost functions less than the standard configuration but not substantially so (Fig.

The HadAM3P cases differ from the standard configuration not only in increased resolution but in the addition of a cloud anvil parametrisation and the indirect effects of aerosols on cloud optical properties

Unless stated otherwise all studies used the same choices of covariance matrices, observations, parameter perturbations and other choices as the HadAM3 14-parameter studies (Table

The diffusion parameter was kept at its default HadAM3P values but all remaining 13 parameters were changed, with 6 being chosen, at random, in each iteration.

Here the same parameters as used in the HadAM3 seven-parameter cases were perturbed and termination occurred immediately if the cost function did not decrease by 0.2.

As HadAM3P13r6 but with, at each iteration, three parameters, of the seven used in the seven-parameter HadAM3 case, perturbed at random.

The standard configuration of HadAM3P (Fig.

For each algorithm we tested using HadAM3 we characterised its performance using Eq. (

As discussed earlier there is a potential selection effect in that from the line-search evaluations we chose the one case with minimum error. To examine the effect of this we compared the average cost from the optimised cases with the independent runs and with the cost values for the standard cases. Note that the independent and optimised cases have identical parameter sets but the 14- and 7-parameter algorithms use slightly different cost functions. The mean cost from the independent simulations is, except for the HadAM3P-7r3 algorithm, larger than the mean cost for the optimised simulations (Table

The expected number of iterations increases from the HadAM3–7 to HadAM3-14 algorithms but does not double. Our earlier work (T13) found that the median number of iterations for optimisation using two observations and four parameters required between three and five iterations. This suggests that the cost of increasing the number of parameters is not excessive, with the iteration count increasing less than

The six-random-parameter (HadAM3-14r6) algorithm worked well with an average cost function slightly better than the standard configuration (Table

To summarise this subsection we find that a relatively simple Gauss–Newton algorithm works well to automatically calibrate parameters in an atmospheric model. The algorithm did not reduce the error to zero and so terminated when it stopped improving. We found that the expected number of iterations increases, though less than linearly, as we increased the number of parameters. Random selection of 6 out of 14 parameters worked well though random selection of 8 from 14 worked poorly. We were also able to reduce the cost function of the HadAM3P model relative to the standard configuration of that model.

Normalised parameter values (

Normalised simulated minus observed distributions (

Algorithm summary. For each algorithm is shown the expected number of iterations, evaluations, mean cost from final optimisation simulations (

As in Fig.

As in Fig.

Normalised Taylor diagrams for land air temperature (asterisks), mean sea
level pressure (triangles) and land precipitation (diamonds)

We now investigate the behaviour of the optimised HadAM3 and HadAM3P models by first focusing on the optimal parameters, then examining the simulation of the target observations in the independent simulations before comparing the model fields of key variables with observations. We aim to test for equifinality

We normalise the parameter values by their expert-based plausible ranges, with 0 being the minimum and 1 the maximum. We find for both the 7- and 14-parameter HadAM3 case studies that many of the parameters have a broad range of optimal values (Fig.

Using Eqs. (

We now consider how the independent simulations behave for the successfully optimised HadAM3–7- and -14, and HadAM3P parameter sets. These, to remind the reader, are two simulations run with the same parameter set as the successful optimised case. All model observation differences are normalised by the diagonal elements of the covariance matrix which is dominated by our crude estimate of observational error.

For the HadAM3 7- and 14-parameter cases the optimised simulations are, for
many target observations, similar to the standard configuration
(Fig.

We now turn to the two optimised HadAM3P cases. These configurations have, like the standard HadAM3P, smaller biases in land air temperature across the three large regions we consider. This is particularly so in the northern hemispheric extra-tropics, suggesting that enhanced resolution improves this particular observation. However, this model has a much worse simulation of precipitation in the tropics, even with tuning, than does the HadAM3 case. Optimising the parameters does reduce biases in the HadAM3P model but not enough to support the claim that is better than its lower-resolution and computationally cheaper HadAM3 cousin.

Comparison of the optimised cases with the initial extreme random parameter choices gives a sense of how important variation in the parameters is for those observational biases. One thing that stands out is that large-scale biases in the tropics (Fig.

We now examine how the bias changes when we consider a period outside the
period we used to calibrate the model. Here we compare changes in bias
between March 2005–February 2010 and March 2000–February 2005. We normalise
by the expected internal variability. For most observations and optimised
configurations the bias does not significantly change between the two periods
(Fig.

So far we have focused on large-scale biases. We use Taylor
diagrams

We find that for land air temperature, 500

To test if calibrating atmospheric parameters results in reasonable coupled
models, we took the calibrated parameters from all successful 7- and
14-parameter cases in a set of control simulations
of HadCM3. The surface temperature
adjusts in the first decade (Fig.

For the HadAM3–7 cases we find that eight of the parameter combinations produce
temperatures within the target range (Table

Time series, from control ocean–atmosphere simulations, of annual-average
global-average 1.5

We now examine if there is any relationship between properties in the
atmospheric model simulation and the coupled model simulation. Above we
showed that RSR changes were somewhat larger than OLR changes and, across the
optimised parameter sets, RSR variability was larger, relative to its
uncertainty, than OLR variability (Fig.

For surface air temperature and volume average ocean temperature, there is
a relationship between atmospheric model RSR and coupled model values, with an
increase in atmospheric RSR leading to cooling in the coupled model
(Fig.

Scatter plot (symbols as in Fig.

Mean of absolute Jacobian for 7-

Our results suggest that calibrating the atmospheric component of a coupled model to multiple observations is computationally feasible, with the resulting coupled models behaving well much, but not all, of the time. However, we found that calibration of 14 parameters was less successful than that of 7 parameters. We now investigate potential reasons for this by looking at the Jacobian matrices from all 7- and non-random 14-parameter studies. We also examine the Jacobian of the HadAM3P 7-parameter cases to see if changing resolution affects the Jacobian, which might explain the failure of the HadAM3P-13r6 case.

We computed Jacobians for each iteration with the parameters normalised by
their range so that 0 (1) is the minimum (maximum) value and normalised each
bias by its simulated internal variability. To see which parameters have the
strongest effect on simulated observations, we compute the mean, over all
iterations, of the

We see that in the 7-parameter cases (Fig.

Examining the 14-parameter Jacobians (Fig.

The mean of the absolute Jacobians between the 14- and 7-parameter cases shows some differences in detail (compare Fig.

Looking at the absolute Jacobians from the HadAM3–7 computations (Fig.

Regarding the poor performance of the HadAM3-14r8 algorithm, it is unclear at this stage precisely what has caused it, given that HadAM3-14r6 behaves very well. We speculate that this may be caused by noise contamination, and that the fewer parameters we perturb in the algorithm, the smaller the chance of seeing the effect of noise. Alternatively there could be instability in the randomised algorithmic variant, again due to noise. We note that if the cost function is
smooth and accurate derivatives were available, one can easily observe improving rates of convergence for randomised block Gauss–Newton variants the more parameters one chooses in the block

As part of the development of our approach we carried out four trial cases where we started from parameter sets (Table

No differences except for starting parameter values.

Reduce

Use

Run model for 15 months, compare model and observations from March 2000 to February 2001 and scale internal covariance matrix by 5.

Start optimisation with standard HadAM3 parameters and use 14-parameter cost function.

All trials (Fig.

Various other studies have attempted to produce stable coupled models.

Using multi-annual, large-spatial-scale observations, we have automatically calibrated HadAM3 and HadAM3P. Much of the time we ended up with models that have similar cost functions to the standard configuration, or for HadAM3P, better than the standard configuration. We used two variants of the Gauss–Newton algorithm. One in which all parameters were varied and a second random block-coordinate variant in which a sub-set of the parameters, chosen at random on each iteration, were varied. For the studies in which we perturbed 7 parameters in HadAM3 we found that all cases converged, taking an average number of 68 evaluations for a total of 425 simulated years.

For the 14-parameter cases we used both the standard Gauss–Newton algorithm and a variant where a random number of parameters were selected. We tried two random cases. One in which 6 parameters were perturbed and an another in which 8 were perturbed. For each algorithm five studies starting from the same initial parameter choices were carried out. We find large differences in the performance of these algorithms, with the 6 random perturbation algorithm performing best, the 8 random perturbation cases worst and the standard Gauss–Newton algorithm performing intermediately. The 6-random case needs an expected number of 82 evaluations (or 512 simulated years) and, on average, produces models that are slightly better than the standard configuration. We found considerable sensitivity to the number of random parameters in the total number of iterations needed to produce acceptable models. This suggests that further work is needed to determine how many parameters should be perturbed.

As discussed above, the poor performance on the 14-parameter case seems to be due to some of the parameter perturbations having only a small impact on the cost function, leading to noise contamination of the line-search vector and causing the algorithm to head in random directions. The poor performance of the random variant that perturbed 8 out of the 14 parameters at random may also be due to noise contamination arising again from unimportant parameters being included, similarly to the full 14-parameter case, or causing some kind of algorithm instability. We recall that

We also found that several different parameter combinations led to models that were broadly comparable with the standard configurations. This suggests that HadAM3 exhibits equifinality

If these techniques could be successfully applied to state-of-the-art models it would be practical to do the following:

generate perturbed models to test if an observationally constrained ensemble has a narrow range of climate feedbacks;

add new parametrisations of processes to a model then recalibrate the model;

explore the effect of changing resolution without large changes in the simulation of large-scale climate.

Though our algorithm works reasonably well for a modest number of parameters,
it would benefit from a better understanding of the effect of noise on it.
Both the line-search through a selection effect and the computation of the
Jacobian/Hermitian matrices are affected by noise. A better algorithm would
identify parameters that did not appear to impact the cost function and
remove them from the analysis, as done by

Our work focused on optimisation rather than the cost function. We used a cost function based on crude estimates of observational uncertainty and a subjective choice of large-scale observations. Future work would benefit from much better estimates of observational uncertainty and an objective means of selecting observations. One approach might be to choose observations of which we have good evidence matter for climate feedbacks or other properties of the model we are concerned about.

Nevertheless, our results suggest that it is possible and computationally feasible to automatically calibrate the atmospheric component of a climate model and generate a plausible coupled model.

All data and software are available from CEDA at

We implemented and developed the algorithms described above using bash shell scripts and
ipython

SFBT, CC and MJM conceived the study. MJM implemented the software framework. KY implemented the GN algorithm, with guidance from CC, within the framework; carried out the 7-parameter studies; and did preliminary analysis. NE, supervised by CC, implemented and tested the random variant of Gauss–Newton algorithm. SFBT re-engineered the framework and carried out 14-parameter and high-resolution cases. SFBT wrote the paper and all commented upon it.

The authors declare that they have no conflict of interest.

This work was funded by NERC (NE/L012146/1) with simulations and post-processing done on the Edinburgh Compute and Data Facility. We thank Dan Williamson (Exeter) for providing R code to compute parameters from meta-parameters and Sam Pepler (BADC) for assistance in archiving data and software. We thank an anonymous referee and Peter Rayner for their review comments. Edited by: James Annan Reviewed by: Peter Rayner and one anonymous referee