An automatic and effective parameter optimization method for model tuning

Physical parameterizations in general circulation models (GCMs), having various uncertain parameters, greatly impact model performance and model climate sensitivity. Traditional manual and empirical tuning of these parameters is time-consuming and ineffective. In this study, a “three-step” methodology is proposed to automatically and effectively obtain the optimum combination of some key parameters in cloud and convective parameterizations according to a comprehensive objective evaluation metrics. Different from the traditional optimization methods, two extra steps, one determining the model’s sensitivity to the parameters and the other choosing the optimum initial value for those sensitive parameters, are introduced before the downhill simplex method. This new method reduces the number of parameters to be tuned and accelerates the convergence of the downhill simplex method. Atmospheric GCM simulation results show that the optimum combination of these parameters determined using this method is able to improve the model’s overall performance by 9 %. The proposed methodology and software framework can be easily applied to other GCMs to speed up the model development process, especially regarding unavoidable comprehensive parameter tuning during the model development stage.


Introduction
Due to their current relatively low model resolutions, general circulation models (GCMs) need to parameterize various sub-grid-scale processes.Physical parameterizations aim to approximate the overall statistical outcomes of various sub-grid-scale physics (Williams, 2005).However, due to the complexities involved in these processes, parameterizations representing sub-grid-scale physical processes unavoidably involve some empirical or statistical parameters (Hack et al., 1994), especially within cloud and convective parameterizations.Consequently, these parameterizations introduce uncertainties into climate simulations using GCMs (Warren and Schneider, 1979).In general, these uncertain parameters need to be calibrated or constrained when new parameterization schemes are developed and integrated into models (L.Li et al., 2013).
Traditionally, the uncertain parameters are manually tuned by comprehensive comparisons of model simulations with available observations.Such an approach is subjective, laborintensive, and hard to extend (Hakkarainen et al., 2012;Allen et al., 2000).By contrast, the automatic parameter calibration techniques have progressed quickly because of their efficiency, effectiveness and broader applications (Bardenet et al., 2013;Elkinton et al., 2008;Jakumeit et al., 2005;Chen et al., 1999).In previous studies applying to GCMs, the methods can be categorized into three major types based on the probability distribution function (PDF) method, optimization algorithms, and data assimilation techniques.
Published by Copernicus Publications on behalf of the European Geosciences Union.
For the PDF method, the confidence ranges of the optimization parameters are evaluated based on likelihood and Bayesian estimation.Cameron et al. (1999) approximate the forecast by the generalized likelihood uncertain estimation (Beven and Binley, 1992), a method obtaining parameter uncertainty ranges of a specific confidence level.The Bayesian Markov chain Monte Carlo (MCMC) method (Gilks, 1995) is widely used to obtain posterior probability distributions from prior knowledge.A couple of specific algorithms based on the MCMC theory are used to calibrate models in the previous literatures, such as Metropolis-Hasting (Sun et al., 2013), the adaptive Metropolis algorithm (Hararuk et al., 2014), and multiple very fast simulated annealing (MVFSA) (Jackson et al., 2008).The MVFSA method is 1 to 2 orders of magnitude faster than the Metropolis-Hasting algorithm (Jackson et al., 2004).However, these methods only attempt to determine the most likely area of uncertain parameters and cannot directly give the best combination of uncertain parameters with an optimum metrics value.Moreover, the PDF heavily depends on the likelihood function assumed, which is usually difficult to determine for climate system model tuning problems.
Optimization algorithms can be used to search the maximum or minimum metrics value in a given parametric space.Severijns and Hazeleger (2005) calibrate parameters of radiation, clouds, and convection in the Speedy model with the downhill simplex (Press et al., 1992;Nelder and Mead, 1965) to improve the radiation budget at the top of the atmosphere and at the surface, as well as the large-scale circulation.The downhill simplex is a fast convergence algorithm when the parametric space is not high-dimensional.However, it is a local optimization algorithm, not aiming to find the global optimal solution.Moreover, the algorithm has convergence issues when the simplex becomes ill-conditioned.Besides the downhill simplex, a few global optimization algorithms are introduced to tune uncertain parameters of climate system models, such as simulated stochastic approximation annealing (SSRR) (Yang et al., 2013), MVFSA (Yang et al., 2014), and multi-objective particle swarm optimization (MOPSO) (Gill et al., 2006).SSRR requires at least 10 000 steps to get a stable solution (Liang et al., 2013), and MVFSA also requires thousands of steps to get a stable solution (Jackson et al., 2004).MOPSO needs dozens of individual cases in each iteration.All these global optimization algorithms require a large number of model runs and very high computational cost during the model tuning process.
The data assimilation method has been well addressed for state estimation, and can be a potential solution for parameter estimation.Aksoy et al. (2006) estimates the parameter uncertainty in a mesoscale model (Grell et al., 1994) using the ensemble Kalman filter (EnKF).Santitissadeekorn and Jones (2015) presents a two-step filtering for the joint state parameter estimation with a combination method of particle filtering (PF) and EnKF.The EnKF and PF use an ensemble of model simulations to estimate the background error covariance, which approximate the traditional Kalman filter with a recurrence process (Evensen, 2003;Arulampalam et al., 2002).The accuracy of the error covariance relies on samples.In general, the larger the ensemble size, the more accurate the estimates are.The limitation of ensemble size for practice use and imperfect models make it difficult to select representative samples (Poterjoy et al., 2014).Moreover, as with the MOPSO method, they require a large number of model runs in each iteration with greatly increased computational cost.
A climate system model is a strongly nonlinear system, having a large number of uncertain parameters.As a result, the parametric space of a climate system model is highdimensional, multi-modal, strongly nonlinear, and inseparable.More seriously, one model run of a climate system model might require tens or even hundreds of years of simulation to get scientifically meaningful results.
To overcome these challenges, we propose a "three-step" strategy to calibrate the uncertain parameters in climate system models effectively and efficiently.First, the Morris method (Morris, 1991;Campolongo et al., 2007), a global sensitivity analysis method, is chosen to eliminate the insensitive parameters by analyzing the main and interactive effects among parameters.Another global method by Sobol (Sobol, 2001) is used to validate the results of the Morris method.Second, a pre-processing of initial values of selected parameters is presented to accelerate the convergence of optimization algorithm and to resolve the issue of ill-conditioned problems.Finally, the downhill simplex algorithm is used to solve the optimization problem because of its low computational cost and fast convergence for low-dimensional space.Taking into account the complex configuration and manipulation of model tuning, an automatic workflow is designed and implemented to make the calibration process more efficient.The method and workflow can be easily applied to GCMs to speed up model development processes.
The paper is organized as follows.Section 2 introduces the proposed automatic workflow.Section 3 describes the details of the example model, reference data, and calibration metrics.The three-step calibration strategy is presented in Sect. 4. Section 5 evaluates the calibration results, followed by a summary in Sect.6.

The end-to-end automatic calibration workflow
We design a software framework for the overall control of the tuning practice.This framework can automatically execute any part of our proposed three-step calibration strategy, determine the optimal parameters, and produce its corresponding diagnostic results.It incorporates various tuning methods and facilitates model tuning processes with minimal manual management.It effectively manages the dependence and calling sequences of various procedures, including parameter sampling, sensitivity analysis and initial value se-Geosci.Model Dev., 8, 3579-3591, 2015 www.geosci-model-dev.net/8/3579/2015/lection, model configuration and running, and evaluation of model outputs using user-provided metrics.Users only need to specify the model to tune, the parameters to be tuned with their valid ranges, and the calibration method to use.There are four main modules within the framework as shown in Fig. 1.The scheduler module manages model simulations with the capability for simultaneous runs.It also coordinates different tasks to reduce the contention and improve throughput.Simulation diagnosis and evaluation are included in a post-processing module.The preparation module contains various sensitivity analysis and sampling methods, such as the Morris (Morris, 1991;Campolongo et al., 2007), Sobol (Sobol, 2001), full factorial (FF) (Raktoe et al., 1981), Latin hypercube (LH) (McKay et al., 1979), Morris one-at-a-time (MOAT) (Morris, 1991), and central composite designs (CCD) (Hader and Park, 1978) methods.The sensitivity analysis is able to eliminate the duplicated samples to reduce unnecessary model runs.A MCMC method based on adaptive Metropolis-Hastings algorithms is also provided to get the posterior distribution of uncertain parameters.The tuning algorithm module offers various local and global optimization algorithms including the downhill simplex, genetic algorithm, particle swarm optimization, differential evolution and simulated annealing.In addition, all the intermediate metrics and their corresponding parameters within the framework are stored in a MySQL database and can be used for posterior knowledge analysis.More importantly, the workflow is flexible and expandable for easy integration of other advanced algorithms as well as tools like the Problem Solving Environment for Uncertainty Analysis and Design Exploration (PSUADE) (Tong, 2005) and the Design Analysis Kit for Optimization and Terascale Applications (DAKOTA) (Eldred et al., 2007).Although uncertainty quantification toolkits such as PSUADE and DAKOTA support various calibration and uncertainty analysis methods and pre-defined function interfaces, they cannot organize the above model tuning process as effectively as the proposed model tuning framework.

Model description and reference metrics
We use the Grid-point Atmospheric Model of IAP LASG version 2 (GAMIL2) as an example for the demonstration of the tuning workflow and our calibration strategy.GAMIL2 is the atmospheric component of the Flexible Global-Ocean-Atmosphere-Land System Model grid version 2 (FGOALS-g2), which participated in the CMIP5 (the fifth phase of the Coupled Model Intercomparison Project) program.The horizontal resolution is 2.8 • × 2.8 • , with 26 vertical levels.GAMIL2 uses a finite-difference scheme that conserves mass and energy (Wang et al., 2004).A two-step shape-preserving advection scheme (Yu, 1994) is used for tracer advection.Compared to the pervious version, GAMIL2 has modifications in cloud-related processes (L.Li et al., 2013), such as the deep convection parameterization (Zhang and Mu, 2005), the convective cloud fraction (Xu and Krueger, 1991), the cloud microphysics (Morrison and Gettelman, 2008), and the stratiform fractional cloud condensation scheme (Zhang et al., 2003).More details are in L. Li et al. (2013).Empirical tunable parameters are selected from schemes of deep convection, shallow convection, and cloud fraction schemes (Table 1).Default parameter values are from the model configuration for CMIP5 experiments.
To save computational cost, atmosphere-only simulations are conducted for 5 years using prescribed seasonal climatology (no interannual variation) of SST and sea ice.Previous studies have shown that a 5-year type of simulation is enough to capture the basic characteristics of simulated mean climate states (Golaz et al., 2011;Lin et al., 2013).The goal of these simulations is not to determine their resemblance to observations, but to compare the results between the control simulation and various tuned simulations.
Model tuning results depend on the reference metrics used.For a simple justification, we use some conventional climate variables for the evaluation.Wind, humidity, and geopotential height are from the European Center for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA)-Interim reanalysis from 1989 to 2004 (Simmons et al., 2007).We use the GPCP (Global Precipitation Climatology Project, Adler et al., 2003) for precipitation and the ERBE (Earth Radiation Budget Experiment, Barkstrom, 1984) for radiative fields.All observational and reanalysis data are gridded to the same grid as GAMIL2 before the comparison.Note that the reference metrics can be extended, depending on the model performance requirement.
The reference metrics, including various variables in Table 2, is used to quantitatively evaluate the performance of overall simulation skills (Murphy et al., 2004;Gleckler et al., 2008;Reichler and Kim, 2008).The calibration RMSE is defined as the spatial standard deviation (SD) of the model simulation against observations/re-analysis, as in Eq. ( 1) (Taylor, 2001;Yang et al., 2013).For an easy comparison, we normalize the RMSE of each simulation output by that of the control simulation using default parameter values.We introduce an improvement index to evaluate the tuning results, which weight each variable equally and compute the average normalized RMSE.The index indicates an overall improvement of the performance of the tuned simulation relative to the control simulation based on a number of model outputs (Table 2).If the index is less than 1, it means the tuned simulation gets better performance than the control run.The smaller this value, the better the improvement is.The output is the optimal parameters and their corresponding diagnostic results after calibration.The preparation module provides the parameter sensitivity analysis.The tuning algorithm module offers local and global optimization algorithms including the downhill simplex, genetic algorithm, particle swarm optimization, differential evolution, and simulated annealing.The scheduler module schedules as many as cases to run simultaneously and coordinates different tasks over a parallel system.The post-processing module is responsible for metrics diagnostics, re-analysis and observational data management.
x F m (i) is the model outputs, and x F o (i) is the corresponding observation or reanalysis data.x F r (i) is model outputs from the control simulation using the default values for the parameters in Table 1.w is the weight due to the different grid area on a regular latitude-longitude grid on the sphere.I is the total grid number in model.N F is the number of the chosen variables.

Parameter tuning with global and local optimization methods
Parameter tuning for a climate system model is intended to solve a global optimization problem in theory.As the wellknown global optimization algorithms, traditional evolutionary algorithms, such as the genetic algorithm (Goldberg et al., 1989), differential evolutionary (DE) (Storn and Price, 1995), and particle swarm optimization (PSO) (Kennedy, 2010) algorithms, can approach the global optimal solution, but generally require high computational cost (Hegerty et al., 2009;Shi and Eberhart, 1999).This is because these algorithms are designed following biological evolution of survival of the fittest.In contrast, the local algorithms utilize the greedy strategy, and thus may tap into a locally optimal solution after convergence.The advantage of local algorithms is the low computational cost due to the relatively fewer samples required.In this sense, the local optimization algorithms are the viable options considering their significantly reduced computational cost.We choose the downhill simplex method for climate model tuning considering its relatively low computation cost.The downhill simplex method searches the optimal solution by changing the shape of a simplex, which represents the optimal direction and step length.A simplex is a geometry, consisting of N + 1 vertexes and their interconnecting edges, where N is the number of calibration parameters.One vertex stands for a pair of a set of parameters and their improvement index as defined in Eq. (3).The new vertex is determined by expanding or shrinking the vertex with the highest metrics value, leading to a new simplex (Press et al., 1992;Nelder and Mead, 1965).
Two performance criteria are used to evaluate the effectiveness and efficiency of the optimization algorithms in this study.Selection of optimization algorithms for parameter  (efficiency).In this study, model improvement is measured by an index defined in Eq. ( 3).The lower this value is, the better the model tuning is.Computational cost is measured by "core hours", which stands for the computational efficiency.It is computed by (N step ) × (N size )× (the number of processes of a single model run) × (hours used for a single 5-year model run ).N step is the total number of iterations of optimization algorithms for convergence.N size is the number of model runs during each iteration, and it is 1 for the downhill simplex method.In the GAMIL2 case, each model run takes 6 h using 30 processes.
According to tuning GAMIL2, two global methods, PSO and DE, give better tuning effectiveness than the downhill simplex method, but their computational costs are approximately 4 and 5 times, respectively, those of the downhill simplex method (Table 3).
To improve the effectiveness of the downhill simplex method, we propose two important steps to significantly improve its performance.In the first step, the number of tuning parameters is reduced by eliminating the insensitive parameters.In the second step, fast convergence is achieved by preselecting proper initial values for the parameters before using the downhill simplex method.

Parameter sensitivity analysis
The number of uncertain parameters in physical parameterizations of a climate system model is quite large.Most optimization algorithms, such as PSO, the downhill simplex method, and the simulated annealing algorithm (Van Laarhoven and Aarts, 1987), are ineffective in highdimensional problems.Iterations for convergence will increase exponentially with the number of tuning parameters.In addition, climate models generally need a long simulation to have meaningful results.Therefore, the high-dimensional parameter tuning problem suffers from an extremely high computational cost.It is necessary to reduce the parameter dimension before the optimization.
Parameter sensitivity analysis can be divided into local and global methods (Gan et al., 2014).The local method determines the sensitivity of a single parameter by perturbing one parameter with all other parameters fixed.Consequently, it does not consider the combined sensitivity of multiple parameters.On the other hand, the global method perturbs all the parameters to explore the sensitivity of the whole parametric space.In this study, the Morris method (Morris, 1991;Campolongo et al., 2007), a global method, is used to screen out the sensitive parameters.Another global method (Sobol, 2001)   The Morris method, based on the MOAT sampling strategy, reduces the number of samples required by other global sensitivity methods (J.Li et al., 2013).Note that a sample is a set of all parameters, not just one parameter.The method is described briefly here, and more details can be found in Morris (1991).Assuming we have k parameters, relative to a random sample S 1 = {x 1 , x 2 , . .., x k }, another sample S 2 = {x 1 , x 2 , . .., x i + i , . .., x k } can be constructed by perturbing the ith parameter by i , where i is a perturbation of this parameter.The elementary effect of the ith parameter x i is defined as where f stands for the improvement index as defined in Eq. (3).A third sample S 3 = {x 1 , x 2 , . .., x i + i , . .., x j + j , . .., x k } can be generated by perturbing another parameter, where j is not i.In doing so k times, we will get k + 1 samples {S 1 , S 2 , . .., S k+1 } and k elementary effects {d 1 , d 2 , . .., d k } after perturbing all the parameters.The vector of {S 1 , S 2 , . .., S k+1 } is called a trajectory.This procedure is repeated for r iterations, and finally we get r trajectories.The starting point of any trajectory is selected randomly as well as the ordering of the parameters to perturb and the for each perturbation in one trajectory.In practice, a number of 10 to 50 trajectories is enough to determine the feasible sensitivity of parameters (Gan et al., 2014;Morris, 1991).In this study, we have a total of seven parameters, and 80 simulations are conducted.
We define D = {D i (t)}, where t is the tth trajectory, and i is the ith elementary effect of the parameter x i .µ i , the mean of |d i |, and σ i , the standard deviation of d i , are used to measure the parameter sensitivity, defined as (5) µ i estimates the effect of x i on the model improvement index as defined in Eq. ( 3), while σ i assesses the interactive effect of x i with other parameters.Those parameters with large µ i and σ i are the sensitive parameters.The Morris method results are shown in Fig. 2. The parameter elimination step is critical for the final result of model tuning.To validate the results obtained by the Morris method, we compare the results with a benchmark method (Sobol, 2001).Based on variance decomposition, the Sobol method requires more samples than the Morris method, leading to a higher computation cost.The variance of the model output can be decomposed as Eq. ( 7), where n is the number of parameters, V i is the variance of the ith parameter, and V ij is the variance of the interactive effect between the ith and j th parameters, and so on.The total sensitivity effect of the ith parameter can be presented as Eq. ( 8), where V −i is the total variance except for the x i parameter.The Sobol results are shown in Fig. 3.The screened-out parameters are the same as those of the Morris method.8) is denoted by the size of the color area.The total sensitivities of ke, c0_shc, and capelmt are less than 0.5 in terms of each variable.

Proper initial value selection for the downhill simplex method
The downhill simplex method is a local optimization algorithm and its convergence performance strongly depends on the quality of the initial values.We need to find the parameters with the smaller metrics around the final solution.Moreover, we have to finish the searching as fast as possible with minimal overhead.For these two objectives, a hierarchical sampling strategy based on the single parameter perturbation (SPP) sample method is used.The SPP is similar to local sensitivity methods, in which only one parameter is perturbed at one time with other parameters fixed.The perturbation samples are uniformly distributed across parametric space.First, the improvement index as defined in Eq. ( 3) of each parameter sample is computed.The distance is defined as the difference between the improvement indexes using two adjacent samples, i.e., the model response measured by a certain percentage change of one parameter.We call this step the first-level sampling.The specific perturbation size for one parameter can be set based on user experience.In our implementation, the user needs to set the number of samples.
For the first-level sampling, we can use a larger perturbation size to reduce computational cost.If the distance between two adjacent samples is greater than a predefined threshold, more SPP samples between the previous two adjacent samples are conducted, and this is called the second-level sampling.Finally, k+1 samples with the best improvement index value are chosen as the candidate initial values for the optimization method.With this hierarchical sampling strategy, we can determine the local parametric space for the final solution and can accelerate the convergence of the following downhill simplex method.This procedure is described in Algorithm 1.It is easy to implement and has lower overhead compared to other complex adaptive sampling methods.

T. Zhang et al.: Parameter optimization method for model tuning 7 6 Conclusions
An effective and efficient three-step method for GCM physical parameter tuning is proposed.Compared with conventional methods, a parameter sensitivity analysis step and a proper initial value selection step are introduced before the low cost downhill simplex method.This effectively reduces the computational cost with an overall good performance.
In addition, an automatic parameter calibration workflow is designed and implemented to enhance operational efficiency and support different uncertainty quantification analysis and calibration strategies.Evaluation of the method and workflow by calibrating GAMIL2 model indicates the three-step method outperforms the two global optimization methods (PSO and DE) in both effectiveness and efficiency.A better trade-off between accuracy and computational cost is achieved compared with the two-step method and the original downhill simplex method.The optimal results of the three-step method demonstrate that most of the variables are improved compared with the control simulation, especially for the radiation related ones.The mechanism analysis is conducted to explain why these radiation related variables have an overall improvement.In future work, more analyses are needed to better understand the model behavior along with the physical parameter changes.The choosing of appropriate reference metrics and related observations are very important for the final tuned model performance.In future studies, we are going to use the more reliable and accurate observations, and add some constraint conditions for parameters tuning to construct a more comprehensive and reliable metrics.TS1 At the same time, inappropriate initial values may lead to ill-conditioned simplex geometry, which can be found in the model tuning process.One issue we meet is that some vertexes in the downhill simplex optimization may have the same values for one or more parameters.As a result, these parameters remain invariant during the optimization, and this may degrade the quality of the final solution as well as the convergence speed.A simplex checking is conducted to keep as many different values of parameters as possible during the process of looking for initial values.Well-conditioned simplex geometry will increase the parameter freedom for optimization.In our implementation (Algorithm 1), the vertex leading to the ill-conditioned simplex is replaced by another parameter sample that gives another minimum improvement index value.
These methods mentioned above are summarized as the initial value pre-processing of the downhill simplex algorithm.Sometimes, the samples used during the initial value selection are the same as those in the parameter sensitivity analysis step.In this case, one model run can be used in both steps to further reduce the computational cost.

Evaluation of the proposed strategy
The effectiveness and efficiency of the three traditional algorithms are compared in Table 3. "Downhill_1_step" represents the original downhill simplex method, which is one of the most widely used local optimization algorithms and has been successfully used in the Speedy model (Severijns and Hazeleger, 2005).PSO and DE are the most widely used global optimization algorithms and are easy to use.Although "Downhill_1_step" achieves slightly worse improvement compared to the two global optimization methods (Table 3), its computation cost is much less (only 20 and 28 % of DE and PSO, respectively).Two extra steps are included before the original downhill simplex method to overcome its limited effectiveness on model performance improvement.The "Downhill_2_steps" method includes an initial value pre-processing step before the downhill simplex method, and the "Downhill_3_steps" method further introduces another step to eliminate insensitive parameters for tuning by sensitivity analysis.The two steps bring in additional overhead, 80 samples for the parameter sensitivity analysis with the Morris method, and 25 samples for the initial value pre-processing.Tables 3 and 4 show that the proposed "Downhill_3_steps" achieves the best effectiveness, improving the model's overall performance by 9 %.It overcomes the inherent ineffectiveness of the original downhill simplex method with a much lower computational cost than global methods.

Analysis of model optimal results
This section compares the default simulation and the tuned simulation by the three-step method, with a focus on the cloud and TOA radiation changes.Table 1 shows the values of the four pairs of sensitive parameters between the control (labeled as CNTL) and optimized (labeled as EXP) simulations.Significant change is found for c0, which represents the auto-conversion coefficient in the deep convection scheme, and rhminh, which represents the threshold relative humidity for high cloud appearance.The other two parameters have negligible change of the values before and after the tuning, and thus it is expected that their impacts on model performance will be accordingly small.
The overall improvement after the tuning from the control simulation can be found in the Taylor diagram (Fig. 4), with improvement for almost all the variables, especially for the meridional winds and mid-tropospheric (400 hPa) humidity.Improvements for the other variables are relatively small.The change in terms of the RMSE factor over the globe and three regions (tropics, SH middle and high latitudes and NH middle and high latitudes) are shown in Fig. 5. First, radiative fields and moisture are improved over all four areas.By contrast, wind and temperature field changes are more diverse among different areas.For example, temperatures over the tropics become worse compared to the control run.There is an overall improvement at the SH middle and high latitudes for all variables except for the 200 hPa temperature.
Winds and precipitation at the NH middle and high latitudes become slightly worse in the tuned simulation.Such changes are somewhat intriguing and we attempt to relate these changes to the two parameters significantly tuned.
With a reduced RH threshold for high cloud (from 0.78 in CNTL to 0.63 in EXP, Table 1), the stratiform condensa-tion rate increases and the atmospheric humidity decreases (Zhang et al., 2003).In addition, with an increased autoconversion coefficient in the deep convection, less condensate is detrained to the environment.As a result, the middle and upper troposphere is overall drier, especially over the tropics, where deep convection dominates the vertical moisture transport (Fig. 6c).Although the middle and upper troposphere becomes drier over the tropics, the reduced RH threshold for high cloud makes clouds easier to be present.Consequently, middle and high clouds increase over the globe, especially over the middle and high latitudes, with the largest increase up to 4-5 % (Fig. 6f).In the tropics, due to the drier tendency induced by the reduced detrainment, high cloud increase is relatively small (2-3 %) compared to the middle and high latitudes.On the contrary, low cloud below 800 hPa decreases by 1-2 % over the middle and high latitudes, with slightly decreased RH (Fig. 6) because of the negligible change of RH threshold for low cloud (Table 1).Overall, the combined effects of all relevant parameterizations lead to the changes in atmospheric humidity and cloud fraction.
Changes in moisture and cloud fields impact radiative fields.With reference to ERBE, TOA outgoing long-wave radiation (OLR) is improved at the mid-latitudes for EXP, but it is degraded over the tropics (Fig. 7a).Compared with the CNTL, middle and high cloud significantly increase in the EXP (Fig. 6).Consequently, it enhances the blocking effect on the long-wave upward flux at TOA (FLUT), reducing the FLUT in mid-latitudes of the Southern Hemisphere and Northern Hemisphere (Fig. 7a).Clear-sky OLR increases for the EXP and this is due to the drier upper troposphere in the EXP (Fig. 6).The decrease in the atmospheric water vapor reduces the greenhouse effect.Therefore, it emits more out-going long-wave radiation and reduces the negative bias of clear-sky long-wave upward flux at TOA (FLUTC, Fig. 7b).Long-wave cloud forcing (LWCF) at the middle and high latitudes is improved due to the improvement in the FLUT in these areas (Fig. 7c), but improvement in the tropics is negligible due to the cancellation between the FLUT and FLUTC.Overall, the tuned simulation has a TOA radiation imbalance of 0.08 W m −2 , which is better than that of the control run (0.8 W m −2 ).
TOA clear-sky short-waves are the same between the control and the tuned simulation since both simulations have the same surface albedo.With increased clouds, the tuned simulation has smaller TOA short-waves absorbed than the control.Compared with ERBE, the tuned simulation has better TOA short-waves absorbed at the middle and high latitudes, but it slightly degrades over the tropics.

Conclusions
An effective and efficient three-step method for GCM physical parameter tuning is proposed.Compared with conventional methods, a parameter sensitivity analysis step and a proper initial value selection step are introduced before the low-cost downhill simplex method.This effectively reduces the computational cost with an overall good performance.In addition, an automatic parameter calibration workflow is designed and implemented to enhance operational efficiency and support different uncertainty quantification analysis and calibration strategies.Evaluation of the method and workflow by calibrating the GAMIL2 model indicates that the three-step method outperforms the two global optimization methods (PSO and DE) in both effectiveness and efficiency.A better trade-off between accuracy and computational cost is achieved compared with the two-step method and the original downhill simplex method.The optimal results of the three-step method demonstrate that most of the variables are improved compared with the control simulation, especially for the radiation-related ones.The mechanism analysis is conducted to explain why these radiation-related variables have an overall improvement.In future work, more analyses are needed to better understand the model behavior along with the physical parameter changes.The choice of appropriate reference metrics and related observations are very important for the final tuned model performance.In future studies, we are going to use the more reliable and accurate observations, and add some constraint conditions for parameter tuning to construct a more comprehensive and reliable metrics.

Figure 1 .
Figure1.The structure of the automatic calibration workflow.The input of the workflow is the parameter set interest and their initial value ranges.The output is the optimal parameters and their corresponding diagnostic results after calibration.The preparation module provides the parameter sensitivity analysis.The tuning algorithm module offers local and global optimization algorithms including the downhill simplex, genetic algorithm, particle swarm optimization, differential evolution, and simulated annealing.The scheduler module schedules as many as cases to run simultaneously and coordinates different tasks over a parallel system.The post-processing module is responsible for metrics diagnostics, re-analysis and observational data management.

Figure 2 .
Figure2.Scatter diagram showing the parameter sensitivity using the Morris sensitivity analysis.The x axis stands for the main effect sensitivity of a single parameter.The y axis stands for the interactive effect sensitivity among multi-parameters.In GAMIL2, c0, rhminl, rhminh, and cmftau have high sensitivity and ke, c0_shc, and capelmt have low sensitivity.

Figure 3 .
Figure3.Sensitivity analysis results from the Sobol method.The total sensitivity in Eq. (8) is denoted by the size of the color area.The total sensitivities of ke, c0_shc, and capelmt are less than 0.5 in terms of each variable.

Figure 5 .
Figure 5. Improvement indices over the global, tropical and midhigh latitudes of the Northern and Southern Hemisphere (MLN and MLS) for each variable of the EXP simulation.

Figure 7 .
Figure 7. Meridional distributions of the annual mean difference between EXP/CNTL and observations of TOA outgoing long-wave radiation (a), TOA clear-sky outgoing long-wave radiation (b), TOA long-wave cloud forcing (c), TOA net short-wave flux (d), TOA clear-sky net short-wave flux (e), and TOA short-wave cloud forcing (f).

Table 1 .
A summary of parameters to be tuned in GAMIL2.The default and final tuned optimum values are also shown.The valid range of each parameter is also included.Note that only four sensitive parameters are tuned and have optimum values.× 10 −4 1. × 10 −4 -5.4 × 10 −3 5.427294 × 10 −4

Table 2 .
Atmospheric fields included in the evaluation metrics and their sources.
is used to validate the results of the Morris method.

Table 3 .
Effectiveness and efficiency comparison between the original downhill simplex method and the two global methods.N step is the total number of calibrating iterations for convergence.N size is the number of model runs during each iteration.Core hours is computed by N step × N size × {the number of processes of a single model run} × {hours used for a single 5-year model run}.In the GAMIL2 case, each model run takes 6 h and uses 30 processes.

Table 4 .
The same as Table3but showing the comparison among the three downhill simplex methods.

.
Taylor diagram of the climate mean state of each output variable from 2002 to 2004 of EXP and CNTL.