The Met Office Global Coupled model 2 . 0 ( GC 2 ) configuration

The latest coupled configuration of the Met Office Unified Model (Global Coupled configuration 2, GC2) is presented. This paper documents the model components which make up the configuration (although the scientific description of these components is detailed elsewhere) and provides a description of the coupling between the components. The performance of GC2 in terms of its systematic errors is assessed using a variety of diagnostic techniques. The configuration is intended to be used by the Met Office and collaborating institutes across a range of timescales, with the seasonal forecast system (GloSea5) and climate projection system (HadGEM) being the initial users. In this paper GC2 is compared against the model currently used operationally in those two systems. Overall GC2 is shown to be an improvement on the configurations used currently, particularly in terms of modes of variability (e.g. mid-latitude and tropical cyclone intensities, the Madden–Julian Oscillation and El Niño Southern Oscillation). A number of outstanding errors are identified with the most significant being a considerable warm bias over the Southern Ocean and a dry precipitation bias in the Indian and West African summer monsoons. Research to address these is ongoing.


Introduction
The Met Office produces forecasts across a range of timescales from numerical weather predictions (NWP) for days ahead or less, through monthly-seasonal-decadal forecasts, to climate change projections.For over 20 years, the framework of the Met Office Unified Model (MetUM, Cullen, 1993) has been used to produce models which between them span these timescales.Over the last few years, the development of the science within the MetUM has been made more seamless across timescales than ever before, with numerous benefits including greater scientific robustness of the model, improved ability to investigate model biases and a more efficient use of resources (Brown et al., 2012).Model development now progresses on an approximately annual timescale with a new configuration of the coupled atmosphere-land-ocean-sea-ice model (and components, e.g.atmosphere-land for short-range NWP) being released each year for use across timescales by the Met Office and its collaborators (Walters et al., 2011).
The latest configuration of the coupled model, released in March 2014, is known as Global Coupled model 2.0 (GC2).This is comprised of component configurations Global Atmosphere 6.0 (GA6.0),Global Land 6.0 (GL6.0),Global Ocean 5.0 (GO5.0) and Global Sea Ice 6.0 (GSI6.0).GA6.0 and GL6.0 are fully documented by Walters et al. (2015), whilst GO5.0 is described by Megann et al. (2014) and GSI6.0 by Rae et al. (2015).In this paper we provide a technical description of the coupling between the components and then present the coupled model performance in terms of systematic errors through a range of diagnostic techniques.We do not discuss predictions/projections from GC2 as these will be presented elsewhere (e.g.Senior et al., 2015).Currently, coupled models are used in Met Office systems on monthly and longer timescales, hence most of the results presented here will be for the seasonal forecasting system (referred to as GloSea5-GC2) and the climate model (referred to as HadGEM3-GC2).In each case, comparisons will be made against the current "operational" configuration which is GloSea5-GA3 for seasonal (MacLachlan et al., 2015) and HadGEM2-AO for climate (The HadGEM2 Development Team, 2011).It should be noted that, unlike the new GC2 configuration which is identical in the different systems, these two control configurations from which we are upgrading differ significantly.It is envisaged that future coupled configurations will also start to be used on shorter NWP timescales, hence a few results are also included from these timescales.
The "physical" model presented here does not include Earth system components such as interactive vegetation or ocean bio-geochemistry (note, our definition of the physical model does include interactive aerosols).Due to the additional resource required to build an Earth system model (ESM), the intention is that Earth system components will be built on top of a subset of the annual physical model releases to form an ESM every 6 years or so.GC2 will not be a configuration to which Earth system components will be added, although it is envisaged that the next coupled model release (GC3) will be developed into an ESM.
In the next section we provide details of the coupling and experiments subsequently presented.In Sect. 3 the climatological biases of the model are discussed, whilst systematic errors in mid-latitude variability are presented in Sect.4, and tropical variability in Sect. 5. We summarize in Sect.6.

Coupled model details
The GC2 configuration is defined by the combination of the component model scientific configurations (GA6.0,GL6.0, GO5.0, GSI6.0) and associated choices about the way these model components are coupled together.The component models are fully documented in the model description sections of Walters et al. (2015), Megann et al. (2014) and Rae et al. (2015), whilst the technical details of the coupling are described below.
Relative to GloSea5-GA3, GC2 has a significant revision to the atmosphere dynamical core and a number of parametrization revisions.HadGEM2-AO predates GloSea5-GA3, so relative to HadGEM2-AO, GC2 has additional changes including a new ocean model, new sea-ice model, new cloud scheme, and considerable revisions to all of the existing parametrization schemes.
The vertical resolution is set by the component definitions, being 85 levels in the atmosphere (with a top at 85 km), four soil levels, 75 levels in the ocean (with a 1 m top level) and five sea-ice thickness categories.The ocean resolution is 0.25 • on a tri-polar grid.The GA6 science can be run over a wide range of horizontal resolutions on a regular latitude-longitude grid with no explicit changes to model parametrizations, however results presented here all use a horizontal resolution of N216 (60 km in mid-latitudes).The atmosphere and ocean horizontal and vertical resolutions presented here are an increase on HadGEM2-AO (which uses an N96 (135 km) L38 atmosphere and 1 • L40 ocean) but the same as GloSea5-GA3.

Description of coupling
The atmosphere (UM) and land surface (JULES, the Joint UK Land Environment Simulator; Best et al., 2011) models run on the same grid and as part of the same model executable so can be considered to be "tightly coupled", passing data where necessary by subroutine arguments or shared data arrays.Similarly the ocean (NEMO (Nucleus for European Modelling of the Ocean); Madec, 2008) and sea-ice (CICE, Hunke and Lipscomb, 2004) models are compiled into a single executable and are "tightly coupled" on the same grid (with the caveat that CICE uses an Arakawa "B grid" placement of velocities in contrast to the "C grid" in NEMO).
Any relevant details of the UM-JULES and NEMO-CICE coupling are largely covered by Walters et al. (2015) and Megann et al. (2014) respectively, so here the focus is on the coupling of GA6.0/GL6.0 with GO5.0/GSI6.0 using the OA-SIS3 coupler (Valcke, 2013).As already mentioned, although the atmosphere (plus land surface) science can be run over a wide range of horizontal resolutions, this is not true for the ocean (plus sea-ice) configuration which is fixed at 0.25 • (using the ORCA025 tri-polar grid; Madec, 2008).This means that GC2 coupled configurations are limited to those using an ORCA025 ocean.At present no resolution-dependent choices have been made in the details of the atmosphereocean coupling although this will not necessarily be true in all future GC configurations.
The coupled model infrastructure remains essentially unchanged from that described by Hewitt et al. (2011).The atmosphere and ocean models run concurrently with OASIS3 (now at version 3.0) handling the exchange and interpolation of model fields between the two executables.OASIS restart dumps are not used and so all relevant fields to initialize the component models at start-up are stored in their restart dumps.Given that OASIS fulfils a technical and (relatively) simple interpolation task it might be envisaged that the same coupled scientific configuration could be reproduced using an alternative coupler.This may theoretically be true but currently details of the way models are sequenced, along with interpolation options available, mean that OASIS3 (although not necessarily the specific code version) is considered to be part of the definition of GC2.
The momentum, freshwater and heat fluxes passed from the atmosphere via OASIS to the ocean are largely as described for "HadGEM3-AO r1.1" in Hewitt et al. (2011).To ensure energy conservation, the coupling part of the NEMO name-list is set to ensure that in most cases there are separate coupling fields received in NEMO as relevant to ocean (solar and non-solar heat fluxes; evaporation) or sea-ice (top and bottom conductive heat fluxes as calculated in the JULES land surface model; sublimation).These fields are converted to mean values over atmosphere grid boxes before being conservatively interpolated by OASIS, and once received by NEMO are applied to the ocean or sea-ice component as appropriate.Where necessary, CICE can pass any excess heat or freshwater fluxes back to NEMO -this may be required if the interpolation of coupling fields produces sea-ice fluxes in ocean grid boxes without sea-ice, or if the sea-ice melts either between coupling exchanges or within a CICE time step.The wind stress components provided from the atmosphere model are currently mean values which are assumed to apply equivalently to ocean and sea-ice.
There are a number of minor changes since the configuration described by Hewitt et al. (2011).Firstly the coupling period is now 3 h for most GC2 simulations to allow the diurnal cycle to be better resolved in both atmosphere and ocean boundary layers (the NWP simulations use hourly coupling -this is something we intend to unify across timescales in future configurations and will aid in reducing the inherent lag in ocean forcing fields as a result of running atmosphere and ocean components concurrently).To ensure conservation, coupling fields passed from atmosphere to ocean are still time-averages but now over a 3 h (1 h for NWP) rather than a 24 h period.In addition, a constant field representing iceberg calving is now added to the run-off field within the atmosphere model before passing to OASIS.There has also been a change to the solar radiation field passed from the atmosphere to allow the use of the RGB (red-green-blue) penetrative radiation scheme in GO5.0.
Coupling fields (sea surface temperature, surface velocities, ice fraction, ice and snow thickness) passed from the ocean to the atmosphere are instantaneous fields, but again at the new coupling frequency.Consistent with the treatment of momentum fluxes described above, the surface velocities passed to the atmosphere model are simply mean ocean and sea-ice values, weighted according to ice fraction.Hewitt et al. (2011) described some of the choices made for the interpolation schemes for atmosphere to ocean and vice versa.These were made based on detailed assessment of regridding between the N96 atmosphere grid and the ORCA1 ocean grid and have not been re-examined for the higher-resolution ORCA025 grid (although N216-ORCA025 is a comparable resolution combination to N96-ORCA1, so similar conclusions are expected to be valid).Hence, with the exception of vector fields which all use bi-linear interpolation, atmosphere-to-ocean fields are regridded using firstorder conservative interpolation (to avoid undershoots and overshoots for fields which must be positive everywhere) whereas second-order conservative interpolation is used for ocean-to-atmosphere fields.
For long climate integrations, energy and freshwater budgets are clearly critical and so conservation of both heat and freshwater across the coupler has been checked in the GC2 configurations and found to be accurate to within around 10 −4 W m −2 (equivalent top of the atmosphere flux) and 10 −5 Sv respectively.These numbers are smaller than the internal conservation errors of some of the individual model components and are therefore not viewed as significant.
Although OASIS3 has the capability of generating interpolation weights at run-time, we continue to calculate these weights off-line using SCRIP (Jones, 1999).This is much more efficient, traceable and also allows some minor adjustments to be made where weights are otherwise calculated incorrectly due to complications caused by the north fold of the tri-polar ocean grid.The method for coupling the ocean component with the UM atmosphere is such that the ocean grid determines the coastline (with land fractions in all grid boxes as either 0 or 1) but the atmosphere model then uses "coastal tiling" allowing the grid box land fractions around the coast to take a value between 0 and 1 (calculated by interpolating the ocean land-sea mask onto the atmosphere grid).A consequence of the way the atmosphere deals with ocean information on these fractional land grid boxes is that when ocean fields are regridded to the atmosphere the OA-SIS3 "FRACAREA" option is used (rather than the standard "DESTAREA").Equivalently when checking conservation for atmosphere to ocean fluxes, the atmosphere fields on coastal points must be multiplied by the land fraction.
The technical details of model set-up are dependent on the machine architecture being used, but typically when running with several hundred processors for both atmosphere and ocean components (e.g. on the IBM Power7 machine), the "pseudo-parallel" capability of OASIS3 is used such that the various coupling fields are typically distributed between eight OASIS3 processes in order to reduce elapsed time for coupling exchanges.This has been shown to provide satisfactory performance without the coupling being a significant overhead on model run time.Given that atmosphere and ocean in GC2 run concurrently it is necessary though to ensure that the model is well "load-balanced" to minimize time when processors are standing idle.On 36 nodes of the Met Office IBM Power7 machine, HadGEM3-GC2 at N216-ORCA025 achieves 1.87 simulated years per wall clock day.Of the 36 nodes, 17 (544 processors) are used by the atmosphere, 18.75 (600 processors) by the ocean and the remaining 8 processors by OASIS.
CLIM is a 100-year free-running simulation with forcings set to use values from the year 2000 (this is the same as experiment 2 in the Coupled Model Intercomparison Project 3, CMIP3).Where appropriate (e.g. for aerosol emissions), these forcings vary through the annual cycle.The ocean is initialized from EN3 climatology (Ingleby and Huddleston, 2007).The top-of-atmosphere radiative imbalance in a parallel atmosphere-only simulation is 0.8 W m −2 , consistent with using present-day forcings, hence a small drift due to the net energy flux would be expected.Average results from the final 50 years of the simulation are shown unless otherwise stated.For those variability diagnostics using high temporal resolution (e.g.daily) data, the final 20 years of the simulation are used.The largest global-mean ocean temperature drift over the 100-year simulation occurs at a depth of 563 m with a rate of 0.11 K decade −1 .Over the final 50 years this reduces to 0.08 K decade −1 .Below 1000 m the average drift is less than 0.02 K decade −1 at all depths.
Results presented from SEAS are a mean of seasonal hindcasts for the years 1996-2009, each of 140 days in length.Within each year, there are three DJF hindcasts initialized on 25 October, 1 November, 9 November and three JJA hindcasts initialized on 25 April, 1 May, 9 May and each start date has a three-member initial condition ensemble, resulting in 120 hindcasts being averaged for each of DJF and JJA.The ocean and sea-ice are initialized from Met Office Ocean Forecast analyses, the atmosphere from ECMWF analyses and soil moisture from a climatology of the land surface model used within GC2 forced with ECMWF analyses.More details on the initialization can be found in MacLachlan et al. (2015).
The NWP experiment comprises 15-day hindcasts, run daily at 12:00 UTC for the period 2-14 December 2011.The atmosphere and land surface are initialized from Met Office NWP analyses, and ocean from Met Office Ocean Forecast analyses.Both NWP and SEAS use prescribed aerosol concentrations from a HadGEM2-AO AMIP (Atmosphere Model Intercomparison Project) simulation, but with direct and indirect effects being calculated interactively as for CLIM (Walters et al., 2015).

GC2 mean biases
HadGEM2-AO is characterized by a cold SST bias over much of world, especially in the North Atlantic, with a slight warm bias over the Southern Ocean and Southern Hemisphere stratocumulus regions (Fig. 1).The change to the new NEMO ocean model and higher ocean resolution has resulted in GC2 SSTs being generally warmer, which is beneficial over most regions, but detrimental over the Southern Ocean.
A considerable amount of work is ongoing to investigate the Southern Ocean warm bias in the Met Office model (e.g.Bodas-Salcedo et al., 2012).To first order, the surface flux biases are similar in AMIP simulations parallel to HadGEM2-AO and HadGEM3-GC2 (Fig. 2a), both having a large downwards surface flux bias over the Southern Ocean which is only slightly worse in HadGEM3-GC2.However, the coupled SST (and upper ocean heat content) biases are much larger in HadGEM3-GC2 than HadGEM2-AO.This appears to be related to changes to both the lateral and vertical ocean heat transports associated with the change in ocean model and ocean resolution.The HadGEM3-GC2 errors also include a contribution associated with too shallow Southern Ocean summer mixed layers.A detailed analyses of this problem is currently underway which will be documented separately, although it is believed that the primary problem is the atmospheric heat flux biases (e.g.Trenberth and Fasullo, 2010;Williams et al., 2013;Bodas-Salcedo et al., 2014).In GC2 both excess downward SW flux and too little upward heat transport from turbulent fluxes are thought to contribute to the net heat flux bias and these are the focus of our efforts to improve future configurations.
The warm bias over the Southern Ocean can be seen early in NWP simulations for the austral summer (Fig. 3).Hindcasts for the austral winter with an earlier configuration did not show such a rapid warming (not shown), providing further evidence that fast atmosphere processes are contributing, and that biases in the SW flux are likely to be significant.
The increase in SSTs in HadGEM3-GC2 is particularly notable over the North Atlantic to the south of Greenland where, in common with many climate models, HadGEM2-AO has a very large cold bias (Fig. 1).Here, the higher horizontal resolution of the ocean model leads to a significantly improved Gulf stream extension, accurately reproducing the northward turn around Newfoundland.Scaife et al. (2011) have shown the importance of this SST improvement for European climate variability.On seasonal timescales, these rel- atively small SST biases in the North Atlantic have been further improved between GloSea5-GA3 and GloSea5-GC2 by introducing aerosol indirect effects from the aerosol climatological concentrations, consistent with climate model simulations, rather than using fixed droplet concentrations for land and sea.As a result, the JJA hindcasts in particular have improved to now match the observed seasonal cycle very well (Fig. 4).
The cold SST bias in HadGEM2-AO impacted the atmosphere with DJF biases of over 6 K in the boundary layer at high latitudes, but also biases of 2 K extending through much of the troposphere in the Northern Hemisphere (Fig. 5).The improved SSTs in HadGEM3-GC2 mean that these biases have reduced, although the troposphere remains slightly cool.An exception is over the Southern Ocean where the warm bias has increased.HadGEM2-AO has a warm stratosphere with a 5 K temperature bias in the tropics at 70 hPa.This warm bias is of concern when developing ESMs with interactive chemistry as these processes are particularly sensitive to the stratospheric temperature and humidity.The stratospheric specific humidity is largely determined by the cold point temperature (Brewer, 1949), hence minimizing this bias is particularly important.It can be seen that this  tropical tropopause warm bias has been considerably improved in HadGEM3-GC2 through a combination of many parametrization changes, changes to the dynamics and increased vertical resolution.
In common with many climate models (e.g.Klein et al., 2006;Ma et al., 2014), HadGEM2-AO develops a warm bias over mid-latitude continents in summer (Fig. 6).This bias is over 6 K in atmosphere-only simulations, but is mitigated in coupled simulations through the cold Northern Hemisphere SST bias in HadGEM2-AO.The summer warm bias is reduced in HadGEM3-GC2 through developments to the land  surface (Martin et al., 2010) and parametrization improvements (including increasing the frequency of radiation calls from 3 hourly to hourly).Changes elsewhere are generally small, although a cold bias is now starting to develop at high latitudes, consistent with the troposphere remaining a little cold.Overall, the area-weighted root mean square (RMS) error for the field is reduced from 2.02 to 1.55.
The accurate simulation of sea-ice extent and thickness in the present-day climate is of importance for the estimation of climate sensitivity (Hall and Qu, 2006), projections of when the Arctic will be ice-free in summer, and seasonal forecasts of sea-ice extent (e.g. to inform the use of Arctic routes by shipping).For GC2, the sea-ice parameters in the model were tuned within the range of observational uncer- tainty (Rae et al., 2015), and the sea-ice simulation generally does a reasonable job of simulating the annual cycles of Arctic sea-ice extent and volume (Fig. 7).The Arctic seaice volume simulation is most accurate at N216 resolution, where throughout the year the ice is around 20 % thicker than at N96, on a spatial average.The warm SST bias over the Southern Ocean results in there being far too little Antarctic sea-ice, which further exacerbates the bias.
The mean value of the Atlantic meridional overturning circulation (MOC) at 26 • N in HadGEM3-GC2 (over years 11-50) is 16.4 Sv.This is close to the observed value from the RAPID array, particularly since a downward trend has been observed since 2004 (average observed value 2004-2012 is 17.5 Sv; Smeed et al., 2013), and is an improvement over HadGEM2-AO in which the MOC was around 15.1 Sv.However, the depth of the North Atlantic Deep Water return flow remains too shallow.This is a common bias in z-level models which tend not to simulate overflows well as there is excess entrainment.The maximum ocean heat transport remains similar to HadGEM2-AO, being below 1 PW and therefore low compared with observational estimates.
Turning to precipitation, the general structure of the mean biases remains similar to HadGEM2-AO, with a southward displaced Inter-Tropical Convergence Zone (ITCZ) over the Atlantic and Indian oceans (Fig. 8).This is consistent with the asymmetry in the SST bias (Kang et al., 2008), and results in lack of precipitation in the summer monsoon systems over West Africa and India.However, there are other processes contributing to reduced precipitation over West Africa and India since a dry bias exists here in AMIP simulations whereas the southward displacement of the ITCZ over the oceans does not.Unlike a number of CMIP5 models (Taylor et al., 2012), there is no pronounced split ITCZ in either HadGEM2-AO or HadGEM3-GC2, although a significant wet bias exists on the north side of the warm pool, Pacific ITCZ and South Pacific Convergence Zone (SPCZ).Whilst the geographical pattern is similar, mid-latitude precipitation biases are slightly reduced in HadGEM3-GC2, particularly the dry bias over the northern North Atlantic.The RMS error is slightly increased (from 1.68 to 1.76), primarily due to an increased mean bias over the tropical West Pacific and East Indian Ocean.
The accurate simulation of clouds is a particular strength of HadGEM2-AO (e.g.Klein et al., 2013;Jiang et al., 2012;Nam et al., 2012).GC2 includes a new prognostic cloud scheme (PC2, Wilson et al., 2008) which gives similar or even slightly improved cloud properties (amount, height and albedo) for optically thicker clouds, including a good simulation of cloud in marine stratocumulus regions which are of particular importance for climate sensitivity (e.g.Bony et al., 2006) (not shown).The main difference in the cloud simulation between the two models is for optically thin cirrus.The Smith (1990) scheme used in HadGEM2-AO had an implicit coupling between cloud fraction and optical depth, preventing high fractional coverage of very thin cirrus.This coupling does not exist in PC2 and consequently HadGEM3-GC2 has almost double the amount of cirrus, much of which is sub-visual.CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations) is a space-borne cloud lidar and is particularly suited to detecting optically thin "subvisual" cirrus, in addition to thicker cirrus which can be detected with passive instruments (Winker et al., 2010;Chepfer et al., 2010).A comparison of the models with CALIPSO using the CFMIP (Cloud Feedback Model Intercomparison Project) Observational Simulator Package (COSP) (Bodas-Salcedo et al., 2011;Chepfer et al., 2008) suggest that the true amount of cirrus should be between what is simulated by the two models (Fig. 9).HadGEM2-AO also has a bias for the tropical cirrus to be too low in altitude.The altitude of the peak amount of cirrus has been improved slightly in HadGEM3-GC2, but still remains lower than observed.

Mid-latitude variability
One of the most recent changes in the development of GC2 was the inclusion of a significant revision to the atmosphere dynamical core -ENDGame (Even Newer Dynamics for Global Atmospheric Modelling of the Environment, Wood et al., 2014).Walters et al. (2015) discuss how one of the main aims of this change was to improve the accuracy of the semi-Lagrangian dynamics, so less implicit smoothing is used, with the effect of increasing synoptic variability.A weakening of synoptic variability, measured through the intensity of tracked extra-tropical cyclones, has been shown to be a general problem in many NWP models (Froude, 2010) and climate models in CMIP5 (Zappa et al., 2013).Walters et al. (2015) illustrate the improvement in cyclone intensities in GA6 on NWP timescales, and this carries through to the GC2 CLIM experiment (Fig. 10).In HadGEM2-AO there is a general negative bias in cyclone intensity (as measured by 850 hPa relative vorticity) in the storm tracks in both www.geosci-model-dev.net/88/1509/2015/hemispheres, especially on the equatorward side of the storm tracks.This bias is largely eliminated in HadGEM3-GC2, although there remains a slight deficit in intensity at the end of the Atlantic storm track and the eastern hemisphere of the Southern Ocean storm track.
The winter jet over the North Atlantic follows a tri-modal structure (Fig. 11;Woollings et al., 2010).These jet positions have been shown to have some correspondence to primary blocking locations with the southernmost jet position being associated with Greenland blocking, the central position with no blocking and the northernmost position being correlated with European blocking (although some studies suggest European blocking can exist as a separate regime) (Cassou et al., 2004;Woollings et al., 2010;Davini et al., 2014).GC2 reproduces the tri-modal structure well although there is a slight tendency for the jet to occupy the northernmost position too frequently (Fig. 11).At other times of the year, the jet is more uni-modal, which GC2 captures, although again there tends to be a slight northward displacement (not shown).These results are robust in that a very similar structure is seen in different atmosphere resolutions, including when the model is run at the lower atmospheric resolution of N96.The tendency to have a more favoured northward position of the jet is in contrast to most CMIP3 and CMIP5 models which either do not capture the tri-modal structure at all, or tend to have the southern jet location simulated too frequently (Hannachi et al., 2013;Anstey et al., 2012).Davini et al. (2014) indicate that a North Atlantic cold bias, resulting from a weak and displaced North Atlantic Drift, can result in the simulated jet favouring the southern position.The improved SSTs in HadGEM3-GC2 relative to other models may account for the more frequent simulation of the northern location.Despite the good distribution of jet latitudes, there is still around a 25 % deficit in European blocking in HadGEM3-GC2 (defined using a 2-D "wave-breaking" index based on Tibaldi and Molteni, 1990; not shown), which is again insensitive to horizontal resolution and is the subject of ongoing research.
The North Atlantic Oscillation (NAO) is a leading mode of variability affecting Europe.Seasonal forecasts of the winter NAO are now starting to show skill (Scaife et al., 2014), hence an accurate simulation of the NAO in a proposed replacement model is important.Table 1 illustrates a further improvement in the pattern correlation and slight improvement in variability of the winter NAO in GC2 relative to the model currently used for seasonal forecasts.Some studies have suggested a link between the Quasi-Biennial Oscilla-Table 1. Spatial correlation of the NAO pattern (the leading empirical orthogonal function of the winter mean sea-level pressure fields) and interannual variability in the CLIM experiment compared with NCEP (National Centers for Environmental Prediction) re-analyses (Kalnay et al., 1996).NCEP re-analyses are used here rather than ERA-I (ECMWF Interim Re-analyses; Dee et al., 2011) which are used throughout the rest of the paper since a longer record is needed to estimate the observed variability.(c, d).White contours show significance at the 90 % level using a two-sided t-test.
tion (QBO) and the winter NAO with a more positive NAO during westerly QBO events (Holton and Tan, 1980;Anstey and Shepherd, 2014).Figure 12 shows a composite of westerly minus easterly QBO events for the Arctic stratospheric vortex and pressure at mean sea-level (PMSL) in the GC2 coupled control simulation.It can be seen that this relationship does exist in GC2, albeit somewhat weaker than observed and with some displacement of the PMSL pattern in the Euro-Atlantic sector.The simulation of this teleconnection is noteworthy given the potential implications for seasonal predictability, and is a relationship which is not present in all CMIP5 models.

Tropical variability
A significant improvement in GA6 relative to earlier configurations is in the simulation of tropical cyclones (Walters et al., 2015).Both the change to the ENDGame dynamical core and convection parametrization changes have contributed, resulting in more intense tropical cyclones and improved tracks.At N216 resolution (around 100 km in the tropics), GC2 is now able to simulate tropical cyclone central pressures which would be expected from category 3 storms (Fig. 13).However, the 10 m wind speed associated with these systems remains below category 1, suggesting that the storms are too large.It might be expected that increased horizontal resolution would improve the pressure-wind-speed relationship, however even increasing the resolution to N1024 (around 20 km in the tropics) in an atmosphere-only simulation only partially improves the bias (Fig. 13) (Roberts et al., 2015).Seasonal forecasts of landfalling Atlantic hurricanes are an emerging product which relies on the accurate tropical cyclone track distributions in the basin.Figure 14 shows the track densities from the SEAS experiment for the existing seasonal forecast system and GC2.In the Atlantic basin, GloSea5-GC2 has more storms than its predecessor (GloSea5-GA3) and has a better distribution of tracks with more early recurvature into the central North Atlantic, more reaching the Caribbean and a broad peak making landfall on the US coast.The West Pacific has slightly too many storms and there is a lack of a clear break in storms in the central Pacific.
There have been a large number of changes to the convection parametrization between HadGEM2-AO and HadGEM3-GC2.Out of these, increases to the entrainment and detrainment rates have been primarily responsible for an improved simulation of the Madden-Julian Oscillation (MJO), although the amplitude of the systems remains significantly weaker than observed.Model performance is measured using simplified metrics proposed by the international MJO Task Force (which is under the WMO Working Group on Numerical Experimentation, WGNE) (Kim et al., 2009).One simple measure of the MJO is based on the spacetime power spectrum of equatorial rainfall.The ratio of eastward to westward power (E / W ratio) at MJO time and space scales (zonal wavenumbers 1-3 and periods of 30-60 days) reveals the prominence of the eastward propagating intraseasonal variability relative to its westward counterpart and is a useful indicator of how prominent the MJO is relative to the background variability (Kim et al., 2014).On this metric, HadGEM3-GC2 has improved relative to HadGEM2-AO, although it remains below the observational range (Table 2).Another measure of MJO fidelity is R max , proposed by Sperber and Kim (2012), which is the maximum correlation between the two time series obtained by projecting model outgoing long-wave radiation (OLR) anomalies onto the leading pair of empirical orthogonal functions (EOFs) of observed OLR that capture the MJO.The MJO is deemed well simulated if the correlation between the two leading principal components (PCs) is strong at a lead time of about 10-15 days, thereby demonstrating coherent eastward propagation with appropriate spatiotemporal structure.Again, HadGEM3-GC2 performs better than HadGEM2-AO (Table 2).
Previous studies have suggested that atmosphere-ocean coupling is important for the propagation of the MJO (Klingaman and Woolnough, 2014).Relatively clear skies ahead of the MJO result in higher SSTs which encourage propagation of the MJO which in turn cools the SSTs due to the high cloud amounts and precipitation as the system passes.This is seen in the NWP experiment using a configuration similar to GC2 (Shelly et al., 2014).The coupled model maintains the observed lag of about 5 days of the outgoing long-wave radiation (OLR) anomaly behind the maximum SSTs in the coupled simulation, whereas the convection moves over the maximum SSTs within the first few days in parallel atmosphere-only simulations and then remains static since the SSTs are not being updated with the cooling effect from the cloud.This is one example of why it is desirable to move to coupled weather forecast models for even relatively short-range predictions.Similar coupled model feedbacks might be expected to impact forecast tropical cyclone intensities (Schade and Emanuel, 1999).
A reliable simulation of the El Niño Southern Oscillation (ENSO) is important for seasonal prediction and climate projections alike since it forms a leading mode of global variability and the major source of seasonal predictability.For many years, Met Office climate models have suffered from excess equatorial easterly wind stress.Improvement in this was a focus of HadGEM2-AO development and it has been further improved in GC2 with a contribution from a number of the science changes, most notably a change to the gravity wave drag scheme which reduces the coupling between the low-level flow-blocking drag and gravity wave drag following Vosper et al. (2009) (Fig. 15b).As a result of the improved windstress, improved MJO (which can be the source of westerly wind bursts -e.g.Lengaigne et al., 2004) and higher horizontal resolution of the ocean, ENSO is well simulated in HadGEM3-GC2 with a good spatial pattern (Fig. 15c and d).When assessed against a range of metrics (Table 3) we see that variability in SST agrees well with observations in the central East Pacific although somewhat weaker than observed near the dateline.A power spectrum analysis shows that the frequency lies within the observed range (3 to 7 years), with no dominant short (e.g. 2 year) or longer period peaks.The model seasonality is good, with maximum (minimum) variability in boreal winter (spring).The standard deviation of precipitation in the central Pacific gives a measure of model capability for regional climate impacts and although slightly underestimated, is good in comparison with other climate models which tend to underestimate this quantity.Overall, HadGEM3-CG2 compares favourably with a range of CMIP5 models (Bellenger et al., 2014).The main observed ENSO teleconnections to remote precipitation anomalies (S.America, Sahel, India, E. Africa, etc.) are also present in the model (not shown).
Africa is a region where accurate predictions and projections of rainfall are particularly important for those living there and model simulations have generally been poor (Flato et al., 2013).Teleconnections from remote SST anomalies are primarily responsible for the large interannual variability of seasonal mean rainfall over many areas of the continent.Rowell (2013) investigated the ability of CMIP models to accurately represent teleconnections from remote SST anomalies to African rainfall.Figure 16

Summary
In this paper we have presented the performance of the GC2 configuration of the Met Office Unified Model in terms of its systematic errors.The focus has been on seasonal and climate timescales since these are the timescales on which the model is to be used operationally for predictions/projections, and GC2 has been compared with models currently used in those systems.The results presented here should be considered alongside the atmosphere-only results in Walters et al. (2015) and ocean/sea-ice results in Megann et al. (2014) and Rae et al. (2015).
Overall, GC2 provides a significant improvement in mean bias and variability over the coupled configurations currently used, with temperature biases in most regions, simulation of atmospheric regimes over the North Atlantic, simulation of tropical cyclones and ENSO being particularly notable.However, there are a number of systematic errors requiring further work, the highest priority being the Southern Ocean warm SST bias and low levels of rainfall over India and West Africa during the summer monsoons.Consequently, caution is required when considering predictions/projections from GC2 in these regions.
Climate change simulations using HadGEM3-GC2 are already in progress and will be reported by Senior et al. (2015)   W, 5 • N-5 • S) (K), M3 is the ratio of power in the 3-7 year range relative to 0 to 10 years for monthly Niño3 SST anomaly (%), M4 is a seasonality metric defined as the ratio of November to January and March to May standard deviation of Niño3 SST anomaly (Bellenger et al., 2014), M5 is the standard deviation of precipitation anomaly for Niño4 (mm day −1 ).CLIM is the final 50 years of experiment CLIM, CLIM2 is the final 100 years of a 150-year experiment equivalent to CLIM differing only in the initial conditions.SST observations are HadISST (1901HadISST ( -2000) ) and precipitation is GPCP (1979GPCP ( -2013)).whilst GloSea5-GC2 is being used operationally for Met Office seasonal forecasts since 3 February 2015.Consistent with the annual development discussed in the Introduction, work is already underway to further develop GC2 to form GC3. It is envisaged that GC3 will subsequently have Earth system components built on top of it to form UKESM1 (United Kingdom Earth System Model version 1), which will be the UK's submission to CMIP6.Hence, the assessment presented here provides an initial picture of the model performance which will help inform the next stages of UKESM1 development.

Figure 2 .
Figure 2. (a) Zonal mean net downward surface flux bias in AMIP simulations.The observed surface flux is a developmental version of the University of Reading surface flux product which combines satellite-based radiative fluxes (Allan et al., 2014) with re-analysis estimates of atmospheric column energy storage and horizontal divergences (Liu et al., 2015).(b) Zonal mean SST bias against EN3 for years 5-15 of the coupled model CLIM experiment (before any biases from the deep ocean influence the SSTs).

Figure 3 .
Figure 3. Mean day 3 and day 15 SST bias (K) against analyses in the NWP experiment.

Figure 9 .
Figure 9. Tropical-mean (20 • N-20 • S) vertical profile of cloud frequency in atmosphere-only (AMIP) simulations of HadGEM2-AO and HadGEM3-GC2 using the CALIPSO simulator from COSP.The observed profile from CALIPSO is shown in black.

Figure 10 .
Figure 10.Bias in winter tracked cyclone intensity, as measured by 850 hPa relative vorticity, relative to ERA-I.Tracking is on the final 20 years of the respective model simulations and for the period 1988-2008 for ERA-I using TRACK (Hodges, 1995).Contours show ERA-I values of relative vorticity at 10 −5 s −1 and the colours indicate the bias on the same scale (contours at 10 −5 s −1 intervals) for the CLIM experiment.

Figure 11 .
Figure 11.Normalized frequency distribution of DJF jet latitude defined as the maximum 850 hPa wind over the North Atlantic 60 • W-0 • E (following Woollings et al., 2010) for ERA-I 1979-2012 and the final 20 years of HadGEM3-GC2 CLIM experiment.
is a reproduction of Fig. 10 from Rowell (2013) but with HadGEM3-GC2 added.It shows that HadGEM3-GC2 has the joint highest proportion of the teleconnections accurately simulated compared with CMIP3 and CMIP5 models previously analysed, sup-porting the use of GC2 for seasonal predictions and climate change projections over the region.

Figure 16 .
Figure16.Proportion of teleconnections to Africa in each of the five skill categories (details inRowell, 2013).Green: at least reasonable model skill; yellow: marginal skill; pale brown: moderate and significant difference between model and observed teleconnection strength, dark brown and red: poor or very poor skill; and white: SST-rainfall associations of little practical interest.CMIP3 and CMIP5 models, together with HadGEM3-GC2 CLIM experiment, are ranked by the number of teleconnections that do not differ significantly from those observed at the 10 % level.