Development and evaluation of a high-resolution reanalysis of the East Australian Current region using the Regional Ocean Modelling System (ROMS 3.4) and Incremental Strong-Constraint 4-Dimensional Variational data assimilation (IS4D-Var)

As with other western boundary currents globally, the East Australian Current (EAC) is inherently dynamic making it a challenge to model and predict. For the EAC region, we combine a high-resolution state-of-the-art numerical ocean model with a variety of traditional and newly available observations using an advanced variational data assimilation scheme. The numerical model is configured using the Regional Ocean Modelling System (ROMS 3.4) and takes boundary forcing from the 5 BlueLink ReANalysis (BRAN3). For the data assimilation we use an Incremental Strong-Constraint 4-Dimensional Variational (IS4D-Var) scheme. This paper describes the data assimilative model configuration that achieves an optimised minimisation of the difference between the modelled solution and the observations to give a dynamically-consistent ‘best-estimate’ of the ocean state over a 2-year period. The reanalysis is shown to represent both assimilated and non-assimilated observations well. It achieves mean spatially-averaged RMS residuals with the observations of 7cm for SSH and 0.4◦ C for SST over the assimilation 10 period. The time-mean RMS residual for subsurface temperature measured by Argo floats is a maximum of 1◦ C between water depths of 100-300m and smaller throughout the rest of the water column. Velocities at several offshore and continental shelf moorings are well represented in the reanalysis with complex correlations between 0.8-1 for all observations in the upper 500m. Surface radial velocities from a high-frequency radar array are assimilated and the reanalysis provides surface velocity estimates with complex correlations with observed velocities of 0.8-1 across the radar footprint. Comparison with independent 15 (non-assimilated) shipboard CTD cast observations shows a marked improvement in the representation of the subsurface ocean in the reanalysis, with the RMS residual in potential density reduced to about half of the residual with the free-running model in the upper eddy-influenced part of the water column. This shows that information is successfully propagated from observed variables to unobserved regions as the assimilation system uses the model dynamics to determine covariance, such that the ocean state better fits and is in balance with the observations. This is the first study to generate a reanalysis of the region at 20 such a high resolution, making use of an unprecedented observational data set and using an assimilation method that uses the time-evolving model physics to adjust the model in a dynamically consistent way. As such, the reanalysis potentially represents 1 Geosci. Model Dev. Discuss., doi:10.5194/gmd-2016-44, 2016 Manuscript under review for journal Geosci. Model Dev. Published: 3 March 2016 c © Author(s) 2016. CC-BY 3.0 License.

(non-assimilated) shipboard CTD cast observations shows a marked improvement in the representation of the subsurface ocean in the reanalysis, with the RMS residual in potential density reduced to about half of the residual with the free-running model in the upper eddy-influenced part of the water column. This shows that information is successfully propagated from observed variables to unobserved regions as the assimilation system uses the model dynamics to determine covariance, such that the ocean state better fits and is in balance with the observations. This is the first study to generate a reanalysis of the region at 20 such a high resolution, making use of an unprecedented observational data set and using an assimilation method that uses the time-evolving model physics to adjust the model in a dynamically consistent way. As such, the reanalysis potentially represents

Introduction
The East Australian Current (EAC) is the Western Boundary Current (WBC) of the South Pacific subtropical gyre, flowing poleward along the east coast of Australia. The EAC has the weakest mean flow of the WBCs associated with the subtropical 5 gyres (Mata et al., 2000) but its flow is characterised by high eddy variability (Mata et al., 2006) comparable with stronger WBCs such as the Gulf Stream, the Kuroshio and the Agulhas Current (e.g. Gordon et al. (1983); Feron (1995)). The EAC forms in the South Coral Sea (15)(16)(17)(18)(19)(20)(21)(22)(23)(24) • S) and intensifies as it flows along the coast of southeast Queensland and northern New South Wales (NSW) (22-35 • S) -refer to Figure 1 for geographical location. The current strengthens dramatically as the continental shelf narrows to 15 km at its narrowest point (31 • S) and typically separates from the coast between 31-33 • S 10 (Cetina Heredia et al., 2014). Turning eastward to form the Tasman Front, the current sheds large warm-and cold-core eddies in the Tasman Sea every 90-110 days (Oke and Middleton, 2000;Cetina Heredia et al., 2014). The high eddy variability makes the EAC a challenging system to observe and predict.
In general, the kinetic energy of the ocean is dominated by mesoscale eddies that fluctuate on time scales of days to months and on spatial scales of tens to 100s of kilometers and exceeds the mean flow by an order of magnitude or more (Stammer, 1997; 15 Ferrari and Wunsch, 2009). Eddies are typically generated due to barotropic or baroclinic instabilities which are unpredictable so effective state-estimation and prediction of the mesoscale circulation requires data assimilation techniques that combine ocean observations with a dynamical model. Much of the effort towards data assimilative modelling of the EAC region has been as part of the development Australia's Bluelink ocean data assimilation system (BODAS) (Oke et al., 2005(Oke et al., , 2008(Oke et al., , 2013 which uses an Ensemble Optimal Interpolation (EnOI) based scheme. The BODAS has been a useful and relatively efficient 20 method to provide ocean state estimates of the Australian region. EnOI uses long-run statistics to generate the covariance of model points and assimilates observations at a single time to generate adjusted initial conditions for each assimilation window.
In this work we use 4-dimensional variational data assimilation (4D-Var), which adjusts the model initial conditions, boundary and surface forcings such that the difference between the model solution of the time-evolving flow and all available observations is minimised over an assimilation interval. A major advantage of the 4D-Var scheme is that it uses the linearised model 25 equations and their adjoint to compute covariances, so the model is adjusted in a dynamically consistent way to minimise the difference between the observations and the modelled time-evolving ocean state. Using the linearised equations allows dynamical connections between state variables to propagate information from observed variables to unobserved, dynamically-linked variables without requiring ensemble or long-run statistics. The state estimate is a solution of the model equations, and the minimistaion process can be used to understand the sensitivity of the ocean circulation (e.g. Moore et al. (2009); Powell et al. 30 (2012)). Zavala-Garay et al. (2012) used 4D-Var with ROMS to assimilate Sea Surface Height (SSH), Sea Surface Temperature (SST) and Expendable Bathythermograph (XBT) observations into a coarse resolution (18-30km) model of the EAC region, and use an empirical relationship between surface and subsurface properties to help propagate the dominant surface observations to the subsurface and improve their subsurface estimates.
Combining a state-of-the-art numerical ocean model with a variety of traditional and newly available observations, we generate a high resolution ocean state estimate of the EAC region over a 2-year period (Jan 2012-Dec 2013. This paper describes the development and evaluation of the data assimilative model configuration. We begin by configuring a numerical 5 model of the EAC region that is capable of representing the mean ocean circulation and its eddy variability. The model is configured to resolve the continental shelf which is 15km wide at its narrowest point and may be important in accelerating the EAC and driving the current's separation (Oke and Middleton, 2000). In order to correctly represent the spatial and temporal evolution of the eddy field, we need to constrain the model with observations. We configure a 4D-Var data assimilation system that reduces the difference between the model solution and observations, given prior assumptions of the uncertainties in the 10 observations and the model background state. In addition to the traditional data streams (satellite derived SSH and SST, Argo profiling floats and XBT lines) we exploit newly available observations that were collected as part of Australia's Integrated Marine Observing System (IMOS, www.imos.org.au). These include velocity and hydrographic observations from a deepwater mooring array and several moorings on the continental shelf, High-Frequency radar observations, and ocean gliders.
We show that the assimilation configuration developed in this work results in near-optimal minimisation of the differences 15 between the modelled solution and the observations. As such, the reanalysis provides us with a 'best estimate' of the ocean state that is dynamically consistent within each assimilation time window. The reanalysis is being used to study the variability and separation dynamics of the EAC. Furthermore, the 4D-Var method allows us to use the reanalysis to quantify the impact of particular data streams on circulation estimates, which has the potential to provide important information for assessing and improving the observing system design. The product is also being used as boundary forcing for a variety of downscaling studies 20 in coastal southeastern Australia. This data assimilative model represents a significant improvement on previous modelling work in the EAC for these purposes e.g. Roughan et al. (2003) which was based on climatology, Macdonald et al. (2013Macdonald et al. ( , 2014 which focused on process studies of warm core and cold core eddies, and Zavala-Garay et al. (2012) which used 4D-Var with a much coarser resolution.
The reanalysis development and evaluation is presented as follows. In Section 2, we describe the numerical model configu-25 ration, including validation of a 10-year free-running simulation to provide confidence that the model is correctly representing the region's circulation dynamics. In Section 3, we describe the development of the reanalysis, including the data assimilation scheme used, the assimilation configuration and the observations. The reanalysis performance is evaluated in Section 4 using a variety of metrics to illustrate the system's skill. A summary and conclusions are presented in Section 5.

Model Configuration
We use the Regional Ocean Modeling System (ROMS, version 3.4) to simulate the eddying general circulation in the southeastern Australia oceanic region. ROMS is a free-surface, hydrostatic, primitive equation ocean model solved on a curvilinear grid with a terrain-following vertical coordinate system (Shchepetkin and McWilliams, 2005). For computational efficiency, ROMS uses a split-explicit time-stepping scheme allowing the barotropic solution to be computed at a much smaller time step than is used for the (slow-mode) baroclinic equations using a temporal averaging filter to ensure preservation of tracers and momentum and minimise aliasing of unresolved barotropic signals into the baroclinic motions (Shchepetkin and McWilliams, 2005). The ROMS computational kernel is further described in McWilliams (1998, 2003). Sub-grid scale horizontal mixing of momentum and tracers uses a harmonic (3-point stencil) mixing operator (Haidvogel and Beckmann, 1999), and the viscosity is derived from the horizontal divergence of the deviatory stress tensor (Wajsowicz, 1993).
The diffusion and viscosity coefficients are scaled by grid-size such that less explicit diffusion occurs in the high-resolution region than in the lower resolution region. The Mellor and Yamada (1982) level-2.5, second-moment turbulence closure scheme (MY2.5) is used in parameterising vertical turbulent mixing of momentum and tracers. 10 The model domain (shown in Figure 1) extends from Fraser Island in the north (25.25 • S) to below the NSW/Victoria border in the south (41.55 • S) and nearly 1000 km offshore. The northern boundary is chosen at a latitude where the EAC remains fairly coherent and is upstream of the region of elevated eddy variability (refer to Figure 2a). The grid is rotated 20 • clockwise such that it is orientated predominantly along-shore in the y-dimension and cross-shore in the x-dimension.
The model has a variable horizontal resolution in the cross-shore direction, with 2.5 km (1/44 • ) over the continental shelf 15 and slope that gradually increases to 6 km (1/18 • ) in the open ocean. The horizontal resolution is 5 km (1/22 • ) in the alongshore direction. The model is configured with 30 vertical s-layers distributed with a higher resolution in the upper 500m to resolve the wind driven mesoscale circulation and near the bottom for improved resolution of the bottom boundary layer. The vertical stretching scheme of Souza et al. (2014) is used, which ensures a constant-depth surface layer to better represent satellite-derived SST, better resolve the ocean surface currents, and reduce the representation error of radio-measured surface 20 currents. The bathymetry for the model was obtained from the 50m Multibeam Dataset for Australia from Geoscience Australia (Whiteway, 2009).
In models using terrain-following coordinate systems, steep topographic gradients generate numerical errors associated with the computation of the pressure gradient term resulting in artificial along-slope flows (Haney, 1991;Mellor et al., 1994). These errors depend on the topographic steepness and the intensity of the stratification (Haidvogel et al., 2000). The variable cross-25 shore resolution improves the bathymetric resolution over the continental shelf and minimises pressure gradient errors over the steep topography of the continental slope, while reducing computational expense by allowing coarser resolution in the deep ocean. ROMS is effective at minimising these horizontal pressure gradient (HPG) errors (Shchepetkin and McWilliams, 2003); although, a certain degree of topographic smoothing is usually still desirable. For this study, a smoothing method has been applied in which a high priority is placed on maintaining the width of the continental shelf and preserving the seamounts that 30 potentially play a role in steering of the EAC, while minimising HPG errors to an acceptable level. Accurate representation of the continental shelf was considered paramount as the shelf is thought to have an important influence on the EAC (e.g. Oke and Middleton (2000)).
The model uses initial conditions and boundary forcing from the BlueLink ReANalysis version 3p5 (BRAN3) (Oke et al., 2013). BRAN is a multi-year integration of the Ocean Forecasting Australian Model (OFAM) and the Bluelink Ocean Data 35 clamped in the baroclinic. Baroclinic energy that does not match the BRAN3 condition is absorbed at the boundaries using a flow relaxation scheme involving a sponge layer over which viscosity and diffusivity are increased linearly by a factor of 10 from the values applied within the model domain for the northern and eastern boundaries, and a factor of 20 for the southern boundary. The size of the sponge layer is 12 grid cells (approximately 60km). Because the BRAN3 system is run with different atmospheric forcing than we use, a correction was applied to the surface heat flux forcing such that the SST from BRAN3 is in 10 balance with the atmospheric surface forcing for each month.
We begin by configuring a 10-year free-running simulation (hereafter referred to as the '10yr free run') to ensure that the model is capable of representing the mean ocean circulation and its variability. The 10yr free run is also used to provide estimates of background variability to compute background error covariances for the assimilation scheme, and the 10-year mean Sea Surface Height (SSH) field is used for addition of Sea Level Anomaly (SLA) observations for assimilation into the 15 model. For the 10yr free run we use atmospheric forcing from the National Center for Environmental Prediction's (NCEP) reanalysis atmospheric model (Kistler et al., 2001). The atmospheric forcing fields are specified every 6 hours and used to compute the surface wind stress and surface net heat and freshwater fluxes using the bulk flux parameterisation of Fairall et al. (1996). A higher resolution atmospheric product was available for the 2-year reanalysis period. Atmospheric forcing for the 2-year model used to develop the reanalysis is provided by the 12km resolution Bureau of Meteorology (BOM) Australian 20 Community Climate and Earth-System Simulation (ACCESS) analysis (Puri et al., 2013) and the forcing fields are specified every 6 hours.

Consistency of Free-running Model
The 10yr free run is performed from 2002-2011 as this is the most recent period over which BRAN3 data were available at the time of model development for use as initial and boundary forcing (BRAN3 more recently became available for the reanalysis 25 period, 2012-2013). Comparison of the 10yr free run with observations provides validation of the ability of the model to represent the ocean dynamics in the region. The model reproduces well the spatial patterns of time-mean and variability of the mesoscale SSH; however, it is not expected to be in phase with the observations (e.g., the time and location of mesoscale eddies do not match). Figure 2 shows that the mesoscale SSH variability is well represented in the model compared to satellitederived SSH data from AVISO over the 10-year period. This region of elevated SSH variability is consistent with the regions 30 of enhanced eddy amplitude and rotational speed shown in Everett et al. (2012).
Mean cross-shore sections of alongshore velocity and temperature for the 10-year modelled period reveal a southward flowing EAC and the associated downward tilt of the thermocline (Figure 3). The sections shown cross the coast at the EAC Transport array, where the EAC is found to be most coherent (27.5 • S), Coffs Harbour, just upstream of the typical EAC separation zone (30.3 • S) and Sydney, downstream of the EAC separation zone (33.9 • S). The mean temperature sections compare well with the corresponding mean temperature sections from the CSIRO Atlas of Regional Seas (CARS) climatology (Ridgway et al., 2002).
Alongshore transport through the same three cross-shore sections for the full water column is computed daily and the mean, standard deviation, minimum and maximum transports are shown in Table 1. Mean transport at Coffs Harbour is greater than 5 upstream at 27.5 • S due to recirculation, as described by (Ridgeway and Hill, 2009). The EAC typically separates from the coast south of Coffs Harbour and mean transport through the Sydney cross section is approximately one third of the transport at Coffs Harbour. This is consistent with Ridgway and Godfrey (1997), who estimate that about a third of the current's transport continues southward of the separation zone. Mata et al. (2000) compute transport from a mooring array located at 30 • S, from the coast out the 154.4 • E (a similar section to our Coffs Harbour section) between September 1991 and March 1994. They find 10 a mean total transport of 22.1 Sv southward with an Root Mean Squared (RMS) variability of 30 Sv. This compares well with the transport through the Coffs Harbour section from the 10yr free run, which has a mean of 21.9 Sv poleward and a standard deviation of 31.7 Sv.
The model configuration is capable of producing the mean dynamical features of the EAC and representing the SSH variability. We aim to constrain the model with two years of observational data to examine the evolution of the EAC during this 15 period.

Configuration
The reanalysis is configured for the 2-year period of 2012-2013 because of the availability of significant observational resources during this time; in particular, an ADCP and mooring array deployed to capture the transport of the EAC (as detailed 20 in the next section). The reanalysis model uses atmospheric forcing provided by the 12km resolution BOM ACCESS analysis, which was not available to over the 10yr free run testing period described above. To ensure that the higher-resolution atmospheric forcing did not significantly alter the previous model comparison, we integrated the model for two years without assimilation (hereafter referred to as the '2yr free run') and compared the model-derived sea surface temperatures (SST) with those from the advanced very-high resolution radiometer (AVHRR) satellite data. The model and SST observations exhibit no 25 net bias over the 2yr free run period (not shown). In addition, the vertical stratification of temperature and salinity of the 2yr free run well matches observations from Argo over that period (Figure 4).

Data Assimilation Scheme
The generate the full reanalysis, we must combine this model with the observations in a way that uses the model physics to constrain the increments in initial conditions, boundary and surface forcing to generate a state estimate that better fits the 30 observations. In this regard, we are looking for the model to represent the observations, not replicate the observations. If the model is capable of representing all of the observations in time and space using the physics of the model, then we should have the most complete description of the ocean-state available. To accomplish this, we use incremental strong-constraint four-dimensional variational assimilation (IS4D-Var).
IS4D-Var is an advanced state estimation technique used to reduce the residuals between model output and observations in a way that is constrained by the dynamics of the flow. IS4D-Var uses variational calculus techniques to solve for the increments 5 in model initial conditions, boundary conditions and forcing such that the difference between the modelled solution and all available observations is minimised -in a least-squares sense -over the assimilation window. The forward integration of the nonlinear model equations, given a prior estimate of the initial conditions, boundary and surface forcings, provides an estimate of the background state (the model prior). For each assimilation window we can formulate a cost function, J, that measures normalised deviations from the observations as well as from the model prior. The cost function is a function of the increment 10 adjustment to the initial conditions, boundary conditions and forcing and has two terms. The first term represents the difference between the model and the observations, and is given by the squared difference between the observations and the model given the integration of the increment adjustment through the tangent linear model, weighted by the inverse of the observation error covariance. The second term penalises deviations from the background state, and is given by the squared difference between the new estimate and the background state, weighted by the inverse of the background error covariance. 15 The model prior is generated in the outer loop of the assimilation methodology. In order to improve the state estimate, the model initial conditions, boundary conditions and forcing are adjusted such that the quadratic cost function is minimised. The minimum is found by integrations of the tangent linear and adjoint models in the inner loops to determine the shape of the cost function, and the minimum is found where the gradient of the cost function goes to zero. The inner loops can be continued until J is reduced by a certain ratio, or, as in this study, a set number of inner loops can be completed that are found to give an 20 acceptable reduction in J. We do not find the true minimum of J, but rather an acceptable reduction.
After the final inner loop the new increment is applied and the outer loop is completed by integrating the nonlinear model.
The final outer loop provides the 'best estimate' of the ocean state (the analysis). The analysis is constrained to satisfy the nonlinear model equations (strong-constraint) and better represent the observations over the assimilation window, the length of which is limited by the time over which the tangent linear assumption remains reasonable. The analysis then provides an 25 improved estimate of the initial state for the subsequent simulation window.
In any assimilation scheme, the prescribed prior uncertainties have a significant control on the assimilation procedure. The observation error covariance matrix specifies the uncertainty in the observations and the model's ability to represent those observations so as to prevent "over-fitting" the solution to noisy observations. Defining these observational uncertainties is an important part of configuring any specific data assimilative model. Similarly, the background error covariance represents the 30 uncertainty in our initial conditions, atmospheric forcing, and boundary conditions such that larger adjustments are permitted in regions where the uncertainties are high, and vice versa. The goal is to solve for the nonlinear ocean solution that is dynamically consistent with the observations and is free within the known uncertainties in the system. The resulting circulation has both a reduced uncertainty and better represents the observations. 4D-Var implementation is well described by (Moore et al., 2011d, b, c), and it has been used successfully in ROMS applications

5
The minimisation of the cost function, J, is performed in a sequence of linear least-squares minimisations in the inner loops, and the nonlinear model trajectory is updated in the outer loops. Because we are using 4D-Var, we aim for the longest timewindow available without nonlinearities growing too large. Linearity experiments (not shown) found that for this region, the linear assumption remains acceptable for typical perturbations over 5 days, so we chose that as our window-size. Through experimentation, we found that one outer-loop, each with 14 inner-loops, gives an acceptable reduction in J for a feasible 10 computational cost.
The ROMS 4D-Var allows for controlling both the initial conditions and the time-varying atmospheric and boundary forcing.
We adjust the atmospheric forcing every 3 hours and the open boundary conditions every 12 hours. The winds are the dominant adjustment in the atmospheric forcing.
In this study, we use 5-day assimilation windows, with a one-day overlap.We overlap the assimilation windows by one-day, 15 such that the initial conditions for the subsequent assimilation window are 4 days after the start of each window. The overlap allows us to produce a blended product by smoothing across the discontinuities between adjacent assimilation windows.

Observations and Observation Prior Uncertainties
The reanalysis time period was chosen because it contains the greatest number of available observations, including an EAC transport array that was deployed from 1 Apr 2012 to 26 Aug 2013. Other available subsurface observations and satellite-20 derived surface observations are also sourced for this time period. Figure 5a shows the Argo profiling float observations, coloured by time of occurrence, and Figure 5b shows all other observations, with the exception of the satellite-derived SSH, SST and Sea Surface Salinity (SSS). The number of processed observations assimilated for each 5-day assimilation window is shown in Figure 6a, with a break-down of the provenance of the temperature observations in Figure 6b. The observations and their respective processing for assimilation into the reanalysis are detailed in the subsections below.

25
The observations used for assimilation are limited by the spatial and temporal resolution of the model and the processes resolved. If multiple observations from the same instrument exist in the same horizontal and vertical grid cell taken within the same model time-step (5 minutes), the observations are averaged and this value is assimilated within that grid cell at the appropriate time. Representational errors are assigned to each observation and provide the uncertainty in the observation.
This uncertainty is a combination of the uncertainty in the observation itself and just as significantly, the uncertainty in the year mean. The AVISO data provides a daily statistical field giving a synoptic view of the SSH. The AVISO SLA data is added to the dynamic SSH mean from the 10yr free run described above to generate sea level data for assimilation that is consistent with the ROMS model bathymetry and configuration. 10 We prescribe an observation uncertainty of 6cm. The error in the AVISO delayed-time global SLA product due to noise for the region is estimated at 2cm (CNES, 2015). We include a further 4cm of uncertainty because, in this case, the model resolves far more structure in the submesoscale than is capable in the observations. The AVISO fields provide a statistical fit to alongtrack SSH data and the observation uncertainty allows for imbalances between this statistical field and a dynamically balanced SSH field required by the model. 15 We exclude SSH observations that were taken over water depths less than 1000m. This is because the observations are noisy on the continental shelf and the AVISO gridded product is not able to resolve the submesoscale processes that occur here.

Satellite-derived Sea Surface Temperature
We use SST from the US Naval Oceanographic Office's Global Area Coverage Advanced Very High Resolution Radiometer level-2 product (NAVOCEANO's GAC AVHRR L2P SST). The product does not provide observations through clouds 20 but contains useful observations close to the coast. Data is available 2-3 times per day. A product error is specified in the NAVOOCEANO SST product (Andreu-Burillo et al., 2010), with an error for each data point of 0.38-0.4 • C. As the resolution of the data is similar to the resolution of the model, the observation uncertainty for the assimilation is set to the square of this product error.

25
Sea Surface Salinity (SSS) has been observed from space for the first time by the National Aeronautics and Space Administrations's (NASA) Aquarius satellite (www.aquarius.umaine.edu/). We make use of the Level 3 gridded salinity product which provides daily fields at a 1 degree resolution. The observation uncertainty is set to 0.4. There is a product error of around 0.2 for the Aquarius SSS data and 0.4 is chosen to account for additional uncertainty due to processes not resolved by the model. The value is considerably higher than the uncertainties specified for other in-situ salinity observations, so as to prevent the surface salinity observations from dominating the salinity cost function. Similarly to SSH, any data taken over water depths less than 1000m depth were eliminated.

Argo floats
Argo is an international program consisting of nearly 4,000 free-drifting profiling floats that measure the temperature and  The HF radar broadcasts and receives along defined angles in a phased-array setup and the surface current speed (towards and away from the radar site) is measured. The overlapping coverage from the two radar sites allows the surface current (u and v) vectors to be computed. Using the same assimilation procedure as detailed in Souza et al. (2014), we assimilate radial currents, rather than the computed current velocities. The velocities have correlated errors and using the radials allows us to make use of data when only one station is available and over areas that do not overlap with the other station's measurements. 5 We can also make use of radial data in regions where the beam intersection angle between measurements from the two stations results in high error in the velocity calculation while the radials errors are adequately low.
Radial data is available from 1 Mar 2012 to the end of the reanalysis period. The areas of HF radar coverage are shown in Figure 5b, with inset panels showing the percentage of data coverage for assimilated radials for the RRK and NNB sites in Figure 5c and 5d, respectively. Radials for each of the two stations are processed separately. At the outer range of the HF 10 radar instrument coverage, radial values become noisy. We extract only radial values with Bragg Signal to Noise ratio >10 dB.
Manual inspection of the radial values for each of the two sites was then conducted and a "good data" region was chosen for each site every day, excluding the outer regions of coverage where noisy data is observed. Only radial data within these "good data" regions is used, and absolute radial speed values greater than 2 ms −1 are excluded. This manual inspection was performed daily as the radii of reliable radial data varies significantly, and this method allows us to retain the maximum amount of data 15 for assimilation. The radial speeds and angles are spatially averaged onto the model grid and a 24-hour boxcar averaging filter is used to remove tides and inertial oscillations that are not resolved by the model. Any observations where the square-root of the radial speed error variance exceeds the radial speed magnitude are removed.
Radial data within one grid cell of the coast is also removed as unrealistically high values are observed here.

NSW Shelf Moorings
Data collected from 3 moorings located along the NSW continental shelf is used in this assimilation study. The moorings collect temperature and velocity data at high sampling frequencies and are located off of Coffs Harbour, 30 • S (CH100) and Sydney, 33.9 • S (SYD100 and SYD140). In each case the number in the mooring name represents the approximate water depth of the mooring location. Table 2 contains details of the mooring locations and the properties observed. Temperature sensors 30 are spaced every 8 meters. The data collection and quality control is described in detail in Roughan and Morris (2011).
All temperature observations taken from moorings at high sampling frequencies are low-pass filtered to remove variability at periods shorter than the inertial period (23.8 hours for Coffs Harbour and 21.5 hours for Sydney), and the observations are applied 6-hourly. For latitudes south of 30 • S, the inertial period is less than 24 hours so the filtering does not remove the diurnal signal which may arise from internal tides and/or diurnal surface heating for near surface temperature observations. The RMS residual between the mooring temperature observations low-pass filtered at 30 hours (removing variability due to baroclinic tides and inertial oscillations) and the observations filtered at the inertial frequency is very small compared to the nominal minimum uncertainties applied, confirming that these unresolved processes are accounted for in the observation uncertainty specification. Velocity observations are low-pass filtered at 30 hours to remove variability due to tides and inertial oscillations  The observation error is specified as the maximum of this nominal minimum error and the variance from averaging observations within the same model grid cell. For velocities, the high density of the ADCP depth bins means several velocity measurements are often available for a single vertical grid layer, which can result in variances that exceed the specified nominal minimum uncertainty.

20
The EAC transport array was deployed as part of the IMOS to understand the variability of the EAC, and it is comprised of five deep water moorings (EAC 1-5) which measure temperature, salinity and velocities. The array was positioned where the EAC is predicted to be most coherent and was designed to measure the mean and time-varying EAC transport (Sloyan et al., 2016).
The array is continued onto the shelf slope and shelf with two moorings (SEQ400 and SEQ200) in approximate water depths of 400m and 200m, respectively. Each mooring has a suite of instruments measuring temperature, salinity, and velocities at 25 high sampling frequencies throughout the water column. Table 2 contains details of the moorings.
All temperature and salinity observations are low-pass filtered to remove variability at periods shorter than the inertial period (26.0 hours), and the observations are applied 6-hourly. The vertical uncertainty profile used for the other off-shelf temperature and salinity observations (Figure 7) is used for the nominal minimum profile for the EAC array mooring observations. For the SEQ moorings the nominal minimum vertical uncertainty profile generated for the shelf observations was used. The 30 velocity observations are filtered and processed in the same manner as the NSW moorings, described above. For the EAC array moorings, the nominal minimum error for u and v velocity components was specified as 0.12 ms −1 in the upper 10m of the water column and 0.10 ms −1 for all depths below 10m. For the SEQ moorings, the uncertainty profile generated for the shelf observations was used. Similarly to the NSW moorings, the observation uncertainty is specified as the maximum of this nominal minimum uncertainty and the variance from averaging observations within the same model grid cell.

Ocean Gliders
Autonomous ocean gliders (both SeaGliders and Slocum) were deployed as part of the IMOS by the Australian National Glider temperature data in the upper 20m and salinity data in the upper 50m were removed. The uncertainty profiles computed for the shelf observations were used and the uncertainties for the gliders are the maximum between the variance computed from the averaging and the shelf uncertainty profile.

15
The goal of the assimilation is combine an uncertain model with uncertain observations to reduce the analysis uncertainty to be lower than the model prior. As such, specification of the prior model and observation uncertainties is important. The model uncertainty should represent the uncertainty in the state of the model for the given time. This was estimated from the average of 5-day variances from the 10yr free run described above. The variances provide an estimate of the uncertainty associated with each state variable and surface forcing field. We choose 5-day variances as the model is nested in BRAN3 which assimilates 20 large-scale data so we expect our model prior boundary and initial conditions to be accurate to within the typical changes to the ocean state that occur over 5-days. Because we have only estimates of the variances, the background error covariance of each field is estimated via a diffusion operator with specified length-scales that are assumed to be homogeneous and isotropic.
In the horizontal, the characteristic length scales chosen for the background covariance are 100km for SSH, temperature and salinity and 70km for velocities. These values were chosen based on analysis of cross-correlation of SSH and complex 25 correlation of surface velocities between points in the eddy rich Tasman Sea region from the 2yr free run. The length scale of 100km for SSH is consistent with the decorrelation scales estimated from along-track satellite data for the area by Wilkin et al. (2002) and used by Zavala-Garay et al. (2012). It is noted that shorter cross-shore length scales are likely along the coast of south-eastern Australia, as the continental shelf is narrow (15-30km) and the EAC displays a narrow jet like structure, while SSH decorrelation length scales were found to be about 100-200km in the alongshore direction by Oke and Sakov 30 (2012). Bridging the scales between mesoscale eddy variability in the Tasman Sea and smaller-scale shelf processes, and the anisotropic flow regime, is a challenge and the impact of the horizontal length scale specification requires further research.

13
Geosci. Model Dev. Discuss., doi:10.5194/gmd-2016-44, 2016 Manuscript under review for journal Geosci. Model Dev. For the vertical, semivariogram analysis of glider data on the NSW shelf by Schaeffer et al. (2015) found vertical decorrelation length scales of about 50m for both temperature and salinity on the NSW shelf. Analysis of correlations between temperature data measured by the moorings used in this study found vertical decorrelation length scales of 15-30m for the shelf moorings (NSW moorings,SEQ 200), 70m for SEQ 400 and 100-200m for the EAC deepwater array moorings (EAC 1-5). Salinity measurements were taken at SEQ200, SEQ400 and the EAC deep water array moorings and decorrelation length 5 scales were similar to the length scales for temperature at these moorings. In the vertical, we apply characteristic length scales of 50m for salinity and 10m for temperature. The shorter length scale for temperature was adopted due to the short length scale of variability for temperature near the sea surface, as SST observations dominate. The salinity length scale is set to 50m (longer than the temperature length scale) in order to limit vertical structure in the salinity analysis increments.
Analysis of correlations between velocities measured by the moorings found vertical decorrelation length scales of 20-50m 10 for the shelf moorings (NSW moorings, SEQ 200), 70m for SEQ 400 and 100-200m for the the EAC deep water array moorings (EAC 1-5). Because the deep water moorings span the core of the EAC, we reduced the decorrelation length scale value to 50m in the vertical for velocity to be ensure consistency when assimilating velocities outside of the EAC and/or on the shelf.
The covariances (between differing model variables) are computed by the tangent linear and adjoint models in the inner loops. The uncertainty in the observations are detailed below. 15

Reanalysis Evaluation
In this section, we evaluate the performance of the assimilation procedure in terms of the consistency between the posterior and prior uncertainties, comparison with the observations, and comparison to unassimilated observations. Overall, the assimilation performs well in minimising the cost function over each assimilation interval and the corresponding reanalysis provides a good match to observations. 20

Consistency of Observation and Model Uncertainties
The level of agreement between the prior observation and model uncertainty specified and the posterior uncertainty from the assimilation provides a measure of the consistency of the assimilation system given the prior uncertainty assumptions (Desroziers et al., 2005). The posterior errors are calculated for the reanalysis using equations (5) and (6)  Overall, we are confident that the prior uncertainties are well specified and, as such, the assimilation is configured to provide a near-optimum minimisation of the cost function for each assimilation interval.

SSH
The Root Mean Squared (RMS) observation anomaly for a particular observation location describes the variability in the observation with respect to its time-mean. This is compared to the RMS differences between the observations and the freerunning model (the 2yr free run), and the observations and the analysis, to provide an assessment of how well the free run and the analysis match the observations relative to their typical variability. A skillful state estimate will have residuals with the 10 observations that are much lower that the observation's typical variability.   Figure 8b and shows that the SSH fields are well represented in the analyses. In Figure 8c, the domain-averaged RM S ObsAnom , RMSD F reerun−Obs and RMSD Analysis−Obs are plotted for each 5-day assimilation window over the 2-year period, showing significant improvement in the fit to observations in the analyses. The RMSD F reerun−Obs is of similar magni-20 tude to the SSH observation anomaly indicating that, as expected, the free run has no skill in predicting the chaotic mesoscale eddies. The time-mean of the spatially-averaged RMSD Analysis−Obs over all assimilation windows is 7.4cm. This is close to the observation uncertainty for SSH of 6cm and small compared to the typical SSH variability (the time-mean of the spatiallyaveraged SSH observation anomalies is 23.4cm)

25
The free-running model shows some skill in prediction of the SST due to the accuracy of the surface forcing; however, significant improvement is achieved in the analyses. The RMS SST observation anomalies describe the variability in SST over the 2-year assimilation period, including the seasonal cycle, and are shown in Figure 9a. The RMSD F reerun−Obs is smaller than the observation anomalies and the RMSD Analysis−Obs (Figure 9b)  We show the improvement in subsurface temperature as measured by the Argo floats, XBTs and ocean gliders by computing the RMSD F reerun−Obs and RMSD Analysis−Obs in nominal depth bins for all observations over the model domain ( Figure   10 10). For Argo, time-mean RMSD F reerun−Obs for all observations in the upper 500m of the water column is 1.6 • C, reduced to 0.8 • C in the analysis. For the XBT above 500m, the time-mean RMSD is reduced to 1.2 • C in the analysis from 2.0 • C for the free-running model. Below 500m, the number of observations in each depth bin for Argo and XBT is too low for a meaningful comparison. The greatest improvement in the fit to observations is achieved in the glider data which mostly samples the shelf and shelf slope circulation. A great majority of the glider observations are in the upper 100m of the water column; here the 15 time-mean RMSD F reerun−Obs of 1.9 • C is reduced to 0.9 • C in the analysis.
As the Argo profiling floats measure both temperature and salinity at each observation time we are able to assess the residual reduction in terms of potential density throughout the water column (Figure 11), describing the improvement in the representation of the density structure in the analysis. The free-running model has some skill in predicting potential density as sampled by the Argo floats in the upper 500m, as the RMSD F reerun−Obs is less than the RM S ObsAnom for the nominal depth bins. 20 The RMSD Analysis−Obs in potential density is reduced to about half of the RMSD F reerun−Obs in the upper 500m; the upper layer that is most effected by mesoscale eddies. The RMSD Analysis−Obs in potential density peaks in the upper 100m at 0.24 kgm −3 and decreases gradually to below 0.1 kgm −3 at 500m depth, remaining below that for the Argo-observed ocean deeper than 500m.

Mooring observations 25
Profiles of the complex correlation between the velocities from the free-running model and the analyses at the mooring velocity measurement locations are shown in Figure 12. The correlations are generally considerably improved in the analyses. The complex correlations for the analysis velocities approach the value of one for the South East Queensland shelf moorings (SEQ200 and SEQ400), which are on the shelf and shelf slope at the latitude where the EAC is found to be most coherent. At this same latitude, the deep water array moorings 1 to 4 (EAC1-4) have high complex correlations between the analysis and 30 the observations in the upper 400m of the water column. This is where the mean EAC jet is the strongest (refer to Figure 3).
The EAC5 mooring is outside of the main jet and influenced by a more variable eddy-dominated circulation and the analysis has slightly lower correlations in the upper water column at this location.
Further south on the shelf, velocity estimates are improved with depth-averaged free run and analysis complex correlations of 0.69 and 0.91, respectively, for the Coffs Harbour mooring (CH100), 0.50 and 0.83 for the Sydney mooring (SYD100) and 0.48 and 0.87 for the other Sydney mooring (SYD140).

Radials
The observed surface velocities are computed from the assimilated radials and the corresponding values computed from the 5 radial values extracted from the free-running model and the analyses. The complex correlations between these observed surface velocities and the surface velocities computed from the free-running model and the analysis are shown in Figure 13. Note that the complex correlation for a particular grid cell requires velocities to be computed, which requires radial data from each of the two sites to be available in that cell and that the beams overlap with an angle greater than 30 • (velocity calculations where beam intersection angles are smaller than this are deemed inaccurate). Radial data is assimilated at other times but cannot be

Comparison to independent observations
Because 4D-Var uses the model dynamics to determine covariance, information from observed variables can propagate to unobserved regions such that the ocean state better fits and is in balance with the observations. Comparison of the reanalysis with independent, non-assimilated, observations allows us to assess the performance of the state estimate away from assimilated observations. As the principal aim of this work was to assimilate the maximum number of available observations in the region 20 in order to provide a 'best estimate' of the ocean state over the 2-year period, few independent observations remain available for this comparison.
The available independent observations are from shipboard CTD casts that were taken on three separate cruises within the Note that the profiles of Argo RMSD F reerun−Obs and RMSD Analysis−Obs in potential density ( Figure 11) are similar to the RMSD profiles for the independent shipboard CTD observations. The reanalysis showing similar residual reduction for the assimilated Argo observations and for the non-assimilated CTD cast observations suggests a well-specified assimilation system in which a dynamically-balanced ocean state estimate is achieved with improved state-estimation throughout the model domain, rather than over-fitting to assimilated observations. 5

Summary and Conclusions
We have presented the development of a data assimilating model of the EAC region and assessed the performance of the corresponding reanalysis over a 2-year period. We use an advanced variational data assimilation scheme to integrate a state-ofthe-art coastal ocean model with an unprecedented observational data for the southeast Australian region. We show that the freerunning numerical model reproduces the long-term mean surface and subsurface ocean properties and represents the eddying Not only does the reanalysis provide a good match with observations, it is the first reanalysis of the EAC region that resolves the continental shelf at its narrowest point (15km) by more than a single grid cell (BRAN3 has a resolution of 10km, Oke et al. (2013)) and the first high-resolution reanalysis of the region that uses the model physics to adjust the model in a dynamically consistent way (Zavala-Garay et al. (2012) has a resolution of 18-30km). Furthermore, it is the first attempt to assimilate such a 25 variety of observations in the region, including observations from moorings on and off the continental shelf, a coastal HF radar array and ocean gliders. The high-resolution and dynamic-consistency of the reanalysis mean it has the potential to provide a marked improvement in our ability to capture important circulation dynamics in the EAC.
The reanalysis is being used to study the 3-dimensional structure of the current and the processes that drive its separation from the coast and eddy formation. Several modelling studies of coastal regions in south-eastern Australia are making use of 30 the reanalysis for boundary forcing. Output from the adjoint model integrations performed in each assimilation interval is being used to directly assess the impact of specific observations on the estimates of circulation dynamics of interest. Through this we hope to understand which observations are most effective at improving our state-estimates and which locations are most effective to observe, providing valuable information on how we might improve the observing system to ultimately improve prediction.

Data availability
Model initial conditions and boundary forcing comes from the Bluelink ReANalysis version 3p5 (BRAN3, Oke et al. (2013)).
Surface forcing is provided by the Australia Bureau of Meteorology's Australian Community Climate and Earth-System Simu-5 lation (ACCESS) 12km product (Puri et al., 2013). The observations used for assimilation are available through the Australian Integrated Marine Observing System's (IMOS) data portal (www.imos.org.au).
The reanalysis output is saved as snapshots of 3-dimensional fields of ocean properties (sea-level, temperature, salinity, velocities) every 4 hours over the 2-year period (2012)(2013). The data is archived at the University of New South Wales, Australia, and can be made available for research purposes (contact the corresponding author of this paper).
Sydney) cross-shore sections as shown in Figure 3, computed daily for the 10-year free-running model period.

32
Geosci. Model Dev. Discuss., doi: 10.5194/gmd-2016-44, 2016 Manuscript under review for journal Geosci. Model Dev.  Depth (m) Figure 14. RMS potential density observation anomaly and RMS difference between the free run and observations, and the analysis and observations for independent CTD cast observations mapped to model vertical levels. Observations are grouped into nominal depth bins of 20m from the surface to 500m depth and 50m below 500m.