The Finite-volumE Sea ice–Ocean Model (FESOM2)

Version 2 of the unstructured-mesh Finite-Element Sea ice–Ocean circulation Model (FESOM) is presented. It builds upon FESOM1.4 (Wang et al., 2014) but differs by its dynamical core (finite volumes instead of finite elements), and is formulated using the arbitrary Lagrangian Eulerian (ALE) vertical coordinate, which increases model flexibility. The model inherits the framework and sea ice model from the previous version, which minimizes the efforts needed from a user to switch from one version to the other. The ocean states simulated with FESOM1.4 and FESOM2.0 driven by COREII forcing are compared on a mesh used for the CORE-II intercomparison project. Additionally, the performance on an eddy-permitting mesh with uniform resolution is discussed. The new version improves the numerical efficiency of FESOM in terms of CPU time by at least 3 times while retaining its fidelity in simulating sea ice and the ocean. From this it is argued that FESOM2.0 provides a major step forward in establishing unstructured-mesh models as valuable tools in climate research.


Introduction
Ocean circulation models formulated on unstructured meshes offer multi-resolution functionality in a seamless way. Although they are common in coastal ocean modeling, they are only beginning to be used for global ocean studies. The Finite-Element Sea ice-Ocean circulation Model (FE-SOM, Wang et al., 2014) is the first mature global multiresolution model designed to simulate the large-scale ocean. A number of FESOM-based studies related to the impact of local dynamics on the global ocean (see, e.g., Hellmer et al., 2012;Haid and Timmermann, 2013;Wekerle et al., 2013;Haid et al., 2015;Wang et al., 2016a;Sein et al., 2016;Wek-erle et al., 2016) indicate that the multi-resolution approach advocated by FESOM is successful and allows one to explore the impact of local processes on the global ocean with moderate computational effort (see Sein et al., 2016). Other new global multi-resolution models are appearing (see Ringler et al., 2013), and new knowledge on unstructured-mesh modeling has accumulated (for a review, see Danilov, 2013). Although FESOM1.4  already offers a very competitive throughput compared to structured-mesh models in massively parallel applications (Sein et al., 2016), we continue to explore the ways to further increase the numerical efficiency of unstructured-mesh models and extend their area of applicability. This paper describes the new numerical core of FESOM2, which is based on finite-volume discretization. Despite the change in the discretization type, we keep the old abbreviation, which now will take "E" from the last letter of "volume". The reason is that FESOM2 builds on the framework of FESOM1.4, including its ice component, FESIM , its input and output routines and its user interface. It works on the same general triangular meshes and is conceived so as to minimize new learning required from users with experience with FESOM1.4. We will use FESOM2 as a root name for the new version, and FE-SOM2.0 for the implementation available at present.
The main reason for switching to a new finite-volume numerical core in FESOM2 is its higher computational efficiency. It stems largely from a more efficient data structure. FESOM1.4 is based on tetrahedral elements, and tetrahedra below any surface triangle do not necessarily keep the same neighborhood connectivity pattern as the depth increases. Three-dimensional auxiliary and look-up arrays are therefore needed, and accessing them for each element slows down the performance. Another reason for switching to a finitevolume version is the availability of clearly defined fluxes Published by Copernicus Publications on behalf of the European Geosciences Union. 766 S. Danilov et al.: FESOM2: from finite elements to finite volumes and a possibility to choose from a selection of transport algorithms, which was very limited for the continuous Galerkin discretization of FESOM1.4. A very useful feature of FE-SOM1.4 is its ability to combine geopotential and terrainfollowing vertical mesh levels, namely, it was the reason for using tetrahedral elements and not triangular prisms. To ensure similar functionality in the new version, we introduce the arbitrary Lagrangian Eulerian (ALE) vertical coordinate (see, e.g., Donea and Huerta, 2003), which provides a general approach to incorporating different types of vertical coordinates within the same code.
Although many details of the finite-volume method used by FESOM2 have already been presented in Danilov (2012), we will repeat their description here for completeness. Besides, the ALE vertical coordinate redefines the implementation details. The paper begins with the description of basic model numerics, delegating some details and implementation variants to the Appendices. The performance of FESOM2 is compared to that of FESOM1.4 in simulations driven by the CORE-II forcing (Large and Yeager, 2009). We report on simulations carried out on a coarse (nominally 1 • ) mesh used by FESOM1.4 in the framework of CORE-II intercomparison, and on a global mesh with a resolution of about 15 km. The intention here is to illustrate that FESOM2 is a fully functional and highly competitive general ocean circulation model. A detailed model assessment paper will be presented separately.
2 Basic description 2.1 The placement of variables FESOM2 uses a cell-vertex placement of variables in the horizontal directions. The 3-D mesh structure is defined by the surface triangular mesh and a system of level surfaces which form a system of prisms. In a horizontal plane, the horizontal velocities are located at cell (triangle) centroids, and scalar variables are at mesh (triangle) vertices. The vector control volumes are the prisms based on mesh surface cells, and the prisms based on median-dual control volumes are used for scalars (temperature, salinity, pressure and elevation). The latter are obtained by connecting cell centroids with edge midpoints, as illustrated in Fig. 1. The same cellvertex placement of variables is also used in FVCOM (Chen et al., 2003); however, FESOM2 differs in almost every numerical aspect, including the implementation of time stepping, scalar and momentum advection and dissipation (see below).
In the vertical direction, the horizontal velocities and scalars are located at mid-levels. The velocities of inter-layer exchange (vertical velocities for flat layer surfaces) are located at full layers and at scalar points. Figure 2 illustrates this arrangement.  Figure 1. Schematic of cell-vertex discretization (left) and the edge-based structure (right). The horizontal velocities are located at cell (triangle) centers (red circles) and scalar quantities (the elevation, pressure, temperature and salinity) are at vertices (blue circles). The vertical velocity and the curl of horizontal velocity (the relative vorticity) are at the scalar locations too. Scalar control volumes (here the volume associated with vertex v 1 is shown) are obtained by connecting the cell centers with midpoints of edges. Each cell is characterized by the list of its vertices v(c), which is (v 1 , v 2 , v 3 ) for c = c 1 , and the list of its nearest neighbors n(c). For c = c 1 , n(c) includes c 2 , c 6 and a triangle (not shown) across the edge formed by v 2 and v 3 . One can also introduce c(v), which is (c 1 , c 2 , c 3 , c 4 , c 5 , c 6 ) for v = v 1 , and other possible lists. Edge e (right panel) is characterized by the list of its vertices v(e) = (v 1 , v 2 ) and the ordered list of cells c(e) = (c 1 , c 2 ) with c 1 on the left. The edge vector l e connects vertex v 1 to vertex v 2 . The edge cross-vectors d ec 1 and d ec 2 connect the edge midpoint to the respective cell centers.
The layer thicknesses are defined at scalar locations (to be consistent with the elevation). There are also auxiliary layer thicknesses at the horizontal velocity locations. They are interpolated from the vertex layer thicknesses.
The cell-vertex discretization selected for FESOM2 can be viewed as an analog of an Arakawa B-grid (see also below), while that of FESOM1.4 is an analog of an A-grid. The cell-vertex discretization is free of pressure modes, which would be excited in the A-grid FESOM1.4 without its stabilization. However, the cell-vertex discretization allows spurious inertial modes because of excessively many degrees of freedom used to represent the horizontal velocities. They can be filtered by the horizontal viscosity. In the quasi-hexagonal C-grid discretization used by the Model for Prediction Across Scales (MPAS) (Ringler et al., 2013) the location of scalar variables is the same (on vertices of a dual triangular mesh) as in FESOM2. The triangular C-grid of ICON (www.mpimet.mpg.de/en/science/models/icon/) is notably different for its scalar variables are located at cells and there are twice as many of them as in FESOM2. Our preference for the cell-vertex discretization is mostly due to its lack of pressure modes, the straightforward way of handling its spurious modes and the ability to work on general triangu-lar meshes (in contrast to orthogonal meshes required by Cgrids). Such meshes are more flexible than the Voronoi quasihexagonal meshes or orthogonal triangular meshes needed for C-grids.

Notation
For convenience of model description we introduce the following notation. Quantities defined at cell centroids will be denoted with the lower index c, and the quantities at vertices will be denoted with the lower index v. The vertical index k will appear as the first index, but it will be suppressed if this does not lead to ambiguities. The agreement is that the layer index increases downwards. The indices may appear in pairs or in triples. Thus the pair kc means the vertical layer (or level for some quantities) k and cell c, and the triple kcv means that the quantity relates to layer (level) k, cell c and vertex v of this cell. We use the notation c(v) for the list of cells that contain vertex v, v(c) for the list of vertices of cell c, e(v) for the list of edges emanating from vertex v, and so on. Each edge e is characterized by its vertices v(e), the neighboring cells c(e), the length vector l e directed from the first vertex in v(e) to the second one and two cross-edge vectors d ec directed from the edge center to the cell center of the left and right cells, respectively (see Fig. 1). The cells in the list c(e) are ordered so that the first one is on the left of the vector l e . The boundary edges have only one (left) cell in the list c(e).
We use a spherical coordinate system with the North Pole displaced to Greenland (commonly 75 • N, 50 • W). A local Cartesian reference frame is used on each cell with cellwiseconstant metric coefficients (cosine of latitude). Gradients of scalar quantities and cell areas are computed with respect to local coordinates. The vectors d ec are stored in a local physical measure of respective cells c for they always enter in combination with velocity (defined on cells) to give normal transports. Vectors l e are stored in radian measure. Whenever their physical length is required, it is computed based on the mean of cosines on c(e). We will skip other details of spherical geometry and ignore the difference in the representation of l e and d ec for brevity below. The x and y directions should be understood as local zonal and meridional directions.

Bottom representation
The bottom topography is commonly specified at scalar points because the elevation is defined there. However, for discretizations operating with full velocity vectors, this would imply that velocity points are also at topographic boundaries. In this case the only safe option is to use the noslip boundary conditions, similar to the traditional B-grids. To avoid this constraint, we use the cellwise representation of bottom topography. In this case both no-slip and free-slip boundary conditions are possible. Their implementation relies on the concept of ghost cells which are obtained from the The blue circles correspond to scalar quantities (temperature, salinity, pressure), the red circles to the horizontal velocities and the yellow ones to the vertical exchange velocities. The bottom can be represented with full cells (three left columns) or partial cells (the next two). The mesh levels can also be terrainfollowing, and the number of layers may vary (the right part of the schematic). The layer thickness in the ALE procedure may vary in prisms above the blue line. The height of prisms in contact with the bottom is fixed.
boundary elements by reflection with respect to the boundary face (edge in 2-D). The drawback of the elementwise bottom representation is that the total thickness is undefined at scalar points if the bottom is stepwise (geopotential vertical coordinate). The motion of level surfaces of the ALE vertical coordinate at each scalar location is then limited to the layers that do not touch the bottom topography (above the blue line in Fig. 2). This is related to the implementation of partial cells, which is much simpler if the thickness of the bottom layer stays fixed. The layer thickness h is dynamically updated at scalar points (vertices) in the layers that are affected by the ALE algorithm and interpolated to the cells (1) The cell thicknesses h c enter the discretized equations as the products with horizontal velocities. Because of cellwise bottom representation, algorithms aiming to closely follow the bottom topography may create triangular prisms going inland (two lateral faces touch the land) at certain levels on z-coordinate meshes even if such prisms were absent along the coast. Such prisms lead to instabilities in practice and have to be excluded. The opposite situation with land prisms pointing into the ocean is much less dangerous, yet it is better to avoid them too. We adjust the number of layers under each surface triangle at the stage of mesh design to exclude such potentially dangerous situations. This issue is absent in FESOM1.4 because of the difference in the placement of horizontal velocities and noslip boundary conditions. Since the number of cells is nearly twice as large as the number of vertices, the cellwise bottom representation may contain more detail than can be resolved 768 S. Danilov et al.: FESOM2: from finite elements to finite volumes by the field of vertical velocity. This may make quasi-vertical transport velocities look noisy in layers adjacent to the bottom.

Partial cells
Partial cells on z-coordinate meshes are naturally taken into account in the ALE formulation because it always deals with variable layer thicknesses (heights of prisms). If K c is the number of layers under cell c, we define If the layer thicknesses are varied in the ALE procedure, this is limited to K − v −1 layers. With this agreement, the thickness of the lowest layer on cells is kept as initially prescribed. In this case the implementation of partial cells reduces to taking the thicknesses of the lowest layers on cells as dictated by the bottom topography unless they are too thick (the real depth is deeper than the deepest standard level by more than the half thickness of the last standard layer), in which case we bound them. The heights of scalar control prisms in the layers below K − v are formally undefined, so they are considered to be the volume-mean ones. Scalar and vector quantities defined at mid-layers are kept at their standard locations. This avoids the creation of spurious pressure gradients. The partial cells then work through the modified transports crossing the faces of control volumes. Since the horizontal velocities are located at cells, the transports entering scalar control volumes are uniquely defined. For vector control volumes the areas of vertical faces may be different on two prisms meeting through the face. Taking the minimum area to compute fluxes is the safest option in this case.
3 Layer equations and time stepping

Layer thicknesses and layer equations
We introduce layer thicknesses h k = h k (x, y, t), where k = 1 : K is the layer index and K the total number of layers. They are functions of the horizontal coordinates and time in a general case. We basically follow the implementation of the ALE vertical coordinate as presented in Ringler et al. (2013) (there are other approaches; see, e.g., Adcroft and Hallberg, 2006;Hofmeister et al., 2010), namely, we introduce the transport velocities w through the top and bottom boundaries of the prisms. They are the differences between the physical velocities in the direction normal to the layer interfaces and the velocities due to the motion of the interfaces. These velocities are defined at the interfaces (the yellow points in Fig. 2). For layer k the top interface has index k and the bottom one is k + 1. All other quantities -horizontal velocities u, temperature T , salinity S and pressure p -are defined at mid-layers. Their depths will be denoted as Z k , and the notation z k is kept for the depths of mesh levels (the layer interfaces). They are functions of horizontal coordinates and time in a general case.
The equations of motion, continuity and tracer balance are integrated vertically over the layers. We will use T to denote an arbitrary tracer. The continuity equation becomes the equation on layer thicknesses and the tracer equation becomes Here, W is the water flux leaving the ocean at the surface. It contributes to the first layer only (hence the delta function); T W is the property transported with the surface water flux and the indices t and b imply the top and bottom of the layer. The operator ∇ is a 2-D one. The right-hand side of Eq. (4) contains the 3-by-3 diffusivity tensor K, and ∇ 3 denotes the 3-D divergence or gradient operators. In writing the 3-D divergence we assume the discrete form are the placeholders for the horizontal and vertical components of the 3-D vector it acts on. The components of the 3-D gradient do not share the same location, so the discretization of K∇ 3 T requires special care (see Lemarié et al., 2012a, for the discussion for quadrilateral meshes). Note that w coincides with the vertical velocity through the layer surface only if the layer surfaces are flat. If the surfaces are inclined, w is the quasi-vertical transport velocity defining the exchange between the layers.
Integrating Eq. (3) vertically and assuming w t = 0 at the free surface, we obtain the elevation equation The layer-integrated momentum equation in the flux form is with D u h u the horizontal viscosity operator (to be specified later), ν v the vertical viscosity coefficient, f the Coriolis parameter and k a unit vertical vector. We ignore the momentum source due to the added water W at the surface. The pressure field is expressed as with p a the atmospheric pressure, which will be omitted for brevity, η the elevation, ρ the deviation of density from its reference value ρ 0 , and p h the hydrostatic pressure due to ρ. The term with the pressure gradient, gρ∇Z, accounts for the fact that layers deviate from geopotential surfaces. The quantity Z appearing in this term is the z coordinate of the midplane of the layer with thickness h. The origin of this term should become clear if one recalls that the horizontal pressure gradient has to be computed at a constant vertical coordinate z.
If the flux form (6) is used, it is more natural to formulate the solution procedure in terms of the horizontal layer transport velocities U = hu.
We get another familiar option by subtracting u multiplied by the thickness Eqs. (3), rearranging the terms with vertical transports and dividing over the layer thickness h: Here, additionally, we used the identity which leads to the vector-invariant form of the momentum equation.
The second term on the lhs of Eq. (8) includes division and multiplication by the layer thickness, and in doing so, it introduces the layer potential vorticity (PV) q = (ω + f ) / h and its transport uh. The layer thickness drops out from the continuous Eq. (8). In the discrete case, the location of vorticity points (vertices) and velocity points is different, and by keeping h the equation will then deal with the same horizontal transports as the thickness equations. This is the prerequisite for developing discretizations that conserve potential vorticity. We will suppress h in Eq. (8) further for simplicity, but including it requires only small modifications.
To summarize, the velocity w of quasi-vertical transport through the interfaces replaces the vertical velocity in the formulas above. The layer surfaces can be any combination of the standard choices, including the moving surfaces.

Asynchronous time stepping
FESOM1.4 uses asynchronous time stepping, with the horizontal velocities and scalars shifted by a half time step. We adapt it to FESOM2. This requires that the elevation and layer thicknesses be introduced at, respectively, full (integer) and half-integer time levels. We write and h n+1/2 T n+1/2 − h n−1/2 T n−1/2 = −τ ∇ · u n h * T n to warrant tracer conservation. Here τ is the time step and D T stands for the terms related to diffusion. We omit the time index on w, for w is related to u and h. Since the horizontal velocity is centered in time, these equations will be of the second order for advective terms if h * = h n . When the vector-invariant form of the momentum equation is used, taking h * = h n−1/2 is more convenient. In this case one does not need thicknesses at full time levels, but only the elevation. Although this formally reduces the time order to the first, the consequences are minor as long as thickness variations are small, which are our options at present. In addition, the elevation is usually computed with the accuracy shifted to the first order in large-scale ocean models, including this one. We will proceed with this option here. Appendix A shows how to implement h * = h n for the flux form of the momentum equation, and its generalizations are straightforward.
The elevation at full time steps and the total thickness on half-steps, given by the vertical sum of h k , may become decoupled due to numerical errors. In order to suppress such decoupling, we seek an algorithm which maintains consistency between the physical layer thickness (h, used with tracers) and dynamical thickness (dependent on the elevation η). We introduce where H is the unperturbed ocean thickness. h would be identical to the elevation η in the continuous world, but not in the discrete formulation here. For h * = h n−1/2 we write for the elevation Here α is the implicitness parameter (0.5 ≤ α ≤ 1) in the continuity equation. Note that the velocities at different time steps are taken on their respective meshes. This approach is inspired by Campin et al. (2004). The equation for thicknesses can be vertically integrated, giving, under the condition that the surface value of w t vanishes, Expressing the rhs in the formula for η through the difference in surface displacements h from the last formula, we see that η and h can be made consistent if we require that Now, to eliminate the possibility of η and h diverging, we always compute η n from the last formula, then estimate η n+1 by solving dynamical equations, and use it only to compute u n+1 . On the new time step a "copy" of η n+1 will be created from the respective fields h. We commonly select α = 1/2; in this case, η n is just the interpolation between the two adjacent values of h. Note that Eq. (14) will be valid for h * = h n ; it is only the upper limits in the integrals above that will be adjusted. The advantage of this approach compared to the synchronous time stepping is that a single version of w centered at full steps is needed. The disadvantage is the additional machinery involving the thicknesses and elevation.
We will continue by providing more detail on the asynchronous time stepping. We write Here θ is the implicitness parameter for the elevation; R n+1/2 u includes all the terms except for vertical viscosity and the contribution from the elevation. We use the second-order Adams-Bashforth (AB) method for the terms related to the momentum advection and Coriolis acceleration; the contribution of pressure p h does not need the AB interpolation (because it is centered) and the horizontal viscosity is estimated on the level n. We write the predictor equation as Solving the three-diagonal operator on the lhs for each column, we find the predicted velocity update u = u * − u n . 1 The corrector step is written as Expressing the new velocity from this equation and substituting the result into the equation for the elevation, we find 1 The vertical viscosity contribution on the rhs can be conveniently added during the assembly of the operator on the lhs.
Here, the operator part depends on h n+1/2 , which is the current value of the thickness. The last term on the rhs is taken from the thickness computations on the previous time step.
The overall solution strategy is as follows.
-Determine layer thicknesses and w according to the options described below.
-Advance the tracers. The implementation of implicit vertical diffusion will be detailed below.
Options for the vertical coordinate: -Linear free surface: if we keep the layer thicknesses fixed, the time derivative drops out, and the rest gives us the standard equation to compute w, starting from the bottom and continuing to the top, If this option is also applied to the first layer, the freshwater flux cannot be taken into account in the thickness equation. Its contribution to the salinity equation is then through the virtual salinity flux. In this option, w at the (fixed) ocean surface differs from zero, and so do the tracer fluxes. They do not necessarily integrate to zero over the ocean surface, which is why tracer conservation is violated.
-Full (nonlinear) free surface: we adjust the thickness of the upper layer, while the thicknesses of all other layers are kept fixed, ∂ t h k = 0 for k > 1. The thickness equations are used to compute w on levels k = 2 : K v starting from the bottom. The change in the thickness of the first layer h n+3/2 1 − h n+1/2 1 is given by Eq. (13), written for the respective time interval. In this case there is no transport through the upper moving surface (the transport velocity w 1 is identically zero). This option requires minimum adjustment with respect to the standard z coordinate. However, the matrix of the operator in Eq. (18) needs to be re-assembled on each time step.
-We can distribute the total change in height ∂ t h between several or all eligible layers. Due to our implementation, at each scalar horizontal location they can only be the layers that do not touch the bottom topography. If all eligible layers are involved, we estimate where h 0 k are the unperturbed layer thicknesses and H is their sum for all eligible layers. The thickness of the layers adjacent to the topography is kept fixed. The equation on thickness, written for each layer, is used to compute transport velocities w starting from zero bottom value. This variant gives the so-called z * coordinate.
-This can be generalized even further. One can use arbitrary distribution of layer thicknesses provided that their tendencies sum to ∂ t h over the layers. In particular, requiring that transport velocities w are zero, isopycnal layers can be introduced. The levels can move with high-pass vertical velocities, leading to the socalled z coordinate -see Leclair and Madec (2011), Petersen et al. (2015), or follow density gradients as in Hofmeister et al. (2010). The unperturbed layer thicknesses need not follow the geopotential surfaces and can be terrain-following, for example. Additional measures may be required in each particular case. For example, for terrain-following meshes the algorithms of computing pressure gradient should be adjusted to minimize errors in the momentum equation. Updated transport algorithms are also needed (in the spirit of Lemarié et al., 2012b) to minimize spurious numerical mixing in terrain-following layers. These generalizations are among the topics of ongoing work.
Because of varying layer thicknesses, the implementation of implicit vertical diffusion needs slight adjustment compared to the case of fixed layers. We write, considering time levels n − 1/2 and n + 1/2, and split it into and Here R T contains all advection terms and the terms due to the diffusion tensor except for the diagonal term with K 33 . Note that the preliminary computation of T * here is a necessity to guarantee that a uniform distribution stays uniform (otherwise some significant digits will be lost).
The semi-implicit implementation of the part related to the surface elevation (external mode) implies that an iterative solver must be used to solve the equation on η n+1 . An alternative is the option with subcycling, as detailed in Appendix B.

Spatial discretization of equations
To obtain the finite-volume discretization, the governing equations are integrated over the control volumes. The flux divergence terms are then, by virtue of the Gauss theorem, transformed into the net fluxes leaving the control volumes. All other terms are estimated as a mean over the volumes. It is assumed that and, similarly for the temperature and other scalars, Here A c and A kv are the horizontal areas of cells and scalar prisms. Note that the scalar areas vary with depth, hence the index k in A kv in the formula above (the index k will be suppressed in some cases). For layer k, A kv is the area of the prism kv including its top face. The area of the bottom face is A (k+1)v and may differ from that of the top one if the bottom is encountered. To be consistent in spherical geometry, we use where c(v) is the list of wet prisms containing v in layer k.
Since the horizontal velocity is at centroids, its cell-mean value u c can be identified with the value of the field u at the centroid of cell c with the second order of spatial accuracy. For scalar quantities a similar rule is valid only on uniform meshes, but even in this case it is violated in the vicinity of boundaries or the topography. This has some implications for the accuracy of transport operators.

Horizontal operators
-Scalar gradient takes vertex values of a field and returns the gradient at the cell center: where n e is the outer normal to cell c. Clearly l e n e = −k × l e if c is the first (left) cell of c(e). This procedure introduces G cv = G x cv , G y cv with the x and y component matrices G x cv and G y cv . They have three non-zero entries for each cell (triangle), which are stored. In contrast to FESOM1.4, where similar arrays are stored for each tetrahedron (and for four vertices and three directions), here only surface cells are involved.
-Vector gradient takes the values of velocity components and returns their derivatives at cell locations. They are computed through the least squares fit based on the velocities on neighboring cells sharing edges with cell c. Their list is n(c). The derivatives (α x , α y ) of the velocity component u are found by minimizing Here r c n = (x c n , y c n ) is the vector connecting the center of c to that of its neighbor n. The solution of the minimization problem can be represented as two matrices g x c n and g y c n , acting on velocity differences u n − u c and returning the derivatives. Computations for the vcomponent result in the same matrices. The explicit expressions for the matrix entries are Here = n(c) y 2 c n and XY = n(c) x c n y c n . The matrices are computed once and stored.
On the cells touching the lateral walls or bottom topography, we use ghost cells (mirror reflections with respect to the boundary edge). Their velocities are computed either as u n = −u c or u n = u c − 2 (u c · n n c ) n n c for the no-slip or free-slip cases, respectively. Here n is the index of the ghost cell, and n n c is the vector of the unit normal to the edge between cells c and n. Note that filing ghost cells takes additional time, but allows us to use matrices g x c n and g y c n related to the surface cells only. Otherwise, separate matrices will be needed for each layer. Note also that ghost cells are insufficient to implement the free-slip condition. In addition, the tangent component of viscous stress should be eliminated directly.
We stress that matrices g x c n and g y c n return derivatives of velocity components, and not the components of the tensor of velocity derivatives. The latter includes additional metric terms.
-Flux divergence takes fluxes nominally defined on cells and returns their divergence on scalar control volumes: where n e c is the outer normal to control volume v. Clearly, if v is the first vertex in the list v(e), n e c d e c = −k × d e c if c is the first in the list c(e) (signs are changed accordingly in other cases). While these rules may sound difficult to memorize, in practice computations are done in a cycle over edges, in which case signs are obvious.
In contrast to the scalar gradient operator, the operator of divergence depends on the layer (because of bottom topography), which is one of the reasons why it is not stored in advance. Besides, the fluxes F involve estimates of the scalar quantity being transported. Computing these estimates requires a cycle over edges in any case, so there would be no economy even if the matrices of the divergence operator were introduced.
-Velocity curl takes velocities at cells and returns the relative vorticity at vertices: where t e c is the unit vector along d e c oriented so as to make a counterclockwise turn around vertex v. If v is the first in the list v(e) and c is the first in the list c(e), t e c d e c = d e c . This operator also depends on the layer and is not stored.
It can be verified that the operators introduced above are mimetic. For example, the scalar gradient and divergence are negative adjoints of each other in the energy norm and the curl operator applied to the scalar gradient operator gives identically zero. The latter property allows a PV conserving discretization, but we will not discuss it here.

Momentum advection
FESOM2.0 has three options for momentum advection. Two of them use the flux form and the third one uses the vectorinvariant form. In spherical geometry the flux form takes an additional term Mk × u, where M = u tan λ/r E is the metric frequency, with λ the latitude and r E the Earth's radius. All the options are based on the understanding that the cellvertex discretization has an excessive number of velocity degrees of freedom on triangular meshes. The implementation of momentum advection must contain certain averaging in order to suppress the appearance of grid-scale noise.
-Vertex velocity option. We compute vertex velocities by averaging and use them to compute the divergence of horizontal momentum flux: Here n e is the external normal and l e n e = −k × l e if c is the first one in the list c(e). Since the horizontal velocity appears as the product with the thickness, the expressions here can be rewritten in terms of transports U = uh.
The fluxes through the top and bottom faces are computed with w c = v(c) w v /3 using either centered or standard third-order upwind algorithms.
-Scalar control volumes. Instead of using vector control volumes, we assemble the flux divergence on the scalar control volumes and then average the result from the vertices to the cells. For the horizontal part, with the same rule for the normals as in the computations of the divergence operator. The contributions from the top and bottom faces of the scalar control volume are obtained by summing the contributions from the cells, for the top surface, and similarly for the bottom one. The estimate of u t can be either centered or third-order upwind as above. Other methods will follow.
This option is special in the sense that the continuity is treated here in the same way as for the scalar quantities.
-Vector-invariant form. The relative vorticity in the cellvertex discretization is defined on vertices, and so should the Coriolis parameter. We use the following representation: ( The representation with the thicknesses, is reserved for the future. The gradient of kinetic energy should be computed in the same way as the pressure gradient, which necessitates computations of u 2 at vertices. This is done as The vertical part follows Eq. (8), for the top surface, and similarly for the bottom. Note that the contributions from the curl of horizontal velocity, the gradient of kinetic energy and the vertical part involve the same stencil of horizontal velocities.
The three options above behave similarly in simple tests on triangular meshes, but their effect on flow-topography interactions or eddy dynamics remains to be studied. The vectorinvariant option is slightly less dissipative, but may leave some noise in w in areas where mesh resolution is varied (see , which is absent for the flux forms. Higher-order methods can be applied for momentum flux computations, exploring which is reserved for the future.

Viscosity operators
Formally, the derivatives of horizontal velocity can be estimated and the components of the viscous stress tensor, σ ij = ν h (∂ i u j + ∂ j u i ), can be found. Here the indices i, j imply the horizontal directions, and ν h is the horizontal viscosity. In computing their divergence a centered estimate of stresses has to be taken at the lateral faces of vector control volumes. If discretized in terms of cell velocities, this scheme downweights or fully eliminates the contributions from the nearest cells, and is thus incapable of eliminating grid-scale fluctuations in velocities.
The expression for stresses can be simplified as σ ij = ν h ∂ i u j . As discussed by Griffies (2004), its divergence still ensures energy dissipation, but is nonzero for solid-body rotations if ν h is variable. In spite of this drawback, using the simplified form is advantageous because the contributions from the neighbor velocities in flux divergence can be strengthened. Indeed, only contraction with a normal vector ν h n i ∂ i u j , i.e., the derivative in the direction of the normal n, appears in the contributions for each vertical face. For the face identified with edge e between cells c and n, we formally write n = r c n /|r c n | + (n − r c n /|r c n |), where r c n = d e n −d e c is the vector connecting the centroids of cells c and n, and split the stress contracted with n into two respective parts. The velocity derivative (up to metric terms) in the direction of r c n is just the difference between the neighboring velocities divided by the distance |r c n |. The remaining part of the viscous flux (contracted with (n − r c n /|r c n |)) is computed with the standard procedure based on the centered estimate of stresses. It is easy to see that only the nearest neighbors will be involved on equilateral meshes (for n and r c n are collinear). However, the computations of velocity derivatives and stresses are still needed if meshes deviate from equilateral. The discretization of the harmonic viscosity operator, amended as described above, works well. Its biharmonic version is obtained by applying the procedure twice.
This procedure, especially its biharmonic version, proves to be costly, for it involves computations of velocity derivatives and manipulations with two types of contributions. On the other hand, we see that the expensive part involving the general computation of velocity derivatives is only needed on deformed meshes; it will be small on quasi-equilateral meshes and, even if it is not small generally, it contributes little to penalizing differences between the nearest velocities. This leads to the idea to introduce simplified operators based (ν hn + ν hc ) (u n − u c ) l e /|r cn |, (32) where n is the cell sharing edge e with cell c, we take into account the contributions from the nearest neighbors. This expression is written for a uniform layer thickness, but can be adjusted for a variable one by adding h c on the lhs and h e on the rhs. The computation is implemented as a cycle over edges. One uses ghost velocities to impose boundary conditions, or can skip the contributions from the boundary edges to emulate free slip. It is easy to see that the operator integrates to zero in the domain interior (momentum conservation) and is negative definite in the energy norm. The procedure is applied twice to get a biharmonic version.
The procedure can be simplified even further as Here τ u is a factor with dimension of time to be specified further. This variant is a filter removing grid-scale fluctuations. Clearly, in a general case, it does not ensure momentum conservation, and we cannot strictly prove that it leads to kinetic energy dissipation. However, on equilateral triangular meshes it reduces to D u = (l 2 /3τ u )(∂ xx + ∂ yy ), where l is the triangle height. This allows one to identify τ u with l 2 /(3ν h ). The biharmonic form of the filter is taken as −τ u D u (A 0 /A c ) 1/2 D u , where A 0 is the reference cell area.
In this case τ u = l 3 l 0 /(9ν bh ), where l 0 is the side of the reference cell and ν bh the coefficient of biharmonic viscosity. The inclusion of area scaling is needed for cubic dependence on l.
Writing ν bh in the commonly used form ν bh = V l 3 , where V is the velocity scale, one finds V = l 0 /9τ u . The values about 0.02 m s −1 are generally sufficient even on highly variable meshes. The code contains these options but we are using the last one in the biharmonic version in most cases -it is efficient both computationally and in terms of providing stable code performance. We have not met any visible artifacts thus far despite its obvious physical shortcomings. In all other cases, the coefficient of horizontal viscosity is scaled with mesh size to provide ν h = V l in the harmonic case and ν bh = (V l 3 ) in the biharmonic case.
We note that the inefficiency of the standard Laplace operator in filtering grid scales for cell variable placement and measures needed to amend it are well known (see, e.g., Blazek, 2001). For the co-called ZM discretization, which is similar to the cell-vertex discretization up to the detail of scalar control volumes, Ringler and Randall (2002) proposed introducing a small-stencil vector Laplacian operator based on the identity u = ∇∇ · u − ∇ × ∇ × u. The stencil involves only the nearest neighbors. However, because these computations are not related to the full mesh cells, they neither ensure momentum conservation nor negative definiteness of kinetic energy dissipation in a general case. In this respect using them is not more logical than using the simplified forms (32) or (33).

Transport of scalar quantities
High-order transport schemes for vertex variable placement can be realized by using polynomial reconstruction of scalar fields or the reconstruction of gradients of scalar fields at mid-edges. We experimented with the quadratic reconstruction of scalars, which provides a compromise between accuracy and computational effort (see Skamarock and Menchaca, 2010). Its other advantage for vertex placement of variables is that it needs only the information from the nearest neighbors, which imposes no new demands on halo exchange in parallel implementation. It turned out that it is not more accurate than the gradient reconstruction algorithm, being twice as expensive and demanding much more storage for the reconstruction matrices. For this reason, at present we keep the gradient reconstruction algorithm as the basic one, which is also available in combination with the FCT (flux corrected transport) algorithm.
Consider edge e with v(e) = (v 1 , v 2 ) and c(e) = (c 1 , c 2 ). The advective flux of scalar quantity T through the face of the scalar volume associated with this edge is The quantity Q e is the volume flux associated with edge e which leaves the control volume v 1 . We need an estimate for T e at the mid-edge. In order to provide it, for each edge e we store the indices of the cells ahead or behind this edge in the direction l e . We compute two estimates takes the fourth-order part with the weight γ and the thirdorder part with 1 − γ . In practice, γ between 0.75 and 0.85 works well for many cases, reducing the upwind dissipation considerably (by a factor of 4 for γ = 0.75). These are the recommended values. We note that the high order of the scheme above is only achieved on uniform meshes. However, since T e is computed through linear reconstruction, the second order is warranted on general meshes.
The implementation requires preliminary computation of scalar gradients on cells. An extended halo exchange is needed to make these gradients available during flux assembly. Edges touching the topography may lack either u or d cells. In this case the simplest choice is either to use the central estimate or the estimate based on the mean vertex gradient A v (∇T ) v = c(v) A c (∇T ) c /3. This introduces some additional logistics, but it is common for all high-order schemes.
For the vertical direction, we provide a set of possibilities which include the third-/fourth-order option similar to the algorithm described above, spline interpolation, as well as the piece-wise parabolic method by Colella and Woodward (1984).
The FCT version uses the first-order upwind method as the low-order monotonic method and the method above as the high-order one. The low-order solution and the antidiffusive fluxes (the difference between the high-order and loworder fluxes) are assembled in the same cycle (over edges for the horizontal part and over vertices for the vertical part) and stored. We experimented with separate pre-limiting of horizontal and vertical antidiffusive fluxes and found that commonly this leads to an increased dissipation, for the horizontal admissible bounds are in many cases too tight. For this reason, the computation of admissible bounds and limiting is 3-D. As a result, it will not necessarily fully eliminate nonmonotonic behavior in the horizontal direction. The FCT algorithm of FESOM1.4 follows the same logic; however, in that case it is the only possibility. Using the FCT version roughly doubles the cost of the transport algorithm, but adds the stability needed in practice.

Vertical velocity splitting
As demonstrated in Lemarié et al. (2015), in practice, the strongest Courant number limitation comes from vertical advection in isolated patches adjacent to the coast. The code numerical efficiency can be augmented if some measures are taken to stabilize it with respect to vertical advection. Unstructured meshes of variable resolution might be even more vulnerable to such limitations because their irregularity can easily provoke a noisy pattern in w through rendering of topography. We implement the approach proposed by Shchepetkin (2015) according to which the vertical transport velocity is split into two contributions w = w ex + w im where the first one is determined by the maximum admissible Courant number, and the second one is the rest. The advection with w ex is done explicitly using the schemes mentioned above. The advection with w im is implicit. It uses the first-order upwind (backward Euler in time) so that the vertical operator that corresponds to it is diagonally dominant. The latter is solved together with the implicit vertical mixing by the standard sweep algorithm. As a result, if this option is used, the incurring additional costs of the model time step are negligible. The use of the first-order upwind scheme may seem to be too dissipative, but the point is that it is applied only to part of the velocity, and only in critical cases.
The bolus velocity v * = (u * , w * ) is expressed in terms of eddy-induced streamfunction , where γ is a 2-D vector. Ferrari et al. (2010) suggest computing it by solving with boundary conditions γ = 0 at the surface and ocean bottom. In this expression, c is the speed of the first baroclinic mode, σ the isoneutral density, κ the thickness diffusivity, N the Brunt-Väisälä frequency, and the index z means that the gradient is computed for fixed z (it differs from the gradient along layers, ∇ z σ = ∇σ − ∂ z σ ∇Z). In terms of the vector γ the components of eddy-induced velocity are computed as It is easy to see that solving Eq. (35) plays a role of tapering, for it allows one to smoothly satisfy boundary conditions. Because of the boundary conditions, adding the eddyinduced velocity to the mean velocity (u, w) does not change h as the vertically integrated divergence of u * is zero. In the ALE formulation the inclusion of eddy-induced velocity implies that the thickness and tracer equations are now written for the so-called residual velocity u r = u+u * , w r = w +w * . Although the natural placement for γ is at the cell centroids, we solve for it on the mesh vertices in order to reduce the number of computations. The vertical location is at full levels (layer interfaces). The horizontal bolus velocities are then computed at cell centroids as The vertical bolus velocity w * is then found together with w at the end of the ALE step and the full residual velocity is used to advect tracers. We compute the speed c in the WKB approximation as Among other factors, the magnitude of the thickness diffusivity κ depends on the resolution r and the local Rossby radius L R = c/f : where f κ is a cut-off function that tends to 0 if r/L R < 1 and to 1 otherwise. The resolution is defined as a square root of the area of the scalar control volume. On general meshes it may exhibit substantial local variations, so smoothing over the neighboring vertices is done. Note that scaling with mesh resolution for viscosity and diffusivity coefficients will also benefit from using a smoothed r.

Isoneutral diffusion
Assuming that the slope of the isopycnals is small, we can write the diffusivity tensor as Here K i and K d are the isoneutral and diapycnal diffusivities, and s is the isoneutral slope vector computed along layers, s = (s x , s y ) = −∇σ/∂ z σ.
If layer interfaces deviate substantially from geopotential surfaces, for example, if layers follow the bottom topography, the slope vector can be substantially larger than typically found on z-coordinate meshes. Mixed derivatives in the ∇ 3 hK∇ 3 operator in this case can cause time-step limitations (Lemarié et al., 2012a). To maintain stability, the term h∂ z s 2 K i + K d ∂ z has to be treated implicitly. Appendix D shows the details of the numerical implementation of isoneutral diffusion.

FESOM2.0 vs. FESOM1.4
In the following we evaluate the performance of FESOM2.0 by simulating the realistic ocean state under prescribed atmospheric forcing. The purpose is to illustrate that FESOM2.0 is ready to be run in global configurations, although it may still need some further parameter tuning. Model efficiency is then briefly assessed. Detailed model assessment is the subject of future work.

Meshes
The evaluation will be done in two steps. In the first step we compare the performance of FESOM2.0 to that of finiteelement FESOM1.4 . For this purpose, we run both models on the same coarse-resolution reference mesh and in similar configurations. The z-coordinate in the vertical is used in the simulations described below. Although the same mesh and level surfaces are used, the vertical mesh geometry is different: FESOM2.0 assumes the mesh to be composed of prisms, whereas these prisms are split into tetrahedra in FESOM1.4. The mesh contains about 120 000 surface nodes, its horizontal resolution varies from 25 km in high latitudes of the Northern Hemisphere to nominally 1 • elsewhere, and there are 46 unevenly spaced z levels in the vertical. This mesh was also used to carry out FESOM1.4 simulations for model intercomparison in the Coordinated Ocean-ice Reference Experiments -Phase II (CORE-II, Large and Yeager, 2009) project. It has been demonstrated that FESOM1.4 performs well in this configuration compared to other ocean models (see, e.g., Griffies et al., 2014;Danabasoglu et al., 2014 and other papers in the same virtual issue).
In the second step we simulate the ocean state under CORE-II forcing with FESOM2.0 but on an eddy-permitting global mesh with a quasi-uniform resolution of 15 km, referred to further as Glob15. The mesh contains about 2 000 000 surface nodes. It is worth mentioning that the size of Glob15 is already larger than all meshes we used with FE-SOM1.4 thus far. We did not carry simulations on Glob15 with FESOM1.4 to save computational resources.

Model settings
Although we try to configure both model versions as closely as possible for our intercomparison, there are a few differences due to the details of implementation. First, different transport schemes are used. The Taylor-Galerkin (TG) algorithm of FESOM1.4 with consistent mass matrices is expected to be less dissipative than the third-/fourth-order upwind algorithm used in FESOM2.0. The TG scheme works by default with a FCT limiter in FESOM1.4, so we apply the FCT limiting in FESOM2 too. Second, the difference between the two versions of FESOM comes from the implementation of the GM parameterization of eddy transport. FE-SOM1.4 uses the GM skew flux formulation as suggested by Griffies (1998). Because of the finite-element discretization and hence variational formulation, this strategy is optimal for FESOM1.4, but less convenient for FESOM2.
All simulations are run with the linear free-surface and virtual salinity forcing. The surface salinity is restored to the climatological data with the piston velocity of 50 m/300 days, which is a common practice for stand-alone ocean models. Although the default mixing scheme in FESOM1.4 is the k-profile parameterization (KPP, Large et al., 1994), it has not been tested yet with FESOM2.0. That is why the vertical mixing in all simulations presented further is provided by the Pacanowski and Philander (1981) scheme with the background vertical diffusion of 2 × 10 −3 m 2 s −1 for momentum and 10 −5 m 2 s −1 for the potential temperature and salinity, and the maximum is limited to 0.01 m 2 s −1 . The parameterization of mesoscale eddies was switched off in the simulation with Glob15 as suggested by Delworth et al. (2012). The time step was set to 30 and 15 min for the reference and Glob15 meshes, respectively, in order to meet the Courant-Friedrichs-Lewy (CFL) condition. All runs are initialized in winter from the Polar Science Center Hydrographic Climatology (Steele et al., 2001) and the integration covers the time frame 1948-2007 of CORE-II atmospheric forcing (Large and Yeager, 2009).

Intercomparison on the coarse-resolution reference mesh
We first compare the last 15 years of the simulated hydrography in the two model runs on the coarse-resolution reference mesh to the World Ocean Database 2005 (WOA2005, Conkright et al., 2002). One should keep in mind that the spin-up time of 60 years is too short to provide an equilibrated ocean state. Nevertheless, the departure of the modeled hydrography from climatology after 60 years of integration can already serve as a measure of the model drift and indicate the quality of the solution. The bias of temperature in different depth ranges is shown in Fig. 3 for FESOM1.4 and FESOM2.0, respectively. The patterns for the upper 200 m look generally similar in the models. Notably, for the cold bias in the Labrador Sea, its surroundings, and in the region of Malvinas Current, FESOM2.0 simulates much larger departures from WOA2005 than FESOM1.4. This cold bias is primarily associated with the missing northwestern corner in the path of the North Atlantic Current and the overly weak strength of the subpolar gyre. Both are often attributed to the lack of spatial resolution. Marzocchi et al. (2015) show that the resolution of 1/12 • (on ORCA-type meshes) is already sufficient to properly resolve the pathway of the North Atlantic Current. On other hand, at coarse resolutions, Stouffer et al. (2005) and Jochum et al. (2008) demonstrate that the reduction in viscosity in the extratropical ocean in climate models increases the strength of the subpolar gyre in the North Atlantic. Other experiments carried out with FE-SOM, which are not presented here, indicate that even small changes in model parameters like viscosity and GM thickness diffusivity can impact the strength of the cold bias. Improving the simulation quality in the Labrador Sea and its vicinity by both, increasing the local resolution and tuning the model parameters, will be the focus of future studies. It should be mentioned that the bias in the upper ocean hydrography shown here for FESOM1.4 is different from that presented in the intercomparison of CORE-II hindcasts Danabasoglu et al., 2014), where the KPP mixing scheme was used instead of PP. As mentioned, different mixing schemes besides PP still need to be more thoroughly tested with FESOM2.0. At deeper levels of the tropical Atlantic, FESOM2.0 performs better than FESOM1.4; at the same time, errors become larger in the Southern Ocean and the eastern North Atlantic. Our experience in running FESOM is that the drift in the Southern Ocean is substantially affected by the imposed spatial (horizontal and vertical) pattern of the GM coefficient κ, which needs to be tuned in FESOM2. The warm bias in the eastern North Atlantic is a persistent feature in all simulations with FESOM2.0 and is likely due to an overly strong Mediterranean outflow. A closer look at salinity (Fig. 4) reveals that FESOM2.0 simulates a much fresher Mediterranean Sea than FESOM1.4 and that more salt is released into the North Atlantic across the Strait of Gibraltar. Indeed, the meshes used here have an artificially widened strait, whereas the cell placement of velocity vectors and the free-slip boundary condition applied in FESOM2.0 (no-slip option used in FESOM1.4) have the potential to increase the Gibraltar outflow if the same geometrical boundary is used. Although the idea of resolving the Strait of Gibraltar may seem straightforward, too fine a resolution would lead to additional computational burden associated with sufficiently small time steps. Individual adjustment of mesh geometry is required for two model versions.
The streamfunction of meridional overturning circulation (MOC) shown in Fig. 6a and b for reference runs with FESOM1.4 and FESOM2.0, respectively, reveals that the Antarctic Bottom Water (AABW) production is larger in FE-SOM2.0 compared to FESOM1.4 and is at the upper boundary of the observation-based estimate available from the literature (see, e.g., Lumpkin and Speer, 2007). The maximum overturning of Upper Circumpolar Deep Water (UCDW) at about 35 • S exceeds 20 Sv compared to only 5 Sv in FE-SOM1.4. This behavior suppresses the mid-depth cell at about 30 • S. The maximum of the mid-depth cell in the North Atlantic is about 12 Sv in both versions and remains at the lower boundary of observational estimates published in the literature. Another distinction between both MOCs is at the northern boundary of the domain. As mentioned in Sidorenko et al. (2009), there is an ambiguity in transport definitions for discretizations exploiting the finite-element approach. This results in a bias that accumulates in the diagnosed MOC at the northern boundary when integrating from the south to the north. The inconsistency amounts to about 2 Sv at certain depths in FESOM1.4, while it is zero in FE-SOM2.0. We conclude that FESOM1.4 and FESOM2.0 show similar behavior on the reference mesh, but FESOM2.0 may benefit from further tuning. In particular, the impact of vertical transport schemes, bottom representation and boundary conditions needs to be explored in more detail.

Simulated ocean state
The difference in hydrography simulated on Glob15 compared to WOA2005 is shown for the mean over the last 15 years in Figs. 3 and 4 ( right columns) for temperature and salinity, respectively. Overall, the model drift in Glob15 is smaller than in the reference runs. The largest improvement is seen at the surface, where the cold bias north of 45 • N is now confined to the northwestern corner of the North Atlantic Current. It does not vanish completely, however, because the resolution of 15 km is far from being even eddypermitting in this region, where the Rossby radius of deformation goes well below 10 km. The area with freshwater bias north of Newfoundland has been significantly reduced compared to the reference simulation with FESOM2.0. This points to the improved linkage between the Arctic and North Atlantic oceans. Some other improvements are also seen at other locations and different depth ranges. For instance, the bias in the Southern Ocean is remarkably reduced in the deeper layers, as is visible from salinity patterns (Fig. 4). These improvements indicate that over some parts of the global ocean partially resolving mesoscale features can already impact dynamics. In order to illustrate the eddy activity, we show the snapshot of subsurface relative vorticity in the North Atlantic in Fig. 5. Although we show the North Atlantic only, the dynamics in Glob15 is eddy-rich around all key fronts and in subtropical gyres of the global ocean. As expected, the mesoscale features are prominent in Fig. 5 along the Florida Current, the Gulf Stream and the North Atlantic Current. The Azores Current branching off the Gulf Stream at ca. 35 • N is also reproduced well. At higher latitudes above about 50 • N, the resolution becomes insufficient for capturing eddy dynamics because the Rossby radius decreases. In the highresolution experiment with FESOM2.0, the Gulf Stream separates too far north of Cape Hatteras, a feature shared by most ocean models with resolution below 0.1 • . As one would expect, the wrong separation of the Gulf Stream is also reflected in the drift of hydrography, where an overly warm and salty bias develops close to the western coast.
The pattern of relative vorticity also reveals the existence of zonally elongated patches corresponding to zonal jets which are often simulated with the high-resolution ocean models, and are confirmed by the altimetric observations (see, e.g., Maximenko et al., 2005). The stripes in the vorticity are seen primarily in the North and South Pacific and in the South Atlantic Ocean (not shown). In the North Atlantic zonal jets are most visible at about 30 • N and in the eastern North Atlantic at about 50 • N. Note that for better visualization of zonal jets one shall inspect the vorticity pattern averaged over certain periods of time or do the same for zonal velocity components. The MOC for Glob15 is shown in Fig. 6c and depicts significant improvements compared to that simulated with FESOM2.0 on the reference mesh. While the bottom cell has been reduced, there is a significant increase in the middepth cell, reaching a maximum of above 15 Sv in the North Atlantic. The Antarctic Bottom Water export across about 65 • S shows a clear connection with the UCDW matching the estimates from inverse techniques by Lumpkin and Speer (2007). The reader is referred to Fig. 2 in their paper. The improvements seen for simulations on mesh Glob15 compared to the reference mesh may serve as an argument in favor of using high resolution.

Sea ice
The sea ice thickness simulated on Glob15 is shown in Fig. 7 for March and September. The maps of ice thicknesses compare well to those of the Pan Arctic Ice-Ocean Modeling and Assimilation System (PIOMAS; Schweiger et al., 2011) presented in Notz et al. (2013) (their Fig. 8) for the Northern Hemisphere. The thickest sea ice in the Arctic reaches above 5 m in March and September and is found north of Greenland and in the Canadian Archipelago, becoming thinner towards the Siberian coast. The simulated 15 % sea ice concentration contours, indicating the sea ice edge, are also shown in Fig. 7 (white contour line) together with NSIDC observations (Fetterer et al., 2002(Fetterer et al., , updated 2009) (black contour line). In September, the model overestimates the sea ice coverage along the Siberian Shelf and in the northern Barents Sea. Because of this, the summer Arctic sea ice extent in Glob15 is on average overestimated by 10 % compared to the satellite data, providing 7.54 × 10 6 km 2 compared to 6.74 × 10 6 km 2 from NSIDC. In the Southern Hemisphere, Glob15 underes-timates the summer ice extent. In this context further study of the performance of mixed-layer parameterization and the effect of still insufficiently strong eddies on the properties of the watermasses simulated around the Antarctic coast may be needed. The sea ice extent simulated by the new model version is very similar to that simulated by FESOM1.4, which lies within the spread of the CORE-II multi-model ensemble (Downes et al., 2015;Wang et al., 2016b). This similarity is probably not too surprising given that both versions of FE-SOM share the same sea ice component.
In order to quantify the seasonal variability of the sea ice, we plot the monthly time series of sea ice extents in Fig. 8. The result compares well to the observation in the Northern Hemisphere, while the amplitude of seasonal variability is overestimated in the Southern Hemisphere. The model simulates lower summer and higher winter sea ice extent in the Southern Ocean. For both hemispheres, the model captures realistic trends in sea ice extent from 1979 to 2007, which are negative and positive in the Northern Hemisphere and Southern Hemisphere, respectively.

Performance and implementation issues
FESOM is written in Fortran 90 with some C/C++ code inserts providing bindings to the third party libraries. The code employs the distributed memory parallelization based on MPI (Message Passing Interface). The model experiments have been carried out on a Cray XC40 system hardwared with Intel Xeon Haswell and 24 cores per node, which was made available through the North-German Supercomputing Alliance (HLRN). The experience shows that the parallel scalability of both versions of FESOM starts to saturate after assigning less than 300 vertices of surface mesh per computa- tional core. In view of this, the experiments on the reference were conducted using 384 cores (16 nodes).
Disregarding input/output, the throughput of FESOM1.4 is ca. 25 simulated years per day (SYPD), where 92.5 and 7.5 % of the resources are spent in the ocean and ice components, respectively. The resources spent in dynamical (solving for u, v, w, η) and tracer (solving for T , S) parts in the ocean are nearly equal. The performance of the dynamical part of the ocean component highly relies on the numerical solver used to solve for the external mode (elevation). We use the parallel Algebraic Recursive Multilevel Solver (pARMS, Li et al., 2003) augmented with the Schur Complement Preconditioner with local incomplete LU factorization (Fuchs, 2013). The cost of solving with pARMS is only about 10 % of the dynamical part and nearly 5 % of the total cost.
Using the same computer resources, the throughput of FE-SOM2.0 is 110 SYPD. In this version, the resources between the ocean and sea ice components are split as 67 and 33 %, respectively. The ocean component in FESOM2.0 demonstrates 7 times higher throughput than that of FESOM1.4, giving the largest speedup in the tracer part, where it is even 9 times faster than in FESOM1.4. The implementation of GM following Ferrari et al. (2010) costs nearly 10 % in the ocean component and 20 % is spent in pARMS to solve for the sea surface height. Interestingly, pARMS shows much faster convergence (up to a factor of 2.5) in FESOM2.0 than in FE-SOM1.4. In summary, disregarding input/output, the reference setup FESOM2.0 shows about 5 times higher throughput than FESOM1.4.
The Glob15 configuration was run on 1728 cores (72 nodes) giving a throughput of 17 SYPD, with relative costs between model components remaining comparable to those of the coarser-resolution reference setup. For this mesh the relative cost of using pARMS decreases compared to the reference mesh despite the much larger mesh and the number of cores. We guess that it is partly linked to a smaller time step which improves the diagonal dominance in the matrix of the sea surface height operator. Compared to the reference mesh, which was run in the limit of linear scalability (≈ 300 surface vertices per core), Glob15 was run with ≈ 1150 vertices per core, so there is still potential for further increase in throughput.
The numbers given above serve only to illustrate the computational performance. Details may depend on the frequency of output, the type of transport algorithm, the presence of isoneutral diffusion or GM parameterization and the number of subcycles used in the elastic-viscous-plastic sea ice solver of FESIM . A conservative estimate would be a 3-fold speedup compared to FESOM1.4.

From finite elements to finite volumes
There are several reasons for developing a new dynamical core based on finite-volume discretization. The first and main one is the need for enhanced numerical efficiency. Generally, the codes based on unstructured meshes are less efficient numerically than their structured-mesh counterparts, partly because of (i) indirect indexing and the need for numerous auxiliary (look-up) arrays (neighboring cells, vertices, matrices of horizontal derivatives) and partly because of (ii) an increased share of floating-point and memory-access operations needed in the absence of directional splitting and mesh structure. The overhead related to (i) can be minimized in codes using prismatic elements defined by unstructured surface meshes. In this case the same 2-D auxiliary arrays can be used over the entire water column, which makes the cost of assessing them rather moderate. The overhead of (3-D) auxiliary arrays is much larger in FESOM1.4 because of its tetrahedral elements needed to implement arbitrary level surfaces. Using bilinear prismatic elements (Wang et al., 2008) requires storing and accessing Jacobians on generalized meshes, which adds to the computational burden. Turning to the finite-volume method together with the ALE vertical coordinate provides a simple and efficient way to exploit the benefits of prismatic meshes.
The second reason for switching to a finite-volume discretization is that, as mentioned in Danilov (2013), continuous Galerkin finite elements are suboptimal for hydrostatic codes because they create horizontal connections even in the matrices of purely vertical operators. In order to be practical, FESOM1.4 used a potential φ for the vertical velocity w = ∂ z φ and a finite-difference method to compute pressure from the hydrostatic balance. This destroys energetic consistency between conversions of kinetic and available potential energy. The finite-volume discretization allows us to maintain energetic consistency (up to errors due to temporal discretization).
Finally, the finite-volume discretization operates with clear definition of fluxes, which is much more convenient for post-processing. For example, it makes computations of the meridional overturning streamfunction much more straightforward and free of interpretation inconsistencies intrinsic to the continuous finite-element discretization. In addition, it also allows numerous transport algorithms, whereas the choice available for finite elements of a selected type is much more restrictive.

Cell-vertex discretization
Among possible finite-volume discretizations, the cellvertex discretization used by FESOM2 presents a compromise allowing us to keep general triangular meshes and use staggering of velocities and pressure. A collocated vertexvertex finite-volume discretization, which is the closest analog to FESOM1.4, was explored by Danilov (2012). It presents a finite-volume analog of linear finite elements, and needs stabilization on an uneven bottom against pressure modes for the same reason as FESOM1.4. Although stabilization does not necessarily lead to deficiencies in the simulated ocean state, it introduces biases to energy exchanges and geostrophic balance, which should better be avoided. Additionally, it requires splitting of the horizontal velocities into contributions located on vertices and cells, so that the velocity used to transport scalar quantities and the velocity used to compute momentum balance are different entities.
The cell-vertex discretization is free of pressure modes; however, this comes at the price of an excessively large number of velocity degrees of freedom. This creates spurious inertial velocity modes and requires the presence of efficient gridscale viscosity operator coupling neighboring velocities. We have found that biharmonic filters are efficient in accomplishing this even on highly nonuniform meshes.
Because of staggering and keeping the velocity vector, the triangular cell-vertex discretization is an analog of an inverted B-grid (we call it a quasi-B-grid). The inversion (the domain boundary is defined by scalar points) allows us to implement both free-and no-slip boundary conditions. Spurious inertial modes are absent on quadrilateral B-grids. This prompts us to consider hybrid meshes composed of triangles and quads, where the triangles will be used to provide transitions between regions of different resolution. The generalization to hybrid meshes is straightforward in the finite-volume implementation because most of the operations are implemented as a cycle over edges. Furthermore, since the number of edges on quadrilateral meshes is smaller than on triangular meshes for a given number of vertices, this also implies a speedup in the code performance. This strategy is already implemented in the coastal branch of FESOM (to be described elsewhere) and will be made available in FESOM later.
Two other variants of finite-volume discretization are used at present in global ocean circulation models. MPAS (Ringler et al., 2013) uses a C-grid discretization on the Voronoi polygonal meshes (most of the polygons are hexagons), and the ICON implementation (at the Max Planck Institute for Meteorology, Hamburg) is based on a triangular C-grid (which needs an orthogonal triangular mesh). The spurious modes of hexagonal C-grids are well controlled, but hexagons are less flexible geometrically and were not selected for our development. The triangular C-grids have spurious divergence modes which seem to be more difficult to control than inertial modes of cell-vertex discretization. Practical experience gained in future by using models with different types of unstructured-mesh finite-volume discretization will reveal the most efficient choice. The community effort may lead to a certain convergence among future model versions, similarly to the convergence toward Cgrids observed presently for models formulated on structured quadrilateral meshes.

Conclusions
This paper describes version 2 of FESOM. The new numerical core uses a cell-vertex finite-volume discretization. FE-SOM2.0 compares well with FESOM1.4 in terms of simulated global ocean circulation. It inherits the model framework and the sea ice model of its predecessor, and is conceived so as to allow users familiar with FESOM1.4 to switch the versions easily. FESOM2.0 ensures higher numerical throughput than FESOM1.4, which makes it much closer to the structured-mesh models in terms of numerical efficiency. It offers new functionality through the ALE vertical coordinate. Future development will focus on the generalized vertical coordinates, high-order transport algorithms working on partly terrain-following meshes without excessive diapycnal mixing and on generalization to mixed meshes combining triangles and quads. FESOM2 will gradually replace FE-SOM1.4, yet the latter will be maintained and user support will be provided over several years to come.

Code and data availability
The version of FESOM2.0 used to carry out simulations reported here can be accessed from https://swrepo1.awi.de/svn/ awi-fvom/ after registration. The updated versions will be available through the same link in future. For convenience, the configuration used, together with the meshes, is archived at doi:10.5281/zenodo.161319. Mesh partitioning in FESOM is based on a METIS Version 5.1.0 package developed at the Department of Computer Science & Engineering at the University of Minnesota (http://glaros.dtc.umn.edu/gkhome/ views/metis). METIS and pARMS (Li et al., 2003) present separate libraries which are freely available subject to their licenses. FESOM1.4 is available at https://swrepo1.awi.de/ projects/fesom/ (requires registration). The Polar Science Center Hydrographic Climatology (Steele et al., 2001) used to initialize runs of CORE-II atmospheric forcing data (Large and Yeager, 2009) is freely available online. The simulation results can be obtained from the authors on request.

Appendix A: The flux form of momentum advection
When using the flux form of momentum, the natural choice is h * = h n , which makes the thickness and transport equations centered. The choice for the thickness appearing with pressure is h n+1/2 , which is centered. The advection and Coriolis terms will be computed through AB2 (or AB3) time stepping, or, if needed, the Coriolis term can be made semi-implicit. The transport U n = u n h n becomes a natural velocity variable.
The time stepping algorithm can be formulated as follows: The last expression combines the terms that need the AB method for stability and the second order. We use h n+1/2 to compute Z and follow the same rule as Eq. (14) to compute η n . The steps are the following.
-Update for implicit viscosity.
-Solve for new elevation. We write first U = k U and similarly for other quantities, getting Eliminating U n+1 between these two equations, one gets the equation on the elevation increment η = η n+1 − η n η − gτ 2 θ α∇ In reality, everything remains similar to the vectorinvariant case, and the matrix to be inverted is the same.
-The new velocities are estimated as Here h n+1 can be computed either in the agreement with the ALE procedure (η n+1 is already known) or by interpolating between the n + 1/2 and n + 3/2 time levels.
It should be clear now that the vector-invariant form can be treated with h * = h n , but this will require considering both u and U .

Appendix B: Subcycling instead of solver
We discuss modifications needed to solve for the external mode through subcycling. This option will be added in future when needed for massively parallel runs. We use the flux form of momentum advection as an example. We take since it provides the second-order accurate estimate. We follow a common technology and run subcycles between time levels n and n + 2, with subsequent averaging to level n + 1. We formally take θ = 1 in vertically averaged equations, for the accuracy of external time stepping will be defined by the procedure used for subcycling. Furthermore, η n+1 will not be used, but the barotropic part of the new velocity will be directly adjusted.
For the same reason, the contribution from the elevation η n can be omitted while predicting U . However, if this is done, the implicit solve for vertical viscosity has to be moved to the end and applied to trim the full velocity u n+1 . We will keep the contribution from η n in the predictor step. Then the compensation term with η n will be present (see Eq. B2 below).
Instead of Eqs. (A2) and (A3) we introduce subcycles indexed with j , j = 0 : 2J , with η n+j/J the shortcut to η j and the same for U in several formulas below. The simplest form of subcycling looks like Other forms of subcycling can be used to increase stability and reduce the number of subcycles 2J + 1. The contribution from the Coriolis acceleration can be put in the subcycling procedure (it is a zero-order term defining the properties of surface inertia gravity (Poincaré) waves). To do this we have to (i) move the implicit viscosity update to the end of velocity step, (ii) separate the Coriolis contribution in U = U + f k × U AB −f k ×U AB , and use the vertically integrated combination in the brackets in place of U above. If we take the Coriolis acceleration in the barotropic equations, we can also treat it implicitly for better stability.
On completing sybcycles one is at time level n+2. In order to eliminate possible high frequencies, averaging is done to time level n + 1: The common further action is to use U n+1 for the barotropic transport combined with the baroclinic transport diagnosed from U n+1 . We introduce first the new baroclinic transport by writing It is then updated to the full transport velocity by As an aside, we document another possibility which implements a pseudotime solver. We want to solve the same pair of equations as Eqs. (A2) and (A3). We rewrite these equations as an iterative procedure, with δ some large parameter: In this case j becomes a "pseudotime" index, while the lhs in each of the equations is the residual of an iterative process. The analysis of stability shows that one should select δ 2 > k 2 τ 2 c 2 θ α. Here c is the phase speed and −k 2 is the eigenvalue of the Laplacian operator. Its maximum value is (π/ x) 2 . Clearly, damping of fast waves in pseudotime follows e −j/δ , which means that the number of pseudotime iterations J should exceed δ. The hope is that J will not need to be too large if the procedure is kept stable through appropriate selection of δ. The high-frequency waves will be damped over several time steps. The condition on δ is that it is larger than the Courant number kτ c (which is much larger than the one for τ of the "internal" mode). While this option is not cheaper than the commonly used one, it is equivalent to the solution based on semi-implicit solvers, and warrants consistency. Indeed, in this case U appears as an auxiliary variable, and the issue of barotropicbaroclinic splitting does not emerge.

Appendix C: Terrain-following meshes
Meshes combining z and terrain-following layers are of interest for studies focused on exchanges between ice cavities or ocean shelves with the deep ocean, and may lead to an improved representation of overflows. The use of tetrahedral elements in previous versions of FESOM was dictated by the need to maintain this functionality. In the framework of ALE this possibility is realized by prescribing the initial thicknesses of layers as h k = h k (x, y) in such a way that some of them follow topography. The practical question is on time step limitations and suppression of dynamical biases on such meshes. We need (i) to adjust the algorithm of computing pressure gradient and (ii) to implement stable isoneutral biharmonic diffusion operators, as suggested by Lemarié et al. (2012a, b). The former means that ∇p/ρ 0 + gρ∇Z/ρ 0 in dynamical equations, which is ∇ z p/ρ 0 , may turn out to be insufficiently accurate if discretized as written. FESOM1.4 does not use this two-term representation, but applies vertical polynomial interpolation to the density field instead. This approach will be retained in FESOM2. The implementation of (ii) will allow us to avoid excessive mixing accompanying advection on terrain-following meshes. These measures are the subject of ongoing work.

Appendix D: Isoneutral diffusion on triangular prisms
For completeness, we write down the expressions for the horizontal and vertical components of fluxes: The terms including K i are referred to as the isoneutral flux; the remaining term with K d is the dianeutral flux. To complete the description, the slope has to be expressed in terms of thermal expansion and saline contraction coefficients α and β, s = − −α∇T + β∇S −α∂ z T + β∂ z S .
(Note that α here has another meaning to the rest of paper.) The discretized isoneutral part of the flux operator K∇ 3 should be zero when applied to the density. The implementation difficulty stems from the fact that the tracers together with α and β are located at mid-layers, the vertical derivatives are located at the level surfaces, and the horizontal derivatives are at mid-layers, but at cells instead of vertices. The estimate of a slope at a single point is impossible without extra interpolation, which will break full consistency. The solution involves triads (see, e.g., Griffies, 2004 andLemarié et al., 2012a) and variational formulation. Note, however, that the implicit time stepping of the contribution with s 2 K i in the vertical flux, needed for stability reasons (Lemarié et al., 2012a), will introduce some errors even in this case. First, we split each triangular prism of our mesh into subvolumes characterized by unique values of the expansion/contraction coefficients, vertical gradients and horizontal gradients, to form triplets. We obtain six subprisms per prism, formed by sections along the midplane and by vertical planes passing through centroids and mid-edges.
Next, one writes the dissipation functional. We will use a different but equivalent formulation. Consider the bilinear form Here the first summation is over mesh prisms (cells and layers), and the second one over the subprisms p. The volume of each subprism is 1/6 of the volume of the full prism (hence the factor 6 on the lhs). Clearly, 2F(T , T ) corresponds to total variance dissipation. If T is the isoneutral density and its gradients are expressed in terms of α and β as for the slope above, F vanishes. The last step is to compute the contribution to the rhs of the scalar equation from the diffusion term (R T ) kv = (1/A kv )∂F/∂ T kv . (D3) Here we took into account that we are dealing with layerintegrated equations, hence the division over the area of the scalar cell v instead of division by volume. Writing down the expression for R T is a rather tedious task. The result can be reformulated in terms of the discrete divergence of discrete flux. Indeed, (R T ) kv A kv is the volume-integrated rhs, i.e., the sum of fluxes through the faces. Note that since F is a bilinear form, the definition of the rhs is always globally consistent. Indeed, the total variance dissipation is 2 k,v T kv (R T ) kv A kv = 2 k,v T kv ∂F/∂ T kv = 2F(T , T ).
In summary, the variational formulation originally proposed for quadrilaterals can easily be extended to triangular meshes. All symmetry properties will be granted if computations are local on subprisms.
Substituting K in the form F, we get F = k,c p −K i ∇ T · ∇T − K i ∇ T · s∂ z T −K i ∂ z T s · ∇T − K d + s 2 K i ∂ z T ∂ z T kcp (A c h kc /6) .
The first term does not involve the slope and will not be considered. Let us start from the third term and compute its contribution to ∂F/∂ T kv . The vertical derivative at level k (the top surface of layer k) is and ∇T is defined on cell c: Hence it follows for the contribution from layer k and element c that ∂F ∂ T kv : In the expressions above, indices k and c identify the triangular prism, and the index of vertex v together with the upper index t or b identify the subprism (related to v and either the top or bottom of the full prism). The expression (K i s) t kcv means that K i is estimated on level k and vertex v, and the slope involves the triplet with α, β at kv, the vertical derivatives at kv and the horizontal derivatives at kc. For (K i s) b kcv , the pairs of indices are (k + 1)v, kv, (k + 1)v and kc, respectively. Now, we combine the contributions from the column associated with cell c that enters the rhs of the equation on T kv (they come from prisms (k − 1)c, kc and (k + 1)c): We easily recognize here the fluxes through the upper and lower surfaces of scalar prism kv coming from the part shared with prism kc. They are thickness-weighed over the cells on both sides. Indeed, 2(Z k−1 −Z k ) = h kc +h (k−1)c for the top surface and similarly for the bottom.