Comparison of dealiasing schemes in large-eddy simulation of neutrally stratified atmospheric flows

Aliasing errors arise in the multiplication of partial sums, such as those encountered when numerically solving the Navier–Stokes equations, and can be detrimental to the accuracy of a numerical solution. In this work, a performance and cost analysis is proposed for widely used dealiasing schemes in large-eddy simulation, focusing on a neutrally stratified, pressure-driven atmospheric boundary-layer flow. Specifically, the exact 3/2 rule, the Fourier truncation method, and a high-order Fourier smoothing method are intercompared. Tests are performed within a newly developed mixed pseudo-spectral finite differences large-eddy simulation code, parallelized using a two-dimensional pencil decomposition. A series of simulations are performed at varying resolution, and key flow statistics are intercompared among the considered runs and dealiasing schemes. The three dealiasing methods compare well in terms of firstand second-order statistics for the considered cases, with modest local departures that decrease as the grid stencil is reduced. Computed velocity spectra using the 3/2 rule and the FS method are in good agreement, whereas the FT method yields a spurious energy redistribution across wavenumbers that compromises both the energy-containing and inertial sublayer trends. The main advantage of the FS and FT methods when compared to the 3/2 rule is a notable reduction in computational cost, with larger savings as the resolution is increased (15 % for a resolution of 1283, up to a theoretical 30 % for a resolution of 20483).

on the FT and the FS techniques are provided in the following section. Limits and merits of the different dealiasing techniques have been extensively discussed in the past decades within the turbulent flow framework (Moser et al., 1983;Zang, 1991).
In this work, we provide a cost-benefit analysis and a comparison of turbulent flow statistics for the FT and FS dealiasing schemes in comparison to the exact 3/2-rule using a set of LES of fully developed ABL type flows. Simulations and benchmark analysis are performed using a recently-developed mixed pseudo-spectral finite difference code, parallelized via a pencil de-5 composition technique based on the 2DECOMP&FFT library (Li and Laizet, 2010). Results of this work are of prime interest to the environmental fluids community (e.g. ABL community) because they can help improve the numerical performance of some of the numerical approaches used. An overview on the different dealiasing methods is provided in section 2. Section 3 briefly presents the LES platform with important benchmark results. The computational cost analysis and flow statistics obtained with the different dealiasing schemes are later discussed in section 4. Finally, the conclusions are presented in section 10 6.

Dealiasing methods
Aliasing errors result from representing the product of two or more variables by N wave-numbers, when each one of the variables is itself represented by a finite sum of N terms (Canuto et al., 2006), here N is assumed even. Such is the case for example when treating the non-linear advection term in the Navier-Stokes (NS) equations. Let f and g be two smooth functions 15 with the corresponding discrete Fourier transforms expressed as, with 20ĥ k = m+n=kf nĝm and |m|, |n| ≤ N/2.
Note that the corresponding expression for the Fourier transform of the product h (result of the convolution of f with g) requires 2N modes. Therefore, the exact computation of the product represents a major numerical cost. Traditionally the convolution of the two functions f and g is made with only N Fourier modes, As a result, the energy contained within the remaining N + 1 to 2N modes folds back on the first N modes, and the amplitude of the first N modes (h k ) is aliased. This can be related to the exact amplitudeĥ k as, with 5h k = m+n=kf nĝm and − N/2 ≤ k ≤ N/2 − 1, such that the second term contains the aliasing errors on the k-th mode. Aliasing errors propagate in the solution of the differential equation and can induce large errors. For the pseudo-spectral methods, the truncation and aliasing errors affect both the accuracy and the stability of the numerical solution (see (Canuto et al., 2006) chap. 3 and (Canuto et al., 2007) chap.
3 for detailed discussion). Traditionally, the aliasing errors are treated using one of the two methods discussed below.

10
The 3/2 rule method is based on the so-called padding and truncation technique, where the Fourier partial sums are zeropadded in Fourier space by half the available modes (from N to 3/2N ), inverse-transformed to physical space before multiplication, multiplied, and then truncated back to the original variable size (N ). This method fully removes aliasing errors.
However, the high computational cost related to the inverse transform operation discourages its use in large scale simulations.
The fast Fourier transform (FFT) algorithm has an operational complexity of N log 2 (N ); counting the number of FFT and 15 multiplications, the operation count of the 3/2-rule applied to dealias the product of two vectors of N components becomes (45/4)N log 2 (3N/2) (Canuto et al., 2006). An alternative method is the so-called Phase Shift method which consists in shifting the grid of one of the variables in physical space. Given the appropriate shift, the aliasing errors are eliminated naturally in the evaluation of the convolution sum. This method has a cost equal to 15N log 2 (N ) (Canuto et al., 2006), which is even greater than the 3/2-rule (Patterson, 1971;Orszag, 1972). The discussion above concerns one dimensional problems but the 20 expansion to higher dimensional problems is straightforward (Iovieno et al., 2001;Canuto et al., 2006). Although, this method provides the full dealiasing of the non-linear term, the cost of expanding the number of Fourier modes by a factor of 3/2 is a computationally expensive endeavour, especially with the progressive increasing size of numerical grids. To reduce the numerical burden, two additional methods were proposed in the past for treating the aliasing errors: the FS method (Hou and Li, 2007), and the FT method (Orszag, 1971;Moeng, 1984;Moeng and Wyngaard, 1988). In both methods, a set of high-wave-25 number Fourier amplitudes are multiplied by a test functionû * k = f (k)u k , to avoid expansions to larger number of modes. As its name indicates, the FT method sets to zero the last third of the Fourier modes (f (k) = 0, for k > 2N/3), equivalent to a sharp spectral cut-off filter. On the other hand, the FS method consists on a progressive attenuation of the higher frequencies using the weighting function f (k) = e −36k 36 (Hou and Li, 2007). Figure 1 presents the spectral function f (k) for the two different methods. Note that both the FT and FS methods behave like a low-pass filter. Although the FT method (continuous line) 30 sets to zero any coefficient larger than k/k N > 2/3, the FS method (dashed line) keeps all the wavelengths unperturbed up until k/k N > 3/4, and then progressively damps the amplitude of the higher wave-number terms. The advantage of both of these methods is that they avoid the need for padding the Fourier partial sums, and hence reduce the numerical cost. Specifically, the computational cost of these methods is (15/2)N log 2 (N ) (Canuto et al., 2006), resulting in methods 33% less computationally 0 0.25 0.5 0.75 1 0 Truncation method Fourier smoothing method Figure 1. Weighting functions used in the FT method (dashed kine) and the FS method (continuous line). The FT method filters scales with k/kN > 2/3 and the FS method at k/kN > 3/4. expensive than the 3/2-rule. The drawback of such approximate approaches is however the fact that a filtering operation is applied to the advection term, resulting in a loss of information. A desirable property of the FS technique when compared to the FT method is that the former exhibits a more localized error and is dynamically very stable (Hou and Li, 2007), while the latter tends to generate oscillations on the whole domain. 3 Large-eddy simulation framework

Equations and boundary conditions
The LES approach consists of solving the filtered NS equations, where the time-and space-evolution of the turbulent eddies larger than the grid size are fully resolved, and the effect of the smaller ones is parametrized. Mathematically, this is described through the use of a numerical filter that separates the larger, energy containing eddies from the smaller ones. Often, the 10 numerical grid of size ∆ is implicitly used as a top-hat filter, hence reducing the computational cost (Sagaut (2006), see Moeng and Sullivan (2015) for an overview of the technique in ABL research). As a result, the velocity field can be separated in a resolved component (ũ i , where i = 1, 2, 3) and an unresolved contribution (u i ) (Smagorinsky, 1963). For this technique to be successful, the low-pass filter operation must be performed at a scale smaller than the energy production range, deep in the inertial sub-range according to Kolmogorov's hypothesis (Kolmogorov, 1968;Piomelli, 1999). In atmospheric boundary-15 layer flow simulations this requirement is known to hold in the bulk of the flow, where contributions from the Sub-Grid Scale (SGS) motions model (or sub-filter scale motions) to the overall dissipation rate are modest. In the near surface regions such requirement is not met, as the characteristic scale of the energy production range L scales with the distance from the wall (L ≈ κz, where κ ≈ 0.4 is the Von Kármán constant, and z is the wall-normal distance from the wall ), hence remaining an active research field (Sullivan et al., 1994;Meneveau et al., 1996;Porté-Agel et al., 2000;Hultmark et al., 2013;Lu and Porté-20 Agel, 2014). In this work, the rotational form of the filtered NS equations are integrated, ensuring conservation of mass of the inertial terms (Kravchenko and Moin, 1997). The corresponding dimensional form of the equation reads as In these equations,ũ i are the velocity components in the three coordinate directions x, y, z (streamwise, spanwise, and vertical respectively),p * denotes the perturbed modified pressure field defined asp * =p + 1 3 ρ 0 τ ∆ kk + 1 2ũ jũj , where the first term is 5 the kinematic pressure, the second term is the trace of the sub-grid stress tensor and the last term is an extra term coming from the rotational form of the momentum equation. Here,f i represents a generic volumetric force. The flow is driven by a constant pressure gradient in the streamwise direction imposed through the body forcef i . The sub-grid stress tensor is defined where the deviatoric components are written using an eddy-viscosity approach 10 with ν T = (C S ∆) 2 |S| being the so-called eddy-viscosity,S ij = 1 2 ∂ jũi + ∂ iũj the resolved strain rate tensor, and C S the Smagorinsky coefficient, a dimensionless proportionality constant (Smagorinsky, 1963;Lilly, 1967). Many studies have investigated the accuracy of this type of models, showing good behaviour for free-shear flows (Lesieur and Metais, 1996). For boundary layer flows, the Smagorinsky constant model is over-dissipative close to the wall, since the integral length-scale scales with the distance to the wall. Therefore, to properly capture the dynamics close to the surface, the Mason-Thompson damping 15 wall function is used (Mason, 1994). This function is given by f n , and is used to decrease the value of C S close to the wall, reducing the sub-grid dissipation.
Note that the molecular viscous term has been neglected in the governing equations, including the wall-layer parameterization, which is equivalent to assuming that the surface drag is mostly caused by pressure (i.e., there are negligible viscous contributions). This is a typical situation in flow over natural surfaces where the surface is often in fully rough aerodynamic 20 regime.
The drag from the underlying surface is entirely modeled in this application through the equilibrium logarithmic law for rough surfaces (Kármán, 1931;Prandtl, 1932), with In equation (10), ũ i is the planar averaged velocity sampled at ∆z/2, z 0 is the aerodynamic roughness length, representative 25 of the underlying surface, ∆z denotes the vertical grid stencil, and κ = 0.4 is the von Kármán constant. The wall shear-stress is computed considering the norm of the horizontal velocity and it is projected over the horizontal directions using the unit vector In addition, the corresponding vertical derivatives of the horizontal mean velocity field are imposed at the first grid point of the 30 vertically staggered grid (Albertson et al., 1995;Albertson and Parlange, 1999).

Numerical implementation and time integration scheme 5
The equations are solved using a pseudo-spectral approach, where the horizontal derivatives are computed using discrete Fourier transforms and the vertical derivatives are computed using second order-accurate centered finite differences on a staggered grid. A projection fractional-step method is used for time integration following Chorin's method (Chorin, 1967(Chorin, , 1968).
The governing equations become decoupled and the system of partial differential equations can be solved in two steps: at first, the non-linear advection-diffusion equation is explicitly advanced, and subsequently the Poisson equation is integrated (the 10 so-called pressure correction step). The latter equation is obtained by taking the divergence of the first equation and setting the divergence of velocity at the next time step equal to zero, to ensure a divergence free flow field. The algorithm is detailed in the rest of the section. Initially, the code computes the velocity tensorG t ij = ∂ jũi t , which contains all the derivatives of the flow field required to compute the SGS stress tensor τ ∆,d,t In the first step of the projection method, the NS equations are solved without the pressure. Hence, the intermediary right hand side is computed as Next, an intermediary step is computed using an Adams-Bashforth scheme, following where RHS t−∆t i is the right hand side of the previous step. At this point, the resulting flow field is not divergence free yet.
The modified pressure is used to impose this fundamental property of the flow filed. Therefore,p * t is computed solving the Poisson equation obtained by taking the divergence of the NS equations. The term Γ t k in the right hand side of the equation above is given by The new flow field for the complete time step is obtained byũ t i =ũ * i − 3 2 ∆t∂ ip * t . Finally, the new right hand side is updated with the pressure gradient as RHS Embedded within this approach, periodic boundary conditions are imposed on the horizontal (x, y) directions. To close the system, a stress free lid boundary condition is imposed at the top of the domain and an impermeability (w = 0) condition is imposed at the lower boundary, which sums to the parametrized stress described in section 3. Y-pencil, (c) Z-pencil (inspired form (Li and Laizet, 2010)).
The code is parallelized following a 2D pencil decomposition paradigm similar to the one presented in Sullivan and Patton The four modules have been individually timed to evaluate their corresponding computational cost at a resolution of N x × N y × N z = 128 3 with the 3/2-rule as a baseline. Results are shown in figure 4. As it can be observed, more than half of the integration time step (∼ 60%) is spent computing the convective term. The three other modules share the rest of the integration time as follows: the computation of the velocity gradients (∼ 20%), the Poisson solver (∼ 16%) and the SGS (∼ 4%). It is important to note that this test was conducted without any I/O as it is not relevant to assess the computational 5 cost of the momentum integration. As explained in section 2, the non-linear term requires the use of dealiasing techniques to control the aliasing error, which traditionally are associated with a padding operation (as mentioned in Section 2), and hence higher computational cost. It is important to note that although the overall integration time distribution between each individual module might vary depending on the numerical resolution employed, the overall cost of the convective term will remain important regardless of the changes in numerical resolution. The goal of this work is to explore the possibility of 10 using alternative dealiasing techniques to enhance the computational performance, while maintaining accurate turbulent flow statistics in simulations of ABL flows. It is important to note that the SGS model used here takes a relatively small fraction of the time integration. This fraction is likely to be larger if a more sophisticated model is used, for example the dynamic Velocity gradient Poisson solver   Smagorinsky model (Germano et al., 1991) or the Lagrangian scale dependant model (Bou-Zeid et al., 2005). In addition, it is important to note that the low computational cost of the Poisson solver is related to the the use of the pencil decomposition, 15 which takes full advantage the pseudo-spectral approach. Specifically, the Z-pencil combines with the horizontal treatment of the derivatives to make the implementation of the solver very efficient.

Study cases
The goal of this study is to develop a cost benefit analysis for the different, already established, dealiasing methods from a computational cost stand point as well as in terms of accuracy in reproducing turbulent flow characteristics. For this reason, 20 three different cases have been considered corresponding to the three dealiasing methods considered: (a) the 3/2-rule used as reference, (b) the Fourier truncation method (FT), and (c) the Fourier smoothing method (FS). All the simulations have been run with a numerical resolution of N x ×N y ×N z = 64 3 , 128 3 , 192 3 and 256 3 with a domain size of (L x , L y , L z ) = (2π, 2π, 1)·z i , where z i is the height of the boundary layer taken here with a value of z i = 1000 m. A uniform surface roughness of value z 0 /z i = 10 −4 is imposed, which is representative of sparse forest or farmland with many hedges (Brutsaert, 1982;Stull, 1988).
The simulations have been initialized with a vertical logarithmic profile with added random noise for theũ 1 component. The two other velocity componentsũ 2 andũ 3 have been initialized with an averaged zero velocity profile with added noise to 5 generate the initial turbulence. The integration time step is set to ∆t = 0.2 s for the 64 3 -, 128 3 -, and the 192 3 -simulations and to ∆t = 0.1 s for the 256 3 -simulation. These time steps ensure that stability requirements for the time-integration scheme are met. The Smagorinsky constant and the wall damping exponent are set to C S = 0.1 and n = 2 (Mason, 1994;Porté-Agel et al., 2000;Sagaut, 2006).

Computational cost
The computational cost of evaluating the convective term, dealiased via the the 3/2 rule, the FT, and the FS are inter-compared in Figure 5. The horizontal resolution has been increased from 128 × 128 to 2048 × 2048 collocation nodes to highlight how the different methods scale. Only the horizontal resolution is changed given that the vertical direction is treated in physical space with second-order accurate centered finite difference method. Note that such a method does not typically require any dealiasing treatment, because the truncation error tends to decrease the aliasing errors (Kravchenko et al., 1996;Canuto et al., 5 2006). In figure 5, the ordinate axis are divided by n x · n y to show the effect of the increase in resolution on the computational cost. The number of MPI processes and the domain decomposition have been kept identical to avoid introducing effects from the parallelization scaling into the results. Hence, only the effect of the resolution change on the computation time of the dealiasing methods is presented here. Results confirm that the computational cost of the convective term is notably smaller when using the FT and FS dealiasing methods, with gains of 30% at n x × n y = 128 × 128 and 23% at n x × n y = 2048 × 2048.

10
The results follow the computational cost calculated by the number of operations presented in section 2 which predicted a gain of up to 35% for runs with 4096 3 grid nodes. The deviation in the computational cost present in figure 5 is the result of the varying load of the computer cluster since all simulations were run using the same number of nodes to avoid having to add the code's scaling properties to the analysis. From the results it is also important to note that there is no significant difference in the computational cost between the FT and FS methods, given that both use the same grid size and hence the corresponding 15 numerical complexity of both methods is similar. It is also worth mentioning that these methods are simpler to implement and require less rapid-access memory when compared to the 3/2 rule, as there is no need to extend neither the numerical grid nor the wavenumber range.
In the following subsections, we compare the impact of the different dealiasing schemes on flow statistics. Profiles from runs using the 3/2-rule for dealiasing will be taken as reference and comments will focus on departures from such "exact" profiles when the FT or FS treatments are considered.

Flow statistics
Traditional flow metrics are investigated next and compared between the different dealiasing schemes. Results for the 128 3 ,192 3 , 5 and 256 3 -cases are presented in this section. The results are normalized using the traditional scaling variables: the friction velocity (u * ) and the boundary layer height (z i ). As a starting point, figure 6 shows an instantaneous snapshot (pseudo-color plots) of the streamwise velocity perturbation for the three dealiasing methods. An additional case without dealiasing in the convective term was run and resulted in a complete laminarization of the turbulent flow (not showed here), highlighting the importance of the dealiasing operation (Kravchenko and Moin, 1997). On the contrary, when dealiasing schemes are applied, 10 the instantaneous flow field appears qualitatively similar among the different cases. Irrespective of the dealiasing method that is used, streamwise elongated high-and low-momentum bulges flank each other, as apparent in figure 6. This is a common phenomena in pressure-driven boundary-layer flows (Munters et al., 2016). Qualitatively, small differences can be appreciated  The horizontally-and temporally-averaged velocity profiles are characterized by an approximately logarithmic behavior within the surface-layer (z ≈ 0.15z i , as apparent from Fig. 7, where results are illustrated for the three resolutions: 128 3 ,192 3 , and 256 3 ). For the 128 3 case, the observed departure from the logarithmic profile for the 3/2-rule case is in excellent agreement with results from previous literature for this particular SGS model (Porté-Agel et al., 2000;Bou-Zeid et al., 2005). When using the FT method the agreement of the averaged velocity profile with the corresponding 3/2-rule profiles improves with 10 increasing resolution. While in the 128 3 case a good estimation of the logarithmic flow is obtained at the surface layer, there is a large acceleration of the flow further above. This overshoot does not occur for the higher resolution runs. When using the FS method, the mean velocity magnitude is consistently over-predicted throughout the domain, and the situation does not improve with increasing resolution (the overshoot is up to 7.5% for the 128 3 , 8.5% for the 192 3 and 7% for the 256 3 run).
Further comparing the results obtained by the FS and FT method with those obtained with the 3/2-rule, it is clear that while 15 the FS method presents a generalized overestimation of the velocity with a an overall good logarithmic trend, the FT method presents a better adjustment in the surface layer with larger departures from the logarithmic regime on the upper domain region that get reduced with increasing numerical resolution. The mean kinetic energy of the system is overestimated by ≈ +2% and ≈ +12% by the FT and FS methods, when compared to that of runs using the 3/2-rule in the 256 3 case. Overall, the mean kinetic energy is larger for the FT and FS cases, when compared to the 3/2-rule case, even at the highest of the considered 20 resolutions (≈ +2% and ≈ +12% by the FT and FS methods for the 256 3 case). Such behavior can be related to the low-pass filtering operation that is performed in the near-wall regions, which tends to reduce resolved turbulent stresses in the near-wall region, resulting in a higher mass flux for the considered flow system. This is more apparent for the low resolution cases.
Mean velocity gradient profiles (Φ m = κ z u * ∂ z U xy (z)) are also featured in Figure 7 (d, e, f). Profiles at each of the considered resolutions present a large overshoot near the surface, which is a well known problem in LES of wall-bounded flows 25 and has been extensively discussed in the literature (Bou-Zeid et al., 2005;Brasseur and Wei, 2010;Lu and Porté-Agel, 2013).
In comparing the results between the FS and FT method with the 3/2-rule, it can be observed that there are stronger gradients in the mean velocity profile within the surface layer when using the FS method. This leads to the observed shift in the mean velocity profile. Conversely, when using the FT method, departures are of oscillatory nature, leading to less pronounced variations in the mean velocity profile when compared to the reference ones (the 3/2-rule cases). This behavior is consistently found    -Agel et al., 2000;Bou-Zeid et al., 2005). The fact that the shear stress profiles are similar among the different dealiasing 35 cases is also indicative that the SGS fraction is not strongly affected by the choice of dealiasing method, which is also partly due to the simplicity of the Static Smagorinsky model that is being used. The potential effect that the different dealiasing schemes could have in more advanced subgrid models is discussed later on. Specifically, the error in the Reynolds shear stress (e.g. not including the SGS contribution) in the surface-layer decreases with increasing resolution for the FS method (from 1.7% to 1.1% for the 128 3 and 256 3 respectively) as indicated in Table 2, and fluctuates also around very small values for FT 5 method. When considering the diagonal stress tensor components across simulations, it is noteworthy that all such quantities are overpredicted when using the FT and the FS methods in the near surface region (z 0.15). Further above, the FS method tends to consistently overpredict, whereas the FT method present an oscillatory nature. As can be observed in

Porté
To complement the analysis of the effect of the different dealiasing methods on the physical structure of the flow, the corresponding power spectra is investigated. According to Kolmogorov's energy cascade theory, the inertial sub-range of the 5 power spectrum should be characterized by a power law of −5/3 slope (Kolmogorov, 1968). In this range the effects of viscosity, boundary conditions, and large scale structures are not important. Also, in wall-bounded flows without buoyancy effects, a production range should also be present, following a power-law scaling of −1 (Gioia et al., 2010;Katul et al., 2012;Calaf et al., 2013). Figure 9 shows the energy spectra of the streamwise velocity obtained using the different dealiasing methods. The spectrum obtained using the 3/2-rule matches well the traditional turbulent spectra presented in the literature 10 (Cerutti, 2000;Bou-Zeid et al., 2005) and it is used to assess the effects introduced by the FT and FS dealiasing methods. From this spectral analysis, it can be observed that the high wave-number ranges are modified by both methods. The FT method sharply cuts the spectra at the scale of 3/2 · ∆ close to the LES filter scale ∆. On the other hand, the FS method smoothly attenuates the effects of the aliasing errors at the high-end of the spectra. The dealiasing methods have been designed for such behavior, since only the higher frequencies are filtered. From the FT method flow field spectra it is clearly visible the effect 10 −2 10 −2 10 −2 10 0 10 0 10 0 10 2 10 2 10 2 10 −4 such a large energy cut-off. Therefore, a larger range of the spectrum is resolved and less turbulent kinetic energy is dissipated 20 by aliasing errors.
Although the effect of the FT and FS methods on the small scale can be clearly observed in figure 9, their effect on the large scales also needs to be quantified. To compute a direct comparison scale by scale, the following ratio was used (equation 16) for the 128 3 , 192 3 , and 256 3 -simulations, where E u,k denotes the power spectral density of the u velocity component at wavenumber k, XX stands for the dealiasing 5 method FT or FS. If ρ(k) < 0 the energy density at that given wavenumber (k) is less than the corresponding one for the run using the 3/2 rule, viceversa if ρ(k) > 0. Figure 10 presents the ratio ρ(k) for both methods.
When using the FT method, energy at the low wavenumbers is underpredicted, whereas energy at the large wavenumbers is overpredicted. Departures are in general larger with decreasing resolution, with an excess of up to 100% for the 128 3simulations in the wavenumber range close to the cutoff wavenumber. On the contrary, when using the FS method, the energy

Discussion
In the development of this manuscript, focus has been directed to the study of the advantages and disadvantages of different dealiasing methods. In this regard, throughout the analysis we have tried to keep the structure of the LES configuration as 15 simple and canonical as possible, to remove the effect of other add-on complexities. Additional complications might arise when considering additional physics; here we discuss the potential effect that these different dealiasing methods could have on them.
One of such elements of added complexity is for example the use of more sophisticated subgrid scale models based on dynamic approaches to determine the values of the Smagorinsky constant (Germano et al., 1991;Bou-Zeid et al., 2005). In most of these advanced subgrid models, information from the small-scale turbulent eddies is used to determine the evolution of the subgrid constant. However, in both the FT and FS method, the small turbulent scales are severely affected and hence use of dynamic subgrid models could be severely hampered unless these are accordingly modified and adjusted, maybe via filtering at larger scales than the usual grid scale. Another element of added complexity consists in using more realistic atmospheric forcing, considering for example the effect of the Coriolis force with flow rotation as a function of height and velocity magnitude. In 5 this case, we hypothesize that the FT method could lead to stronger influences on the resultant flow field as this dealiasing technique not-only affects the distribution of energy in the small turbulent scales, but also in the large scales (as apparent from Fig. 10), being these later ones potentially more affected by the Coriolis force. This represents a strong non-linear effect, that is hard to quantify and hence further testing, including realistic forcing with a geostrophic wind and Coriolis force would be required to better quantify these effects. Also often in LES studies of atmospheric flows one is interested in including an 10 accurate representation of scalar transport (passive/active). In this case the differential equations don't include a pressure term and hence most of the computational cost is linked to the evaluation of the convective term. As a result, the benefit of using alternative, cheaper dealiasing techniques (FT or FS) will be even more advantageous, yet the total gain is not trivial to evaluate a priori, and the effect on the scalar fields should also be further evaluated.
In general, we believe that it is not fair to advocate for one or other dealiasing method based on the results of this analysis.
Note that the goal of this work is to provide an objective analysis of the advantages and limitations that the different methods provide, letting the readers the ultimate responsibility to choose the option that will adjust better to their application. For example, while having exact dealiasing (3/2-rule) might be better in studies focusing on turbulence and dispersion, one might be well-off using a simpler and faster dealiasing scheme to run the traditionally expensive warm-up runs, or to evaluate surface 5 drag in flow over urban and vegetation canopies, where most of the surface force is due to pressure differences (Patton et al., 2016).

Conclusions
The Fourier-based pseudo-spectral collocation method (Orszag, 1970;Orszag and Pao, 1975;Canuto et al., 2006) remains the preferred "work-horse" in simulations of wall-bounded flows over horizontally periodic regular domains, which is often used in conjunction with a centered finite-difference or Chebychev polynomial expansions in the vertical direction (Shah and Bou-Zeid, 2014;Moeng and Sullivan, 2015). This approach is often used because of the high-order accuracy and the intrinsic efficiency of the fast-Fourier-transform algorithm (Cooley and Tukey, 1965;Frigo and Johnson, 2005). In this technique, aliasing that arises when evaluating the quadratic non-linear term in the NS equations can severely deteriorate the quality of the solution and hence need to be treated adequately. In this work a performance/cost analysis has been developed for three well-accepted 15 dealiasing techniques (3/2-rule, FT and FS) to evaluate the corresponding advantages and limitations. The 3/2-rule requires a computationally expensive padding and truncation operation, while the FT and FS methods provide an approximate dealiasing by low-pass filtering the signal over the available wavenumbers, which comes at a reduced cost.
The presented results show compelling evidence of the benefits of these methods as well as some of their drawbacks. The advantage of using the FT or the FS approximate dealiasing methods is their reduced computational cost (∼15% for the 128 3 20 case, ∼25% for the 256 3 case), with an increased gain as the numerical resolution is increased. Regarding the flow statistics, results illustrate that both, the FT and the FS methods, yield less accurate results when compared to those obtained with the traditional 3/2-rule, as one could expect.
Specifically, results illustrate that both the FT and FS methods over-dissipate the turbulent motions in the near wall region, yielding an overall higher mass flux when compared to the reference one (3/2-rule). Regarding the variances, results illustrate modest errors in the surface-layer, with local departures in general below 10% of the reference value across the considered resolutions. The observed departures in terms of mass flux and velocity variances tend to reduce with increasing resolution.
Analysis of the streamwise velocity spectra has also shown that the FT method redistributes unevenly the energy across the available wavenumbers, leading to an over-estimation of the energy of some scales by up to 100%. Contrary, the FS methods 5 redistributes the energy evenly, yielding a modest +13% energy magnitude throughout the available wavenumbers. Compared to the 3/2-rule, these differences in flow statistics are the result of the sharp low-pass filter applied in the FT method and the smooth filter that characterizes the FS method.
Code availability. The sources of the LES code developed at the University of Utah are accessible in pre-release at https://doi.org/10.5281/ zenodo.1048338 (Margairaz et al., 2017).
Data availability. Due to the large amount of data generated during this study, no lasting structure can be permanently supported where to openly access the data. However, access can be provided using the Temporary Guest Transfer Service of the Center of High Performance Computing of the University of Utah. To get access to the data, Marc Calaf (marc.calaf@utah.edu) will provide temporary login information for the sftp server. Hou, T. Y. and Li, R.: Computing nearly singular solutions using pseudo-spectral methods, Journal of Computational Physics, 226, 379-397,