Stride Search: a general algorithm for storm detection in high-resolution climate data

. This article discusses the problem of identifying extreme climate events such as intense storms within large climate data sets. The basic storm detection algorithm is reviewed, which splits the problem into two parts: a spatial search followed by a temporal correlation problem. Two speciﬁc implementations of the spatial search algorithm are compared: the commonly used grid point search algorithm is reviewed, and a new algorithm called Stride Search is introduced. The Stride Search algorithm is deﬁned independently of the spatial discretization associated with a particular data set. Results from the two algorithms are compared for the application of tropical cyclone detection, and shown to produce similar results for the same set of storm identiﬁ-cation criteria. Differences between the two algorithms arise for some storms due to their different deﬁnition of search regions in physical space. The physical space associated with each Stride Search region is constant, regardless of data resolution or latitude, and Stride Search is therefore capable of searching all regions of the globe in the same manner. Stride Search’s ability to search high latitudes is demonstrated for the case of polar low detection. Wall clock time required for Stride Search is shown to be smaller than a grid point search of the same data, and the relative speed up associated with Stride Search increases as resolution increases


Introduction
The identification of extreme events in climate data sets is a fundamental objective of many climate scientists.Data sets may be a reanalysis product or a particular model's output, and an extreme event may be any event classified as an important deviation from a subjective normal state -loosely, a "storm".End users of climate data and model developers alike frequently investigate prevalent storm tracks, intensity, or formation areas within a given data set, e.g., Williamson (1981), Hodges (1994), and Vitart et al. (1997).Annual means and statistical averages regarding the frequency of a particular type of storm in a particular region are another frequent subject of study, e.g., Sinclair (1994), Blender et al. (1997), Raible and Blender (2004), Bracegirdle and Gray (2008), and Kleppek et al. (2008).Individual storms' structures are often investigated to evaluate how well a model captures realistic physical features, e.g., Nordeng and Rasmussen (1992), Walsh et al. (2007), Reed andJablonowski (2011), andFøre et al. (2012).Common to all such efforts is the need to search data sets for quantifiable, objective storm identification criteria.
Identification criteria are defined as a small set of variables that together give a basic characterization of storms' location, intensity, and size (Williamson, 1981;Hodges, 1994).Each variable is paired with a threshold value used to filter the data into a small number of categories.The particular variables and their appropriate threshold values vary greatly by application, and many studies have proposed and compared different sets of criteria; see Raible et al. (2008) and Neu et al. (2013) for a discussion of extratropical cyclone criteria.Walsh et al. (2007) and Horn et al. (2014) provide similar analyses for tropical cyclones.
The basic storm detection algorithm consists of two stages (Hodges, 1994).First, a spatial search loops over all time steps in a data set and collects detection points where the spa-Published by Copernicus Publications on behalf of the European Geosciences Union.
i+2.If several candidates are found at time step i+1, the algorithm chooses the closest candidate to the entry at time step i and disregards the others.Track building proceeds until either zero candidates are found at the next time step or the data are exhausted.
The tracking algorithm (Algorithm 2) is sensitive to its two input parameters U max and t min .
Choosing a value of t min too low may not eliminate enough false positives, but choosing a value of 105 t min too high may eliminate weak cyclones that did exist but did not intensify enough to last long.
The effects of choosing a too high or too low value for U max are more subtle.A value of U max that is too low can cause storm tracks to fragment into several disjoint pieces.By contrast, choosing an unrealistically high value for U max could cause the tracking algorithm to merge two storms that are in reality separate entities.

110
Storm tracks provide a natural mechanism to count storms and to dismiss false positives.Tracks that consist of only one point, indicating a storm whose duration was only 1 time step may be dis-4 tially defined identification criteria are met or exceeded.Second, a temporal correlation procedure correlates detection points across adjacent time steps to construct storm tracks and apply temporally defined identification criteria.
In comparison to the number of studies concerned with identification criteria, literature regarding the analysis of spatial search algorithms is relatively sparse in the climate community.Such a discussion is of heightened importance due the growth of climate data sets in both size and number.As models and reanalysis products increase spatial and temporal resolution, and as ensembles are more commonly used forecasting tools, the need to efficiently and accurately search climate data sets is also increasing.Of equal concern to an algorithm's performance is its ability to produce repeatable, objective analysis of data (Hodges, 1994), regardless of the data layout and resolution.Contemporary models can incorporate advanced features such as variable resolution using unstructured grids (e.g., Zarzycki and Jablonowski, 2014) and frequently employ different representations of the sphere than a traditional latitude-longitude grid (e.g., Putnam and Lin, 2007;Neale et al., 2012;Skamarock et al., 2012).An ideal search algorithm would be agnostic to such details.
In such an ideal world, the choice of search algorithm would not affect the statistics associated with a particular data set.In practice, however, we find that just as a change to the identification criteria of a particular storm type can change the statistics found in a particular data set (Raible et al., 2008;Horn et al., 2014), the way that data set is divided and searched -independently of the identification criteriacan also affect the statistics.
One contributing factor is that some of the variables used to identify storms, such as vorticity, have a dependence on the scale of the data (Sinclair, 1997;Walsh et al., 2007).This dependence may also vary with location depending on the layout of the data set.On a uniform latitude-longitude grid, for example, the spatial scale of adjacent grid points varies with latitude.A search algorithm that does not account for this variation may inadvertently allow nonphysical artifacts related to data resolution and grid type to influence its output.Researchers may interpolate a data set to a different type of grid (Sinclair, 1994;Bracegirdle and Gray, 2008), employ a spatial smoothing procedure (Sinclair, 1997), or adjust threshold values as data resolution changes (Walsh et al., 2007) to alleviate some of these problems.We propose an alternative approach that separates the definition of an extreme event from its discrete representation in a data set.
The goal of this work is to provide an algorithm that allows identification criteria to be defined independently of the spatial resolution and layout of the data.The Stride Search algorithm facilitates searching data given on general unstructured grids as well as uniform latitude-longitude grids without alteration, and provides improved performance over the commonly used grid point search algorithm.Additionally, Stride Search treats all regions of the globe in the same manner, which allows users to search all latitudes, including the poles, efficiently.By decoupling the choice of identification criteria from the resolution and layout of the data, we aim to provide a robust objective search algorithm.

Storm detection algorithms
Basic descriptions of the two-stage storm detection algorithm are given as Algorithm 1, the spatial search, and Algorithm 2, the temporal correlation procedure.The majority of this work focuses on the spatial search strategy used to define Algorithm 1, which requires the most computational effort (Prabhat et al., 2012).Its input is a search domain and a set of per time step storm identification criteria, as well as the data.A key step in the algorithm is the division of the search domain end for 21: end for missed as noise.Tracks that persist for many time steps but do not move may possibly be regarded as topographic effects, particularly if the identification criteria use data that are sensitive to topography, such as geopotential height surfaces.

115
Storm tracks also provide a straightforward method of applying additional identification criteria.
A study concerned with identifying regions of cyclogenesis may reject any storms that do not intensify along their track.Temporal criteria may also be used to perform more detailed classification of candidate storms.For example, a vorticity criterion or a vertical wind speed criterion may detect strong convection due to thunderstorms.To make a distinction between typical summertime after-120 noon thunderstorms and more persistent mesoscale convective complexes, a temporal criterion may be used to neglect storms that do not persist for longer than 12 hours.5 into a set of search sectors (line 4); we will discuss this process in more detail in the following sections.For each file and each time step, the algorithm compares the data within each sector to the storm identification criteria.If the criteria are met, a storm is recorded to the list L at the current time step.
Identification criteria, particularly those used with a spatial search, are highly application dependent.Ideally storm detection software should be flexible enough to allow users to easily define identification criteria relevant to their area of study and should not be limited to any specific geographic region.In other words, users should be able to easily modify the implementation of Algorithm 1, line 7, in code.
The output of the spatial search (Algorithm 1) is a list of candidate storms.This list may contain false positives due to noisy data, topographic effects, or ambiguity within the identification criteria.The second step of storm detection and tracking, the temporal correlation problem, handles these issues.The temporal correlation algorithm's task is to identify the same storm across adjacent time steps.It does so by building storm tracks and is outlined by Algorithm 2. Users define a maximum travel speed U max appropriate to the type of storm under investigation.The algorithm uses that speed to define D max = U max • t, the maximum possible distance a storm may travel per time step.Beginning with a storm entry in the spatial search output list L at time step i, the algorithm searches all storms detected at time step i + 1.Any storms at time step i + 1 separated by a distance less than D max from the storm at time step i are marked as candidate successors.If zero candidates are found, the track is ended at time step i.If one candidate is found, that candidate is linked to its predecessor at time step i and the algorithm continues to build the track by looking for candidates at time step i + 2. If several candidates are found at time step i + 1, the algorithm chooses the closest candidate to the entry at time step i and disregards the others.Track building proceeds until either zero candidates are found at the next time step or the data are exhausted.
The tracking algorithm (Algorithm 2) is sensitive to its two input parameters U max and t min .Choosing a value of t min too low may not eliminate enough false positives, but choosing a value of t min too high may eliminate weak cyclones that did exist but did not intensify enough to last long.The effects of choosing a too high or too low value for U max are more subtle.A value of U max that is too low can cause storm tracks to fragment into several disjoint pieces.By contrast, choosing an unrealistically high value for U max could cause the tracking algorithm to merge two storms that are in reality separate entities.

P. A. Bosler et al.: Stride Search
Storm tracks provide a natural mechanism to count storms and to dismiss false positives.Tracks that consist of only one point, indicating a storm whose duration was only 1 time step may be dismissed as noise.Tracks that persist for many time steps but do not move may possibly be regarded as topographic effects, particularly if the identification criteria use data that are sensitive to topography, such as geopotential height surfaces.
Storm tracks also provide a straightforward method of applying additional identification criteria.A study concerned with identifying regions of cyclogenesis may reject any storms that do not intensify along their track.Temporal criteria may also be used to perform more detailed classification of candidate storms.For example, a vorticity criterion or a vertical wind speed criterion may detect strong convection due to thunderstorms.To make a distinction between typical summertime afternoon thunderstorms and more persistent mesoscale convective complexes, a temporal criterion may be used to neglect storms that do not persist for longer than 12 h.
Grid point searches are designed for the common case where data are given on a uniform latitude-longitude grid with resolution λ (in radians) so that grid points (λ j , θ i ) are located at where λ is longitude, θ is latitude, n lon = 2π/ λ is the number of longitudinal grid points, and n lat = n lon /2 + 1 is the number of latitude grid points.
In a grid point search algorithm, each grid point (λ j , θ i ) in the search domain is a search sector center.Sector K ij centered at grid point (λ j , θ i ) is defined as where n is a user-specified parameter that corresponds to the scale of a storm in latitude-longitude space.Thus, each sector is a (2n + 1) × (2n + 1) square in grid point space.
To set up a grid point search, users define the search domain by defining a minimum and maximum latitude, θ min and θ max , and a maximum and minimum longitude, λ min and λ max .In this work we assume λ min = 0 and λ max = 2π, while θ min and θ max can vary by application.Users must also select a value for n that relates the spatial scale of the storms they wish to detect to the resolution of the data λ. Figure 1a shows grid point search sectors along the equator with n = 2 for data with resolution λ = 10 • .The sectors are 5 × 5 boxes in grid point space and span approximately 5600 km × 5600 km on the Earth.
For each sector, the software collects data from the (2n + 1)×(2n+1) points centered at (λ j , θ i ).The collected data are compared against the storm identification criteria.If the criteria are met or exceeded in the sector, the algorithm checks if (λ j , θ i ) is the location of the storm within that sector.If so, the algorithm records the storm to its output list.If not, the algorithm cycles to the next grid point, say (λ j +1 , θ i ), and begins again.In Fig. 1a the blue, horizontally striped sector is centered at (λ j , θ i ) = (150 • E, 0 • N) and the next two consecutive sectors are shown by the red, vertically striped sector centered at (λ j +1 , θ i ) = (160 • E, 0 • N) and black, diagonally striped sector whose center is Centering a sector at each grid point in the search domain yields a robust algorithm.It ensures that the entire search domain will be covered and that the same storm will not be recorded twice.Even though a single storm may trigger the identification criteria in several sectors, only the sector whose center corresponds to the storm location will be recorded to output.While the robustness of the grid point search algorithm is an advantage, it comes at the cost of redundant work.The same data points are accessed multiple times because the algorithm only advances one grid point at a time and the overlap of adjacent sectors is considerable.
The data access required by a grid point search and the overlap of adjacent sectors is also illustrated by Fig. 1a.All three sectors read the data from grid points in the region {(λ, θ ): For visual clarity we have not plotted the sectors at (λ j −1 , θ i ) or (λ j , θ i±1 ), which would also overlap a majority of the same grid points.

Stride Search
Instead of squares in grid point space, Stride Search sectors are circles on the surface of an Earth-sized sphere.The sectors are defined using the geodesic distance function where a is the radius of the Earth.Users select an applicationdependent spatial scale s in units of distance such that a maximum of one storm can be located within any spherical circle of radius s.The search domain is divided into a collection of circles on the sphere, each with the same geodesic radius s.
Stride Search sectors, illustrated in Fig. 1b for s = 2220 km, are defined by the following procedure.The number S lat = s/a defines the arc length corresponding to the user-specified scale s.We refer to S lat as the latitude stride and use it to define lines of constant latitude where N θ = (θ max − θ min )/S lat + 1.The set θ I divides the search domain into latitudinal strips of width ≈ s.We also define a longitude stride for each θ I , so that S (I ) lon are the arc lengths along each latitude circle θ I that approximately span a geodesic distance s in the longitudinal (zonal) direction.The minimum function in Eq. ( 5) accounts for the case where θ I is either pole.The longitude stride defines points λ I J along each latitude line θ I , where J = (j − 1)S (I ) lon longitude points along each θ I .Each point (λ I J , θ I ) defines the center of search sector K I J , where K I J is the set of all points on the sphere lying within a distance s of (λ I J , θ I ), We note that the definition of the Stride Search sectors is determined entirely by the application-related spatial scale s and is therefore independent of resolution of the data, λ.By construction, sector K I J overlaps its neighbors K I ±1,J and K I,J ±1 by approximately one radius s in physical space.This is a sufficient condition to ensure that the entire search domain will be covered by the circular search sectors.Figure 1b shows three consecutive circular sectors with s = 2220 km.This value of s corresponds to an arc length of ≈ 20 • , and was chosen to match the scale of the sectors in Fig. 1a.The blue, horizontally striped circle is centered at (λ I J , θ I ) = (150 • E, 0 • N).The red, vertically striped sector is centered at (λ I,J +1 , θ I ) = (170 • E, 0 • N) and the center of the black, diagonally striped sector is located at (λ I,J +2 , θ I ) = (170 • W, 0 Stride Search algorithm therefore covers a much larger geographic area with the same number of search sectors.Stride Search setup is completed by defining the sectors K I J in terms the available data.Computationally, this involves creating a class and/or methods that identify and link each sector to the data points enclosed by its geographic boundary.Sectors used with high-resolution data sets automatically link to more grid points than the same-sized sectors used with lower-resolution data.For uniform longitudelatitude grids, Eq. ( 1) applies and the process is straightforward.For unstructured grids, the mesh's connectivity information may include a node adjacency list or topological data structures such as edges and faces.Any of these may be used to determine a sector's enclosed data points.In the absence of such connectivity information, a kd-tree algorithm (e.g., Samet, 2006) may be used.
Reduced overlap between adjacent sectors leads to improved performance by decreasing the number of redundant data accesses and by reducing the total number of sectors.However, it also creates a new issue.Since each sector is searched independently, several sectors may detect and record the same storm to the linked list, as illustrated by Fig. 2. Before the storms detected by Stride Search can be saved to output, duplicate entries must be removed.
Duplicates are removed by again referencing the userspecified scale s.Each pair of entries in the linked list is compared; if a pair is separated by a distance less than s, the entries are considered duplicates and the less intense entry is deleted.In Fig. 2, this is demonstrated with pressure data.Each of the three search sectors have exceeded the storm identification criteria and independently locate their minimum pressure.The blue (left) sector finds a minimum of 988 hPa, the red (middle) finds 982 hPa, and the black (right) finds 984 hPa.These three entries are clearly separated by a distance less than s, as they are all contained within the red (middle) circle.The duplicate removal procedure will delete the blue (left) and black (right) entries because, compared to the red (middle) entry, they have higher pressures and are less intense.Only the red 982 hPa entry will be saved to output.
In general, the list of detected storms at a particular time step, duplicates included, is much smaller than the spatial size of the data and the time required by the duplicate removal procedure is negligible.

Data description
To demonstrate and test Stride Search we use data produced by the spectral element dynamical core of the Community Atmosphere Model, CAM-SE (Neale et al., 2012;Dennis et al., 2012).The model uses a cubed sphere grid and highresolution experiments set 240 elements per face of the cube for a total of 3 110 402 horizontal grid points.This results in a horizontal resolution of λ ≈ 0.125 • (Worley et al., 2011;Dennis et al., 2012).Due to well-known issues regarding the tuning of physical parameterizations within climate models, this high-resolution simulation may produce highintensity storms with unrealistically high frequencies (Reed and Jablonowski, 2011;Dennis et al., 2012).
The original goals of the high-resolution experiments of Worley et al. (2011) and Dennis et al. (2012) were to demonstrate the parallel scaling of CAM-SE, to document its required run time and related statistics in various highperformance computing environments, and to demonstrate the model's capability to produce well-resolved features like tropical cyclones that cannot be represented well in lowresolution experiments.Here, we choose these data because their high resolution ensures that small-scale storms will exist, which provides a good testing environment for storm detection algorithms.The fact that there may be an unrealistically high number of storms in the data is a benefit in this case.
The data set contains 5 years of simulated data that used CAM5 physics and preindustrial (year 1850) initial conditions.Instead of additional model components, the CAM-SE atmospheric dynamical core is coupled to a set of land, ocean, and sea ice data that also correspond to the year 1850 to provide its boundary conditions (Dennis et al., 2012).The land, ocean, and sea ice boundary conditions are periodic, with a period of 1 year, and simply repeat throughout the 5year atmospheric simulation.
The model's native cubed sphere data were interpolated to a uniform latitude-longitude grid with n lon = 1024 for a resolution of λ = 0.35 • using the regridding software provided by the Earth System Modeling Framework (Balaji et al., 2014).To facilitate a timing experiment (presented in the next section), we also interpolate 3 months of data to resolutions of λ = 2, 1, 0.5, and 0.25 • .

Tropical cyclones
In this section, we apply both Stride Search and TSTORMS to the problem of tropical cyclone detection.Our aim is to validate Stride Search by comparison with TSTORMS, Geosci.Model Dev., 9, 1383-1398, 2016 www.geosci-model-dev.net/9/1383/2016/ which has a proven record.We also discuss the subtle differences between the two algorithms that lead to differences in their final results that may be of importance to climate researchers.
While there are many different combinations of variables available to define a tropical cyclone (Walsh et al., 2007;Horn et al., 2014), to provide both codes with a common set of identification criteria we choose the TSTORMS default.A tropical cyclone is identified within search sector K ij if the following four criteria are met (Vitart et al., 1997): 1.There is a cyclonic vorticity maximum greater than a threshold value, τ ζ : where ζ 850 is the relative vorticity at the 850 hPa level.
2. The distance between the cyclonic vorticity maximum and the sector's sea-level pressure minimum is less than a threshold value τ D 1 : where (λ ζ , θ ζ ) and (λ P , θ P ) are the locations of sector K ij 's vorticity maximum and sea-level pressure minimum, respectively.
3. The difference between the vertically averaged temperature's maximum value and its sector average exceeds a threshold τ T : where T is defined as and T 500 and T 200 are the temperatures at the 500 and 200 hPa pressure levels, respectively.To maintain consistency between both codes, the sector average Avg K ij (T ) is approximated as a simple arithmetic average, where N K ij is the number of data points in sector K ij .
4. The distance between the maximum vertically averaged temperature and the sea-level pressure minimum is less than a threshold value τ D 2 : dist (λ T , θ T ), (λ P , θ P ) < τ D 2 , where (λ T , θ T ) is the location of the sector maximum of T .
Table 1.Threshold values used for tropical cyclone detection.
Differences between the two algorithms' detections arise due to the differences in the algorithms themselves.For computing the collocation criteria, Eqs. ( 8) and ( 12), TSTORMS uses the angular distance function For TSTORMS, whose intended application is in tropical regions, this is a simple and effective strategy because angular distance is a reasonable proxy for geodesic distance near the equator.Stride Search uses the geodesic distance function Eq. ( 3).Users of TSTORMS must specify τ D 1 and τ D 2 in angular units, while users of Stride Search must use units of length.The arithmetic averages of the vertically averaged temperature Eq. ( 11) will be different from one algorithm to the other, because their sectors will contain a different number of data points.As a result criteria 2, 3, and 4 may behave differently for each algorithm.

Spatial search results
We apply both algorithms to the data described in Sect.3. We set TSTORMS n = 12, Stride Search s = 450 km, and use the threshold values shown in Table 1 for Eqs. ( 7), ( 8), (9), and (12).
Results from an arbitrarily chosen 3 months of data, 18 July to 18 October of simulation year 4, are plotted in Fig. 3.Each dot represents a storm detected at one time step.All 6-hourly time steps over the entire 3 months are shown, colored by the windspeed-dependent hurricane categories defined by the Saffir-Simpson intensity scale.
Both algorithms produce qualitatively similar results.Visually they appear to agree nearly perfectly on the identifiable storm tracks and intensities.They both have false positives over land and in the Southern Ocean.Stride Search produces more false positives, particularly in the Southern Ocean, than TSTORMS.This is due to the fact that Stride Search sectors -especially at higher latitudes -typically contain more data points than TSTORMS sectors.The larger number of points per sector reduces the sector average of the vertically averaged temperature, Eq. ( 11), compared to a TSTORMS sector at the same location.Thus, the warm core temperature excess criterion Eq. ( 9) is more easily achieved using Stride Search than TSTORMS for the same value of τ T .For both codes, false positives are eliminated by the temporal correlation algorithm discussed in the next section.However, the consequences of this different behavior of Eq. ( 9) between the codes will propagate into the storm tracks and final output.

Temporal correlation results
In this section we apply the temporal correlation algorithm, Algorithm 2, to the spatial search results.Since tropical cyclones are inherently maritime events (Cotton and Anthes, 1989), at this stage we also apply a land mask to remove any tracks whose origins are not over water.
In Fig. 4 we show the storm tracks that correspond to the spatial search output of Fig. 3 with parameters U max = 15 m s −1 and t min = 2 days.These results show that the temporal correlation algorithm succeeds in eliminating false positives and gives a better representation of the storms within the data set than the raw output from the spatial search algorithm.Table 2 presents the final storm counts for each algorithm for the 3-month data set separated by hurricane category.Again, both algorithms produce nearly identical results which validate the present work.
Table 2 shows that Stride Search detects two fewer category 1 storms and one additional category 2 storm than TSTORMS.Looking for differences between the panels of Fig. 4, we see that the category 1 storms correspond to a storm in the western Pacific off the east coast of Japan near (150 • E, 30 • W) and a storm in the north central Atlantic near (050 • W, 20 • N).
Comparing the panels, we see that Stride Search classified the western Pacific storm as category 2, which accounts for two of the three differences between rows in Table 2. Viewing the data, we note that the Stride Search track for this particular storm is 4 time steps longer than the corresponding TSTORMS track.During its final data points in the Stride Search output, the storm's temperature excess was decreasing and very close to the detection threshold.Since the TSTORMS sector average of T is higher than the Stride Search sector average of T , the storm did not pass criterion 3 (Eq.9) at the end of its life cycle in TSTORMS.We also find that this storm only achieved a category 2 wind speed at the very end of its life cycle, after the point in time where TSTORMS had stopped detecting it, which explains why Stride Search counts the storm as a category 2 and TSTORMS does not.
A similar explanation holds for the Atlantic storm.We see that TSTORMS counts the same storm twice, once as a category 1 and once as a category 2, while Stride Search shows only one longer category 2 track.This is again due to the difference in the computation of the temperature excess between the two codes.This particular storm weakened in its early days to the point where its temperature excess was not sufficient for TSTORMS to detect it before finally intensifying into a category 2 storm.This creates a hole in the TSTORMS track that does not show up in the Stride Search results because the temperature excess criterion is not as strict in Stride Search as it is in TSTORMS.
We point out that this is not a weakness of the TSTORMS software -our choice of τ T = 2 K was somewhat arbitrary and choosing a lower threshold value τ T for TSTORMS would remedy this problem for this particular cyclone.Rather, we stress that these differences arise simply due to the differences in the definition of both algorithms' search sectors.To investigate these differences further we tested Stride Search using a midpoint rule quadrature approximation of the sector average of the average vertical temperature, Eq. ( 11), and found similar results.We therefore chose to use the arithmetic average to keep the tropical cyclone identification criteria the same between the two codes.
Unfortunately, differences in specific storm tracks between each algorithm become more difficult to sort by cause as the  size of the data set grows. Figure 5 shows the storm tracks produced by each algorithm for the entire data set, using the same identification criteria and threshold values as our previous discussion.Since the temperature excess criterion is more easily achieved by Stride Search, we would expect Stride Search to identify more storms than TSTORMS, particularly in the lower-intensity storm categories.We might also expect TSTORMS to count too many storms because, for these threshold values, some storms may be split into multiple tracks.Both of these predictions may be born out in the data, which are tabulated in Table 3. Stride Search does indeed detect more category 0, 1, and 2 storms.TSTORMS also finds a higher number of high-intensity (category ≥ 3) storms than Stride Search, possibly due to track splitting.Without investigating differences in individual tracks, which is impractical for large data sets, one can only state that since the identification criteria used by both codes were identical, the different results can only be due to differences at the level of the algorithms' design.

Performance and timing
As discussed previously, climate data sets are large and expected to increase in size as climate models run at high resolutions with λ < 0.5 • .Storm detection algorithms must be both accurate and efficient.In this section we document the dependence of each search algorithm's run time on the resolution of its input data set.Differences in the structure of the two algorithms result in notable differences in the number of search sectors and the number of times each grid point is accessed in memory per time step.In the top row of Table 4 we present the total number of search sectors required by each algorithm to search the tropical domain, θ min = 40 • S, θ max = 40 • N for each data resolution.For TSTORMS this is equivalent to the number of grid points in the domain, hence the number of search sectors increases by a factor of 4 as the data resolution is halved.The number of Stride Search sectors remains constant across all data resolutions.For data on a uniform latitude-longitude grid, the number of points in a Stride Search sector grows as a function of latitude.The maximum points per sector listed for Stride Search are an upper bound that depends on the search domain, specifically θ min and θ max and the spatial scale s.For TSTORMS, the maximum points per sector are a function of the user-specified parameter n and are equal to (2n + 1) × (2n + 1).
In the last row of the table, the total number of per time step data accesses required by each algorithm are given; these numbers are the product of the number of sectors and the number of points per sector.This number provides an indication of the cost of each algorithm.The smaller number of sec-  Both TSTORMS and the current implementation of Stride Search are written in Fortran and run in serial on a single thread.As a timing experiment, we run both codes on 3 months of data that were interpolated to resolutions of λ = 2, 1, 0.5, and 0.25 • , as described in Sect.3.Each data set contains 360 time steps evenly spread across three NetCDF files.We choose s = 450 km for Stride Search and set the TSTORMS parameter n = 2, 3, 6, 12 for the corresponding λ.The total wall clock time required for each algorithm to search each data set, including I/O, is recorded and the average time per time step is computed by dividing the total by 360.We repeat each experiment three times for each resolution and average the results to produce Fig. 6.
Figure 6a shows the average wall clock time required by each algorithm to search one time step of data as a function of the data resolution λ.The plot shows results from experiments run on a standard desktop workstation using GNU's Fortran compiler.The same tests were run on one node of Sandia's Red Sky High Performance Computing cluster using the Intel Fortran compiler and produced similar results.Both algorithms appear to scale at the expected rate of O( λ −2 ) as λ → 0, but for each resolution Stride Search is faster.The speed advantage of Stride Search over TSTORMS improves as resolution increases.

Polar search
A key motivation for developing Stride Search was to provide a detection algorithm capable of searching all latitudes, including polar regions.The Arctic and Antarctic climates become increasingly frequent subjects of study due to recent significant changes in these environments (Stocker et al., 2013), and a detection algorithm capable of searching data near the poles is necessary.Grid point searches have been used at midlatitudes up to ≈ 60 • N and 60 • S (König et al., 1993;Raible and Blender, 2004), but users must exercise care when choosing the sector size parameter n at high latitudes.A grid point search near the poles may have to use sectors whose physical size is no longer representative of the physical features of the storms it is meant to locate.each sector, along 80 • N, spans only about 1000 km east to west.Users can choose higher values of n at higher latitudes to ensure the sectors will have sufficient zonal extent to capture the desired feature, but the square sectors will still have different spatial scales in the longitude direction compared to the latitude direction.As we have seen already in Sect.4, the definition of search sector size can impact the final output of a search algorithm, independently of the identification criteria.
The constant geodesic radius of Stride Search sectors removes this dependence on the data.Search sectors -even one centered at the pole -have the same geographic size regardless of latitude, and are therefore capable of searching polar regions as effectively as midlatitude and tropical regions.
As an example application, we consider polar lows.Polar lows are distinct from midlatitude low-pressure systems due to their different developmental forcing and a typical lack of associated fronts (Montgomery and Farrell, 1992;Rasmussen and Turner, 2003).They contribute to the break up of sea ice which has significant implications for the polar climate.
Individual polar lows may develop a barotropic structure more similar to tropical cyclones than to baroclinic midlatitude storms (Nordeng and Rasmussen, 1992;Føre et al., 2012).These particularly interesting polar lows are typically smaller in diameter and duration than tropical cyclones, and therefore require a high-resolution model such as the one described in Sect. 3 to resolve.A companion study will investigate the Arctic climatology of variable resolution climate models and their ability to simulate "hurricane-like" polar lows (Roesler et al., 2016).Here, our goal is to test Stride Search's ability to locate such a small-scale, high-latitude storm.
A set of objective identification criteria for polar lows are given by Bracegirdle and Gray (2008), which we adapt to the Stride Search algorithm.Storm intensity is measured with vorticity and pressure and, as was the case with the tropical cyclone identification criteria, a collocation requirement is applied.New to this application is a criterion that identifies regions of increased instability associated with cold air outbreaks over relatively warm ocean water.
Stride Search records a polar low in sector K ij if the following criteria are met: 1.A sea-level pressure minimum of sufficient intensity exists, min i,j ∈K ij P sl (λ j , θ i ) < τ P , where P sl is sea-level pressure and τ P is the pressure threshold.
2. A cold air outbreak exists, where θ 700 is the potential temperature at the 700 hPa level, SST is the sea surface temperature, and τ T is the cold air outbreak threshold.The data include only temperature (not potential temperature) and do not include the 700 hPa pressure level.The required θ 700 data are approximated as where θ 850 = T 850 1000 850 0.286 and θ 500 = T 500 1000 500 0.286 .Results from the entire 5-year data set are presented in Fig. 8, separated by season.Storm tracks are colored by their maximum strength on the US National Weather Service's maritime warning scale (Bowditch, 2002); gale force storms (black) have maximum wind speeds 17.5 ≤ u max < 24.5 m s −1 , storm force (blue) has 24.5 ≤ u max < 33 m s −1 , and hurricane force storms (red) have maximum wind speeds greater than 33 m s −1 .The results show the expected seasonal variation of storm frequencies, with the maximum number of storms and the maximum intensity of storms occurring in the winter (DJF) months.Spring (MAM) months show more activity over the pole than the fall (SON) months, and there are few storms in the summer (JJA).
Once storm tracks are built, users may investigate individual storms more easily.To fulfill our goal of finding a "hurricane-like" polar low near the pole we search the storm tracks for storms that get within 5 • of the pole, then plot the vorticity associated with each storm.Since the size of the storm track list is much smaller than the size of the data set, we quickly find the polar low shown in Fig. 9, from 12:00 UTC 25 December, simulation year 3.At the plotted time step the storm is located at (017.0 • E, 86.5 • N).Spatial scale is illustrated by the 1000 km line segment.The plot shows the 850 hPa relative vorticity of the storm and we note how similar its structure is to the typical tropical cyclone.The diameter of the storm's core is approximately 200 km and the diameter of the whole storm, including its vorticity bands, is approximately 500 km.Due to its small spatial scale this storm would not be resolved in low-resolution data.Additionally, it is located very close to the pole.This storm therefore demonstrates the capability of Stride Search to find specific features in high-resolution data, even in polar regions.
We have introduced the Stride Search algorithm for detection of extreme events within climate data sets.The algorithm is defined independently of a data set's layout and resolution, and depends only on the spatial scale associated with a user's intended application.Stride Search was designed to be a flexible algorithm, capable of searching data sets for a variety of extreme events while treating physical space the same for different data sets.Extreme events must be described by a set of quantifiable identification criteria and a representative spatial scale.As examples we have shown detections of two different varieties of cyclonic storms: tropical cyclones and polar lows.The capability to search polar latitudes in exactly the same manner as tropical and midlatitudes is a new feature introduced by Stride Search, and was the primary motivation for its development.
To validate Stride Search we compare its output to the output of TSTORMS, the current standard tool for tropical cyclone detection.We show that Stride Search performs faster than TSTORMS and its relative speed up increases as the data resolution increases.The final outputs of both algorithms and tropical cyclone tracks, generally agree.However, due to their different definitions of spatial search sectors, results between the two codes can differ in some cases even when both use the same identification criteria and threshold values.
Our results show that the storm track statistics associated with a particular climate data set can depend not only on the storm identification criteria, as is widely reported in the literature (e.g., Bracegirdle and Gray, 2008;Raible et al., 2008;Horn et al., 2014), but also on the spatial search algorithm used to produce the storm tracks.Since the Stride Search algorithm is defined independently of data layout and resolution, we posit that it may provide a more objective analysis tool and be less sensitive to differences in spatial discretizations between data sets.Further experiments are necessary to investigate this claim; they should include variable resolution data, data defined on different types of spherical meshes, and sensitivity analyses covering a range of identification criteria and threshold values.
We anticipate that extending Stride Search to other, more specialized applications such as locating multicentered cyclones (Hanley and Caballero, 2012) and atmospheric rivers (Ralph et al., 2004;Prabhat et al., 2012) will be straightforward.The software uses the object-oriented design capabilities provided by modern Fortran (Adams et al., 2009) and is intended to allow users to extend its data types to new applications.
Finally, we note that the performance of both Stride Search and TSTORMS software may be improved via parallelization.It is already common to take advantage of temporal parallelism by applying the spatial search algorithm to multiple time steps and multiple files concurrently using several compute nodes.This may be implemented with customized run scripts or dedicated software such as NASA's Portable Distributed Script (PoDS) software (Kouatchou and Oloso, 2014), GNU Parallel (Tange, 2011), and the Toolkit for Extreme Climate Analysis (Prabhat et al., 2012).However, there also remains a significant amount of unexploited parallelism in the storm detection problem, as individual search sectors at the same time step may be distributed across intranode threads.We mark the parallel development of the Stride Search software as an additional item for future work.

Algorithm 2
Temporal correlation algorithm Input: List L of potential storms from spatial search output, max storm travel speed Umax, minimum duration tmin, other storm identification criteria (temporal) Output: List T of storm tracks 1: set T = empty list 2: NT = total number of time steps in data set 3: for i = 1 to NT do 4:for all elements li ∈ L at time step i do lj+1 ∈ L at time step j + 1 for possible successors to storm lj

Figure 3 .
Figure 3. Spatial search results, Northern Hemisphere summer simulation year 4. Each dot is a storm detected at one time step; colors correspond to the categories of the Saffir-Simpson hurricane scale.

FigFigure 4 .
Figure 4. Storm tracks, Northern Hemisphere summer, simulation year 4. Each track is colored by the hurricane category corresponding to the maximum wind speed achieved during its lifetime.

FigFigure 5 .
Figure 5. Storm tracks.The ultimate output of each algorithm for the entire 5-year data set.Coloring as in Fig. 4.

Figure 6 .
Figure 6.Timing results.(a) Average wall clock time required to search one time step of data.(b) Speedup due to Stride Search.

Figure
Figure 7a shows three consecutive grid point search sectors along θ = 60 • N. As in Fig. 1, the grid of data points has resolution λ = 10 • and we have used the same n = 2 to set up 5 × 5 grid point search sectors.The blue (horizontal stripes) sector is centered at (λ j , θ i ) = (150 • E, 60 • N) and the red (vertical stripes) and black (diagonal stripes) sectors are at (λ j +1 , θ i ) = (160 • E, 60 • N) and (λ j +2 , θ i ) = (170 • E, 60 • N), respectively.Each grid point search sector spans a distance of 5 grid points in latitude, or approximately 5600 km south to north.The square grid point search sectors may appear correct in the left plot of Fig. 7a, a Mercator projection, but the problem with them is clear in the polar stereographic projection to the right.The southern boundary of each sector lies along the 40 • N latitude circle, where 5 grid points in longitude span 4000 km east to west.However, the northern boundary of

Figure 8 .
Figure 8. Northern Hemisphere polar low storm tracks by season; (a) DJF, (b) MAM, (c) JJA, (d) SON.Tracks are colored by the NOAA warning category corresponding to their lifetime maximum wind speed.
Figure 7b shows three Stride Search sectors along θ = 60 • N, with s = 2500 km.The blue (horizontally striped) sector is centered at (λ J , θ I ) = (150 • E, 60 • N).The red (vertically striped) and black (diagonally striped) sectors are at (λ j +1 , θ i ) = (170 • W, 60 • N) and (λ j +2 , θ i ) = (130 • W, 60 • N), respectively.The longitude stride along θ = 60 • N is twice as large as the longitude stride along the equator, so the three consecutive sectors in Fig. 7b cover twice as many longitude lines than the three sectors in Fig. 1b.The shapes of each sector in the left plot are due to the effects of the Mercator projection.All sectors are still circles on the sphere, as shown in the polar stereographic projection (right).Stride vorticity maximum must be collocated with the pressure minimum, dist (λ P , θ P ), (λ ζ , θ ζ ) < τ D .(17) Stride Search setup uses a sector radius s = 500 km, and search region boundaries θ min = 45 • N, and θ max = 90 • N. Threshold values are set at τ P = 980 hPa, τ T = 7 K, τ ζ = 2.0 × 10 −4 s −1 and τ D = 200 km.To the temporal correlation algorithm we add a minimum duration t min = 12 h and set U max = 20 m s −1 .

Figure 9 .
Figure 9.An example polar low on 25 December, simulation year 3, located at (017.0 • E, 86.5 • N) with structural similarities to a tropical cyclone.The center of the plot is the North Pole and the perimeter is the 80 • N latitude circle.

Table 2 .
Storm count by hurricane category for Northern Hemisphere summer simulation year 4; this table corresponds to the storm tracks shown in

Table 3 .
Storm counts by hurricane category for the entire 5-year data set; this table corresponds to the storm tracks shown in