Atmospheric River Tracking Method Intercomparison Project (ARTMIP): project goals and experimental design

. The Atmospheric River Tracking Method Intercomparison Project (ARTMIP) is an international collaborative effort to understand and quantify the uncertainties in atmospheric river (AR) science based on detection algorithm alone. Currently, there are many AR identiﬁcation and tracking algorithms in the literature with a wide range of techniques and conclusions. ARTMIP strives to provide the community with information on different methodologies and provide guidance on the most appropriate algorithm for a given science question or region of interest. All ARTMIP participants will implement their detection algorithms on a speciﬁed common dataset for a deﬁned period of time. The project is divided into two phases: Tier 1 will utilize the Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) reanalysis from January 1980 to June 2017 and will be used as a baseline for all subsequent comparisons. Participation in Tier 1 is required. Tier 2 will be optional and include sensitivity studies designed around speciﬁc science questions, such as reanalysis uncertainty and climate change. High-resolution reanalysis and/or model output will be used wherever possible. Proposed metrics include AR frequency, duration, intensity, and precipitation attributable to ARs. Here, we present the ARTMIP experimental design, timeline, project requirements, and a brief description of the variety of methodologies in the current literature. We also present results from our 1-month “proof-of-concept” trial run designed to illustrate the utility and feasibility of the ARTMIP project.


Introduction
Atmospheric rivers (ARs) are dynamically driven, filamentary structures that account for ∼ 90% of poleward water vapor transport outside of the tropics, despite occupying only ∼ 10% of the available longitude (Zhu and Newell, 1998).ARs are often associated with extreme winter storms and heavy precipitation along the west coasts of midlatitude continents, including the western US, western Europe, and Chile (e.g., Ralph et al., 2004;Neiman et al., 2008;Viale and Nuñez, 2011;Lavers and Villarini, 2015;Waliser and Guan, 2107).Their influence stretches as far as the polar caps as ARs transfer large amounts of heat and moisture poleward, influencing the ice sheets' surface mass and energy budget (Gorodetskaya et al., 2014;Neff et al., 2014;Bonne et al., 2015).Despite their short-term hazards (e.g., landslides, flooding), ARs provide long-term benefits to regions such as California, where they contribute substantially to mountain snowpack (e.g., Guan et al., 2010), and ultimately refill reservoirs.The sequence of precipitating storms that often accompany ARs may also contribute to relieving droughts (Dettinger, 2014).Because ARs play such an important role in the global hydrological cycle (Paltan et al., 2017) as well as for water resources in areas such as the western US, un-derstanding how they may vary from subseasonal to interannual timescales and change in a warmer climate is critical to advancing understanding and prediction of regional precipitation (Gershunov et al., 2017).
The study of ARs has blossomed from 10 publications in its first 10 years in the 1990s to over 200 papers in 2015 alone (Ralph et al., 2017).This growth in scientific interest is founded on the vital role ARs play in the water budget, precipitation distribution, extreme events, flooding, drought, and many other areas with significant societal relevance, and is evidenced by current (past) campaigns including the multiagency supported CalWater (Precipitation, Aerosols, and Pacific Atmospheric Rivers Experiment) and ACAPEX (ARM Cloud Aerosol Precipitation Experiment) field campaigns in 2015 with deployment of a wide range of in situ and remote sensing instruments from four research aircraft, a research vessel, and multiple ground-based observational networks (Ralph et al., 2016;Neiman et al., 2008).The scientific community involved in AR research has expanded greatly, with 100+ participants from five continents attending the First International Atmospheric Rivers Conference in August 2016 (http://cw3e.ucsd.edu/ARconf2016/,last access: 15 June 2018), many of whom enthusiastically expressed interest in the AR definition and detection comparison project described here.
The increased study of ARs has led to the development of many novel and objective AR identification methods for model and reanalysis data that build on the initial modelbased method of Zhu and Newell (1998) and observationally based methods of Ralph et al. (2004Ralph et al. ( , 2013)).These different methods have strengths and weaknesses, affecting the resultant AR climatologies and the attribution of high-impact weather and climate events to ARs.Their differences are of particular interest to researchers using reanalysis products to understand the initiation and evolution of ARs and their moisture sources (e.g., Dacre et al., 2015;Ramos et al., 2016a;Ryoo et al., 2015;Payne and Magnusdottir, 2016), to assess weather and subseasonal-to-seasonal (S2S) forecast skill of ARs and AR-induced precipitation (Jankov et al., 2009;Kim et al., 2013;Wick et al., 2013a;Lavers et al., 2014;Nayak et al., 2014;DeFlorio et al., 2018;Baggett et al., 2017), evaluate global weather and climate model simulation fidelity of ARs (Guan and Waliser, 2017), investigate how a warmer or different climate is expected to change AR frequency, duration, and intensity (e.g., Lavers et al., 2013;Gao et al., 2015;Payne and Magnusdottir, 2015;Warner et al., 2015;Shields and Kiehl, 2016a, b;Ramos et al., 2016b;Lora et al., 2017;Warner and Mass, 2017), and attribute and quantify aspects of freshwater variability to ARs (Ralph et al., 2006;Guan et al., 2010;Neiman et al., 2011;Paltan et al., 2017).
Representing the climatological statistics of ARs is highly dependent on the identification method used (e.g., Huning et al., 2017).For example, different detection algorithms may produce different frequency statistics, not only between

Condition
If conditions are met, then AR exists for each time instance at each grid point.This counts time slices at a specific grid point.

Tracking
Lagrangian approach: if conditions are met, AR object is defined and followed across time and space.

Absolute
Value is explicitly defined.

Relative
Value is computed based on anomaly or statistic.

Time slice
Consecutive time slices can be counted to compute AR duration, but it is not required to identify an AR.

Time stitching
Coherent AR object is followed through time as a part of the algorithm.observation-based reanalysis products but also among future climate model projections.The diversity of information on how ARs may change in the future will not be meaningful if we cannot understand how and why, mechanistically, the range of detection algorithms produces significantly different results.The variety of parameter variable types, and different choices that can be made for each variable in AR detection schemes, is summarized in Fig. 1 and will be described in more detail in Sect.3.

Global
The detection algorithm diversity problem is not unique to ARs.For instance, the CLIVAR (Climate and Ocean -Variability, Predictability, and Change) program's IMILAST (Intercomparison of Midlatitude Storm Diagnostics) project investigated extratropical cyclones similar to what is proposed here (Neu et al., 2013).That project found considerable differences across definitions and methodologies and helped define future research directions regarding extratropical cyclones for such storms.Hence, it is imperative to facilitate an objective comparison of AR identification methods, develop guidelines that match science questions with the most appropriate algorithms, and evaluate their performance relative to both observations and climate model data so that the community can direct their future work.
The American Meteorological Society (2017) glossary defines an atmospheric river as "A long, narrow, and transient corridor of strong horizontal water vapor transport that is typically associated with a low-level jet stream ahead of the cold front of an extratropical cyclone.The water vapor in atmospheric rivers is supplied by tropical and/or extratropical moisture sources.Atmospheric rivers frequently lead to heavy precipitation where they are forced upward-for example, by mountains or by ascent in the warm conveyor belt.Horizontal water vapor transport in the midlatitudes occurs primarily in atmospheric rivers and is focused in the lower troposphere." ARTMIP strives to evaluate each of the participating algorithms within the context of this AR definition.

ARTMIP Goals
Numerous methods are used to detect ARs on gridded model or reanalysis data; therefore, AR characteristics, such as frequency, duration, and intensity, can vary substantially due to the chosen method.The differences between AR identification methods must be quantified and understood to more fully understand present and future AR processes, climatology, and impacts.With this in mind, ARTMIP has the following goals: Goal no.1: Provide a framework that allows for a systematic comparison of how different AR identification methods affect the climatological, hydrological, and extreme impacts attributed to ARs.
The co-chairs and committee have established this framework by facilitating meetings, inviting participants, sharing resources for data and information management, and providing a common structure enabling researchers to participate.The experimental design, described in Sect.4, is the product of the first ARTMIP workshop, and provides the framework necessary for ARTMIP to succeed.The final design was a collaborative decision and included participation from researchers from around the world who were interested in a AR detection comparison project and who are co-authors on this paper.Guan and Waliser (2015).These studies use different AR identification methods, as well as different atmospheric reanalyses and observed precipitation datasets.
Goal no.2: Understand and quantify the differences and uncertainties in the climatological characteristics of ARs, as a result of different AR identification methods.
The second goal is to quantify the extent to which different AR identification criteria (e.g., feature geometry, intensity, temporal, and regional requirements) contribute to the diversity, and resulting uncertainty, in AR statistics, and evaluate the implications for understanding the thermodynamic and dynamical processes associated with ARs, as well as their societal impacts.
The climatological characteristics of ARs, such as AR frequency, duration, intensity, and seasonality (annual cycle), are all strongly dependent on the method used to identify ARs.It is, however, the precipitation attributable to ARs that is perhaps most strongly affected, and this has significant implications for our understanding of how ARs contribute to regional hydroclimate now and in the future.For example, Fig. 2 highlights the results of three separate studies (Dettinger et al., 2011;Rutz et al., 2014;Guan and Waliser, 2015), which used different AR identification methods to analyze the fraction of total cool-season or annual precipitation attributable to ARs from a variety of reanalysis and precipitation datasets.Differences in AR identification methods as well as the techniques used to attribute precipitation to ARs have important implications for understanding the hydroclimate and managing water resources across the western US.For example, because so much of the western US water supply is accumulated and stored as snowpack during the cool season, scientists and resource managers need to know how much of this water is attributable to ARs, and how changing AR behavior might affect those numbers in the future.The purpose of this figure is not to directly compare these analyses but to motivate ARTMIP and illustrate the different ways of identifying and attributing precipitation that already exist in the literature.These results highlight the importance not only of quantifying the current uncertainty in AR climatology but also the importance of future projections and reliable estimates of their uncertainty.
Goal no.3: Better understand changes in future ARs and AR-related impacts.
As a key pathway of moisture transport across the subtropical boundary and from ocean to land, ARs are important elements of the global and regional water cycle.ARs also represent a key aspect of the weather-climate nexus as global warming may influence the synoptic-scale weather systems in which ARs are embedded and affect extreme precipitation in multiple ways.Hence, understanding the processes associated with AR formation, maintenance, and decay, and accurately representing these processes in climate models, is critical for the scientific community to develop a more robust understanding of AR changes in the future climate.
A key question that will be addressed is how different AR detection methods may lead to uncertainty in understanding the thermodynamic and dynamical mechanisms of AR changes in a warmer climate.Although the water vapor content in the atmosphere scales with warming following the Clausius-Clapeyron relation, changes in atmospheric circulation such as the jet stream and Rossby wave activity may also have a significant impact on ARs in the future (Barnes et al., 2013;Lavers et al., 2015;Shields and Kiehl, 2016b).Will ARs from different ocean basins respond differently to greenhouse forcing?How do natural modes of climate variability, i.e., the El Niño-Southern Oscillation (ENSO), the Arctic Oscillation (AO), the Madden-Julian oscillation (MJO), the Pacific Decadal Oscillation (PDO), or the Southern Annular Mode (SAM), come into play?How do changes in precipitation efficiency influence regional precipitation as-Table 1. Algorithm methods participating in the early phases of ARTMIP and content of this paper.The developer is listed along with algorithm details, i.e., type; geometry, threshold, and temporal requirements; region of study; DOI reference.Identifiers for the subset of methods participating in the 1-month proof-of-concept test are in the far-left column and labeled as A1, A2, etc. IVT is integrated vapor transport and IWV is integrated water vapor.ARTMIP is an ongoing project with the addition of new participants as the project progresses.For the most recent list of developers and participants, please refer to the ARTMIP web pages at http://www.cgd.ucar.edu/projects/artmip/(last access: 15 June 2018).a ZN relative threshold formula: Q>=Q zonal_mean + AR coeff (Q zonalmax − Qzonamean), where Q is the moisture variable, either IVT (kg m −1 s −1 ) or IWV (cm).AR coeff = 0.3 except where noted (Zhu and Newell, 1998).The Gorodetskaya method uses Qsat, where Qsat represents maximum moisture holding capacity calculated based on temperature (Clausius-Clapeyron), an important distinction for polar ARs.Additional analysis of the ZN method can be found in Newman et al. (2012).b Methods used in a 1-month proof-of-concept test (Sect.5).These methods are assigned an algorithm ID, i.e., A1, A2, etc. c These 1-month proof-of-concept methods apply a percentile approach to determining ARs.A3 and A8 applied the full Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) climatology to compute percentiles.A9 applied the February 2017 climatology for this test only.For the full catalogues, A9 will apply extended winter and extended summer climatologies to compute percentiles.Please refer to individual publications (DOI reference column in this table) for climatologies used in earlier published studies by each developer.The climatology used to compute percentile is often dependent on the dataset (reanalysis or model data) being used.sociated with ARs in the future?As the simulation fidelity of ARs is somewhat sensitive to model resolution (Hagos et al., 2015;Guan and Waliser, 2017), another important question is whether certain AR detection and tracking methods may be more sensitive to the resolutions of the simulations than others, and what the implications are for understanding uncertainty in projections of AR changes in the future.
To begin to answer and diagnose these questions, an understanding of how the definition and detection of an AR alters the answers to these questions is needed.A catalogue of ARs and AR-related information will enable researchers to assess which identification methods are most appropriate for the science question being asked, or region of interest.Applying different identification methods to climate simulations of ARs in the present day and future climate will facilitate more robust evaluation of model skill in simulating ARs and identification of mechanisms responsible for changes in ARs and associated extreme precipitation in a warmer climate.Finally, determination of the most appropriate methods of identifying ARs will provide for a set of best practices and community standards that researchers engaged in understanding ARs and climate change can work with and use to develop diagnostic and evaluation metrics for weather and climate models.

Method types
Table 1 summarizes the different algorithms adopted by the ARTMIP participants.Details for each parameter type and choice (from Fig. 1) are listed as table columns.The developer of the method is listed by row and refers to individuals or groups who developed the algorithm.The identifier in the first column (A1, A2, etc.) will be used for Figs. 3, 5, 7, and 8, and denotes those algorithms participating in the initial proof-of-concept phase of the project.Type choices are "condition" or "track" (see Sect. 3.1 for definition of these choices).Geometry requirements refer to the shape and axis requirements, if any, of an AR object.For example, a condition AR algorithm that tests a grid point may also have a requirement that strings grid points together to meet a minimum length, width, or axis.Threshold requirements refer to any absolute or relative threshold, typically for a moisturerelated variable, that must be met for an AR object to be defined.Temporal requirements refer to any time conditions to be met.Tracking algorithms typically contain temporal requirements to define an AR object as it is defined in time and space.However, many condition algorithms may also specify a minimum number of time instances (non-varying over a grid point) to be met before an AR object is counted for that grid point.Region refers to whether or not the algorithm is defined to track or count ARs globally or only over specified regions.The reference section lists published papers and datasets and their DOI numbers."Experimental" algorithms have not been published yet.

Condition vs. tracking algorithms
The subtleties in language when describing different algorithmic approaches are best illustrated with the "tracking" versus "condition" parameter type.For ARTMIP purposes, two basic detection "types," defined at the first ARTMIP workshop, represent two fundamentally different ways of detecting ARs."Condition" refers to counting algorithms that identify a time instance where AR conditions are met.Condition algorithms typically search grid point by grid point for each unique time instance.If AR geometry (involving multiple grid points) and threshold requirements are met, then an AR condition is found at that grid point and that point in time.Condition methods may also have an added temporal requirement, but this does not impact the fact that conditions are met at a unique point in space (grid point).
"Tracking" refers to a Lagrangian-style detection method where ARs are objects that can be tracked (followed) in time and space.Objects have specified geometric constraints and can span across grid points.Tracking algorithms must include a temporal requirement that stitches time instances together; i.e., a tracked AR would include several 3 h time slices stitched together.An example of an object-oriented tracking methods is the Sellars et al. (2015) tracking method.

Thresholding: absolute versus relative approaches
Another major area where algorithms diverge is in how to determine the intensity of an AR.Some methods follow studies, such as Ralph et al. (2004) and Rutz et al. (2014), that assign an observationally derived value, such as 2 cm of IWV, or an IVT value of 250 kg m −1 s −1 to determine the physical threshold required for identification of an AR.Other methods use a statistical approach rather than an absolute value when setting a threshold value, such as the approach developed by Lavers et al. (2012) where an AR is defined by the 85th percentile values of IVT (kg m −1 s −1 ).Other relative threshold methods, such as Shields andKiehl (2016a, b), andGorodetskaya et al. (2014), apply a direct interpretation of the foundational Zhu and Newell (1998) paper that defines ARs by computing anomalies of IWV (cm) or IVT (kg m −1 s −1 ) by latitude band.Further, Gorodetskaya et al. (2014) used the physical approach to define a threshold for IWV depending on the tropospheric moisture holding capacity as a function of temperature at each pressure level (Clausius-Clapeyron relation).The Lora et al. (2017) method is yet another relative thresholding technique wherein ARs are detected for IVT at 100 kg m −1 s −1 above a climatological-derived mean IVT value and thus changes with the climate state.Although all of these methods "detect" ARs, they do not always detect the same object.Observationally based methods may be best for case studies, forecasts, or current climatologies, but future climate research may be better served by relative methodologies, partly because of model biases in the moisture and/or wind fields.Ultimately, however, the best al-gorithmic choice will be unique to the science being done, rather than depend on general categories.

Experimental Design
ARTMIP will be conducted using a phased experimental approach.All participants must contribute to the first phase to provide a baseline for all subsequent experiments in the second phase.The first phase will be called Tier 1 and will require that participants provide a catalogue of AR occurrences for a set period of time using a common reanalysis product.This phase will focus on defining the uncertainties amongst the various detection method algorithms.The second phase, Tier 2, is optional, and will potentially include creating catalogues for a number of common datasets with different science goals in mind.To some degree, the experiments chosen for Tier 2 will be informed by the outcomes of Tier 1; however, initially, ARTMIP participants have proposed three separate Tier 2 experiments.The first and second experiments will test AR algorithms under climate change scenarios and different model resolutions, and the third experiment will explore the uncertainties to the various reanalysis products.Table 2 outlines the timeline for ARTMIP.

Tier 1 description
ARTMIP participants will run their independent algorithms on a common reanalysis dataset and adhere to a common data format.Tier 1 will establish baseline detection statistics for all participants by applying the algorithms to MERRA-2 (Modern Era Retrospective analysis for Research and Applications, version 2) (Gelaro et al., 2017, data DOI number: 10.5067/QBZ6MG944HW0) reanalysis data, for the period of January 1980-June 2017.To eliminate any processing differences between algorithm groups, all moisture and wind variables have been processed and made available at the University of California, San Diego (UCSD) Center for Western Weather and Water Extremes (CW3E) (Brian Kawzenuk, personal communication, 2017) at ∼ 50 km (0.5 • × 0.625 • ) spatial resolution and 3-hourly instantaneous temporal resolution.Specifically, ARTMIP participants that require IVT (integrated vapor transport, kg m −1 s −1 ) information for their algorithms will be using IVT data calculated by UCSD using the MERRA-2 data 3-hourly zonal and meridional winds, and specific humidity variables.IVT is calculated using the following Eq.(1) (from Cordeira et al., 2013): where q is the specific humidity (kg kg −1 ), V h is the horizontal wind vector (m s −1 ), P b is 1000 hPa, P t is 200 hPa, and g is the acceleration due to gravity.The 1-hourly averaged IVT data available from MERRA-2 directly will not be used.A comparison between 3-hourly UCSD IVT-computed data and 1-hourly MERRA-2 data was completed with details found in the Supplement.Although the 1 h data provide better temporal resolution, the 3-hourly data provide ample temporal information and are sufficient for algorithmic detection comparisons for ARTMIP.Gains using the 1-hourly MERRA-2 IVT data do not outweigh the extra burden in computational resources required for groups to participate in ARTMIP.Not all algorithms require IVT.Instead, some use IWV, integrated water vapor, or precipitable water (cm).This quantity is derived from MERRA-2 data and is computed as Eq. ( 2): where q is the specific humidity (kg kg −1 ), P b is 1000 hPa, P t is 200 hPa, and g is the acceleration due to gravity.Table 3 summarizes all the MERRA-2 data available for AR tracking.
Once catalogues are created for each algorithm, data will be made available to all participants.Data format specifications for each catalogue are found in the Supplement.
Many of the ARTMIP participants focus on the North Pacific (western North America) and North Atlantic (European) regions; however, ARs in other regions, such as the poles and the southeast US may also be evaluated with ARTMIP data.We are not placing any coverage requirements for participation in ARTMIP, and each group can provide as many global or regional catalogues as desired.

Tier 2 description
Tier 2 will be similar in structure to Tier 1 in that all participants will create catalogues on a common dataset and follow the same formats, etc.However, instead of algorithms creating catalogues for one reanalysis product, a number of sensitivities studies will be conducted, spanning AR detection sensitivity to reanalysis products, and AR detection sensitivity under climate change scenarios.

High-resolution climate change catalogues
For climate model resolution studies, CAM5 (Community Atmosphere Model, version 5; Neale et al., 2010) 20th century simulations available at 25, 100, and 200 km resolutions from the C20C+ (Climate of the 20th Century Plus Project) subproject on detection and attribution (http: //portal.nersc.gov/c20c,last access: 15 June 2018) are available for participants to create AR catalogues for a period of 27 years .For climate change studies, highresolution (25 km) historical  and end-of-thecentury RCP8.5 (2080-2099) CAM5 simulation data are also provided.This version of CAM5 uses the finite volume dynamical core on a latitude-longitude mesh (Wehner et al., 2014) with data freely available at http://portal.nersc.gov/c20c.We use high-resolution data for both the Tier 1 (∼ 50 km) and Tier 2 (25 km) climate change catalogues because it has been shown that high-resolution data are important in replicating AR climatology and regional precipitation.Although some climate models have a tendency to overestimate extreme precipitation related to ARs, these biases tend to decrease when high resolution is applied (Hagos et al., 2015(Hagos et al., , 2016)).In an Earth system modeling framework, regional precipitation is represented more realistically in the higherresolution version compared to the standard lower-resolution horizontal grids (Delworth et al., 2012;Small et al., 2014;Shields et al., 2016).High-resolution data will have a better representation of topographical features and be better able to represent regional features at a finer scale.

CMIP5 catalogues
A number of studies have analyzed CMIP5 model outputs to explore future changes in ARs and the thermodynamic and dynamical mechanisms for the changes (e.g., Lavers et al., 2013;Payne and Magnusdottir, 2015;Warner et al., 2015;Gao et al., 2016;Shields and Kiehl, 2016b;Ramos et al., 2016b).However, there is a lack of systematic comparison of the results and how differences in AR detection and tracking may have influenced the conclusions regarding the changes in AR frequency, AR mean and extreme precipitation, spatial and seasonal distribution of landfalling ARs, and other AR characteristics, impacts, and mechanisms.Characterizing uncertainty in projected AR changes associated with detection algorithms will facilitate more in-depth analysis to understand other aspects of uncertainty related to model differences, internal variability, and scenario differences, and such uncertainties influence our understanding of AR changes in a warming climate.

Reanalysis catalogues
For the reanalysis sensitivity experiment, products chosen may include ERA-I or 5 (European Reanalysis -ERA-Interim, or version 5; Dee et al., 2011), NCEP/NCAR (National Centers for Environmental Prediction -National Center for Atmospheric Research; Kalnay et al., 1996), JRA-55 (Japanese 55-year Reanalysis; Kobayashi et al., 2015), CFSR (Climate Forecast System Reanalysis;Saha et al., 2014), and the NOAA-CIRES 20th Century Reanalysis (Compo et al., 2011).Resolution will be coarsened to the lowest resolution, and temporal frequency will be chosen by the lowest temporal frequency available amongst all the various products for the necessary variables (listed in Table 3).

Metrics
Once all the catalogues are complete, then analysis will begin.There are many metrics to potentially analyze that are currently found in the literature.The frequency, duration, intensity, climatology of ARs, and their relationship to precipitation are common.Other metrics, such as those described in Guan and Waliser (2017), can be adapted for ARTMIP.To test the experimental design, we conducted a 1-month proofof-concept test to help the basic design and fine tune a few metrics.Here, we present a few results from this 1-month test that diagnose frequency, intensity and duration for two landfalling AR regions, the North Pacific and North Atlantic.For the full Tier 1 analysis in future publications, global views will be added.Landfalling regions are chosen so that both regional algorithms, focused on impacts to specific continental areas, and global algorithms can be compared directly.For the full catalogues in Tier 1, additional regions will be analyzed, including the east Antarctic, which has proven to have large differences between methodologies that implement a global algorithm compared to a regionally specific polar algorithm (Gorodetskaya et al., 2014).February 2017 was chosen because of the frequent landfalling North Pacific ARs during this time.Algorithms participating in the 1-month test are labeled with a "b" in Table 1 and identified with an algorithm ID, i.e., A1, A2, etc.We also conducted a "human" control, where AR conditions and tracks were identified by eye for the month of February for landfalling ARs impacting the western coastlines of North America and Europe.Full details on the human control dataset are explained in the Supplement.We emphasize here that the human control is not considered "truth", nor is it better or worse than automated methods, but merely another (subjective) method to add to the spectrum of detection algorithms participating in ART-MIP.

Frequency
Figure 3 shows frequency (in 3 h instances) by latitude band for landfalling ARs.The human control as well as each of the methods are plotted for February 2017.Each color represents a unique detection algorithm, and the black lines represent the human controls where both IVT and IWV were utilized to identify ARs by eye.The IVT threshold (solid black line) is 250 kg m −1 s −1 , and the IWV thresholds (two different dashed lines) are 2 and 1.5 cm, respectively.For western North America, all of the algorithms and the human controls agree on the shape of the latitudinal distribution with most AR 3 h period detections accumulating along the coast of California.ARs over the North Atlantic are latitudinally more diverse, but the majority of algorithms and controls peak around 53 • N. Regarding the actual number of 3 h periods, there is a large spread in the frequency values across all the automated algorithms with the human control "detections" far exceeding most algorithms.This preliminary result suggests that setting a moisture threshold of 250 kg m −1 s 1 or an IWV value of 2 cm for North Atlantic ARs, as in the human control, is potentially too permissive.threshold, and the black dashed (and dotted) lines represent static 2 and 1.5 cm IWV thresholds, respectively.Algorithm identifiers (A1, A2, etc.) are specified in Table 1.
To help identify case study events, a methodology count of how many (and which) methods detect an AR along the coast can be conducted.Figure 4 plots the number of methods that detect an AR at the North American coastline for a sample of days in February 2017.The number of method detections for each 3 h time instance per day was computed, but only the maximum time instance per day is plotted for simplicity.The polygons represent the number of methods.For example, if only one method detects an AR at a specific grid point along the coast, then a beige circle is plotted at that grid point along the coast; if 14 methods detect an AR at a specific grid point along the coast, then a dark blue circle is plotted at that grid point along the coast, and so forth.Even with this basic representation, the diversity in numbers of method detections for each day is large.There are days where there is good method agreement in identifying AR conditions along the coastline.For example, for 7 February, most methods identify AR conditions in southern California, and on 9 and 15 February many methods detect ARs in the Pacific northwest.However, there are many days where only a handful of methods detect ARs (i.e.,22 and 28 February).The ability of individual algorithms to detect the duration of events listed here is examined in further detail in Sect.5.3.Because each day had eight associated time steps, the maximum number of methods for each day is plotted.The polygons represent the number of methods; i.e., if only one method detected an AR at a specific grid point along the coast, then a light beige circle is plotted at that grid point along the coast; if 14 methods detected an AR at a specific grid point along the coast, then the darkest blue star is plotted at that grid point along the coast.Individual methods are not identified.

Intensity
Intensity can be defined in many ways but often refers to the amount of moisture present in an AR and/or the strength of the winds.IVT is an obvious quantity to use when evaluating the strength of an AR because it incorporates both wind and moisture.There is value, however, at looking at these quantities separately when trying to decompose dynamic and thermodynamic influences.For the 1-month test, we looked at IVT for time instances where ARs exist.
In Figs. 5 and 6, we show two different ways of looking at mean AR-IVT across applicable methods to highlight how the definition of intensity can also vary. Figure 5a and b show composites (for the North Pacific and European sectors, respectively) only at grid points where detection algorithms are implemented and include all time instances.This provides a look at the mean IVT for all ARs at all locations for all times.Not all algorithms search for AR conditions at all points.For example, A14 (Shields and Kiehl) only detects ARs that make landfall along coastal grid points, and A9 (Ramos et al.) detects ARs along reference meridians (for masks for regional algorithms, see Fig. S3 in the Supplement).Figure 6 comparatively, shows IVT composites for each grid point, focusing only on specific time periods where landfalling ARs exist.While Fig. 5 shows mean IVT for all ARs at detection points, Fig. 6 is the composite for landfalling ARs only.Each of these methods shows intensity but is looking at different quantities.The landfalling ARs have a different signature and a less intense distribution, compared to the all-location AR composites.As one would expect, for both Figs. 5 and 6, methods with higher thresholds on IVT produce much higher AR average intensities; thus, AR intensity metrics could be thought of as self-selecting for some cases.

Duration
The duration of ARs also must be defined.Typically, this is expressed as the length of time an AR affects a point location, for example, a coastal location for a landfalling storm.However, for tracking algorithms, duration may be defined as the life cycle of an AR.For the 1-month proof-of-concept test, we use the first definition and look at the duration at coastal locations along the North American west coast and specific European locations.Figure 7a shows a time series of daily IVT anomalies along the western coastlines of the (orange line) Iberian Peninsula, (teal line) United States, and (blue line) Ireland and the United Kingdom.Four human-observed AR tracks for events in each region are shaded and the composite magnitudes of IVT for each are shown in Fig. 7b-e.These four events are compared over a variety of algorithms, indicated by algorithm ID in Fig. 7a, where each black dot indicates detection of an AR along the coastline.While all algorithms are listed, it is important to note that they are a mix of regional and global algorithms in scope.An example snapshot of IVT from a global view is shown in the Supplement (Fig. S4).The date 19 February 2017, at 21:00 Z, was chosen to illustrate individual ARs in the MERRA-2 dataset during the month examined here.
The four selected events in Fig. 7 demonstrate the large diversity of AR geometry, landfall location, and intensity that must be identified by each algorithm.The agreement between the different algorithms, hinted at in Fig. 4, is apparent in a comparison of the two west coast examples mentioned in Sect.5.1 (Fig. 7c and e).The three versions of the Sellars et al. (2015) algorithm can be used as a benchmark of AR intensity, in which the IVT threshold increases from 300 kg m −1 s −1 (in A11) to 700 kg m −1 s −1 (in A13).Relatively strong events are well captured by most algorithms (Fig. 7b-d), with few exceptions that are likely related to domain size.Agreement between algorithms on the duration or presence of an AR during weaker events is much more variable, such as that seen in Fig. 7e.

Comparison with precipitation observational datasets
The importance of understanding and tracking ARs ultimately boils down to impacts.AR-related precipitation can be the cause of major flooding, can fill local reservoirs, and can relieve droughts.How much precipitation falls, the rate at which it falls, and when and where it falls, specifically during AR events, is a metric we must consider for this project.The variation among the different algorithms can be seen in a comparison of precipitation characteristics for the event shown in Fig. 7c using MERRA-2 precipitation data (Fig. 8).The inset shows the landfalling mask from Shields and Kiehl ( 2016), which is used as a common base of comparison for landfall between the different algorithms.Precipitation related to the landfalling AR is isolated by focusing only on grid boxes that are tagged by each algorithm.Comparison shows a positive relationship between the average spatial coverage of the detected landfalling plume (y axis) and the average maximum precipitation rate at each time slice (x axis).Generally, the durations of AR conditions along the coastline are higher for algorithms with broader coverage.The wide range of characteristics for this single well-defined event motivates further investigation.As a part of Tier 1, methods will be evaluated using a variety of precipitation products in addition to MERRA-2, most relevant to the areas of interest.These include the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) 3B42 product, version 7 (Huffman et al., 2007), the Global Precipitation Climatology Project (GPCP) dataset (Huffman et al., 2001), the Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN; Sorooshian et al., 2000), Livneh (Livneh et al., 2013), and E-OBS (Haylock et al., 2008).Tier 2 climate studies will use precipitation output, both convective and large-scale, from the CAM5 simulations.Finally, it is important to consider not only the uncertainties in attributing precipitation due to detection method but also the manner or technique used when assigning precipitation values to individual ARs.

Summary
ARTMIP is a community effort designed to diagnose the uncertainties surrounding atmospheric river science based on detection methodology alone.Understanding the uncertainties and, importantly, the implications of those uncertainties, is the primary motivation for ARTMIP, whose goals are to provide the community with a deeper understanding of AR tracking, mechanisms, and impacts for both the weather forecasting and climate community.There are many detection algorithms currently in the literature that are often fundamentally different.Some algorithms detect ARs based on a condition at a certain point in time and space, while others follow, or track, ARs as a whole object through space and time.Some algorithms use absolute thresholds to determine moisture intensity, while others use relative measures, such as sta-tistical or anomaly-based approaches.The many degrees of freedom, in both detection parameter and choice of thresholds or geometry, add to the uncertainty of defining an AR, in particular for gridded datasets such as reanalysis products, or model output.This project aims to disentangle some of these problems by providing a framework to compare detection schemes.The project is divided into two tiers.The first tier is mandatory for all participants and will provide a baseline by applying all algorithms to a common dataset, the MERRA-2 reanalysis.The second tier is optional and will focus on sensitivity studies such as comparison amongst a variety of reanalysis products, and a comparison using climate model

Figure 1 .
Figure 1.Schematic diagram illustrating the diversity on AR detection algorithms found in current literature by categorizing the variety of parameters used for identification and tracking, and then listing different types of choices available per category.

Figure 2 .
Figure 2. Examples of different algorithm results.(a, b) The fraction of total cool-season precipitation attributable to ARs from Dettinger et al. (2011) and Rutz et al. (2014).(c) As in panels (a, b) but for annual precipitation fromGuan and Waliser (2015).These studies use different AR identification methods, as well as different atmospheric reanalyses and observed precipitation datasets.

Figure 3 .
Figure 3. Human control vs. method counts (3 h instances) at the coastline for landfalling ARs by latitude for the month of February using MERRA-2 3-hourly data.West refers to North Pacific ARs making landfall along western North America, and east refers to North Atlantic ARs impacting European latitudes.Color lines represent detection algorithms and black lines represent the "human" control.The black solid line represents a static IVT 250 kg m −1 s −1threshold, and the black dashed (and dotted) lines represent static 2 and 1.5 cm IWV thresholds, respectively.Algorithm identifiers (A1, A2, etc.) are specified in Table1.

Figure 4 .
Figure4.The number of methods that detect an AR at the coastline for sample days in February is plotted; plots are labeled with the date in YYYYMMDD format; i.e., 20170201 is 1 February 2017.Because each day had eight associated time steps, the maximum number of methods for each day is plotted.The polygons represent the number of methods; i.e., if only one method detected an AR at a specific grid point along the coast, then a light beige circle is plotted at that grid point along the coast; if 14 methods detected an AR at a specific grid point along the coast, then the darkest blue star is plotted at that grid point along the coast.Individual methods are not identified.

Figure 5 .
Figure 5. (a) Composite MERRA-2 IVT (kg m −1 s −1 ) for western North America for all AR occurrences for all grid points where ARs are detected.Algorithm IDs are found in Table 1.Algorithm A14 computes AR detection only for landfalling ARs at coastline grid points.The absence of color indicates no AR detection.(b) Same as panel (a) except for North Atlantic ARs.Algorithm A9 detects ARs at reference meridians.Note that the number of algorithms in this figure differs from panel (a) due to the regional constraint of the respective definitions.

Figure 6 .
Figure 6.(a) Composite MERRA-2 IVT (kg m −1 s −1 ) but for landfalling ARs only along the North American west coast.Time instances where an AR was detected along the coastline were composited for the entire region.Algorithm masks are not necessary.(b) Same as panel (a) except for European coastlines.Note that the number of algorithms in this figure differs from panel (a) due to the regional constraint of the respective definitions.

Figure 7 Figure 8 .
Figure 7. (a) Time series of daily IVT anomalies for (orange) Iberia, (teal) the US west coast, and (blue) Ireland and the UK.Four events of varying geometry and intensity are shaded in panel (a) and composites for each event are shown in panels (b)-(e).The black dots above the time series in panel (a) indicate time slices in which each event is detected by an algorithm.

Table 2 .
ARTMIP timeline.Completed targets are in bold.