The method ADAMONT v1.0 for statistical adjustment of climate projections applicable to energy balance land surface models

We introduce the method ADAMONT v1.0 to adjust and disaggregate daily climate projections from a regional climate model (RCM) using an observational dataset at hourly time resolution. The method uses a refined quantile mapping approach for statistical adjustment and an analogous method for sub-daily disaggregation. The method ultimately produces adjusted hourly time series of temperature, precipitation, wind speed, humidity, and shortand longwave radiation, which can in turn be used to force any energy balance land surface model. While the method is generic and can be employed for any appropriate observation time series, here we focus on the description and evaluation of the method in the French mountainous regions. The observational dataset used here is the SAFRAN meteorological reanalysis, which covers the entire French Alps split into 23 massifs, within which meteorological conditions are provided for several 300 m elevation bands. In order to evaluate the skills of the method itself, it is applied to the ALADIN-Climate v5 RCM using the ERA-Interim reanalysis as boundary conditions, for the time period from 1980 to 2010. Results of the ADAMONT method are compared to the SAFRAN reanalysis itself. Various evaluation criteria are used for temperature and precipitation but also snow depth, which is computed by the SURFEX/ISBA-Crocus model using the meteorological driving data from either the adjusted RCM data or the SAFRAN reanalysis itself. The evaluation addresses in particular the time transferability of the method (using various learning/application time periods), the impact of the RCM grid point selection procedure for each massif/altitude band configuration, and the intervariable consistency of the adjusted meteorological data generated by the method. Results show that the performance of the method is satisfactory, with similar or even better evaluation metrics than alternative methods. However, results for air temperature are generally better than for precipitation. Results in terms of snow depth are satisfactory, which can be viewed as indicating a reasonably good intervariable consistency of the meteorological data produced by the method. In terms of temporal transferability (evaluated over time periods of 15 years only), results depend on the learning period. In terms of RCM grid point selection technique, the use of a complex RCM grid points selection technique, taking into account horizontal but also altitudinal proximity to SAFRAN massif centre points/altitude couples, generally degrades evaluation metrics for high altitudes compared to a simpler grid point selection method based on horizontal distance.

Abstract. We introduce the method ADAMONT v1.0 to adjust and disaggregate daily climate projections from a regional climate model (RCM) using an observational dataset at hourly time resolution. The method uses a refined quantile mapping approach for statistical adjustment and an analogous method for sub-daily disaggregation. The method ultimately produces adjusted hourly time series of temperature, precipitation, wind speed, humidity, and short-and longwave radiation, which can in turn be used to force any energy balance land surface model. While the method is generic and can be employed for any appropriate observation time series, here we focus on the description and evaluation of the method in the French mountainous regions. The observational dataset used here is the SAFRAN meteorological reanalysis, which covers the entire French Alps split into 23 massifs, within which meteorological conditions are provided for several 300 m elevation bands. In order to evaluate the skills of the method itself, it is applied to the ALADIN-Climate v5 RCM using the ERA-Interim reanalysis as boundary conditions, for the time period from 1980 to 2010. Results of the ADAMONT method are compared to the SAFRAN reanalysis itself. Various evaluation criteria are used for temperature and precipitation but also snow depth, which is computed by the SURFEX/ISBA-Crocus model using the meteorological driving data from either the adjusted RCM data or the SAFRAN reanalysis itself. The evaluation addresses in particular the time transferability of the method (using various learning/application time periods), the impact of the RCM grid point selection procedure for each massif/altitude band configuration, and the intervariable consistency of the adjusted meteorological data generated by the method. Results show that the performance of the method is satisfactory, with similar or even better evaluation metrics than alternative methods. However, results for air temperature are generally better than for precipitation. Results in terms of snow depth are satisfactory, which can be viewed as indicating a reasonably good intervariable consistency of the meteorological data produced by the method. In terms of temporal transferability (evaluated over time periods of 15 years only), results depend on the learning period. In terms of RCM grid point selection technique, the use of a complex RCM grid points selection technique, taking into account horizontal but also altitudinal proximity to SAFRAN massif centre points/altitude couples, generally degrades evaluation metrics for high altitudes compared to a simpler grid point selection method based on horizontal distance.

Introduction
Projections of future climate change in terms of meteorological conditions and their impacts are requested for many scientific and societal applications (IPCC, 2013(IPCC, , 2014a. For a given socio-economic or greenhouse-gas concentration scenario, these projections generally concern future temperature and precipitation, and associated extreme events, and are usually generated using the outputs of global climate models (GCMs) and regional climate models (RCMs). However, GCMs and RCMs suffer from biases compared to local observations (Christensen et al., 2008;Rauscher et al., 2010;Kotlarski et al., 2014). Raw climate projections must therefore be adjusted (Déqué, 2007;Themeßl et al., 2011;Gobiet et al., 2015;Maraun, 2016) before they can be used as such D. Verfaillie et al.: The method ADAMONT v1.0 for statistical adjustment of climate projections (meteorological conditions) or in order to drive specific impact models. Various downscaling and adjustment methods have been developed (Maraun et al., 2010;Teutschbein andSeibert, 2012, 2013). They all require an observation dataset which (i) meets the data requirements of the application and (ii) is sufficiently long and reliable to be used to infer the relationships between the observations and the raw climate projections during the observation time period. Several approaches, such as the analog method, search for relationships between observed large-scale predictors (generally from reanalyses) and observed local-scale predictands (Vrac et al., 2007a;Dayon et al., 2015). In contrast, model output statistics approaches calibrate model outputs against observations, with various levels of complexity, such as scaling methods (linear scaling, local intensity scaling, variance scaling, etc.), delta-change methods (e.g. Abegg et al., 2007;Hantel and Hirtl-Wielke, 2007;Schmucki et al., 2014) and distribution mapping methods (e.g. Boe et al., 2007;Déqué, 2007;Gobiet et al., 2015;Olsson et al., 2015). The latter include quantile mapping, which is considered as an efficient and easy to implement adjustment method (Themeßl et al., 2011;Teutschbein and Seibert, 2012;Maurer and Pierce, 2014;Gobiet et al., 2015). The main advantage of this method is that it adjusts deviations in the shape of the distribution, and is thus able to adjust deviations not only for the mean but also for the entire probability distribution function (PDF) (Themeßl et al., 2011). Moreover, the adjustment is not strictly restricted to the range of observed values in the reference period, which is the case for example for methods based on analog weather patterns (e.g. Déqué, 2007;Themeßl et al., 2011;Rousselot et al., 2012;Dayon et al., 2015), provided that values based on the lowermost and uppermost quantiles are handled appropriately (Gobiet et al., 2015). It can thus be used for evaluation of climate extremes or projections at the end of the 21st century, as long as the probability associated with these events is robustly estimated from a long enough sample. The main limits of quantile mapping are the assumption of time-invariant model deviation to observations on which it is based and the fact that the temporal properties of the model are not adjusted. If the model has a chronological behaviour which differs from the observations (too chaotic or too persistent), this will not be adjusted (Déqué, 2007). Moreover, quantile mapping does not guarantee the spatial and intervariable consistency, in contrast to e.g. the analog method. Furthermore, the performance level of quantile mapping methods is sensitive to the observation dataset used and the detailed characteristics of their implementation, which requires specific attention.
Climate projections in mountainous regions, which are motivated by a broad range of geophysical, environmental and societally relevant scientific challenges (Martin et al., 1994;Beniston, 1997;Jomelli et al., 2009;Castebrunet et al., 2014;Piazza et al., 2014;Schmucki et al., 2014;Lafaysse et al., 2014;Boulangeat et al., 2014;Thuiller et al., 2014;Castebrunet et al., 2014;Francois et al., 2015;Spandre et al., 2016) are particularly sensitive to the quality of the adjustment method. Indeed, RCM resolutions typically between 10 and 50 km are not sufficient to capture the fine-scale processes and thresholds at play. Resolving altitude dependencies is critical, especially for snow-related issues (because of the temperature dependency of the snow-rain transition). Furthermore, not only temperature and precipitation act on the snowpack but also a broader range of meteorological conditions and their diurnal variations. As a consequence, considering only adjusted daily temperature and precipitation would miss some of the non-linear responses of the snowpack. Such phenomena cannot be addressed using deltachange methods, which by definition apply fixed changes to an observed time series, conserving its statistical persistence properties and seasonality (e.g. Abegg et al., 2007;Hantel and Hirtl-Wielke, 2007;Schmucki et al., 2014;Marty et al., 2017) although those could evolve significantly under changed climate conditions.
Here we introduce the ADAMONT v1.0 method to adjust climate model projections in order to provide hourlyadjusted meteorological conditions for past and future conditions based on climate model output and observational datasets. Although it could be applied for GCM output, it was primarily designed to process RCM output. Indeed, raw regional climate projection data are increasingly made available, e.g. the World Climate Research Program (WCRP) Coordinated Regional Downscaling Experiment (CORDEX; Giorgi et al., 2009), whose aim is to improve and distribute regional climate modelling worldwide. Its European branch, EURO-CORDEX (Jacob et al., 2014), gathers regional climate simulations over Europe from 30 different modelling groups at 50 km (EUR-44) and 12.5 km (EUR-11) resolutions. On the observation side, the use of surface meteorological reanalysis is a powerful alternative to station observation data to provide the necessary observational dataset (Berg et al., 2015). Indeed, the process by which such reanalyses are generated addresses the time and space variations in the meteorological conditions, and by design they consist of gap-free and complete time series. Here we describe the use of the ADAMONT method based on RCM model output comparable to EURO-CORDEX and on the mountain meteorological reanalysis SAFRAN. SAFRAN was developed specifically to address the needs of snowpack numerical simulations in mountainous regions, and contains hourly time series of temperature, precipitation, wind speed, humidity, and short-and longwave radiation for so-called massifs (ranging between 500 and 2000 km 2 in the French Alps) by elevation steps of 300 m (Durand et al., 2009a, b). Here, quantile mapping is applied using daily outputs from a given RCM for all the variables provided in the SAFRAN reanalysis. Following a subdaily disaggregation step based on analog days selection from the reanalysis itself, these hourly-adjusted fields are then used to force the SURFEX/ISBA-Crocus (Vionnet et al., 2012) model over the French Alps. We evaluate the performance of the ADAMONT method by applying it to the ALADIN-Climate v5 RCM (Colin et al., 2010) forced by the ERA-Interim reanalysis (Dee et al., 2011) over the period 1980-2010. Section 2 describes the models used and the evaluation approach. Sections 3 and 4 contain the results and their discussions, respectively, and general conclusions are drawn in Sect. 5.
2 Models and methods 2.1 Description of the ADAMONT method ADAMONT is primarily a quantile mapping adjustment method (Déqué, 2007;Gobiet et al., 2015). In general, quantile mapping is considered to be one of the most efficient bias adjustment methods available (Themeßl et al., 2011;Maurer and Pierce, 2014;Gobiet et al., 2015). It consists of adjusting the quantiles of the simulated historical distribution based on the quantiles of the observed distribution. The main issues with quantile mapping relate to the assumption of time-invariant model biases, the fact that temporal properties of the RCM are untouched by the adjustment method and that the spatial and intervariable consistency is not guaranteed. Moreover, Driouech et al. (2009) showed that for mid-latitude climates, such as in Morocco, quantile mapping adjustment can vary for different weather regimes, because model biases vary in different regimes. Similarly, Addor et al. (2016) demonstrated the sensitivity of quantile mapping adjustment to circulation biases over the alpine domain. Additionally, the frequency of weather regimes may change in a changing climate (Boe et al., 2006;Cattiaux et al., 2013). To improve the stationarity of our method in a changing climate, weather regimes are thus taken into account, i.e. quantile adjustment functions are computed and applied depending on the weather regime.
Assuming the availability of a gap-free meteorological observational dataset at hourly time resolution consisting of one or several geographical locations considered sharing similar large-scale meteorological conditions, and daily RCM model outputs covering the geographical domain of interest, the statistical adjustment method ADAMONT consists of the following steps: 1. RCM grid point selection: for each observation point, a RCM grid point is selected by minimising the following distance: where x, y and z represent the longitudinal, latitudinal and vertical distances (in km) between the observation point and the RCM grid points, and N is referred to as the elevation factor. Values of 0, 50 and 100 were tested, but 0 (N0) and 50 (N50) are reported in this study. The factor N is a scaling factor between horizontal and vertical distances, allowing us to take into account the strong dependence of meteorological variables (mainly precipitation and temperature) on altitude (e.g. Gottardi et al., 2012;Kotlarski et al., 2012).
2. Weather regime computation: each day of the RCM and observational records are clustered into different daily weather regimes based on the geopotential height at 500 hPa, following Michelangeli et al. (1995), similar to the method described in Driouech et al. (2010). Weather regime clusters were previously computed on the basis of the large-scale meteorological reanalysis ERA-40 (Uppala et al., 2005). The ERA-Interim reanalysis (Dee and Uppala, 2009) was used to infer weather regimes corresponding to each observation date and for all observation points. RCM weather regimes were determined based on the synoptic field of the GCM used as a boundary condition for the RCM. In Michelangeli et al. (1995) and Driouech et al. (2010), only regimes for the winter season are defined. We chose to apply the same method to determine weather regimes for the other seasons as well. A classification and reproducibility analysis performed by Michelangeli et al. (1995) showed that four weather regimes can reasonably be chosen for Europe. On one hand, this number is a compromise between accuracy of the correction and robustness of the percentile estimation (more regimes can be used, such as in Ummenhofer et al., 2017). On the other hand, this relatively small number of regimes ensures a sufficiently large size of the datasets used for quantile mapping (which are, as described below, further partitioned into four seasons: DJF, MAM, JJA and SON). Figure 1 represents the different regimes used in this study.
3. Aggregation from hourly to daily observations: the observational data are aggregated from hourly to daily time resolution, depending on the variable considered (see Table 1). For temperature, the daily minimum and maximum values (from 06:00 to 06:00 UTC the next day) are selected (RCMs generally offer daily minimum and maximum temperature); for wind speed and humidity, the last value of each day (at 06:00 UTC) is selected (in order to be comparable to an instantaneous value), and for precipitation and radiation, the daily mean (06:00 to 06:00 UTC) is used.  For precipitation, it can happen that for low quantiles, the probability of precipitation is lower in the RCM than in the observation dataset (i.e. several null values in the RCM, which can correspond to different positive values in the observational data). In this case, a random draw is performed amongst the observation values within the same quantile.
6. Selection of analogue date for sub-daily disaggregation: for each day in the RCM dataset, an analogous date is chosen in the observational dataset, matching the following criteria: the month and the weather regime must be the same as in the RCM dataset, and whenever possible, consecutive time slices are chosen in the observational dataset in order to avoid artificial jumps in the final data linked to the choice of analogues. A further criterion is applied to ensure that the weather situations are even more comparable between the RCM date and the analogous date from the observational record, based on precipitation consistency (wet vs. dry conditions). A threshold of 1 kg m −2 day −1 on total precipitation is applied to partition dates between dry and wet conditions. For the first RCM date, a random draw amongst all available observational dates is performed, then the dates are browsed through chronologically until one meets all the requirements outlined above. This analogous day is then used in the following step for all variables. If the following analogue day in the observations still meets all requirements, it is selected as analogue for the following day in the RCM (to ensure as far as possible consecutive time slices). A new random draw is only performed once the analogue fails to meet all requirements described above.
7. Sub-daily disaggregation: the adjusted RCM dataset is disaggregated from a daily integration period into an hourly time step by using the hourly observational data from each analogous date chosen in the previous step to reconstruct the daily cycle of the data: where X h RCM (i) is the hourly-adjusted RCM value of the variable X and X h OBS is the hourly observational value of the same variable from the chosen analogous date (step 6). Different criteria are chosen to calculate a and b, depending on the variable considered (Table 1). For the disaggregation of RCM-adjusted temperature from daily to hourly (Table 1), a compromise must be made between obtaining minimum and maximum daily values as close as possible to RCM-adjusted daily minimum and maximum and minimising the possible jump in adjusted values between consecutive days. This is achieved by minimising the following function: where T h RCM (1 h, i) and T h RCM (24 h, i−1) are the hourlyadjusted RCM temperature values at the first time step of day i and at the last time step of day i − 1, T min h RCM (i) and T max h RCM (i) are the hourly minimum and maximum adjusted RCM temperature values, respectively, and T min d, adj RCM (i) and T max d, adj RCM (i) are the daily minimum and maximum adjusted RCM temperature values, respectively (Fig. 2). α is a parameter which can be tuned to balance the importance of the minimisation of differences between daily and hourly RCM minima and maxima and the minimisation of the jump between two consecutive days. For a value of 0 for α, there would be no jump in values between consecutive days, but the values of T min h RCM (i) and T max h RCM (i) could be far from the values of T min d, adj RCM (i) and T max d, adj RCM (i). For an infinitely large value for α, the minimum and maximum hourly and daily values would match, but the jump between consecutive days could be significant. Sensitivity tests yielded an optimal value of 2 for α. Following Eq. (2), Eq. (3) transforms into By searching for the local minima δQ/δa = 0 and δQ/δb = 0, a and b can be determined, and the hourlyadjusted RCM temperature can be obtained following Eq.
(2). For specific cases, i.e. for the first day where T h RCM (24 h, i − 1) does not exist or if the determinant of our system is too close to zero (< 0.1) or in the case where a < 0, a simpler equation is used in which we only ensure that final minimum and maximum daily values correspond to the RCM-adjusted minimum and maximum values by solving This procedure is only applied for temperature because the use of the maximum and minimum criterion can lead to important jumps between consecutive days, which is not the case for other variables (Table 1). For humidity, Eq.
(2) is solved using b = 0 and a = X d, adj , so that the hourlyadjusted RCM value and the hourly observational value at the last time step of day i (X h OBS (24 h, i)) are equal. For wind speed, the same calculation as for humidity is applied, except if a > 1 (i.e. X d, adj RCM (i)/X h OBS (mean, i), so that the mean hourlyadjusted RCM value and the mean hourly observation value of day i are equal. For solar radiation, if X h OBS (mean, i) ≤ 10 −10 , a = 0. For precipitation, if this is the case, a = 1.
8. Snow/rain partitioning: total precipitation is separated into rainfall and snowfall based on hourly-adjusted temperature (a threshold of 1 • C is used for the transition from snow to rain). As mentioned above, intervariable consistency is not guaranteed by quantile mapping. Given the importance of the consistency between temperature and precipitation in many applications and in particular in mountainous areas, given that precipitation and temperature are corrected independently from each other (step 5), and because the adjustment can differ for the different precipitation phases, the relationship between temperature and precipitation phase may be modified by quantile mapping so that the adjusted rain and snow distributions may lose consistency. To avoid this, Olsson et al. (2015) separated temperature data into wet and dry days before adjustment. In our case an additional quantile mapping against the observational dataset is applied for daily cumulated adjusted RCM rainfall and snowfall separately. Hourly-adjusted RCM rainfall and snowfall (a 2 ) are then determined by applying the ratio between daily rainfall or snowfall after quantile mapping (A 2 ) and daily rainfall or snowfall before quantile mapping (A 1 ) to the hourly rainfall or snowfall before quantile mapping (a 1 ) If A 1 = 0 and A 2 = 0, then a 2 = 0. If A 1 = 0 and A 2 = 0, then a 2 = A 2 .
9. Final adjusted dataset: the resulting adjusted hourly time series for each variable are obtained for each snow year (from the 1st of August to the 31 July of the following year), matching the format of the observational dataset.

SAFRAN reanalysis and application of ADAMONT method using SAFRAN
Although the ADAMONT method is highly generic and can be applied using any hourly-resolution observational dataset, in the following we focus on the use of ADAMONT using the SAFRAN reanalysis data as an observational dataset. We first describe SAFRAN, then we present specific features of the ADAMONT method when using SAFRAN as the observational dataset.
The SAFRAN system is a regional-scale meteorological downscaling and surface analysis system (Durand et al., 1993), which provides hourly data of temperature, precipitation amount and phase, specific humidity, wind speed, and shortwave and longwave radiation for each mountain region (or "massif") in the French Alps (23 massifs, as illustrated in Fig. 3) but also in the French and Spanish Pyrenees and Corsica. Unlike traditional reanalyses, SAFRAN does not operate on a grid but on French mountain regions subdivided into different polygons known as massifs. Massifs (Durand et al., 1993(Durand et al., , 1999 correspond to regions ranging approximately between 500 and 2000 km 2 for which meteorological conditions are assumed to be spatially homogeneous but vary with altitude. SAFRAN data are available for elevation bands with a resolution of 300 m, i.e. altitude levels 600, 900, 1200, 1500 m etc. are typically considered, making it possible to extract meteorological information at these altitude levels, or in-between using altitude interpolation. It was used by Durand et al. (2009b) to create a meteorological reanalysis over the French Alps by combining the ERA-40 reanalysis (Uppala et al., 2005) with various meteorological observations including in situ mountain stations, radiosondes and satellite data. It was complemented after the end of the ERA-40 reanalysis (2002) by large-scale meteorological fields from the ARPEGE analysis so that it now spans the period from 1959 to 2016, making it one of the longest meteorological reanalyses available in the French mountain regions.
When the ADAMONT method is applied using the SAFRAN reanalysis, only one geographic coordinate is used for each massif, corresponding to the centre of the massif (see Fig. 3). However, for each massif several altitude levels are considered, which means that depending on the N factor considered different RCM grid points may be selected for a given massif and altitude. Also, in order to maximise the consistency between massifs after the adjustment process, the dry/wet analogue day criterion used for the time disaggregation of RCM-adjusted variables into hourly variables is computed generally for the entire SAFRAN dataset, here in the 23 French Alp massifs. This means that a day is considered dry when the average of all daily precipitation data is below 1 kg m −2 day −1 and wet if it falls above the threshold for all massifs and all altitude levels (from an observational perspective), and for all corresponding adjusted RCM grid points (from an adjusted RCM perspective).

SURFEX/ISBA-Crocus model
Crocus (Brun et al., 1989(Brun et al., , 1992Vionnet et al., 2012) is a detailed snowpack model within the SURFEX externalised surface module (Masson et al., 2013). It enables the computation of the exchanges of energy and mass between the snow surface and the atmosphere (radiative balance, turbulent heat and moisture fluxes, etc.), but also between the snowpack and the ground underneath. Similarly to most land surface models, it requires sub-diurnal (ideally hourly) meteorological forcing data including air temperature, humidity, incoming longwave and shortwave radiation, wind speed, and rain and day i day i+1 day i-1 Figure 2. Timeline of the different parameters taken into account in the disaggregation of RCM temperature from a daily integration period into an hourly time step. T h RCM (1 h, i) and T h RCM (24 h, i − 1) are the hourly-adjusted RCM temperature values at the first time step of day i and at the last time step of the day before (i − 1), T min h RCM (i) and T max h RCM (i) are the hourly minimum and maximum adjusted RCM temperature values, respectively, and T min d, adj RCM (i) and T max d, adj RCM (i) are the daily minimum and maximum adjusted RCM temperature values, respectively. α is a parameter which can be tuned to give more importance to the minimisation of differences between daily and hourly RCM minima and maxima. Hourly-adjusted RCM temperature time series for values of 0, 2 and infinity for α are shown. T h OBS corresponds to the hourly series of the chosen daily analogue, and T min d, raw RCM (i) and T max d, raw RCM (i) are the daily raw minimum and maximum RCM temperature values (before adjustment).  Table 3. Projection is in Lambert II étendu (L2E). snow precipitation. The one-dimensional multilayer physical snow scheme Crocus is able to simulate the evolution of the snowpack over time by accounting for several processes occurring in the snowpack, such as thermal diffusion, phase changes, metamorphism, etc. The SAFRAN-Crocus model chain has been operationally used for more than 20 years for avalanche hazard forecasting and extensively evaluated over the alpine domain in particular with snow depth observation stations (Durand et al., 1999(Durand et al., , 2009bLafaysse et al., 2013). Here we apply the Crocus model using either the SAFRAN reanalysis itself or adjusted fields from a given RCM using the ADAMONT method, in order to compute and compare snow conditions using either driving data. This is both a proof-of-concept of the applicability of the ADAMONT method to generate data appropriate to driving land surface models and a means to assess the intervariable consistency of the ADAMONT output given that Crocus is simultaneously sensitive to all meteorological fields and potentially disturbed by inconsistencies in the forcing dataset.

ADAMONT method evaluation
To evaluate the ADAMONT method, it was applied to the Météo France ALADIN RCM forced by ERA-Interim over the time period from 1980 to 2010. This RCM was run at 12.5 km resolution and we use the daily time resolution output, which is consistent with typical output from EURO-CORDEX RCMs. This simulation was then adjusted against the SAFRAN reanalysis. The spatial domain (2200 km × 2200 km, centred on France; see Fig. 3) is deliberately smaller than EURO-CORDEX (5000 km × 5000 km domain covering all of Europe; Fig. 3) although both are on the same order of magnitude in order to place more emphasis on the method skills than on the output of the RCM itself, especially in terms of chronology. Indeed, the smaller the domain, the more it is constrained by its driving large-scale model (be it a GCM or a reanalysis) (Alexandru et al., 2007).
Performance indicators described below were computed for temperature and total precipitation but also for the snow depth, which integrates all the meteorological variables considered in the ADAMONT method. Focus was hereby placed on evaluating the ability of the method to correctly represent integrated outputs computed using SURFEX/ISBA-Crocus from meteorological variables adjusted independently of each other. This is often applied to river discharge for downscaling methods used for hydrological applications (e.g. Lafaysse et al., 2014;Olsson et al., 2015).
The method was applied for all 23 massifs of the French Alps and all elevation bands (Fig. 3), totalling 187 massif/altitude configurations. Performance indicators, described below, were either computed spanning all configurations or focusing on a given altitude level (1200 and 2100 m) and/or a subset of massifs (the Vercors massif was taken as an example, and computations were also performed separately for the northern and southern Alps, respectively).
We specifically tested the following aspects of the method: -RCM grid point neighbour selection techniques (N = 0 or N = 50).
-Learning period: split-sample evaluation was performed using three different learning and application periods (1980-1995, 1995-2010 and 1980-2010) by evaluating the results on an evaluation period different from the learning period (1995-2010 for simulations with the learning period 1980-1995 and vice versa). These two sub-periods correspond to markedly different climate conditions in the French Alps (Reid et al., 2015). For simulations using the entire learning period 1980-2010, the evaluation period was 1980-2010. This case with a 30-year learning period corresponds to the typical duration of the learning period when the method is applied for climate projections.
-Rain/snow quantile mapping: the method was applied with (default case) or without ("no corr") the last adjustment step operating on the rainfall and snowfall separately.
-Raw RCM data: raw RCM simulations, without any adjustment, were considered for some of the variables (temperature and precipitation only) and compared to adjusted results. This can not be used in the case of snow depth because daily-resolved RCM output cannot be employed to run Crocus.
-The impact of using 6 h input RCM data instead of daily data was also tested but yielded similar results (not shown). Only results based on daily RCM input are presented because GCM/RCM outputs are often available at this time step on data distribution platforms such as the one of EURO-CORDEX.
The following indicators were analysed for temperature, total precipitation and snow depth: the seasonal average time series from 1980 to 2010 in the SAFRAN and the adjusted RCM datasets; the mean annual cycle over two distinct periods: 1980-1995 and 1995-2010 in the SAFRAN and the adjusted RCM datasets; the mean value for each elevation band over the evaluation period in the SAFRAN and the adjusted RCM datasets; the correlation and the ratio of standard deviations between time series of the SAFRAN and the adjusted RCM datasets for each variable and as a function of the integration window (from 1 day to several years) over the evaluation period; the cumulated PDF of daily variables over the evaluation period in the SAFRAN and the adjusted RCM datasets; the root mean square error (RMSE) and the mean bias over the evaluation period, computed over seasonal integration periods based on the SAFRAN and the adjusted RCM datasets (to evaluate the method performance in terms of reproducing amounts); scores specific to the detection of occurrence of precipitation events in the SAFRAN and the adjusted RCM datasets over the evaluation period: the probability of detection (POD = n hh /(n hh + n hd )), the false alarm rate (FAR = n dh /(n dh + n hh )), the probability of false detection (POFD = n dh /(n dh + n dd )) and the true skill score (TSS = POD − FAR), where n hh is the number of days which are wet in the SAFRAN and wet in the adjusted RCM, n dd the number of days which are dry in the reanalysis and dry in the adjusted RCM, n hd the number of days which are wet in the reanalysis but dry in the adjusted RCM and n dh the number of days which are dry in the reanalysis but wet in the adjusted RCM (a threshold of 1 kg m −2 d −1 was considered for the occurrence of precipitation); scores for the duration and persistence of precipitation events over the evaluation period (Wilby et al., 1998;Boe et al., 2006): the relative error on the probability of a dry day (EPD = (n R d − n S d )/n S d ), the relative error on the probability of a dry day following a dry day (EPDD = (n R d−d /n R d − n S d−d /n S d )/(n S d−d /n S d )), the relative error on the probability of a wet day following a wet day (EPHH = (n R h−h /(n − n R d ) − n S h−h /(n − n S d ))/(n S h−h /(n − n S d ))) and the relative error on the mean duration of wet periods (EHD = (hdur R − hdur S )/hdur S ), where n R d and n S d are the number of dry days in the adjusted RCM and in SAFRAN, respectively, n R d−d and n S d−d the number of dry days following D. Verfaillie et al.: The method ADAMONT v1.0 for statistical adjustment of climate projections a dry day in the adjusted RCM and in SAFRAN, respectively, n R h−h and n S h−h the number of wet days following a wet day in the adjusted RCM and in SAFRAN, respectively, n is the total number of days, and hdur R and hdur S are the duration of wet periods in the adjusted RCM and in SAFRAN, respectively. A threshold of 1 kg m −2 d −1 was considered for the occurrence of precipitation.
These indicators are classically employed (e.g. Boe et al., 2006;Vrac et al., 2007b;Lafaysse, 2011;Kotlarski et al., 2014) to assess the following: 1. the ability of a model or method to reproduce the statistical characteristics of the observed meteorological variables (through the RMSE, the mean bias, the ratio of standard deviation values, the duration and persistence of precipitation events and the cumulated PDFs) and their spatial variability (through the mean values at each elevation band and the analysis of different massifs); 2. its capacity to reproduce the low-frequency variability in the observations, i.e. their chronology (through the analysis of seasonal average time series, the correlation as a function of the integration window and the detection of precipitation events); 3. its temporal transferability, i.e. its ability to reproduce the observed variables over a period different from the learning period (through the use of split-sample evaluation, the analysis of the mean annual cycle over two distinct periods and the seasonal average time series); 4. its intervariable consistency, which is assessed here by applying the evaluation indicators to snow depth, an integrated output of the Crocus model.
When available, we compare the indicators with the same criteria applied to analog-resampling-based or transferfunction algorithms by Lafaysse (2011) and Lafaysse et al. (2014), and for other downscaling and adjustment methods by Vrac et al. (2012) and Olsson et al. (2015). Table 1 outlines the input and output variables of Crocus. Table 2 presents a summary of the different configurations used for the evaluation.

Spatial variability and statistical characteristics of the variables
This section provides the evidence needed to assess the performance of the ADAMONT method applied to a RCM driven by a global reanalysis (ERA-Interim) using the SAFRAN meteorological reanalysis as the observational dataset in the French Alps. Adjusted RCM data are compared to SAFRAN itself. Adequate performance of the method is attained when the two datasets match best. Figure 4 presents the location of the Vercors massif and its average temperature, precipitation and snow depth for each elevation band, for the evaluation period in the SAFRAN/Crocus reanalysis as well as adjusted RCM. The shape of the mean altitudinal evolution of all three variables is well represented compared to SAFRAN, which is also the case for other massifs (see the Supplement). The computed temperature values are very similar to the ones in SAFRAN. It is less so for precipitation, with over-or underestimation depending on the learning period (Fig. 4) and the massif considered (see the Supplement). Despite the differences in the magnitude of average precipitation in the adjusted RCM compared to SAFRAN, the magnitude of average snow depth in the different adjusted RCM simulations is remarkably close to the results obtained using the reanalysis as meteorological input, with slight differences depending on the massif (see the Supplement). For all variables and all massifs, the difference between simulations using the two RCM grid points neighbour selection techniques (N = 0 or N = 50) is smaller than the difference induced by using different learning periods.
Figures 5-7 display the mean bias and the RMSE for each raw and adjusted RCM simulation compared to SAFRAN, for temperature, precipitation and snow depth for the Vercors massif. Additionally, Table 3 presents the corresponding scores on the annual time scale compared to mean values, for the adjusted RCM L. 1980-2010 simulation, for each massif in the French Alps and for the northern and southern Alps at 1200 and 2100 m a.s.l. This highlights the large biases and RMSE values obtained when using raw RCM simulations compared to adjusted simulations, a feature common to all massifs (Figs. 5-6 and the Supplement).
For temperature, biases of the adjusted RCM simulations vary with elevation and for the different massifs (Fig. 5, Table 3 and the Supplement), but lie always within 1 K. Biases are generally smaller in autumn (SON) than for other seasons. RMSEs also vary with elevation and massifs, and can differ significantly between simulations using the two different RCM grid point neighbour selection techniques. For elevations above ≈ 2100 m a.s.l., stronger biases and higher RMSEs are found for simulations using the selection technique accounting for altitude differences (N = 50), especially in summer (JJA) than for other seasons. Temperature biases and RMSE values also depend on the learning period considered, the longer learning period 1980-2010 generally presenting smaller biases and RMSEs (Fig. 5 and the Supplement).
For precipitation, biases generally vary with altitude (Fig. 6, Table 3 and the Supplement), but less than for temperature (Fig. 5, Table 3 and the Supplement). Biases of the adjusted simulations remain smaller than 150 kg m −2 per month in absolute value, corresponding to up to 90 % depending on the massif and altitude, and are generally stronger  (1980-2010, 1980-1995 or 1995-2010 in summer. Smaller autumn and winter precipitation biases lead to a good agreement between the magnitude of average snow depth in the different adjusted RCM simulations and the results obtained using the reanalysis as meteorological input (as noted in Fig. 4). RMSE values generally increase with altitude. Using different RCM grid point neighbour selection techniques has less impact on precipitation scores than for temperature, except that the N = 50 configuration yields more variability in scores with altitude. This is due to the choice of different grid points for different altitudes of a single massif, because precipitation is spatially more variable than temperature. The influence of the learning period on scores is also visible. For snow depth, the biases never exceed 50 cm, which corresponds to up to 50 % depending on the altitude and the massif (Fig. 7, Table 3 and the Supplement). The biases are smaller in autumn than for other seasons, similar to temperature ( Fig. 5 and the Supplement). Summer biases at high altitudes are almost always negative, which cannot always be explained by a combination of positive biases in temperature and/or negative biases in precipitation, indicating the possible impact of other variables on snow depth (such as longwave radiation for example). RMSE values increase with al-titude due to the effect of increased snow accumulation with altitude. Using the N = 50 configuration generally degrades scores at high elevations, similar to the effect on temperature.
For precipitation and snow depth, simulations without the final quantile mapping on snowfall and rainfall are also presented (by definition it has no impact on temperature). It is clear from Figs. 6-7 and the Supplement that without this final correction (no corr), biases in precipitation and snow depth are much stronger and RMSEs much higher than when this correction is applied. Figure 8 represents the ratio of the standard deviation values between each adjusted RCM simulation and SAFRAN for temperature, precipitation and snow depth and as a function of the integration window (from 1 day to several years) over the learning period. Ratios are displayed for the Vercors massif, for altitudes of 1200 and 2100 m a.s.l. If this ratio is lower than 1, it means that adjusted RCM simulations have a smaller standard deviation (i.e. variability) than SAFRAN. For temperature, the ratio of standard deviation is very close to 1 for integration windows of 1 day to a few months. It varies more for longer integration windows of 1 year or more. The differences between the two altitudinal levels considered or between massifs are limited ( Fig. 8 and For precipitation, ratios of the standard deviation values are also close to 1 (generally between 0.8 and 1.2) for integration windows of 1 day to 1 month. This result is similar to ratios of variance between daily RCMs adjusted with a cumulative distribution function transform and observations for the Mediterranean region in Vrac et al. (2012). For integration windows of 1 month or more, the ratios vary more with under-or overestimation of variance depending on the massif, the learning period and the grid points neighbour selection technique considered (Fig. 8 and the Supplement). For snow depth, the ratio does not vary until 1 month of integration approximately (Fig. 8 and the Supplement), and shows larger variations for higher values. Some differences can be noted for different altitudes, and different massifs, but also for different learning periods and the two grid point neighbour selection techniques considered. Figure 9 presents the cumulated PDFs of daily temperature, precipitation and snow depth at 1200 and 2100 m a.s.l. for the Vercors massif. The distributions of daily temperature for adjusted RCM simulations are remarkably close to the distribution of SAFRAN ( Fig. 9 and the Supplement). The agreement is better than the one observed in Lafaysse (2011) and Lafaysse et al. (2014) between the different configura- Table 3. Mean values and scores for the ADAMONT-adjusted RCM L. 1980-2010 simulation compared to SAFRAN over the period 1980-2010 for each massif of the French Alps (massif numbers indicated in Fig. 3) and for the northern and southern Alps at 1200 and 2100 m elevation: mean annual temperature (T , in K) and precipitation (P , in kg m −2 yr −1 ), mean winter (DJFMAM) snow depth (SD, in m); mean annual bias of T and P , mean winter bias of SD; annual root mean square error (RMSE) of T and P , winter RMSE of SD; and annual correlations of T and P .

RMSE (K)
Vercors SON RCM L. 1995N50 RCM L. 1980N0 RCM L. 1980N50 RCM L. 1980-1995N0 RCM L. 1980-1995N50 RCM L. 1995 RCM raw Tmin N0 RCM raw Tmin N50 RCM raw Tmax N0 RCM raw Tmax N50 Figure 5. Temperature mean bias and root mean square error (RMSE) of each raw and adjusted RCM simulation compared to the SAFRAN reanalysis over the evaluation period for the Vercors massif as a function of elevation. Scores computed for the raw RCM simulations concern minimum and maximum daily temperatures. Only small differences are observed for different altitudes or different massifs ( Fig. 9 and the Supplement), and the choice of the learning period or the grid point neighbour selection technique has almost no impact on the PDF. For precipitation, the PDFs of adjusted RCM simulations are also very close to the PDF of SAFRAN with a slight overestimation or underestimation of moderate to high precipitation depending on the learning period, occurring for most massifs (Fig. 9 and the Supplement). This result is similar to that observed in Lafaysse (2011) for the Durance basin (see Fig. 11.7 therein). As for temperature, altitude and massif location have only a small impact on the distribution as well as the grid point neighbour selection technique considered. The distribution of snow depth, however, depends more on the massif considered and the altitude ( Fig. 9 and the Supplement). As for precipitation, the moderate to high snowdepth values seem to be slightly overestimated or underestimated for most massifs, depending on the learning period. The choice of the grid points neighbour selection technique has also slightly more impact on snow depth PDFs than for temperature and precipitation. The fact that PDFs for temperature and precipitation are very close to the ones of SAFRAN is a logical consequence of using a quantile mapping ap-proach. That it is also true for snow depth indicates that even if they are treated separately, the intervariable consistency of the meteorological fields generated using our method is, in general, appropriate. The capacity to reproduce the duration and persistence of precipitation events is shown in Fig. 10. The ratio between the number of dry days and the number of rainy or snowy days is very correctly reproduced for every massif and altitude ( Fig. 10 and the Supplement), the relative error on the probability of a dry day being lower than 5 %. This feature was also observed by Lafaysse (2011) in his study of the Durance basin (see Fig. 11.10 therein). The persistence of dry and rainy/snowy events is generally underestimated (up to about −30 %), which was also the case in Lafaysse (2011), even though the error depends on the massif and the altitude considered. In general, errors on the persistence of precipitation events are larger in massifs of the southern Alps than the northern Alps (see the Supplement). Using different learning periods and different grid point neighbour selection techniques has an impact on scores, but this is small compared to the influence of the massif or the altitude. Figure 11 represents the mean annual cycle of temperature, precipitation and snow depth for the different adjusted RCM simulations vs. the SAFRAN/Crocus reanalysis, for the period 1980-1995 and 1995-2010 for the Vercors massif at Vercors 1200 m SAFRAN reanalysis RCM L. 1980-1995N0 RCM L. 1980-1995N50 RCM L. 1995N0 RCM L. 1995N50 RCM L. 1980N0 RCM L. 1980 Temperature Temperature

Mean seasonal variations
Precipitation Precipitation Snow depth Snow depth Temperature Figure 9. Cumulated probability density function (PDF) of daily temperature, precipitation and snow depth (using Crocus in this case) in each adjusted RCM simulation and in the SAFRAN reanalysis

Relative error scores (-)
Vercors 2100 m Figure 10. Scores for the duration and persistence of precipitation events in each adjusted RCM simulation compared to the SAFRAN reanalysis over the evaluation period for the Vercors massif at 1200 and 2100 m a.s.l. EPD is the relative error on the probability of a dry day, EPDD is the relative error on the probability of a dry day following a dry day, EPHH is the relative error on the probability of a wet day following a wet day and EHD is the relative error on the mean duration of wet periods.
1200 and 2100 m a.s.l. The mean annual cycle of temperature is very well reproduced for every massif and altitude ( Fig. 11 and Supplement). Using different grid point neighbour selection techniques has a limited impact on the mean annual cycle. For precipitation, the mean annual cycle is relatively well reproduced ( Fig. 11 and Supplement). The choice of grid point neighbour selection technique can have slightly more influence on the results than for temperature. For snow depth, the annual cycle is remarkably well reproduced, with peak snow depth in the core of winter (JFM) and no snow or reduced amounts in late summer months (JAS) (Fig. 11 and Supplement). As for temperature, the impact of the grid point neighbour selection technique is very limited.

Interannual variability
The chronology of time series of seasonal averages of temperature, precipitation and snow depth from 1980 to 2010 is shown in Figs. 12-14, for the Vercors massif at 1200 and 2100 m a.s.l. in SAFRAN and the adjusted RCM. Temperature RCM time series are similar to SAFRAN, with an interannual variability which is well reproduced (Fig. 12 and Supplement). Some significant differences appear when using different learning periods, as already noted in Sect. 3.2. Using different grid points neighbour selection techniques has an impact on the time series of temperature which is generally smaller than the influence of the learning period. However, as already noted in Sect. 3.1, the agreement between observed and simulated time series is degraded for high altitudes under the spatial and altitudinal (N = 50) grid point neighbour selection technique. The interannual variability in precipitation is also well reproduced for most massifs and altitudes ( Fig. 13 and Supplement), especially given that the only forcing of the RCM comes from ERA-Interim reanalysis at the boundaries of the RCM domain. It is slightly less well reproduced in summer (JJA), as observed by Lafaysse (2011) for the analog-resampling-based transfer function algorithm DSCLIM (Pagé et al., 2009) and the Durance basin (see Fig. 10.1 therein). Differences between simulations using different learning periods mostly appear in summer (JJA). The use of different grid point neighbour selection techniques has a rather limited impact on time series of precipitation, whose magnitude depends on the massif and the altitude ( Fig. 13 and Supplement). For snow depth, the interannual variability is well reproduced in winter (DJF) and correctly reproduced in intermediate seasons (MAM and SON). Summer snow depths are generally underestimated, as already noted in Sect. 3.1, but represent a small portion of the annual snow accumulation. Likewise, adjusted data using the spatial and altitudinal (N = 50) RCM grid point selection technique can be degraded at high altitudes, similarly to temperature. Figure 15 displays the temporal correlation between each adjusted RCM simulation and SAFRAN over the evaluation period for temperature and precipitation and as a function of the integration window (from 1 day to several years). Correlations are displayed for the Vercors massif, for altitudes of 1200 and 2100 m a.s.l. Additionally, Table 3 presents the same correlation values at the same altitudes, for an integration window of 1 year, and for the adjusted RCM L. 1980-2010 simulation only, for every massif of the French Alps and for the northern and southern Alps. Snow depth values were not included because of their cumulative nature. Correlations for temperature are very high (always above 0.8) for all massifs and altitudes until an integration window of a few months to 1 year (Fig. 15, Table 3 and Supplement), as found by Lafaysse (2011) (see Fig. F.21 therein). The Figure 11. Mean annual cycle of temperature, precipitation and snow depth (using Crocus in this case) in each adjusted RCM simulation and in the SAFRAN reanalysis over the periods 1980-1995 and 1995-2010 for the Vercors massif at 1200 and 2100 m a.s.l. Letters on the x axis correspond to the different months of the calendar (J = January, F = February, etc.). differences between learning periods are negligible. As already observed in Sect. 3.1 and for the time series above, the correlation is clearly degraded for high altitudes (above ≈ 2100 m a.s.l.) in simulations using the N = 50 grid points selection technique. Precipitation also yields satisfactory correlation values (always above 0.4) until a few months integration window, which vary depending on the massif considered ( Fig. 15 and Supplement). Correlations are generally similar or even better than the ones observed in Lafaysse (2011) for various statistical downscaling models and different configurations of the ALADIN RCM (see Fig. 12.10 therein). The use of the N = 50 grid point neighbour selection technique increases or decreases correlation values depending on the massif and the altitude considered. The choice of learning period has a limited effect on correlation, at least up to integration windows of a few months. Correlations are higher on the scale of the northern and southern Alps than on the massif scale (Table 3). This scale dependence of precipitation downscaling skill was also illustrated by Gangopadhyay et al. (2004) and Mezghani and Hingray (2009).
Scores for the detection of precipitation events are presented in Fig. 16 for the Vercors massif for altitudes of 1200

Vercors 2100 m SON
and 2100 m a.s.l. The scores vary depending on massifs and altitude but a general pattern emerges ( Fig. 16 and Supplement). The POD is the highest with values between 0.55 and 0.8, very similar to Lafaysse (2011) (see Figs. 11.14 and 12.8 therein). The FAR is rather low (always below 0.5) as is the POFD (below 0.2). TSSs are generally better for massifs of the northern Alps (0.25 to 0.6) than the southern Alps (0.1 to 0.4, Supplement), where PODs are lower and FARs much higher. Such results indicate that the method performs well in detecting precipitation events. Using different learning periods has a rather limited impact on the detection of precipitation. The choice of the grid point selection technique has a limited influence at low-to mid-altitudes, which increases above ≈ 2100 m a.s.l.

Discussion
This section discusses the main limits of the method described and evaluated here, and the limits of the evaluation method itself.

Transferability in time
The temporal transferability of the ADAMONT method, i.e. its capacity to apply adequately to a period which is different from the learning period, can be evaluated from results in Sects. 3.1, 3.2 and 3.3. Figures 11-14 and the Supplement reveal some significant differences when using different learning periods. This feature is generally most visible in summer. It denotes a limit in the temporal transferability of the ADAMONT method, which was also the case in Lafaysse (2011) for the analogbased and transfer function algorithms (see Figs. 11.11 and 11.12 therein). Using the longer learning period of 1980-2010 yields better results, most probably due to the fact that, in this case, the learning and evaluation periods are the same, but also the fact that the learning period is longer.
There are some limits in the conclusions which can be drawn from this transferability assessment. First, reanalysis data used here as forcing for the RCM (ERA-Interim) or for statistical adjustment and evaluation purposes (SAFRAN reanalysis) are heterogeneous in time (Sterl, 2004;Vidal et al., 2010). These heterogeneities are especially marked in summer in the SAFRAN reanalysis, when most observations from mountain stations are not available (Gobiet et al., 2015). Secondly, variations which will occur in the future climate are expected to be much stronger than the variations which can be tested in our evaluation period. Issues related to the time transferability of the adjustment approach may be amplified when applied in the context of climate projections, but their relative impact will probably be lower than shown here given the magnitude of the expected changes.

Impact of the spatial selection technique
The impact of the RCM grid point selection technique is illustrated in Sects. 3.1 and 3.3. Indeed, Figs. 5-7, 12-15 and the Supplement show a clear degradation of scores for elevations above ≈ 2100 m a.s.l. using a selection criterion explicitly accounting for the altitude difference (N = 50). This is linked to the scarcity of high-altitude grid points in AL- ADIN compared to SAFRAN, resulting in grid points being selected several tens of kilometres from the centre point of most SAFRAN massifs (see Fig. 4 and the Supplement for the location of selected grid points). The impact of this issue depends on the location of massifs relative to highaltitude grid points in ALADIN. For example, most southern Alps massifs are affected, except the southernmost massifs of Ubaye, Alpes Azur and Mercantour (see the Supplement), which are located less than 15 km from high-altitude points. This shows that, although it seems appealing to select RCM grid points at elevations matching the elevation of the observation dataset rather than using RCM grid points with a potentially large elevation difference (hence leading to stronger adjustment requirements), in practice the results are far more homogeneous and quantitatively generally equivalent or better when concentrating only on the horizontal distance between the RCM grid points and the observation dataset.

Intervariable consistency
The lack of explicitly enforced intervariable consistency in the quantile mapping method can be a major disadvantage. As we focus on a mountainous region for the evaluation and future use of the method, the consistency between temperature and precipitation phase is crucial. The impact of this final correction is assessed in Sect. 3.1. Figures 6-7 and the Supplement show that without this final correction (no corr), biases for precipitation are much stronger and RMSEs much higher than with this final correction, highlighting its importance.
The intervariable consistency of the ADAMONT method is indirectly assessed by applying the evaluation metrics described above to an integrated output of the Crocus model, the snow depth, which is computed from meteorological variables adjusted independently from each other. As men- Vercors 2100 m RCM L. 1980-1995N0 RCM L. 1980-1995N50 RCM L. 1995N0 RCM L. 1995N50 RCM L. 1980N0 RCM L. 1980 (a) (b) Figure 16. Scores for the detection of precipitation events in each adjusted RCM simulation compared to the SAFRAN reanalysis over the evaluation period for the Vercors massif at (a) 1200 and (b) 2100 m a.s.l. POD is the probability of detection, FAR is the false alarm rate, POFD is the probability of false detection and TSS is the true skill score.
tioned above, snow depth results are generally satisfying, which tend to indicate a good intervariable consistency. Performance indicators for snow depth are often consistent with temperature and precipitation indicators, even though they cannot always be explained by these two variables alone (for example the analysis of biases in Sect. 3.1), indicating the probable influence of other variables not directly analysed here such as longwave radiation.

Limits of the evaluation method
The spatial consistency of the ADAMONT method has not been evaluated other than by using spatial averages. In future studies, it would be necessary to test it by evaluating spatial correlations (for example using metrics described in Kotlarski et al., 2014) or by using integrated variables requiring spatial variability such as snow cover area or river discharges.
In this study, we evaluated the method using only the ALADIN-Climate RCM. However, Olsson et al. (2015) showed that the choice of RCM could have a significant impact on the evaluation of the performance of the adjustment method. Evaluation using another RCM could thus prove useful, even though we would have to use RCM outputs run on the same spatial domain as the ALADIN-Climate RCM in order to compare them.

Conclusions
The new method ADAMONT is able to statistically adjust daily regional climate model projections and to provide hourly-adjusted outputs of temperature, precipitation, wind speed, humidity and short-and longwave radiation necessary to force energy balance land surface (impact) models.
The method processes daily outputs from an RCM and adjusts them with a sub-daily (typically hourly) observational dataset. The method was evaluated using outputs from the ALADIN-Climate RCM driven by ERA-Interim reanalysis for the time period 1980-2010, using the SAFRAN meteorological reanalysis in the French Alps as an observation dataset. The direct outputs of the ADAMONT method, namely temperature and total precipitation, as well as an indirect output, namely snow depth, computed by the Crocus model from meteorological variables corrected independently of each other were evaluated. The impact of the learning period was tested, as well as the method to select RCM grid points corresponding to each observational point. The evaluation addressed four main concerns: (1) the ability of the ADAMONT method to reproduce the spatial (especially altitudinal) variability and the statistical characteristics of SAFRAN variables; (2) its ability to reproduce the low-frequency variability, i.e. the chronology of SAFRAN, through the analysis of the interannual variability and the annual cycle of adjusted variables; (3) the temporal transferability of the method; and (4) its intervariable consistency.
Performance scores are always better for adjusted RCM simulations than for raw RCM simulations, which highlights the need for such adjustment and demonstrates the skill of the method. In general, the performance of the ADAMONT method concerning temperature is better than for precipitation. However, evaluation indicators for precipitation are generally similar or even better than the indicators evaluated in Lafaysse (2011) and Lafaysse et al. (2014) for other types of algorithms (analog-based or transfer functions). Snow depth yields good results, considering its integrated nature, i.e. the fact that it was computed from variables corrected independently. The impact of the learning period depends on the evaluation indicator considered, and must be considered when applying the method. The best solution is probably to choose the longest possible learning period. For precipitation and snow depth, the importance of the final quantile mapping applied to snowfall and rainfall (i.e. after a first quantile mapping on total precipitation, an additional quantile mapping against the observational dataset is applied for daily cumulated adjusted RCM rainfall and snowfall separately) is unambiguously demonstrated. Using a grid point selection technique relying on spatial but also altitudinal proximity between SAFRAN massif centre points and RCM grid points either had no impact on the performance indicators or degraded them for altitudes higher than 2100 m a.s.l. As a consequence, the simple spatial grid point neighbour selection technique will be retained for future applications of the method.
The ADAMONT method is generic and can be applied to any observational dataset. Its application using the SAFRAN reanalysis as the observation dataset is somewhat a specific case, initially tailored for French mountainous regions (Durand et al., 2009a). However, beyond the French mountain regions, the method could be applied in France using the SAFRAN-France gridded reanalysis (Vidal et al., 2010). A Spanish version of SAFRAN was also developed recently (Quintana-Seguí et al., 2017). The method could also be applied to other observational datasets or meteorological reanalyses, such as ERA-Interim surface fields (Dee et al., 2011) or MESCAN (Soci et al., 2016).
Code availability. The code of the ADAMONT v1.0 method is available as an open git repository after free registration at https://opensource.cnrm-game-meteo.fr/projects/adamont. The version used for this article is available at https: //opensource.cnrm-game-meteo.fr/projects/adamont/repository? rev=ADAMONT-v1.0 The version of the open source code of SURFEX/ISBA-Crocus used in this study is available as a specific branch of an open git repository, after free registration, at https://opensource. cnrm-game-meteo.fr/projects/surfex_git2 (last access: November 2017). For reproducibility of results, the version used in this work is tagged as https://opensource.umr-cnrm.fr/projects/surfex_git2/ repository?rev=ADAMONT-1.0 (last access: November 2017).
Competing interests. The authors declare that they have no conflict of interest.