An open-source MEteoroLOgical observation time series DISaggregation Tool (MELODIST v0.1.1)

Meteorological time series with one-hour time step are required in many applications in geoscientific modelling. These hourly time series generally cover shorter periods of time compared to daily meteorological time series. We present an open-source MEteoroLOgical observation time series DISaggregation Tool (MELODIST). This software package is written in Python and comprises simple methods to temporally downscale (disaggregate) daily meteorological time series to hourly data. MELODIST is capable of disaggregating the most commonly used meteorological variables for geoscientific modelling 5 including temperature, precipitation, humidity, wind speed, and shortwave radiation. In this way, disaggregation is performed independently for each variable considering a single site without spatial dependencies. The algorithms are validated against observed meteorological time series for five sites in different climates. Results indicate a good reconstruction of diurnal features at those sites. This makes the methodology interesting to users of models operating at hourly time steps who want to apply their models for longer periods of time not covered by hourly observations. 10


Introduction
Continuous recordings of meteorological data are available since the late 18th century. During the 20th century, observational networks have been refined intensively, even at remote sites. However, these observations are generally not distributed equally in space and their temporal resolutions range from some hours (e.g., three measurements of temperature for each day) to one day (e.g., rain gauges). Later, in the late 20th century, the instrumentation of meteorological stations has been supplemented 15 by the installation of automatic weather stations (AWS) which are capable of collecting meteorological data continuously with a frequency ranging from one hour to one minute or even shorter periods of time (Rassmussen et al., 1993). Figure 1 depicts the global temporal evolution of data availability for daily and hourly meteorological time series during the 20th century and beyond. This diagram has been compiled using two freely available datasets through querying the temporal coverage of available data of each dataset: Daily data are collected continuously in the Global Historical Climatology Network-20 Daily Database (GHCN) (Menne et al., 2012;NOAA, 2015b), whereas the Integrated Surface Database (ISD) provides hourly time series of stations worldwide (Smith et al., 2011;NOAA, 2015a). This comparison reveals that the availability of hourly observations as provided by AWS is restricted to a few decades only. When observing Fig. 1, it becomes obvious that a large number of AWS have only been mounted in the last two or three decades.
given time, dynamical downscaling of this kind of data is a sophisticated way to derive hourly values for that time and arbitrary locations in a realistic manner. However, small scale precipitation might not be covered as accurate by the LAM in some cases due to the very complex micro-physical nature of precipitation and its variability (e.g., Förster et al., 2014).
3. Using weather generators to derive new synthetic time series that match the statistics of available hourly data: Weather generators calculate statistics of observed time series and apply these statistics using a random number generator to 30 obtain new time series with equal statistical characteristics (Haberlandt et al., 2011;Ailliot et al., 2015). For hourly time steps, resampling techniques are applied in most cases (e.g., Sharif and Burn, 2007;Strasser, 2008). Time series derived by weather generators only match the observations statistically. The sequence of events is different due to its random nature, which is why sub-daily time series do not provide the originally measured values. Weather generators are powerful tools that supplement deterministic modelling by stochastic methods and thus add a probabilistic component to the elsewise pure mechanistic methodology (mixed deterministic-stochastic models, see, e.g., Pechlivanidis et al., 2011).
In this study, we focus on the simplest method among the listed approaches, the disaggregation of daily meteorological data (# 1). For instance, in hydrological modelling, simple methods are usually sufficient in order to force conceptual, process-based 5 models (Waichler and Wigmosta, 2003;Debele et al., 2007). To the authors' knowledge there is neither any "best" way of disaggregating meteorological data to hourly values nor any easy, ready to use and flexible software package that enables this task for different meteorological variables including precipitation, temperature, humidity, solar radiation, and wind speed. Therefore, we propose a robust and fully documented methodology including alternative approaches for all these variables in order to make the best use of available data. Although there are more complex and sophisticated methods available for obtaining hourly 10 values, MELODIST can be viewed as good balance among several aspects such as data availability, user's prior knowledge, robustness, and computational costs. Therefore, MELODIST addresses practitioners who need to run their model for long periods of time at one hour time steps. Here, emphasis is put on single stations rather than considering interdependencies among different stations. However, the manuscript includes some specific remarks with respect to this restriction.
The paper is organised as follows: First, the study sites investigated herein are briefly presented in Section 2. The next section 15 gives an overview of the disaggregation methods. In the fourth section, the methods are statistically evaluated with respect to their accuracy to reconstruct sub-daily features. Finally, Section 5 includes concluding remarks and an outlook for possible future work.

Study sites
The accuracy of disaggregation methodologies strongly depends on diurnal characteristics of meteorological variables. In turn, 20 these diurnal characteristics might vary among different climates and environments. To test the robustness of the methods described in the next section, a small number of sites in different climates has been chosen (see, Fig. 2 and Tab. 1).
Except for Obergurgl, all station data are available for free. For each station, all relevant meteorological variables have been recorded for at least one decade. Only shortwave radiation and precipitation are not available for Rio de Janeiro and Ny-Ålesund, respectively (Tab. 1). 25 The available datasets have been subdivided into two independent periods of time, one for calibration purposes, if required, and the other for an independent validation of the disaggregation results. This subdivision has been defined in order to enable a split-sample test (Klemeš, 1986) which requires an independent validation period for testing models. In this study, the splitsample test is applied for the disaggregation methods described in the next section.

Overview
In this section, all disaggregation methods employed in the framework of this paper are described in brief. For each meteorological variable different options are available (Table 2). Deterministic methods generally provide the same output if input remains unchanged. In contrast, stochastic methods are based on random numbers. This means, that the output differs in consecutive 5 runs even if the input dataset remains the same. Thus, stochastic methods require multiple runs prior to a sound statistical evaluation of these runs in order to draw conclusions. Some models require the calibration of model parameters that need to be adjusted for each site. Split-sample tests (Klemeš, 1986) are applied to test the methods more rigorously.
The subsequent sections provide details for each of the methods listed in Tab. 2. For each variable an example figure is provided which gives an idea of how each of the methods works. The times and locations of these figures have been randomly 10 selected.

Temperature (T1)
Temperature on day i is disaggregated to hourly values j on using a cosine function whose amplitude is defined by the observed minimum T min,i and maximum temperature T max,i on day i (e.g., Debele et al., 2007): (1) 15 The parameter a is determined either through providing an a priori guess of the temporal difference between the solar noon and the occurrence of the maximum temperature or through calibration. Three options are provided by MELODIST: Minimum and maximum temperatures occur at 7 am and 2 pm, respectively (T1a). The second option (T1b) relies on radiation geometry in order to calculate sunset as point in local time for minimum temperatures and sun noon + 2 hours as point in time for maximum temperatures (see, Fig 3). As the temporal shift of 2 hours might not be viewed acceptable as a general rule of 20 thumb, temporal shifts for each month can be evaluated through statistical evaluation of observed hourly time series (T1c).
In principle, the methodology is based upon the assumption that the diurnal course of temperature simply tracks the diurnal course of the incoming shortwave radiative flux with a shift in time. This assumption does not hold true during polar nights which is why another method is applied for Ny-Ålesund. For this station, a linear interpolation between minimum and maximum temperature is applied (T1d nighttime option). If temperature increases compared to the previous day, minimum 25 temperature is assumed to be representative for the first 12 hours of the current day and the maximum temperature is likewise attributed to the second half of that day. If temperature decreases from one day to the next, the opposite assignment is applied.
Even though this method is rather simple, it preserves minimum and maximum temperatures while disaggregating.
It generally follows a diurnal course with the maximum around sunrise and the minimum in the early afternoon (Debele et al., (3) 10 Saturation vapour pressure for a given temperature T [ • C] is calculated using the Magnus formula (Alduchov and Eskridge, 1997): while actual vapour pressure for a given temperature T and relative humidity H [%] is calculated as e a = e s (T ) · H 100 .
(5) 15 Methods H1 and H2 use a model in the form of T dew, day = aT min +b to calculate daily dew point temperature (i.e., no diurnal dew point temperature variation is assumed). For H1, a = 1 and b = 0, i.e., T dew, day is assumed to be equal to the daily minimum temperature. H2 uses hourly observations of temperature and humidity to calculate the best fit for a and b for a given site. T dew is thereby calculated from T and H by inverting eq. (4):

20
H3 assumes a diurnal dew point temperature variation based on the assumptions that dew point temperature varies linearly between consecutive days, and that mean daily dew point temperature occurs around sunrise (Debele et al., 2007). Dew point temperature for a given day d and hour h is thereby calculated as where k r should be set to 6 for sites with average monthly radiation higher than 100 W m −2 , and to 12 otherwise (Debele et al., 2007). An example application of these methods is shown in Fig. 4. 3.3.2 Minimum und maximum humidity disaggregation (H4)

5
Method H4 uses records of daily minimum and maximum temperature and daily minimum and maximum relative humidity as well as the disaggregated hourly temperature values to generate hourly humidity values: If H min and H max are available for each day, this method is the best available option among all available disaggregation methods (Waichler and Wigmosta, 2003).

Wind speed
Wind speed is a meteorological variable subjected to high variability at small temporal scales. This small-scale variability can be observed, e.g. from eddy-covariance measurements (Stull, 2009). The methods compiled in this study focus on suitable wind speed time series for hourly time steps without taking into account these sub-hourly considerations. This idea best corresponds to averages of wind speed for a given increment of time (e.g., one hour) rather than instantaneous measurements.

Equal distribution (W1)
As for precipitation, this method applies one unique value for each hour of the considered day. The daily mean value is assumed to be valid for hourly values as is (W1). For many applications, this assumption might be sufficient.

Cosine function (W2)
Due to local and microclimatic conditions, wind speed is subjected to diurnal variations on days with calm weather in absence 20 of synoptic-scale weather patterns that obliterate local and microclimatic forcings (Oke, 1987). Typical diurnal patterns in wind speed (and wind direction as well) are related to mountain-valley or land-sea wind systems. Besides these local climatic wind systems, wind speed typically increases during daytime and almost always diminishes after sunset. This phenomenon is related to increased radiation-induced momentum flux on fair weather days. Again, synoptic scale weather patterns such as low pressure systems might obliterate local-scale effects. These patterns of diurnal wind speed variations can be simply represented 25 by a cosine function (W2), which requires calibration using data observed at the considered site. This model is similar to the temperature disaggregation method T1 (see, Eq. 1, Debele et al., 2007) v The wind speed representative for day i is disaggregated to v i,t for hour t (Fig. 5). a w , b w , and ∆t w are parameters that need to be calibrated for each site prior to the application of this method. According to Debele et al. (2007) a random disaggregation of wind speed (W3) might also perform reasonably: The function rnd is a random number generator which draws random numbers between 0 and 1 from a uniform distribution.  Shortwave radiation R 0 in W m −2 is computed for hourly time steps using the methodology described by Liston and Elder (2006), which predicts potential shortwave radiation R 0 for each time step. A simplified formula is provided that assumes a flat surface (Liston and Elder, 2006): The solar constant (1370 W m −2 ) is scaled according to the solar zenith angle Z, which depends on time (day of year and hour measured from local solar noon) and latitude (Liston and Elder, 2006). Details on these calculations as well as on the direct and diffuse radiation scaling values Ψ dir and Ψ dif are given by Liston and Elder (2006).
This methodology is applied for all three options. R1 assumes daily averages of shortwave radiation. This type of data is 20 generally only available if hourly recordings of shortwave radiation have been aggregated prior to the data dissemination. In contrast, options R2 and R3 do not require shortwave radiation data as input.

Disaggregation of sunshine duration (R2)
The method R2 builds upon the same methodology as R1 but runs the Ångström (1924) model prior to the disaggregation computations. This model relates sunshine duration to mean shortwave radiation for daily time steps: Relative sunshine duration S/S 0 is transformed to relative global radiation R/R 0 and then the Liston and Elder (2006) 5 radiation model is applied using this data.
The parameters a and b are by default set to 0.25 and 0.75, respectively (Ångström, 1924), but can also be determined by optimisation using observations of daily mean solar radiation, if available. Figure 6 shows an example based on method R2 for summertime radiation in De Bilt (Fig. 2). The constants a and b have been obtained through linear regression of R and S time series covered by the calibration period. If shortwave radiation and sunshine duration recordings are available, it is 10 recommended to calculate these values for the site of interest.

The Bristow-Campbell model (R3)
If radiation is not available, option R3 might provide reliable radiation estimates based on minimum and maximum temperature.
It is assumed that small differences between maximum and minimum temperatures typically occur on cloudy days. However, larger differences are common on sunny days with radiative cooling during nighttime and surface heating caused by shortwave 15 radiative flux during daytime. The corresponding method is named after its inventors, Bristow and Campbell (1984): Here, relative global radiation R/R 0 is related to the diurnal temperature range ∆T , which is estimated using maximum and minimum temperatures on specific day i and the subsequent day i + 1: Besides the parameters A = 0.75 and C = 2.4, which might be viewed as constants in a first step, B is a site-specific parameter: In contrast to ∆T , which refers to a certain day, ∆T is the long-term average of differences between maximum and minimum temperature for the month of the current day. Based on these computations, radiation estimates are used as input to the radiation 25 model R1 (see Fig. 6). A site-specific adjustment of the parameters A and C is possible by optimisation using observations of shortwave radiation, daily minimum and maximum temperature.

Equal redistribution (P1)
Reconstructing sub-daily precipitation intensities from daily values is challenging as precipitation intensities strongly vary in time and space. In the framework of this study, three methods are presented. The first method is the simplest way of disaggregating daily precipitation to hourly intensities by dividing the daily value by 24. In order to provide a more sophisticated model that preserves sub-daily precipitation characteristics and is still less complex than typical weather generators, a simple statistical precipitation disaggregation approach has been set up: The microcanonical, multiplicative cascade model by Olsson (1998). Some enhancements proposed in the literature (Güntner et al., 2001), such as weighting, have been taken into account as well. This method is a probabilistic approach providing different disaggregation 10 results for each run (realisation). However, the statistical characteristics of each realisation are equal by definition.
The disaggregation is carried out assuming a doubling of temporal resolution for each step. Due to this stepwise doubling of resolution, the model is referred to as cascade model (see, Fig. 1 in Olsson, 1998). The time series of cascade level i with time step ∆t i is disaggregated to level i + 1 with time step ∆t i+1 = 1 2 · ∆t i . The procedure is applied successively until the desired temporal resolution is reached. The doubling of elements of each subsequently derived time series implies that each 15 box 1 of the higher level's time series has to be split in the next cascade level. Thus, the question arises how the separation of the precipitation volume P i into two temporally equidistant time steps P i+1,1 = W 1 · P i and P i+1,2 = (1 − W 1 ) · P i = W 2 · P i (branching) is done, whereby W 1 is the relative weight of branching for the first box of the subsequent level with respect to the total precipitation volume to be branched (W 2 is the weight assigned to the second box). Three cases are foreseen in the so-called branching generator (Olsson, 1998;: with probability P (0/1) 1 and 0 with probability P (1/0) The first case indicates a branching that fills the second box of the subsequent level only, whereas the second case indicates the opposite. In contrast, the third case accounts for a weighted branching into both boxes of the subsequent level. For these cases, probabilities are provided for four different types of wet boxes with P i > 0: starting box: This type of box indicates a dry box in the previous and a wet box in the next time step.

25
ending box: An ending box follows a wet box and is followed by a dry box.
These probabilities for the three different branching possibilities (Eq. 17) can be achieved by a reverse scaling procedure.
Highly resolved precipitation time series are aggregated by applying the cascade level branching assumption backwards. Every two boxes are added in each case representing the respective total volume of the antecedent higher level. Statistics are calculated 5 for the branching types mentioned above (probabilities are derived through dividing counts of each case by the total number of elements of the time series). Separate evaluations are prepared for precipitation intensities below and above the mean precipitation value.
Additional statistics need to be computed for the case P (x/(1 − x)) for which the relative weight x is evaluated as well. For all box types and both intensity classes, the relative weight ranging from zero to one is simply divided into seven bins (see, 10 histograms in Olsson, 1998;Güntner et al., 2001) and counted according to the previously mentioned criteria (4 box types, 2 intensity classes, 7 classes of x). This procedure is applied for the aggregation steps , and 16 → 32 h (2 5 h). According to Güntner et al. (2001), a count related weight is assigned to the probabilities P (0/1), P (1/0), and P (x/(1 − x)) in each aggregation step prior to averaging the probabilities of all steps. The same procedure is applied to the weights. Finally, as a result, matrices of probabilities and weights are derived that represent 15 the station's precipitation scaling. The parametrization is done by applying the empirical distributions of P (0/1), P (1/0), , and x to a random number generator (without fitting analytical distributions).
In turn, these matrices of probabilities and weights are used to disaggregate daily time series. The type of branching is determined by drawing random numbers for each branching step incorporating the probabilities P (0/1), P (1/0), and P (x/(1− x)), which are evaluated cumulatively. If the random number is within the range of P (x/(1−x)), a similar procedure is applied 20 to determine the weight x using another random number. In contrast to the aggregation procedure, disaggregation is applied including the following steps (see Fig. 7 Güntner et al., 2001). The time series with a 45 minutes time step is equally distributed to time series with a 15 minutes time step. These, in turn, are transformed uniformly to obtain time series with one hour time step.
For all disaggregation steps described above, the cascade model preserves mass which means that the precipitation total 25 of the disaggregated time series is equal to the respective value of the original time series (microcanonical cascade model).
Despite its simplicity with respect to model complexity and parameter estimation (Molnar and Burlando, 2005), cascade models have been already used successfully in different climates (Güntner et al., 2001). In contrast to more sophisticated models, the autocorrelation structure might not necessarily preserved (Koutsoyiannis, 2003;Lombardo et al., 2012).
Remarks on spatial representativeness: If this procedure is applied to more than one station, the sub-daily temporal distri-30 bution of precipitation is randomly derived for each station. These spatial patterns do not represent the actual spatial structure of the events at sub-daily time scales. For practical applications at the meso-scale, it is therefore suggested, to redistribute the sub-daily intensities for each station according to the cumulative relative sum of the station that is subjected to highest daily precipitation depth (Haberlandt and Radtke, 2014), which can be performed using the method described in the next paragraph.
Areal peak intensities at sub-daily time steps might be overestimated due to this assumption which limits the universal applicability of this approach. However, this overestimation might be acceptable for some applications like, e.g., derived flood frequency analyses for hydrologic design purposes (Haberlandt and Radtke, 2014). A more sophisticated but much more complex approach that has been developed recently Haberlandt, 2015, 2016) takes spatial consistency explicitly into consideration. 5 3.6.3 Redistribution according to another station (P3) Finally, a third method is supplied that addresses the generally higher network density of precipitation gauges compared to other meteorological variables. If a mixed network including hourly and daily observational sites is considered and if the distance among these stations is small, the relative mass curve of the station recordings at one hour time step can be transferred to the other sites for which only daily recordings are available. The values for the target sites are obtained through multiplying 10 the relative mass of the highly resolved station's curve with the daily precipitation depth observed at the target site. This methodology is also applied in the tool IDWP, which is part of the hydrological modelling system WaSiM (Schulla, 2015). The applicability is limited to the period of time covered by recordings at one hour time step.

15
This section follows the same structure as the methodology section. For each variable long-term averages of disaggregated and observed time series are presented and evaluated in order to assess the model skill of the disaggregation methods. The time series used for disaggregation represent hourly observations aggregated to daily averages and totals, respectively. Emphasis is put on prediction of diurnal features since most methods described herein are founded upon assumptions that imply a certain diurnal course for a given variable. This holds especially true for temperature, humidity, wind speed, and radiation. For 20 precipitation, results are compiled and discussed for the cascade model. Due to the involvement of a random number generator in this method, evaluations with respect to model skill require the analysis of multiple runs (realisations).
Not all methods provided by MELODIST are evaluated. We focus on a subset of methods which might be relevant to a broad range of users with respect to typical data availability settings and typical applications. For each variable, the same methodology is applied to all stations listed in Section 2.

25
In order to put light on the model skill in a more quantitative way, statistical parameters have been derived for both the observed and the disaggregated time series (see, e.g., Tab RMSE is a measure of deviations between observed and disaggregated time series on an hour-to-hour basis. Smaller values are generally better than larger values. The correlation coefficient is ideally close to one and describes the coincidence of phase for two series without considering biases. In contrast, NSE can be viewed as a combined measure addressing deviations in 5 terms of biases and shifts in phase. It ranges from negative infinity indicating a low skill to one indicating a perfect fit. A value of zero means that the model is as good as applying the average value.
In order to gain some insight on how well the distributions of disaggregated time series match the observed ones, histograms for each variable and each site are displayed for both disaggregated and observed values in Fig. 8.

10
Despite the fact that only one option is available for temperature (T1), the standard-sine method enables different options to define the boundary conditions of the sine function (see Fig. 3). This method uses minimum and maximum temperature as input data. Here, results using the day length dependent option are presented, where maximum temperature is assumed to occur two hours after the solar noon. For Ny Ålesund, the modified nighttime option was activated as well in order to reliably disaggregate nighttime temperatures during polar nights, when the assumption of a distinct diurnal course does not hold true. 15 Long-term averages of hourly temperature derived for all sites are compiled in Fig. 9  application of average values might be sufficient as disaggregation procedure, which can be explained by the lower impact of radiation on diurnal features of meteorological variables for that site. To conclude, temperature disaggregation based on minimum and maximum temperature should provide reliable estimates. This finding is also supported by the good agreement of the histograms constructed for both disaggregated and observed time series (Fig. 8, 1 st column).

Humidity
As for temperature, Fig. 10 depicts the long-term mean of the diurnal course of relative humidity for all stations (H3 model).
The diurnal patterns of relative humidity are reasonably disaggregated through simulating a drop in humidity in the afternoon, 5 which is observed at most stations. However, the accordance is less pronounced than for temperature. It is worth noting that the disaggregation of relative humidity depends on hourly temperature values. For these analyses, the results described for temperature in the previous sections have been applied for the disaggregation of relative humidity. Hence, uncertainties involved in the prior step also contribute to deviations between observation and disaggregation.
A closer look at the statistical evaluations derived for humidity disaggregation as compiled in Tab. 4 shows that the model The RMSE amounts to 20% indicating comparably large differences between observed and disaggregated values even though the mean bias is substantially lower. For all but one station, the correlation coefficient is higher than 0.5. In Ny Ålesund a 15 correlation close to zero could be interpreted as inadequate model skill which is underlined when considering the negative NSE value. It may be assumed that the generally lower impact of radiation on other meteorological variables would suggest to use an equal redistribution of humidity values for that station. However, the model performance achieved for the other stations is better given that the RMSE is lower and r and NSE are higher, respectively. In contrast to temperature, the humidity disaggregation performs best for Rio de Janeiro. To summarise, the 20 disaggregation of humidity is reliable considering the fact that disaggregated temperature time series and only one humidity value per day have been used as input. Hence, minimum and maximum humidity are not preserved by this approach. This finding becomes apparent when considering the mismatch of minimum and maximum humidity reconstructions for some sites (e.g., Tucson, see Fig. 8, 2 nd column for further details). These findings prove previous work that also discussed the accuracy of humidity disaggregation techniques (Waichler and Wigmosta, 2003;Bregaglio et al., 2010). If daily minimum and maximum

Wind speed
Wind speed disaggregation has been accomplished using the modified sine-curve (W2). In Fig. 11 the long-term averages of the diurnal course of wind speed is plotted separately for observed and disaggregated wind speed, respectively. In this figure, 30 wind speed is scaled as 'normative' wind speed, i.e. the value for each hour is divided by the mean value. Maximum wind speed, which is typically observed during the afternoon hours, is well represented in the disaggregated time series. Small scale variability, as discussed in the methodology section, is not reproducible by this approach.
As the mean value is simply redistributed according to a sine-function, mean values are exactly reproduced by the disaggregation approach. As already mentioned, variability (i.e. fluctuations) is neglected resulting in lower predicted standard deviations when compared to the corresponding standard deviations derived for the observed time series. This also becomes evident when observing the falling limb of the histograms of disaggregated values shown in Fig. 8 (3 rd column). If these fluctuations are not relevant for further evaluation, this disaggregation methodology for wind speed has an acceptable model skill 5 which can be observed from the correlation coefficients and NSE values. Although these values are lower than those derived for temperature, they indicate a good model performance for all sites. The best model skill is achieved for De Bilt, whereas the lowest performance is achieved for Tucson, where a secondary wind speed maximum is observed in the morning. This diurnal pattern might be related to a local wind system that is subject to a change in wind direction and, hence, to a change in wind speed. Such phenomena are not addressed by this method.

Radiation
Even though radiation observations are available to most of the sites investigated in this study, the availability of daily mean shortwave radiation in absence of sub-daily time series is not so common. One exception is climate model output, which is typically aggregated to daily values. A typical real-world-case is, however, a long dataset of sunshine duration recordings.
Therefore, method R2 is applied even though it is only applicable to De Bilt and Ny Ålesund. The diurnal course of mean 15 hourly values derived through averaging the observed and disaggregated datasets is displayed in Fig. 12.
Given that the disaggregation is based on sunshine duration, the model skill can be viewed as very good for both sites. The timing of solar noon radiative fluxes as well as the phase of the disaggregated time series track observations very well which is also underlined by the performance measures presented in Tab. 6. Deviations between the mean values can be related to uncertainties involved in the Ångström (1924) model which has been fitted prior to disaggregation for both stations using the data 20 from the calibration period. However, the disaggregated time series are subjected to similar variabilities as the observed time series which is expressed by the very similar standard deviations and the coincidence of histograms computed for disaggregated and observed time series as displayed in (Fig. 8, 4 th column). As expected, the RMSE is comparably high when compared to the mean value of the time series since shortwave radiation is subjected to fluctuations due to the presence and absence of clouds causing rapid changes in shortwave radiation even for small increments in time. Notwithstanding these restrictions, the 25 model skill expressed through the correlation coefficient and the NSE can be viewed as very good.

Precipitation
In contrast to the meteorological variables previously described, precipitation has been disaggregated using the cascade model (P2), which is a probabilistic model. As already explained, this change from deterministic to probabilistic methods requires a modified evaluation of model performance. Even though the precipitation total is preserved for each day throughout the 30 disaggregation procedure, the occurrence and sequence of precipitation intensities differ from run to run. For rigorous testing and validation of the method, multiple runs are needed and their results have to be statistically evaluated. Figure 8 (5 th column) shows histograms for both disaggregated and observed time series for each station. The comparison of histograms derived from disaggregated and observed values reveals that the empirical distributions are similar. The falling limb of the histograms is also reliably reconstructed by the cascade model for which 100 runs have been considered to compute the histograms.
In addition to this visual comparison, the evaluation has been carried out according to the validation approaches described by Olsson (1998) and Güntner et al. (2001). Following their ideas, Quantile-Quantile plots (Q-Q plots) of precipitation intensities are shown in Fig. 13, with close attention paid to the highest 1% of precipitation intensities. Since autocorrelation structure 5 is not explicitly warranted by the cascade model, this feature is also tested (see, Fig. 14). As the common performance measures cannot be applied appropriately for random distributions of daily disaggregations, other performance criteria have to be considered. An approach similar to that described by Olsson (1998) was chosen for that reason (see Tab. 7).
First, the simulation of peak intensities is studied through comparing observed and disaggregated intensities in a Q-Q plot (Fig. 13). For each station for which precipitation is available the highest 1% of disaggregated intensity values is plotted against Other characteristics that are also relevant for evaluations of sub-daily precipitation characteristics are summarised in Tab. 20 7. The mean duration of events ranges from 3 to 5 hours and is overestimated for all stations, which was also found by Olsson (1998) and Güntner et al. (2001). In contrast, the mean precipitation total of events derived through disaggregation is on average similar to the respective observed value. This finding holds for all stations. It is evident that this value is higher in the subtropics than in the mid-latitudes. Although the total annual rainfall in Tucson is comparably small and the number of events per year is low, the average rainfall of events is also higher than in the mid-latitudes. This feature is correctly predicted by the cascade 25 model. The duration of dry periods is also in good agreement compared to observations. Even though the length of events is over-predicted, the characteristics of the observed precipitation time series are captured very well for each site by the cascade model.
To conclude, the cascade model preserves major characteristics of the observed hourly time series. However, these sub-daily characteristics can only be statistically evaluated due to the probabilistic nature of the approach. precipitation features given that intensities, major characteristics of precipitation events, and the autocorrelation structure of the disaggregated time series are in good agreement with observation.

Conclusions and outlook
The application of a simple and easy-to-use toolbox of disaggregation methods has been presented. Most of the methods included in MELODIST are parsimonious with respect to theory and computational costs (disaggregating 5 years of daily 5 precipitation recordings using the cascade model takes less than 4 seconds on a notebook with a 2 GHz i7 CPU). The basic levels of complexity have been chosen keeping practitioners in mind who need a package that is capable of disaggregating all relevant meteorological variables needed for environmental modelling. Available studies on disaggregation often focus on single variables such as precipitation rather than providing a unified framework for disaggregation. However, the presented package can be easily extended by more complex methods available in the literature as it provides basic functionalities for Some of the methods provided by MELODIST are based upon analyses of time series for parameter estimation, which requires a certain quality of data to derive sound parameters for performing the disaggregation runs. In general, it is important to note that data homogeneity might not always apply to long time series as changes in the instrumentation, microclimate, 25 and processing of data might have caused discontinuities in the time series (see, e.g., Rassmussen et al., 1993). For instance, Maturilli et al. (2013a) describe trends in the Ny Ålesund datasets, which are also tested herein. This is especially important if statistical disaggregation methods are applied that have been tuned for small periods of time only. Moreover, the limited availability of hourly observations involved in the statistics achieved in this study has to be carefully reviewed with respect to representativeness from a climatological point of view. In this study, different stations have been considered to investigate the 30 robustness of methods rather than drawing conclusions in terms of climatic differences.
Homogeneity might be also relevant for disaggregation of time series that are subject to changes in climate. Ideas to cope with changing climatic conditions for disaggregation approaches are currently investigated. Two examples relevant in this context for the statistics based cascade model are a circulation-based parameterization in order to better predict changing weather patterns related to changing climate (Lisniak et al., 2013) and an intensity-based categorisation (Anis and Rode, 2014). Current research also focuses on the incorporation of the Clausius-Clapeyron relation to better predict rainfall intensities in future climates (Bürger et al., 2014). These studies only address single stations or a limited study area without the consideration of different climates. Hence, the applicability of new methods should also be critically reviewed with respect to transferability.

5
In contrast to weather generators and dynamical downscaling approaches, physical consistency among the meteorological variables considered in this framework is not inherent in the methodology. This limitation might restrict the methodology to derive input data only for conceptual models that are not pure physics-based approaches as the latter are more demanding with respect to this consistency. However, for most conceptual "grey box" models (see, e.g., Refsgaard, 1996) the quality of data provided by this disaggregation methods should be sufficient as tested in the framework of other model experiments 10 (Waichler and Wigmosta, 2003). A better representation of the dependencies among the most relevant meteorological variables should be addressed explicitly in the future. Moreover, further emphasis should be on spatial consistency in disaggregation as already pursued by some authors (see, e.g., Koutsoyiannis, 2003;. The ongoing research on disaggregation methods underlines the need for sound and robust tools for disaggregating meteorological variables. Even though MELODIST provides robust methods that do not include those very recent developments, it might serve as tool 15 for both practitioners and scientists. For the latter group, MELODIST could be viewed as framework for performing future research on disaggregation since new disaggregations methods can be easily plugged in. AG and VLV -Vorarlberger Landes-Versicherung VaG. We also acknowledge the work of all the institutions that collect meteorological data and share these data with the public, especially those whose data have been used in the framework of this study: Koninklijk Nederlands Mete-da Prefeitura do Rio de Janeiro, National Oceanic and Atmospheric Administration (NOAA) / National Climatic Data Center (NCDC). We would like to thank Hannes Müller for the fruitful discussion on rainfall disaggregation. His helpful remarks on the discussion paper have been addressed in the final version of this manuscript. Last but not least, we wish to thank Ina Pohle and one anonymous referee for their helpful and constructive reviews which greatly helped to improve the manuscript. Ångström, A.: Solar and terrestrial radiation. Report to the international commission for solar research on actinometric investigations of solar and atmospheric radiation, Q. J. Roy. Meteor. Soc., 50, 121-126, doi:10.1002/qj.49705021008, 1924 Anis, M. R. and Rode, M.: A new magnitude category disaggregation approach for temporal high-resolution rainfall intensities, Hydrol.

R1
Scaling of potential shortwave radiation (Liston and Elder, 2006) deterministic no

P2
Cascade model (Olsson, 1998) stochastic yes P3 Redistribution according to another station deterministic no

X1
Linear interpolation deterministic no Table 6. Model performance measures for radiation disaggregation (R2 model).xo andxs denote the mean values of observed and disaggregated shortwave radiation, respectively. The standard deviation of the observed (σo) and disaggregated (σs) time series are also specified.