We introduce r.randomwalk, a flexible and multi-functional open-source tool
for backward and forward analyses of mass movement propagation.
r.randomwalk builds on GRASS GIS (Geographic Resources Analysis Support System – Geographic Information System), the R software for statistical computing
and the programming languages Python and C. Using constrained random walks,
mass points are routed from defined release pixels of one to many mass
movements through a digital elevation model until a defined break criterion
is reached. Compared to existing tools, the major innovative features of
r.randomwalk are (i) multiple break criteria can be combined to compute an
impact indicator score; (ii) the uncertainties of break criteria can be
included by performing multiple parallel computations with randomized
parameter sets, resulting in an impact indicator index in the range 0–1;
(iii) built-in functions for validation and visualization of the results are
provided; (iv) observed landslides can be back analysed to derive the
density distribution of the observed angles of reach. This distribution can
be employed to compute impact probabilities for each pixel. Further, impact
indicator scores and probabilities can be combined with release indicator
scores or probabilities, and with exposure indicator scores. We demonstrate
the key functionalities of r.randomwalk for (i) a single event, the Acheron rock avalanche in New Zealand; (ii) landslides in a
61.5 km

Mass movement processes such as landslides, debris flows, rock avalanches, or snow avalanches may lead to damages or even disasters when interacting with society. Computer models predicting travel distances, hazardous areas, impact energies, or travel times may help the society to mitigate the effects of such processes and, consequently, to reduce the risk and the losses (Hungr et al., 2005).

Physically based dynamic models are used for in-detailed analyses of specific events or situations (e.g., Savage and Hutter, 1989; Takahashi et al., 1992; Iverson, 1997; Pudasaini and Hutter, 2007; McDougall and Hungr, 2004, 2005; Pitman and Le, 2005; Christen et al., 2010a, b; Mergili et al., 2012b; Pudasaini, 2012; Hergarten and Robl, 2015; Mergili et al., 2015). Since the processes are complex in detail and the input parameters are uncertain, simplified conceptual models for the motion of mass flows are today used in combination with GIS (Geographic Information System). These models may be used for single events. However, they are particularly useful to indicate potential impact areas at broader scales. Hypothetic mass points are routed from a release pixel through a digital elevation model (DEM) until a defined break criterion is reached. Monte Carlo techniques (random walks, Pearson, 1905; Gamma, 2000) or multiple flow direction algorithms (Horton et al., 2013) are employed to simulate the lateral spreading of the flow.

The break criteria often consist in threshold values of the angle of reach (i.e., the average slope of the path) or horizontal and vertical distances (Lied and Bakkehøi, 1980; Vandre, 1985; McClung and Lied, 1987; Burton and Bathurst, 1998; Corominas et al., 2003; Haeberli, 1983; Zimmermann et al., 1997; Huggel et al., 2002, 2003, 2004a, b), sometimes related to volume (Rickenmann, 1999; Scheidl and Rickenmann, 2010). However, those relationships usually display a large degree of scatter. Further, key parameters for design issues, such as impact pressures, are not provided (Hungr et al., 2005).

Some approaches include simplified physically based models going back to the mass flow model of Voellmy (1955), relating the shear traction to the square of the velocity and assuming an additional Coulomb friction effect (Pudasaini and Hutter, 2007). They consider only the centre of the flowing mass, but not its deformation and the spatial distribution of the flow variables. This type of models is mainly used for snow avalanches and debris flows (Perla et al., 1980; Gamma, 2000; Wichmann and Becht, 2003; Mergili et al., 2012a; Horton et al., 2013).

Various – mostly open-source – software tools for conceptual modelling of
mass movements (mainly flows) at medium or broad scales are available (e.g.,
Gamma, 2000; Wichmann and Becht, 2003; Mergili et al., 2012a; Horton et al.,
2013). However, most of these tools lack substantial features: (i) they are limited to one single type of break criterion; (ii) they do not
allow one to directly account for the uncertainty of the break criteria; (iii) they do not allow one to back calculate the statistics of a set of observed mass
movements; and (iv) they do not offer built-in functionalities for evaluating
the model results against observations. Consequently, the key objectives of
the present study are

to introduce r.randomwalk, a freely available, comprehensive and flexible tool for routing mass movements;

to demonstrate the various functionalities of r.randomwalk, particularly in terms of overcoming the issues (i)–(iv);

to discuss the potentials and limitations of this tool.

Next, we will describe the r.randomwalk software tool (Sect. 2). Furthermore, we will present the test areas and the results (Sect. 3). Finally, we will discuss the findings (Sect. 4) and conclude with some key messages of the work (Sect. 5).

r.randomwalk is implemented as a raster module of the open-source software
package GRASS (Geographic Resources Analysis Support System) GIS 7 (Neteler and Mitasova, 2007; GRASS Development Team,
2015). We use the Python programming language for data management,
pre-processing and post-processing tasks (module r.randomwalk). The routing
procedure (see Sect. 2.2–2.4) is written in the C programming language
(sub-module r.randomwalk.main). The R software environment for statistical
computing and graphics (R Core Team, 2015) is employed for built-in
validation and visualization functions (see Sect. 2.5). Parallelization of
multiple model runs is enabled. It allows for the exploitation of all
computational cores available, speeding up analysis processes. The
parallelization procedure is implemented at the Python level (analogous to
the way described in Mergili et al., 2014): the module r.randomwalk produces
a batch file for each model run. This batch file calls the Python-based
sub-module r.randomwalk.mult, which is then used to launch r.randomwalk.main
with the specific parameters for the associated model run. Thereby, the
Python library “Threading”, a higher-level threading interface, and the
Python module “Queue”, a class helping to block execution until all the
items in the queue have been processed, are exploited. Parallel processing
serves for reducing the computational time in the following contexts:

Analyses with multiple random subsets of the release areas or coordinates. In each model run, one subset is used for back calculating the probability density function (PDF) of the angle of reach, the other subset is employed for validating the distribution of the impact probability derived with this PDF against the observed deposition areas.

Analyses with multiple combinations of input parameters varied in a controlled or randomized way, enabling one to consider parameter uncertainties and to explore parameter sensitivity.

r.randomwalk was developed and tested with Ubuntu 12.04 LTS and is expected to also work on other UNIX systems. A simple user interface is available. However, the tool may be started more efficiently through command line parameters, enabling a straightforward batching on the shell script level. This feature facilitates model testing, the combination with other GRASS GIS modules and the consideration of process chains (i.e., using the output of one analysis as the input for the next one). The logical framework is illustrated in Fig. 1, the key variables used in r.randomwalk are summarized in Table 1.

Logical framework of r.randomwalk. Only those components covered in the present article are shown.

All tests (see Sect. 3) are performed on an Intel^{®} Core i7 975
with 3.33 GHz and 16 GB RAM (DDR3, PC3-1333 MHz), exploring a maximum of eight
cores through hyperthreading.

The term random walk refers to a Monte Carlo approach for routing an object through any type of space. The term was introduced by Pearson (1905). Constrained random walk approaches are used for routing mass movements such as debris flows through elevation maps (DEMs), e.g. by Gamma (2000), Wichmann and Becht (2003), Mergili et al. (2012a), and Gruber and Mergili (2013). Such methods enable a certain degree of spreading of the movement by also considering other routing directions than the steepest descent. It avoids the concentration of flows – or any other types of mass movements – to linear features, which would not be realistic for debris flows, snow avalanches, or other types of mass movements. However, the routing is constrained or weighted by factors such as the slope or the perpetuation of the flow direction. An alternative to constrained random walk routing would consist in a multiple flow direction algorithm (Horton et al., 2013).

Control length

In the context of r.randomwalk, each random walk routes a hypothetic mass
point from a release pixel through the DEM until a break criterion is
reached (see Sect. 2.3). A large set of random walks is required for each
mass point in order to achieve a satisfactory cover of the possible impact
area. r.randomwalk is designed for

one set of random walks for one mass movement, starting from a defined set of coordinates;

multiple sets of random walks for one mass movement, one set starting from each pixel of the release area;

sets of random walks for multiple mass movements in a study area (either starting from one set of coordinates per mass movement, or from all pixels defined as release areas);

one set of random walks starting from each pixel in the study area.

Overlay rules for different random walks and sets of random walks are applied (see Sect. 2.4).

Summary of the key variables used in r.randomwalk.

During the pixel-to-pixel routing procedure, turns of > 90

In order to constrain upward movements, a user-defined maximum vertical
run-up height

Certain types of mass flows (i.e., those with high viscosity) hardly change
their flow direction sharply. The user-defined horizontal control distance

Possibilities to define the break criteria. The flags provided
through the command line or the user interface define the type of break
criterion. RC is release coordinates (release from highest points of
release areas), RP is release pixels (release from all pixels within
release areas),

The probability

The break criteria for the random walks (see Sect. 2.3) are directly or
indirectly related to the travel distance

Each random walk continues until at least one neighbour pixel is outside the
study area, or until the user-defined break criterion is fulfilled. The
break criteria are the key parameters for estimating the mass flow impact
areas and can be defined in various ways (Table 2):

The angle of reach

Empirical–statistical relationships or the semi-deterministic model may be
applied in a large number of parallel computations with randomized values of
the parameters

An impact probability raster map

If an inventory of events is available, the observed impact areas may be
back calculated by routing each random walk until it leaves the observed
impact area of the corresponding mass movement. This mode can be used to
explore the statistical distribution of

Types of rules and relationships supported by r.randomwalk.

The overlay of individual
random walks operates at two levels:

Random walks of the same mass point: impact frequency (IF) is increased by 1 for each random walk
predicting an impact. IIS is increased by 1 for each model where at least 1
random walk predicts an impact. The average angle of path – and therefore
also

Sets of random walks for different mass points: the values of IF for all
random walks impacting a pixel are just added up whilst the maximum of IIS is
applied to each pixel. The issue gets more complex when it comes to

The resulting maps of

r.randomwalk includes three possibilities for validation of the model results. All three build on the availability of a raster map of the observed deposition area of the mass movement(s) under investigation. All parts of the observed impact areas outside of the observed deposition areas are set to no data (Fig. 3).

Model validation with an ROC plot, relating the false positive
rate

For IIS, the true positive (TP), true negative (TN), false positive (FP), and
false negative (FN) predictions are counted on the basis of pixels and put
in relation. All pixels with IIS

ROC (receiver operating characteristics) plots are produced for III or

If only one mass movement is considered, a longitudinal profile may be
defined by a set of coordinates of the profile vertices. The observed and
predicted (IIS

The Acheron rock avalanche in Canterbury, New Zealand (Fig. 4), was
triggered approx. 1100 years BP (Smith et al., 2006). Within the present
study, the release volume,

We use this case study for demonstrating how to compute the impact indicator
index III from an elevation map, the release area, and the release volume.
Before doing so, we have to analyse the influence of the pixel size and the
parameters

III is computed by executing r.randomwalk 100 times, with the parameter
values optimized according to Table 4. We explore an empirical–statistical
relationship for

Acheron rock avalanche.

Empirical–statistical relationship relating the angle of reach

Figure 6 summarizes the findings of the test s 1–3 (see Table 4). Test 1
leads to the expected result that the predicted impact area increases with
the number of random walks. However, the predicted impact area is also a
function of the pixel size: with larger pixels, less random walks are needed
to cover an area of similar size than with smaller pixels. Figure 6a further
indicates that the possible impact area is not fully covered even at 10

Tests of the parameters

Test criteria:

Results of the tests 1–3 (number of test indicated in the yellow
circle). Number of random walks plotted against

On the other hand, the quality of the prediction in terms of AUC

Sensitivity of impact area and AUC

Figure 6c illustrates that, at

On the other hand, the value of

Within the tested ranges of parameter values, the quality of the prediction
is highest at values of

Impact indicator score for the Acheron rock avalanche.

Figures 6 and 7 indicate that the initial values of

III was generated within a computational time of 188 s.

Between 7 and 9 August 2009, Typhoon Morakot struck Taiwan and triggered enormous landslides, causing significant land cover change (Fig. 9). More than 22 000 landslides were recorded in southern Taiwan (Lin et al., 2011). One of the hot spots of mass wasting was the Kao Ping Watershed (Wu et al., 2011), where the extremely heavy rainfall (in total, more than 2000 mm depth and 90 h duration) triggered a catastrophic landslide in the Hsiaolin Village (Kuo et al., 2013).

Location, terrain and landslide inventory of the Kao Ping Watershed, Taiwan. Comparison of the satellite images illustrates the landslide-induced land cover changes associated with the Typhoon Morakot. The landslide inventory builds on the interpretation of the FORMOSAT-2 imagery.

We consider a 61.5 km

A set of random walks (

After completing all random walks for the study area, the statistical
distribution of

We perform a forward analysis of

Steps 2. and 3. are repeated for 100 randomly selected subsets (parallel
processing is applied). The final map of

We refer to this work flow as test 1 and repeat the analysis with starting
random walks not only from the release points but also from all the pixels
within the observed release areas (test 2). This means that the CDF is
derived from a much larger sample of data than when considering only one
point per landslide for starting random walks. We exclude all sets of random
walks yielding

Starting sets of 10

Each of the impact probabilities shown in Fig. 11 represents the overlay of
100 analyses where random sets of 80 % of the landslides are used for
deriving the CDF and the remaining 20 % are used for computing the impact
probabilities. The maps illustrate the maximum values of

Histograms, probability densities, and cumulative densities of

The prediction quality is tested for each of the 100 model runs for the two
tests, producing sets of 100 ROC curves (Fig. 12).
AUC

In contrast, the procedures demonstrated in the two tests vary strongly in
their scope of applicability. We have demonstrated the methodologies by
back calculating observed landslides. As soon as this is done, one may go
one step further:

The methodology shown in test 1 can be employed to make forward predictions for defined expected future landslides, given that a sufficient set of observed landslides of similar behaviour is available to derive the CDF.

The methodology demonstrated in test 2 can be used in combination with maps of landslide release probability to explore the composite probability of a landslide impact (see Sects. 2.4 and 4).

Impact probability in the range 0–1.

ROC plots illustrating the prediction quality of

In either case the statistics (see Fig. 10) have to be derived with the same
type of approach later used for producing the

As most mountain areas worldwide, the Pamir of Tajikistan experiences a significant retreat of the glaciers. One of the consequences thereof consists in the formation and growth of lakes, some of which are subject to glacial lake outburst floods (GLOFs), which may evolve into destructive debris flows (Mergili and Schneider, 2011; Mergili et al., 2013; Gruber and Mergili, 2013). No records of historic GLOFs in the test area are known to the authors. However, in August 2002 a GLOF in the nearby Shakhdara Valley evolved into a debris flow, which destroyed the village of Dasht, claiming dozens of lives (Mergili et al., 2011).

The frequency of such events is low and historical data are sparse. Consequently, possible travel distances of GLOFs may not be derived in a purely statistical way. Instead, we have to use published empirical–statistical relationships and simple rules to produce an impact indicator score (IIS) map.

We compute IIS with regard to GLOFs for a 2106 km

A set of random walks (

Figure 14 illustrates the possible impact areas of GLOFs in the Gunt Valley study area according to the relationships listed in Table 5.

Figure 14a shows the impact indicator score IIS i.e., the number of relationships
predicting an impact, resulting from test 1 (rule 1 applied with

The test area in the Gunt Valley, Tajikistan.

Empirical–statistical relationships and simple rules used for computing the IIS of GLOFs in the Gunt Valley (see Table 3).

Note that Fig. 14 only indicates the tendency of an already released GLOF to
impact certain pixels. It does not provide any information on the
susceptibility of a certain lake to produce a GLOF at all. Earlier,
Mergili and Schneider (2011) and Gruber and Mergili (2013) have attempted to
combine GLOF release indicators with impact indicators and land cover maps
to generate hazard and risk indicator maps. However, the results of their
studies may underestimate the possible impact areas as the travel distance
was computed on a pixel-to-pixel basis, possibly yielding too low values of

The robustness and appropriateness of the rules and relationships for
low-frequency events, such as GLOFs (see Table 5), is questionable. The rules
building on a unique value of

We have measured computational times of 1520 s for test 1 and 1556 s for test 2.

Possible GLOF impact areas in the Gunt Valley, Tajikistan.

Whilst conceptual tools are commonly applied for routing mass movements at medium and broad scales, most of them use single values or rules as break criteria, disregarding the high degree of uncertainty (e.g., Gamma, 2000; Wichmann and Becht, 2003; Huggel et al., 2002; Horton et al., 2013; Blahut et al., 2010). r.randomwalk introduces a set of tools to deal with uncertain break criteria in a flexible way, depending on the quality of rules or relationships available. In general, empirical–statistical relationships represent rough simplifications as mass movement processes may also stop when reaching valleys of higher order, run against opposite slopes or lose energy when bending sharply. However, relatively robust rules or relationships exist for the most common types of processes such as rock avalanches (Scheidegger, 1973; see Fig. 5) or debris flows (Rickenmann, 1999). They build on data sets large enough to derive meaningful envelopes and to compute impact indicator indices with r.randomwalk. Relationships for less frequent types of processes are less robust as it was illustrated for GLOFs (Haeberli, 1983; Zimmermann et al.; 1997; Huggel et al., 2002; Huggel, 2004; see Sect. 3.3.2). In such cases we recommend to compute impact indicator scores building on more than one model, as shown by Gruber and Mergili (2013) and in the present work. Impact indicator indices and scores are mainly useful for anticipating the possible impact area of expected single events (see Sect. 3.1.2), or for application at broader scales (see Sect. 3.3.2).

The impact probability is useful for predicting possible impact areas of
mass movements in areas where many events are documented, but the volumes of
possible future events are not known. Whilst in the present paper it was
demonstrated how to compute impact probabilities related to observed release
areas, r.randomwalk also includes the option to combine the impact
probability with the release probability

The sensitivity of r.randomwalk to variations of the parameters

Overestimating the travel distance at a certain pixel is avoided by choosing
sufficiently high values of

We have demonstrated how to estimate the prediction quality of III and

r.randomwalk includes a break criterion building on the two-parameter friction model of Perla et al. (1980) (see Sect. 1 and Table 3), which can be used to compute flow velocities (e.g., Wichmann and Becht, 2013; Mergili et al., 2012a; Horton et al., 2013). Evaluating this functionality has to build on (i) specific strategies for the sensitivity analysis and optimization of multiple parameters and (ii) a sound comparison with the outcome of physically based models. This effort will be presented in a separate article (Krenn et al., 2015). Further, the parameter sensitivity and optimization code AIMEC (Fischer, 2013) can be directly coupled to r.randomwalk.

We have introduced the open-source GIS tool r.randomwalk, designed for conceptual modelling of the propagation of mass movements. r.randomwalk offers built-in functions for considering uncertainties and for validation. Employing a set of three contrasting test areas, we have demonstrated (i) the possibility to combine results yielded with various break criteria into one impact indicator score; (ii) the option to explore multiple computational cores for combining the results obtained with many randomized parameter combinations into an impact indicator index; (iii) the possibility to back calculate the CDF of the angles of reach of observed landslides, and to use this CDF to make forward predictions of the impact probability; and (iv) integrated functions for the validation and visualization of the results. This includes strategies to properly separate the data sets for parameter optimization and model validation.

We have further shown that controls for smoothing of the flow path and the avoidance of circular flows have to be introduced to avoid underestimating travel distances and impact areas. The number of random walks executed for each mass point and the pixel size influence the level of conservativeness of the results rather than the quality of the prediction. The scope of applicability of r.randomwalk strongly depends on the availability of robust break criteria and on the availability of reference data for evaluation.

The model codes, a user manual, the scripts used for starting the tests
presented in Sect. 3 and some of the test data are available at

The work was conducted as part of the international cooperation project “A GIS simulation model for avalanche and debris flows” funded by the Austrian Science Fund (FWF) and the German Research Foundation (DFG). Further, the support of Massimiliano Alvioli, Matthias Benedikt, Yi-Chin Chen, Ivan Marchesini, and Tim Davies is acknowledged. Edited by: T. Poulet