CranSLIK v2.0: improving the stochastic prediction of oil spill transport and fate using approximation methods

Abstract. Oil spill models are used to forecast the transport and fate of oil after it has been released. CranSLIK is a model that predicts the movement and spread of a surface oil spill at sea via a stochastic approach. The aim of this work is to identify parameters that can further improve the forecasting algorithms and expand the functionality of CranSLIK, while maintaining the run-time efficiency of the method. The results from multiple simulations performed using the operational, validated oil spill model, MEDSLIK-II, were analysed using multiple regression in order to identify improvements which could be incorporated into CranSLIK. This has led to a revised model, namely CranSLIK v2.0, which was validated against MEDSLIK-II forecasts for real oil spill cases. The new version of CranSLIK demonstrated significant forecasting improvements by capturing the oil spill accurately in real validation cases and also proved capable of simulating a broader range of oil spill scenarios.


Introduction
Oil spills can have damaging effects on the environment and also on human activities and infrastructure such as fishing, recreation, harbours and power plants.Adding to the tragic direct fatalities and injuries caused, an oil spill accident can also have an adverse long-term impact on human health.Oil spill accidents can lead to severe financial implications too.Costs emerge not only with the level of damage caused, but also during the implementation of the response strategy and subsequently during the restoration phase.The National Commission on the BP Deepwater Horizon Oil Spill and Off-shore Drilling prepared a very comprehensive and detailed report for the US President about the Deepwater Horizon Spill, where it is reported (Graham et al., 2011, p. 210) that just the cost for the gulf restoration will require USD 15 billion to USD 20 billion.For all these reasons, it is crucial that oil spill planning, preparedness, and response requirements are in place to deal with any eventuality.Consequently, considerable effort and resources are expended on mitigating these effects, including the planning of response strategies.These contingency plans typically identify appropriate protective and clean-up measures for different scenarios and plan the response operations for fast, efficient execution and targeting of those areas most at risk from the oil pollution.
Planning at both the strategic and operational level can benefit from the use of computerised models which predict the path and spread of spilt oil and changes in its state.A substantial amount of money has been spent by governments and the oil industry in an effort to develop the capability to predict the fate of spilt oil.Several models have been developed over the years (Reed et al., 1999), some relying on commercial software (e.g.Li et al., 2013), while others are developed using in-house expertise and coding (e.g.De Dominicis et al., 2013a, b).While the degree of sophistication varies and different methodologies can be employed to account for the involved processes, these models typically solve, using numerical methods, partial differential transport equations for the advection-diffusion of the oil and incorporate submodels to simulate a selection of the processes that operate on the oil such as evaporation, emulsification, dispersion and so on (Fingas, 2011).Figure 1   as weathering (MEDESS-4MS, 2014;ITOPF, 2014a).These models can be quite complex and computationally expensive, which means that faster-response models using less resourceintensive, approximation methods potentially have a complementary role to play in preliminary scenario planning when it is not necessary to employ the full capabilities of a more comprehensive model.A fast-response model is particularly appropriate for stochastic methods such as Monte Carlo simulation because the short computational time allows many simulations to be run quickly, and the results can then be processed to predict the likelihood of several possible outcomes.Another advantage of this approach is that it can efficiently encapsulate the uncertainty that arises as a result of the stochastic nature of certain variables, such as current velocity, which cannot be known beforehand.This is potentially very useful during a response operation, because it provides planners with a richer forecast on which to base decisions than a simple "best guess" of the trajectory that the oil slick will follow.
The purpose of the CranSLIK oil spill model is to fulfil this responsive role.The first version of this model, namely CranSLIK v1.0, is described in Snow et al. (2014).

Aims
The primary objectives of this study are to investigate whether the forecasting accuracy of CranSLIK v1.0 can be improved by taking into account additional factors -specifically sub-surface currents, oil density and sea surface temperature -and also to extend and validate the model's functionality to a greater range of scenarios.Collecting comprehensive data on real oil spills in the field is difficult and such data sets are not readily available.For this reason, CranS-LIK bases its development and validation on results from MEDSLIK-II (De Dominicis et al., 2013b), which is a currently operational Mediterranean oil spill model that is supported by a consortium of four European institutions.
Data from two real oil spill cases are available from the MEDSLIK-II team.The first case is an oil spill of 680 t of crude oil that occurred approximately 130 km off the coast of Algeria in 2008 and was studied by De Dominicis et al. (2013a).The second case is a spill in 2006 from a coastal power station at Jieh in Lebanon and was the subject of a study by Coppini et al. (2011).We validate our methodology in the context of these two real case scenarios.
The key steps of this work were as follows.
1. Identify additional parameters which could be significant for short-term and long-term oil spill prediction.
2. Create samples of input parameter values.
3. Run the MEDSLIK-II simulation using the samples as inputs.
4. Based on the MEDSLIK-II output, fit regression models to map the inputs to the response variables.
5. Incorporate the regression models into the CranSLIK prediction code.
6. Test the developed code against MEDSLIK-II forecasts for real scenarios, by running both models using identical inputs and comparing the results.
2 Stochastic and deterministic modelling 2.1 CranSLIK v1.0 CranSLIK v1.0 forecasts the transport, shape and size of an oil slick in hourly time steps after instantaneous release of the oil at a point in space.
The trajectory of the slick's centre of mass is predicted using analyses of forecasts of wind and surface current velocity which are provided by atmospheric and oceanographic models.Wind forcing, i.e. the wind velocity components at 10 m above the sea surface, is provided by meteorological models, while currents and temperature are provided by oceanographic models.The atmospheric forcing is provided by the European Centre for Medium-Range Weather Forecasts (ECMWF), with 0.25 • space and 6 h temporal resolution.The current velocities used in this work come from the Mediterranean Forecasting System (MFS) described in Pinardi et al. (2003) and Pinardi and Coppini (2010).The MFS system is composed of an ocean general circulation model (OGCM) at 6.5 km horizontal resolution and 72 vertical levels (Tonani et al., 2008;Oddo et al., 2009).Every day MFS produces forecasts of temperature, salinity, intensity and direction of currents for the next 10 days.Once a week, an assimilation scheme, as described in Dobricic and Pinardi (2008), corrects the model's initial guess with all the available in situ and satellite observations, producing analyses that are initial conditions for 10-day ocean current forecasts.The modelled currents and wind fields can be affected by uncertainties that arise from model initial conditions, boundaries, forcing fields, parameterisations, etc.The hourly mean analyses are used to eliminate the additional uncertainty connected with forecasts for both atmospheric and oceanographic input data.Analyses using the output of the same oceanographic and atmospheric models have been used for this investigation.
CranSLIK v1.0 can be run in both deterministic and stochastic mode.The former uses the wind and current forecasts exactly as output by the oceanographic and atmospheric models and produces a single forecast, whereas stochastic mode recognises that there is inherent uncertainty in the forecasts and runs the model as a Monte Carlo simulation.For a Monte Carlo run, the wind and current forecasts are sampled randomly from representative probability distributions.Based on the results of multiple runs using different samples, the results can be analysed as a range of possible outcomes with associated probability estimates.This stochastic capability is believed to be one of CranSLIK's key benefits and is made practical by the speed at which it runs.
In CranSLIK v1.0 the shape of the slick is modelled as circular and the radius forecast is based on the mass of the spill and the age of the slick.Validation of the model was carried out using the point-mass Algeria test case which is described in Sect.4.1.
CranSLIK v1.0 established the methodology and demonstrated the potential of using approximation methods with stochastic capability.However, it was recognised that there was scope for developing the model further, because its algorithms had been limited to using wind and surface current velocities, spill mass and slick age.It was found that the resulting trajectory forecast needed to be reset to the MEDSLIK-II forecast every few hours to achieve reasonable accuracy, so investigation was recommended to identify other significant factors which could be used to improve forecasting accuracy.The model had also been limited to a particular type of spill, namely, a release of oil at a single point in both time and space in the open sea, so extending its scope to continuous spills and interaction with coastlines, for example, was another potential area for development.

MEDSLIK-II
MEDSLIK-II simulates the evolution of a surface oil spill to produce forecasts of the oil's movement and the change in its condition due to weathering.The advection-diffusion of the oil is modelled using a Lagrangian approach whereby the slick is represented as a collection of constituent particles.Four weathering processes -spreading, evaporation, dispersion and emulsification -are modelled using empirical formulae which are largely based on the methods of Mackay et al. (1980).The resulting forecasts are given as oil concentrations at the water surface, in the water column and on the coast.In addition to starting from the initial release of the oil, MEDSLIK-II can be initialised to an existing slick using satellite data.
The key input parameters to MEDSLIK-II are the forecasts of wind and sea current, which can be obtained from a variety of atmospheric and oceanographic models already described in the previous section.Other important input parameters are the sea surface temperature forecast, the type and properties of the oil and the spill's location, date and time, release rate and release period.
Further details of MEDSLIK-II, including the theoretical foundations as well as the numerical validation and simulations, can be found in De Dominicis et al. (2013a, b).

Methodology
In order to obtain the data required to investigate whether additional factors should be included in the CranSLIK forecasting algorithms, multiple scenarios were simulated in MEDSLIK-II, with each run using a different combination of values of the input variables, viz.wind velocity, current velocity, oil density, sea surface temperature and spill mass.The location and time of the spill were the same for each scenario and were based on the Algeria test case.The other input variables used values taken from the following intervals: For wind velocity, current velocity and sea surface temperature, the mean and range of each interval were loosely based on the forecasts used for the test cases.The range of oil densities was determined by MEDSLIK's capability and the maximum spill size was set to 1000 t to capture the majority of oil spills (ITOPF, 2013).From each interval, several values were chosen, and each simulation run then used a different combination of the selected values, which were held constant over time and space for the duration of the run.The results were then analysed by multiple polynomial linear regression to identify relationships between these inputs and the variables describing the slick's advection and spread.Similar approaches have been adopted to stochastically approach complex engineering problems, as can be found in Hanak et al. (2015) and Salonitis and Kolios (2014).
Polynomial linear regression is a common technique used to derive a model of the form where y is the response, dependent, variable; x 1 , . .., x n are the predictor, independent, variables, and is the residual error (Choi et al., 2007).In simple linear regression there is one predictor variable, i.e., n = 1, whereas in multiple linear regression there is more than one, i.e., n > 1, predictor variable.Each function f 1 , . .., f k is a single term that is a product of non-negative, integer powers of the predictor variables,  so it has the following form: x α 1 1 x α 2 2 . ..x α n n , where α i 0 for i = 1, . .., n.Some typical examples are x 1 , x 2 1 , x 1 x 2 and x 3 2 x 3 x 2 4 .Consequently, the number of possible models is unlimited and a principal aim of the analysis is to select terms that are significant predictors.
The method is called linear regression because the model is linear in the coefficients β 0 , . .., β k .Once the f i terms have been selected, the coefficient values that make the model "best fit" the data are derived.There are different techniques for doing this, but a widely used method that was employed in this study is least squares estimation.This calculates the coefficient values that minimise the sum of squares of the errors , where the sum is over all predicted data points.

Input parameter study
The initial part of the investigation considered the effect of the following input parameters: wind velocity, sea current velocity, oil density, sea surface temperature and spill mass, which essentially represent the x i independent variables in the polynomial linear regression.The prime focus was on the forecast of the oil slick's trajectory, since in CranSLIK v1.0 this was proving more difficult to predict than the radius of the slick.
Various different models were generated that included some or all of the studied parameters either as linear terms, as higher power terms, or combined as mixed terms, which essentially constitute the f i terms in the polynomial linear regression.All models were investigated in terms of the accuracy of their forecasts for the Algeria test case.Performance was measured and compared using the mean absolute error in CranSLIK's prediction of the location of the slick's centre of mass over a 36 h forecast.The preferred model was a straightforward linear combination of wind velocity and the advective current velocity.When other models that included additional terms were compared with this model, in most cases they performed slightly better in the Algeria point case (up to 6.6 %), whereas in the Algeria contour case they all increased the error (up to 4.6 %).Moreover, the models which performed best in the Algeria point case tended to be the least accurate for the Algeria contour case.Consequently, Eq. ( 1) below was judged to be the best solution in terms of accuracy and simplicity.

Trajectory
Based on the above study we conclude that the wind and the advective current are the main drivers of the MEDSLIK-II forecast.In particular, the regression analysis showed that, in  Once the slick speed has been calculated, the displacement during a time step can be derived easily.The new model uses this method to calculate the displacement east and north and their vector sum then represents the resultant displacement.The sensitivity of the oil spill trajectory forecast due to the choice of current velocity components has been assessed in De Dominicis et al. (2013a, b).MEDSLIK-II allows the user to specify the depth at which the current velocity component should be considered.CranSLIK adopts the same approach, allowing the depth of the advective current in Eq. ( 1) to be selected.It is interesting to note that Eq. ( 1) resembles a rule of thumb given by Fingas (2011, p. 197) and ITOPF (2014b), except it uses 0.935 % of the wind speed instead of 3 %.

Reconstruction of surface
As they age, slicks tend to deform, break up and spread out over a large area (Reed et al., 1999;ITOPF, 2014a).Consequently, modelling a slick as circular can be a limitation.In order to be able to represent slicks of any shape, we decided to treat the slick as a collection of mini-slicks (numbering anything from one to a few thousand), which is a similar methodology to the Lagrangian approach used by many oil spill models, including MEDSLIK-II.Each mini-slick was modelled as a separate entity of circular shape and fixed mass.At each simulated time step its displacement was calculated using Eq. ( 1).In this way, the whole slick could take on any shape, but this was not the only motivation for using this approach -it also had several other advantages, namely the following: -variations of wind and current over large slicks are handled more accurately; -the model can be initialised using data for an observed slick; -continuous spills can be modelled by adding to the collection of mini-slicks as time is advanced during the period when oil is released; -stranding of oil on shorelines can be simulated by removing mini-slicks from the collection.
In addition to calculating mini-slick paths, spreading of each mini-slick was modelled by updating its radius at each time step via a method similar to that used in CranSLIK v1.0 but with the following revisions.Firstly, the oil's density was added to the formula, which slightly improved its accuracy.
Secondly, this formula used the slick's age, so in cases where the model was initialised to an observed slick, its age at the time of observation had to be estimated.For this purpose, an additional formula based on the ratio of the spill's mass to its average concentration was derived.Thirdly, the revised formula for the absolute value of the slick radius was only used to calculate the radius at the end of the first time step.Thereafter, a new formula which expressed the radius as a multiple of this value was used.This was because it gave similar results for a 36 h forecast and continued to give good results beyond this point, unlike the formula for calculating the absolute radius, which became increasingly inaccurate as the length of the forecast increased.This new formula was a function of one variable, the slick age, and was initially a seventh-degree polynomial.However, it was found that this could be closely approximated by Eq. ( 2), which was used in the final version of the model because it had a much simpler form.
slick radius = radius after 1 h × (age of slick in hours) 0.59   (2) A point worth noting here is that if the slick is represented by a sizeable quantity of mini-slicks, the model is not sensitive to the accuracy of the spreading algorithm and mini-slick trajectories will generally be much more important than mini-slick size for accurate forecasting of the way in which the whole slick spreads.
Given that the new methods model a slick as a number of constituent mini-slicks, does this increase the computational expense to unacceptable levels?The answer is no, which is most easily demonstrated by the Algeria test case.The processing for the two scenarios, point and contour, is virtually identical, but the former is represented as 1 mini-slick and the latter by more than 4000, which is probably towards the upper limit likely to be required because the initial observed slick was spread out over a large area.For both of these scenarios, generation of a 36 h forecast took approximately 7 s, with the contour case taking typically around half a second longer.When the same simulation was run in Monte Carlo mode, creation of 10 000 forecasts took about 8 s for the point scenario and around 45 s for the contour scenario.
The results obtained using these proposed new methods on the test cases are presented and analysed in the next section.

Algeria case
The Algeria case was modelled both by initialising it as an instantaneous spill at a point (referred to below as the point scenario) and starting from observed slick data (referred to below as the contour scenario).In both cases, a 36 h CranS-LIK forecast was evaluated against a corresponding forecast generated by MEDSLIK-II.
For both Algeria scenarios, the CranSLIK and MEDSLIK-II trajectory forecasts showed good agreement as seen in Figs. 2 and 3, with the maximum CranSLIK error always below 600 m as shown in Fig. 4.
The slick forecasts after 36 h for both the point scenario and the contour scenario, and for advective currents considered at 0, 10, and 30 m depths, are plotted in Figs. 5 and  6.Oil concentration is not currently modelled in CranSLIK forecasts and is therefore not shown in the respective plots.
By taking the boundary of the CranSLIK slick forecast to be the convex hull of the mini-slicks, it was straightforward to calculate a useful measure of CranSLIK's accuracy, namely the percentage of the oil's mass captured within this closed curve.This measure is plotted in Fig. 7, which shows that in both scenarios and for all advective currents, the minimum amount of oil captured was always above 98 %.Whether this high accuracy was achieved simply because slick size was overestimated was naturally a legitimate question we had.To answer this, we evaluated plots similar to

Lebanon case
For the Lebanon case, it was decided to create a 600 h (25day) forecast which was similar to the forecast period used by Coppini et al. (2011) and provided a significant challenge to the revised model.The CranSLIK and MEDSLIK-II trajectory forecasts for this case are compared in Fig. 8.When the surface current was used as the advective current, CranS-LIK beached all of the oil in hour 323, probably because CranSLIK over-predicted the amount of oil beached compared with MEDSLIK (roughly by a factor of 2).Consequently, for this particular case results are only considered for the first 300 h.From Fig. 9 it can be seen that for roughly the first three-quarters of the forecast period, the distance between the MEDSLIK and CranSLIK centre of mass forecasts was never more than about 5 km, while from Fig. 10 it can be seen that the oil captured by CranSLIK was above 95 %.After this point there is a marked change in CranSLIK's accuracy which is attributed to the onset of oil beaching on the coast.Some example plots to illustrate "good" and "bad" forecasts are given in Figs.11 and 12. Figure 11 shows a representative hour near the point where the forecast started to degrade but when the oil captured was still above 99 %, and Fig. 12 is for the hour when the amount of oil captured was a minimum (66, 82 and 88 % for the 0, 10 and 30 m cases, respectively).

Comparisons with previous version of the model
Where possible, the revised model's forecasts were compared with those given by CranSLIK v1.0.For the Algeria point scenario, the new trajectory forecast is significantly better than the v1.0 forecast, which had a maximum error of approximately 8 km according to Snow et al. (2014).In view of this, it was not surprising that the percentage of oil captured by the new model was much higher.In Snow et al. (2014) this percentage dropped to zero after a few hours of the forecast if no update was provided to the model.Similar comparisons for the Algeria contour scenario and the Lebanon case could not be made because v1.0's functionality did not include initialisation to an observed slick or beaching of oil on the coast.

Conclusions
This paper discusses further development and enhancement of the CranSLIK stochastic model.The forecasts of oil slick trajectory, shape and size obtained using the revised model, namely CranSLIK v2.0, showed good agreement with MEDSLIK-II in the test cases and significant improvement in forecasting accuracy compared with CranSLIK v1.0.For the Algeria test case, CranSLIK v2.0 captured a minimum of 98 % of the amount of oil for a 36 h prediction, while for the Lebanon case with a 10 or 30 m advective current (and ignoring the first hour), it captured a minimum of 97 % for a 19-day prediction.For the Lebanon case using a surface advective current (and again ignoring the first hour), the oil captured never dropped below 92 % for an 11-day forecast.
CranSLIK v2.0 incorporates additional functionality which increases the model's flexibility and its ability to handle a wider range of scenarios.Firstly, it can model both an instantaneous and a continuous release of oil.Secondly, the new model can be initialised to an observed slick.Thirdly, the beaching of oil can now be modelled, which is important because it is often near the coast where the potentially damaging effects of the pollution are most keenly felt.The inclusion of oil-shoreline interaction in the new model appears to give reasonable results, but has not yet been fully developed and requires further work.In particular, the ability to automatically handle any shape of coastline needs to be incorporated into the model, and the accuracy of the beaching algorithm and the way it appears to affect the trajectory forecast needs investigation.A major strength of the developed model is its computational efficiency and the minimal time required to perform Monte Carlo simulations and thus generate maximum likelihood regions.It is believed that CranSLIK has a role to play in both planning and operational mode and merits further development.In addition to the points mentioned above, the model would be enhanced by the development of the stochastic methods used to associate estimates of uncertainty with the forecasts.
presents a schematic illustration of the involved processes that are collectively referred to Published by Copernicus Publications on behalf of the European Geosciences Union.

Figure 2 .
Figure 2. Centre of mass trajectory forecasts (hours 1-36) for the Algeria point case using the current at different depths as the advective current.

Figure 3 .
Figure 3. Centre of mass trajectory forecasts (hours 1-36) for the Algeria contour case using the current at different depths as the advective current.

Figure 4 .
Figure 4. Distance between MEDSLIK and CranSLIK centre of mass forecasts for the Algeria test case (point and contour scenarios), using the current at different depths as the advective current.

Figure 5 .
Figure 5. MEDSLIK and CranSLIK forecasts of slick after 36 h for the Algeria point case, using the current at different depths as the advective current.

Figure 6 .
Figure 6.MEDSLIK and CranSLIK forecasts of slick after 36 h for the Algeria contour case, using the current at different depths as the advective current.

Figure 7 .
Figure 7. Percentage of oil captured by CranSLIK's forecast of the slick's convex hull for the Algeria case (point and contour scenarios), using the current at different depths as the advective current.

Figure 8 .
Figure 8. Centre of mass trajectory forecasts (hours 1-600) for the Lebanon case using the current at different depths as the advective current.

Figure 9 .Figure 10 .
Figure 9. Distance between MEDSLIK and CranSLIK centre of mass forecasts for the Lebanon test case, using the current at different depths as the advective current.

Figs. 5
Figs. 5 and 6 at different simulated times.The indication we had was that this was not the case because the slick size predicted by CranSLIK closely matched that forecast by MEDSLIK-II.

Figure 11 .
Figure11.MEDSLIK and CranSLIK "good" (≥ 99 % oil captured) slick forecasts for the Lebanon case, using the current at different depths as the advective current.Note that CranSLIK does not predict oil concentration.

Figure 12 .
Figure12.MEDSLIK and CranSLIK "bad" (minimum % oil captured) slick forecasts for the Lebanon case, using the current at different depths as the advective current.Note that CranSLIK does not predict oil concentration.