Despite the availability of both commercial and open-source software, an ideal tool for digital rock physics analysis for accurate automatic image analysis at ambient computational performance is difficult to pinpoint. More often, image segmentation is driven manually, where the performance remains limited to two phases. Discrepancies due to artefacts cause inaccuracies in image analysis. To overcome these problems, we have developed CobWeb 1.0, which is automated and explicitly tailored for accurate greyscale (multiphase) image segmentation using unsupervised and supervised machine learning techniques. In this study, we demonstrate image segmentation using unsupervised machine learning techniques. The simple and intuitive layout of the graphical user interface enables easy access to perform image enhancement and image segmentation, and further to obtain the accuracy of different segmented classes. The graphical user interface enables not only processing of a full 3-D digital rock dataset but also provides a quick and easy region-of-interest selection, where a representative elementary volume can be extracted and processed. The CobWeb software package covers image processing and machine learning libraries of MATLAB® used for image enhancement and image segmentation operations, which are compiled into series of Windows-executable binaries. Segmentation can be performed using unsupervised, supervised and ensemble classification tools. Additionally, based on the segmented phases, geometrical parameters such as pore size distribution, relative porosity trends and volume fraction can be calculated and visualized. The CobWeb software allows the export of data to various formats such as ParaView (.vtk), DSI Studio (.fib) for visualization and animation, and Microsoft® Excel and MATLAB® for numerical calculation and simulations. The capability of this new software is verified using high-resolution synchrotron tomography datasets, as well as lab-based (cone-beam) X-ray microtomography datasets. Regardless of the high spatial resolution (submicrometre), the synchrotron dataset contained edge enhancement artefacts which were eliminated using a novel dual filtering and dual segmentation procedure.
Market survey of the currently available commercial software
Currently, a vast number of available commercial and open-source software packages for pore-scale analysis and modelling exist (compiled in Fig. 1), but dedicated approaches to verify the accuracy of the segmented phases are lacking. To the best of our knowledge, the current practice among researchers is to alternate between different available software tools and to synthesize the different datasets using individually aligned workflows. Porosity and, in particular, permeability can vary dramatically with small changes in segmentation, as significant features on the pore scale get lost when thresholding greyscale tomography images to binary images, even if using the most advanced data acquiring techniques like synchrotron tomography (Leu et al., 2014). Our new CobWeb 1.0 visualization and image analysis toolkit addresses some of the challenges of selecting representative elementary volume (REV) for X-ray computed tomography (XCT) datasets reported earlier by several researchers (Zhang et al., 2000; Gitman et al., 2006; Razavi et al., 2007; Al-Raoush and Papadopoulos, 2010; Costanza-Robinson et al., 2011; Leu et al., 2014). The software is built on scientific studies which have been peer-reviewed and accepted in the scientific community (Chauhan et al., 2016a, b). The spinoff for these studies was not the lack of accuracy provided by manual segmentation schemes but the subjective assessment and non-comparability caused by the individual human assessor. Therefore, automated segmentation schemes offer speed, accuracy and possibility to intercompare results, enhancing traceability and reproducibility in the evaluation process. To our knowledge, none of the XCT software used in rock science community relies explicitly on machine learning to perform segmentation, which makes the software unique.
Despite many review articles and scientific publication highlighting potential of machine learning and deep learning (Iassonov et al., 2009; Cnudde and Boone, 2013; Schlüter et al., 2014), software libraries or toolboxes are seldom made available. Thus, with CobWeb, we started to fill this gap for the first time. Despite its limited volume-rendering capabilities, it is a useful tool, and the current version of the software can be applied in scientific and industrial studies. CobWeb provides an appropriate test platform where new segmentation and filtration schemes can be tested and used as a complementary tool to the simulation software, GeoDict and Volume Graphics. The simulation software (GeoDict and Volume Graphics) has benchmarked solvers for performing flow, diffusion, dispersion, advection-type simulation, but their accuracy relies heavily on the finely segmented datasets. This software is based on a machine learning approach with great potential for segmentation analysis, as introduced previously (Chauhan et al., 2016a, b). Further, this software tool package was developed on a MATLAB® workbench and can be used as a Windows stand-alone executable (.exe) file or as a MATLAB® plugin. The dataset for the gas hydrate (GH) sediment geomaterials was acquired using monochromatic synchrotron X-ray, unhampered by beam hardening; Sell et al. (2016) highlighted problems with edge enhancement artefact and recommended image morphological strategies to tackle this challenge. In this paper, we therefore also describe a strategy to eliminate ED artefacts using the same dataset but applying the new machine learning approach.
Image pre-processing is one of the essential and precautionary steps before
image segmentation (Iassonov et al., 2009; Schlüter et al., 2014). Image
enhancement filtering techniques help to reduce artefacts such as blur,
background intensity and contrast variation, whereas denoise filters such as
the median filter, non-local means filter and anisotropic diffusion filter can
assist in lowering the phase misclassification and improving the convergence
rate of automatic segmentation schemes. CobWeb 1.0 is equipped with image
enhancement and denoise filters, namely
Despite being at the instrument level, different measures can be taken to improve the resolution of the X-ray volumetric data; the contrast in the XCT images depends particularly on the composition and corresponding densities (optical depth) of the test sample. Therefore, it is somewhat difficult to enhance contrast at the experimental setup or at the X-ray system design control stage. Thus, the contrast needs to be enhanced or adjusted after the volumetric image has been generated. For this purpose, image sharpening can be used. Image sharpening is a sort of contrast enhancement. The contrast enhancements generally take place at the contours, where high and low greyscale pixel value intensities meet (Parker, 2010).
For intuition purposes, an anisotropic diffusion (AD) filter can be thought as a
(Gaussian) blur filter. AD blurs the image, where it carefully smooths the
textures in the image by preserving its edges (Kaestner et al., 2008; Porter
et al., 2010; Schlüter et al., 2014). To achieve the smoothing
along with edge preservation, the AD filter performs an iteration to solve
non-linear partial differential equations (PDEs) of diffusion:
Perona and Malik (1990) introduce a flux function
The non-local means (NLM) filter is based on the assumption that the image
contains an extensive amount of self-similarity (Buades et al., 2005;
Shreyamsha Kumar, 2013). Based on this assumption, Buades et al. (2005)
extended the linear neighbourhood SUSAN filter (Smith and Brady, 1997) with
non-local class. Thus, through the non-local class, the spatial search for
similar pixel values is not restricted to a constrained neighbourhood pixel
but the whole image is part of the search for similar pixel values. This is given by the following equation:
However, for a practical and computational reason, the search is performed
within a search window or neighbourhood patches, and
The similarity is fulfilled as the Euclidean distance between the local
neighbourhood patches exponentially decreases.
The
A digital image comprises pixels of colour or greyscale intensities. Image
segmentation is partitioning or classifying the pixel intensities into
disjoint regions that are homogenous with respect to some characteristics
(Bishop, 2006). There are continuous research efforts done in various
international groups to improve and develop image segmentation approaches
(Mjolsness and DeCoste, 2001). In particular, the most popular and relevant
image segmentation approaches for analysing X-ray tomographic rock images
are presented in the review studies done by Iassonov et al. (2009) and
Schlüter et al. (2014). We use machine learning techniques for image
segmentation and have implemented algorithms such as
The
The fuzzy
Unlike
Similar to unsupervised techniques, the objective of the supervised machine learning technique is to separate data. The advantage supervised technique offers compared to unsupervised technique is that it is effective in separating non-linear separable data (Haykin, 1995; Bishop, 2006). Datasets can be linearly separable if the points in the dataset can be partitioned into two classes using a threshold function (threshold should not be a piecewise discontinuous function). Loosely speaking, the threshold function fits a line to produce the partition. On the contrary, if we try to fit a threshold function to a substantially overlapped dataset, this usually leads to wrong partitioning (Bishop, 2006; Haykin, 1995). Therefore, a dataset which has values very close to each other is regarded as a linearly inseparable dataset (Bishop, 2006). In a supervised technique, the prediction is made by a model. The model is a mathematical function which fits a line or a plane between linearly or non-linearly separable data to classify them into different categories. The model's ability or intuition regarding where to place the line or plane between the datasets to clearly separate (classify) them is based on its (model) a priori knowledge of the dataset – this a priori knowledge is called the training dataset. Therefore, unlike the unsupervised technique, the supervised model needs to be trained on a subset of the dataset. The training dataset is the only “window” through which the model knows some pattern about the linear or non-linear separable dataset. How well the model has acquired the knowledge of the training dataset determines its success in prediction. If it has learned the training data accurately, it picks up noise along with the pattern and loses its generalization ability, thus failing when introduced to an (unknown) separable dataset. On the contrary, there could be failure in prediction caused due to inadequate training information provided to the model, or the selected model could be incapable of learning the information provided in the training dataset. Therefore, to manage a good tradeoff, cross-validation techniques are used to monitor the learning rates of the model (Haykin, 1995).
Support vector machine (SVM) (Haykin, 1995) and its modified version (LSSVM) are one such category of the supervised ML technique (Suykens and Vandewalle, 1999) and use the principles mentioned above. The plane separating the data is termed a hyperplane. The hyperplane has a boundary around it, which is called the margin, and the data points that lie closest to or on the margin are called the support vectors. The width of the margin governs the tradeoff, i.e. if the model is overfitted or underfitted to the training dataset, and can be verified through cross-validation techniques. If the width of the margin is two narrow (high learning rate), the model is overfitted (high variance) to the training dataset and will lose it generalization capability and may not separate the linear or non-linear separable (unknown) data accurately. If the width of the margin is too wide (very low learning rate), the model is underfitted (high bias) to the training dataset and will fail. An optimal learning model has just the appropriate width to maintain the generalization and also learn the patterns in the dataset.
If the training dataset is non-linear and inseparable in a 2-D coordinate
system, it is useful to project the dataset in a 3-D coordinate system; thus,
by doing so, the added dimension (3-D) helps to visualize the data and find a
place to fit a hyperplane to separate them (Cover, 1965). So, SVM and
LSSVM use the principle of cover theorem (Cover, 1965) to project the
data into a higher dimension to make them linearly separable and transform
them back to the original coordinate system (Suykens and Vandewalle, 1999).
Hence, what type of projection is to be performed by the SVM or LSSVM is done
by choosing the appropriate kernel function (van Gestel et al., 2004). This
gives them the capability to attain the knowledge of the data and also
preserve the generalization behaviour of the model or the classifier. In the
original or the 2-D coordinate system, the hyperplane is no longer a line
but a convex-shaped curve which has clearly separated the data and
suitable margins to the support vectors. Here, 3-D implies a
As the name implies, ensemble classifier is an approach where the decision of several simple models is considered to improve the prediction performance. The idea behind using ensemble methods emulates from a typical human approach of exploring several options before making a decision. The ensemble technique is faster compared to supervised techniques. Basically, the evaluation of the decisions predicted by the simple models can be either done sequentially (bragging or boosting) or in parallel (random forest). Our toolbox used the sequential approach with a variation of bragging and boosting for classification. These bragging and boosting evaluations used tree learners (Seiffert et al., 2008; Breiman, 1996), inherited from the MATLAB® libraries.
The main differences between bragging and boosting are as follows. Bragging generates a set of simple models: first, it trains these models with the random sample and evaluates the classification performance of each model using the test subset of data. In the second step, only those models whose classification performance was low are retrained. The final predictive performance rate of the bragging classifier is an average of individual model performance. This approach minimizes the variance in the prediction, meaning if several bragging classifiers are generated from the same sample of data, their prediction capability, when exposed to the unknown dataset, will not differ much. The main difference between boosting and bragging is that bragging retrains selected models (high misclassification rate) with the complete training dataset until their respective accuracy increases, whereas in boosting, the size of the data which have been misclassified increases in ratio to the data which have been accurately classified – and thereafter all the models are retrained sequentially. The predictive performance is calculated the same way as in bragging by averaging the predictive performance rate of the individual models. This approach of boosting minimizes the bias in the prediction.
It is necessary to monitor the performance of an ML model. This ensures that the trained model does not overfit or underfit with the training dataset. The main reason for overfitting and underfitting of the model with the training dataset is directly proportional to the complexity of the ML models. However, the consequence is that an overfitted trained ML model will capture noise along with the information pattern from the training dataset and will lose its ability to generalize, hence leading to inaccurate classification when exposed to the unknown dataset, as it has high variance toward the training dataset. On the opposite side, when the ML model is underfitted with the training dataset, it is unable to learn or capture the essence of the training dataset; this can happen either due to a choice of a simple type model (e.g. linear instead of quadratic) or very little data to build a reliable model. As a consequence, the ML fails to predict as it has low variance towards the training dataset (Dietterich, 1998). So, the performance of the ML model (low variance and low bias) is an indication of how accurately it can predict. The above explanation is valid for supervised ML techniques. For unsupervised clustering techniques where there is no model available to train, the quality of the classification is judged from the classified result. One such commonly used metric is entropy (Stehl, 2002; Meilǎ, 2003; Amigó et al., 2009). In CobWeb, the performance of the ML models and the quality of the classification can be evaluated using 10-fold cross validation, entropy and ROC. The explanation of these methods is briefly described in the subsection below. For detailed information, the readers are referred to Stehl (2002), Dietterich (1998), Bradley (1997), and references therein.
The idea for
The entropy of a class reflects how the members of the
ROC curves are one of the popular
methods to cross validate ML model performance (probability of models'
correct response
The accuracy is determined by calculating the area under the curve (AUC), and
the simplest way to do this was by using trapezoidal approximation.
Snapshots of the CobWeb GUI. XCT stack of Grosmont carbonate rock
is shown as an example of representative elementary volume analysis. The
top panel displays the XCT raw sample, the
The first version of CobWeb offers the possibility to read and to process
reconstructed XCT files in both .tiff and .raw formats. The graphical user
interface (GUI) is embedded with visual inspection tools to zoom in/out,
crop, colour and scale to assist in the visualization and
interpretation of 2-D and 3-D stack data. Noise filters such as non-local
means, anisotropic diffusion, median and contrast adjustments are
implemented to increase the signal-to-noise ratio. The user has a choice of
five different segmentation algorithms, namely
The main GUI window panel is divided into three main parts (Fig. 2): the tool
menu strip, the inspector panel and the visualization panel. The tool strip
contains menus to zoom in and out, pan, rotate, point selection, colour
bar, legend bar and measurement scale functionalities. The inspector panel
is divided into subpanels where the user can configure the initial process
settings such as segmentation schemes (supervised, unsupervised, ensemble
classifiers), filters (contrast, non-local means, anisotropic filter,
As a stand-alone module, the CobWeb GUI can be executed on different PC and HPC
clusters without any license issues. The framework of CobWeb 1.0 is
schematically illustrated in Fig. 3, and the direction for the arrow (left
to right) represents the series in which the various functions are executed.
The back-end architecture can be broadly classified into three different
categories, namely the
control module, analysis module and visualization module.
Initially, the main figure panel is generated, followed by the tool strip
dividing the main figure into different panels and subpanels as shown in
Fig. 2. After that, the control buttons
The next step is data processing, triggered by pressing the
The general workflow of the CobWeb software tool, where the arrow denotes the series in which different modules (represented in dark blue boxes) are compiled and executed. A separate file script is used to generate .dll binaries and executables.
The next step is the segmentation process; an unsupervised or supervised
algorithm is initialized based on the selection made by the user in the
pre-processing First, the visualization panel displays a single 2-D slice of the REV or 3-D image stack in a resizable pan window. The embedded Second, by pressing the In the third step, the user has to identify features, such as pores,
minerals, matrix and noise/specks, in the 2-D image using zoom-in and zoom-out tools available in the toolbar. The In the fourth step, the data are gathered and exported for training. This is done by pressing the export button placed on the In the fifth step, the model is trained. This is done by using the
A progress bar offers to monitor the state of the process. Further, the
Once the processing is finished, the segmented data can be visualized in the
2-D format using geometrical parameters performance export stack
The methods used to calculate geometrical parameters and validation schemes
are benchmarked in Chauhan et al. (2016a, b).
Therefore, the selection of desired options initializes respective
subroutines (
In the following sections, the CobWeb toolbox is demonstrated by means of three showcase examples, which are briefly introduced in terms of underlying imaging settings, research question and challenges for image processing.
The in situ synchrotron-based tomography experiment and post-processing of
synchrotron data conducted to resolve the microstructure of GH-bearing sediments are given in detail by Chaouachi et al. (2015), Falenty et al. (2015) and Sell et al. (2016). In brief, the
tomographic scans were acquired with a monochromatic X-ray beam energy of
21.9 KeV at the Swiss Light Source (SLS) synchrotron facility
(Paul-Scherrer-Institute, Villigen, Switzerland) using the TOMCAT beamline
(Tomographic Microscope and Coherent Radiology Experiment; Stampanoni et al.
2006). Each tomogram was reconstructed from sinograms by using the gridded
Fourier transformation algorithm (Marone und Stampanoni, 2012). Later, a 3-D
stack of
The ED artefact is the high and low image contrast seen between the edges of the void, quartz and GH phases in the GH tomograms. It certainly aids in clear visual distinction of these phases but becomes a nuisance during the segmentation process. Several approaches to reduce ED artefact in GH tomograms and its effect on segmentation and numerical simulation have been discussed in Sell et al. (2016). Based on our experience, a combination of the NLM filter and the AD filter, implemented using Avizo (Thermo Scientific), works best in removing ED artefacts for our GH data. In short, AD was used for edge preservation and NLM for denoising. In this study, the NLM filter was set to a search window of 21, local neighbourhood of 6 and a similarity value of 0.71. The NLM filter was implemented in 3-D mode to attain desired spatial and temporal accuracy and was processed on a CPU device.
The edge enhancement effect was significant in all the reconstructed slices
of the GH dataset. The ED effect was noticeable around the quartz grains,
with high and low pixel intensities adjacent to each other. The
high-intensity pixel values (EDH) were very close to GH pixel values, while
the low-intensity pixel values (EDL) showed a variance between noise and
void phase pixel values. Therefore, immediate segmentation performed on the
pre-filtered GH datasets using CobWeb 1.0 resulted in misclassification.
Further parameterizing and tuning the unsupervised ( First, through visual inspection of the segmented image (step 2), different phases and their corresponding labels were identified, shown in Table 1. Thereafter, pixel indices of these phases were extracted from the
segmented image based on their labels. Further, these indices were used as a reference mask to retrieve pixel
values of the phases from the 16-bit raw REV stacks.
The obtained pixel values represent noise, void (liquid), EDL, quartz, EDH,
and GH phases in the raw images. Then, the histogram distribution of the
pixel values in each phase was plotted. The skewness of the histograms was
investigated where the max, min, mean and standard deviation for each of
the histograms were calculated. Thereafter, the max and min of the histograms
were compared, and the indexing limits were adjusted, as long as there
was no overlap found amidst the histogram boundaries.
Class labels of different phases.
The digital rock images of the Grosmont carbonate rock were obtained from
the FTP server GitHub (
The Berea sandstone digital rock images were part of a benchmark project
published by Andrä et al. (2013a, b) and obtained from the GitHub FTP
server. The Berea sandstone sample plug was acquired from Berea
Sandstone™ petroleum cores (Ohio, USA). The porosity value of
20 % (
The REV selection basically was a
combination of visual inspection and consecutively segmenting and plotting
trends in relative porosity, pore size distribution and volume fraction.
This was done by loading the complete stack in the CobWeb software; during
the loading process, a 2-D movie of the tomogram was displayed in the display
window and saved in the root folder. Carefully monitoring the movie gives an
objective evaluation of the heterogeneity of the respective XCT sample. We
observed several subsample volumes at various locations (
The most suitable ROIs and corresponding REV dimensions of Berea
sandstone and Grosmont carbonate GH-bearing sediment are shown in
panels
In the case of Berea sandstone, four different ROIs were investigated,
whereas with Grosmont carbonate rock seven different ROIs were needed to identify
the best REVs. Cubical stack sizes between 300
2-D slices of REV 1 are represented above. The raw image is first
filtered with anisotropic diffusion filtered and later on with non-local
means. Thereafter, the different phases were segregated using a
segmentation and indexing approach, and the raw image(s) were rescaled such
that there was no overlap or mixed phases within the raw image; an
example is shown as the rescaled 2-D ROI plot. Thereafter,
In the case of Berea sandstone, the 3-D reconstructed raw images (1024
Hence, through the experiments conducted in Sell et al. (2016), for us,
dual filtration was one of the best approaches that we could include in the
pre-processing step. This dual filtering did not remove the ED completely but
rather normalized it to a reasonable range. Through the approach of rescaling
and (hard)
In general, our observation is that, depending on the resolution of the dataset, the fixed parameters of NLM and other filters should do a fairly good job. In the event that there still exists noise and artefacts, we recommended that the supervised techniques be used. The supervised techniques offer the possibility to select the residual noise or artefact pixel values before or after the filtration (pre-processing) through proper feature vector selection, and further train the appropriate model and performing classification. Through this, the existing noise and artefact can be isolated and segmented as separate labels. Another alternative option could be to pre-process the data with desired filter data and import the data into CobWeb for segmentation and analysis.
Another issue has to be explained in more detail in the implementation of
the image segmentation. CobWeb 1.0 uses a slice-by-slice 2-D approach. It was
observed that the ML techniques tend to underestimate porosity values
compared to manually segmented analysis at a REV scale size
Panels
The major problem for all multiphase segmentation is that phases having
intermediate greyscale values get sandwiched between two different phases.
These intermediate phases sometimes represent some of the vital material
properties such as connectivity. Therefore, it is vital to emphasize how ML
can assist in issues related to multiphase segmentation. In a practical
sense, machine learning tries to separate greyscale values into disjoint
sets. The creation of these disjoint sets is commonly done in two ways:
The first way is by binning the greyscale values to the nearest representative values which are iteratively updated using an optimization function. This optimization function can be a simple regression or distance function (Jain et al., 1999), commonly used in unsupervised techniques. The second way is by regularizing pre-trained models which store certain pattern information of the datasets such as topology features, contour intensities, pixel value, etc. (Hopfield, 1982; Haykin, 1995; Suykens and Vandewalle, 1999) or by using a voting system in a bootstrap ensemble of linear models (Breiman, 1996).
Panel
So, in this process, the intermediate greyscale values corresponding to
low volume fraction which shows multi-modal distributions are merged with
greyscale values of high volume fraction to create disjoint boundaries.
Through this, the intermediate phase information is misclassified and hence
destroyed. One way to overcome this problem is by using supervised
techniques such as LSSVM or ensemble classifiers. When constructing a
training dataset (feature vector selection), careful selection of
intermediate phases as a sufficiently large sample size compared to the
predominant phases will preserve the intermediate phases. In addition, the
likelihood that the trained model will identify them and cluster them
separately is higher (Chauhan et al., 2016a). In this study, in particular,
we made tests using supervised techniques (LSSVM, ensemble classifiers) and an unsupervised technique (FCM) but the results were not superior compared to
Note that the purpose of this study was to demonstrate the capabilities of CobWeb and removal of edge enhancement segmentation through dual filtration and dual segmentation schemes. Detailed verification with LSSVM and ensemble classifiers therefore falls outside the scope of this work, and readers are referred to the previous work from Chauhan et al. (2016a) based on which CobWeb is developed. That work benchmarks different ML algorithms and quantifies their respective accuracies and performance.
The PSDs of the respective REVs were calculated
using the CobWeb PSD module. The PSD module is based on an image processing
morphological scheme (watershed transformation) suggested by Rabbani et al. (2014). As stated in Rabbani et al. (2014), the aim is to
break down the monolithic void structure of rock into specific pores and
throats connecting each other. Rabbani et al. (2014) used unsegmented
images and performed image filtration and thereafter segmented using
watershed transformation. In our case, the tomograms were already
pre-processed and segmented using ML techniques. These images are converted
to binary images and thereafter subjected to the image processing distance
function (Rosenfeld, 1969) and the watershed algorithm (Myers et al., 2007)
to extract pores and throats. City-block distance function is used to locate
the void pixels (pores), and watershed with eight connected neighbourhoods was
used to obtain the interconnectivity. Since the watershed algorithm is very
sensitive to noise, despite the pre-processing and ML segmentation, the median
filter was applied before subjecting to the watershed segmentation.
Thereafter, the mean relative porosity value obtained for Berea sandstone was
Similarly, the porosity and PSD of the four GH REVs were analysed using
CobWeb 1.0 and are shown in Fig. 7. The low
Segmented REVs of a gas hydrate sample displayed as surface, and volume rendered and analysed using CobWeb 1.0 and exported to .vtk format using the CobWeb 1.0 ParaView plugin. The quartz grain phase is represented in green colour, gas hydrate is in red, and in blue is the void space.
This paper introduces with CobWeb 1.0, a new visualization and image analysis
toolkit dedicated to representative elementary volume analysis of digital
rocks. CobWeb 1.0 is developed on the MATLAB®
framework and can be used as MATLAB® plugin or
as a stand-alone executable. It offers robust image segmentation schemes
based on ML techniques (unsupervised and supervised),
where the accuracy of the segmentation schemes can be determined and results
can be compared. Dedicated image processing filters such as the non-local
means, anisotropic diffusion, averaging and the contrast enhancement
functions help to reduce artefacts and increase the signal-to-noise ratio.
The petrophysical and geometrical properties such as porosity, pore size
distribution and volume fractions can be computed quickly on a single
representative 2-D slice or on a complete 3-D stack. This had been validated
using synchrotron datasets of the Berea sandstone (at a spatial resolution
of 0.74
CobWeb 1.0 is still somewhat limited regarding its volume-rendering
capabilities, which will be one of the features to improve in the next
version. The volume-rendering algorithms implemented in CobWeb 1.0 so far do
not reach the capabilities offered by ParaView or DSI Studio, which rely
on the OpenGL marching cube scheme. At present, the densely nested loop
structure appears to be the best choice for systematic processing. As an
outlook, vectorization and indexing approaches (
With regards to the code availability, the MATLAB® code for removal of edge enhancement artefacts from the GH-bearing sediment is attached in the Supplement. The CobWeb executable as well as the user manual and the GH-bearing sediment XCT datasets are available to the public on the Zenodo repository
The CobWeb executable requires a MATLAB® runtime compiler R2017b (9.3), which can be downloaded and installed from
The supplement related to this article is available online at:
SC conceptualized, investigated and performed the study. Further, SC implemented the machine learning workflow and graphical user interface design. Additionally, SC performed the formal analysis and developed a software code for the removal of the edge enhancement artefacts using the dual clustering approach. Further contributions of SC included data curation of the CobWeb software; writing the software manual and figures; and writing, reviewing and editing the manuscript.
KS conceptualized, investigated and performed a case study on gas hydrates. Further, she performed a study on the removal of edge enhancement artefacts and phase segmentation of methane hydrate XCTs. Also, she did a formal analysis by implementing the dual filtration approach to reduce the edge enhancement artefacts. KS participated in discussions to validate phase segmentation using the dual segmentation approach and was involved in writing, reviewing and editing the manuscript.
WR was involved in the project administration of the CobWeb activities and provided resources with respect to GUI and inputs on improving GUI functionalities.
TW was involved in funding acquisition and sponsoring the CobWeb project, under the framework of the SUGAR (Submarine Gashydrat Ressourcen) III project by the Germany Federal Ministry of Education and Research (grant no. 03SX38IH). He was involved in project administration and provided feedback on GUI functionalities.
IS was involved in the concept and funding acquisition for the CobWeb project, under the framework of the SUGAR (Submarine Gashydrat Ressourcen) III project by the Germany Federal Ministry of Education and Research (grant no. 03SX38IH). He also provided supervision, project administration, resources and periodic review to improve GUI functionalities.
The authors declare that they have no conflict of interest.
We thank Heiko Andrä and his team at Fraunhofer ITWM, Kaiserslautern, Germany, for providing us with the synchrotron tomography benchmark dataset of the Berea sandstone. We also thank Michael Kersten, Frieder Enzmann and his group at the Institute for Geoscience, Johannes-Gutenberg Universität Mainz, for providing high-resolution gas hydrate synchrotron data. The acquisition of the GH synchrotron data was funded by the German Science Foundation (DFG grants Ke 508/20 and Ku 920/18). This study was funded within the framework of the SUGAR (Submarine Gashydrat Ressourcen) III project by the Germany Federal Ministry of Education and Research (BMBF grant 03SX38IH). The sole responsibility of the paper lies with the authors.
We thank Kirill Gerke, two anonymous reviewers and the editor, Thomas Poulet, for their valuable comments and suggestions which significantly improved the manuscript.
This research has been supported by the BMBF (grant no. 03SX38IH).
This paper was edited by Thomas Poulet and reviewed by Kirill Gerke and two anonymous referees.