PyXRD v 0 . 6 . 7 : a free and open source program to quantify disordered phyllosilicates using multispecimen X-ray diffraction profile fitting

This paper presents a free and open-source program called PyXRD (short for Python Xray diffraction) to improve the quantification of complex, poly-phasic mixed-layer phyllosilicate assemblages. The validity of the program was checked by comparing its output with Sybilla v2.2.2, which shares the same mathematical formalism. The novelty of this program is the ab initio incorporation of the multispecimen method, making it possible to share phases and (a selection of) their parameters across multiple specimens. PyXRD thus allows modelling multiple specimens side by side, and this approach speeds up the manual refinement process significantly. To check the hypothesis that this multispecimen set-up – as it effectively reduces the number of parameters and increases the number of observations – can also improve automatic parameter refinements, we calculated X-ray diffraction patterns for four theoretical mineral assemblages. These patterns were then used as input for one refinement employing the multispecimen set-up and one employing the single-pattern set-ups. For all of the assemblages, PyXRD was able to reproduce or approximate the input parameters with the multispecimen approach. Diverging solutions only occurred in single-pattern set-ups which do not contain enough information to discern all minerals present (e.g. patterns of heated samples). Assuming a correct qualitative interpretation was made and a single pattern exists in which all phases are sufficiently discernible, the obtained results indicate a good quantification can often be obtained with just that pattern. However, these results from theoretical experiments cannot automatically be extrapolated to all real-life experiments. In any case, PyXRD has proven to be useful when X-ray diffraction patterns are modelled for complex mineral assemblages containing mixed-layer phyllosilicates with a multispecimen approach.


Introduction
Clay minerals (i.e.phyllosilicates) are among the most difficult minerals to study in detail due to their inherent chemical and structural variability ( Środoń, 2006;Velde and Meunier, 2008;Hubert et al., 2012).Nonetheless, these minerals are one of the most abundant constituents of the Earth's upper crust, and have an important influence on various physical (e.g.plasticity, shear strength, porosity) and chemical (e.g.buffering and exchange capacities, pH, electrical conductivity) properties (Agbenin and Tiessen, 1995;Vernik and Liu, 1997;Righi et al., 1999;Wen and Aydin, 2003;Lado and Ben-Hur, 2004;Caner et al., 2010).Phyllosilicates are also very reactive phases responding quickly to changes in their environment (Pai et al., 2004;Meunier, 2007;Velde and Meunier, 2008;Cornelis et al., 2014).
Therefore, quantitative information on the mineralogical composition of clay-bearing samples is an important step in characterizing and understanding them.Different techniques can be used to quantify clay minerals, but those using X-ray diffraction are the most abundant and have proven to be the most reliable (Plançon, 1981;Reynolds Jr., 1985;Drits and Tchoubar, 1990;Righi et al., 1999;Sakharov et al., 1999a;Środoń, 2006;Hubert et al., 2009Hubert et al., , 2012;;Ufer et al., 2012a, b;Viennet et al., 2015).Programs calculating X-ray diffraction patterns usually provide the highest level of detail because the input for such models can be considered an Published by Copernicus Publications on behalf of the European Geosciences Union.approximation of the real structure of the minerals (e.g.layer structures, composition, stacking parameters, interlayer composition, orientation).As such, this approach yields not only quantitative data but also structural and compositional information.However, this also means a large number of variables are involved, some of which are very difficult to predict or estimate in advance.In combination with the complex, polyphasic nature of many natural samples, it is a challenge to create software that allows for the quantification of clay minerals.
Two complementary methods exist to analyse clay minerals using X-ray diffraction.One uses powder samples, for which the orientation of crystallites is considered to be approximately random, and the other uses oriented samples, in which the orientation of crystallites occurs mainly along a plane of preferred orientation.Originally, powder X-ray diffraction was and still is used to determine crystal structures for unknown phases (not just phyllosilicates), which then developed into quantitative analysis.However, for disordered structures like mixed-layered clay minerals, powder patterns are often difficult to interpret.In such cases oriented patterns can be used to focus on the stacking (dis)order along the c* axis.Since the 1970s, several computer programs have been developed to calculate X-ray diffraction patterns for (disordered) clay minerals (Kakinoki and Komura, 1965;Reynolds, 1967;Ergun, 1970;Sakharov and Drits, 1973;Drits and Sakharov, 1976;Plançon, 1981).Examples of commonly used programs are the NEWMOD © family (Reynolds Jr., 1985;Pevear and Schuette, 1993;Reynolds Jr. and Reynolds III, 1996;Yuan and Bish, 2010), MLM2C/3C and derivatives (Plançon and Drits, 2000), Sybilla (Aplin et al., 2006;D. McCarty, personal communication, 2015), DIF-FaX (Treacy et al., 1991), and BGMN (Ufer et al., 2012a, b).Some of these programs (e.g.DIFFaX, BGMN, Wildfire, Sybilla 3-D) are able to calculate X-ray diffraction patterns for random powder diffraction patterns, while others (NEWMOD © , MLM2C, MLM3C, Sybilla) focus only on calculating one-dimensional (00l) patterns.
Another aspect to consider is the ability of these programs to automatically refine parameters.For instance, the last version of NEWMOD © uses a simple linear least-squares algorithm, Sybilla makes use of a genetic algorithm, and BGMN has a custom least-squares algorithm.In essence, all of these algorithms try to find a solution by minimizing a target function, usually a measure for the difference between the calculated and observed data.This difference is usually defined as the sum of the squares of the errors or as the pattern's Rp factor (Toby, 2006).A linear or ordinary least-squares algorithm works well when there is a well-defined global minimum and the target function is relatively smooth.However, for more complex cases this is often not the case, and as a result an ordinary least-squares might not converge at all.Algorithms using a more stochastic approach, like genetic algorithms, can partly overcome problems related to target function smoothness or poorly defined minima (also see Sect.2.4).Nonetheless, any algorithm will require some guidance, e.g., by not releasing all parameters for automatic refinement at once, by adjusting some parameters manually, by setting upper and lower limits or by choosing starting values close enough to the actual solution.The reason is that models describing X-ray diffraction by disordered layered minerals can not always be constrained adequately, and a successful quantification is still very dependent on the skill of the individual modeller.As a result, most published quantifications of complex mixed-layer assemblages employ a time-consuming trial-and-error approach at some point in the modelling process.
Several authors used a multi-specimen approach to further constrain their models (Drits, 1997;Sakharov et al., 1999a, b;Hubert et al., 2012, and references therein).This approach involves recording multiple "specimens" or patterns (e.g.airdried, glycolated, heat treatments) of the same sample and creating a structural model that can explain the observed features for all patterns.The reason for doing this is that swelling layers (like smectite and vermiculite layers) will expand or contract in response to these treatments.The level of expansion or contraction can be related to layer charges, and helps in discerning the different swelling phases present and understanding their stacking (dis)order (Ferrage et al., 2005a(Ferrage et al., , b, 2007;;Michot et al., 2005;Dazas et al., 2015).In short, this approach allows one to determine the structure and type of (mixed-layer) clay minerals present in the sample with higher certainty.However, today not a single program allows for a side-by-side calculation of these patterns.Because of this, modellers are still forced to refine their model parameters on one specimen and then check if the solution also explains the other observations.As long as a manual trial-and-error refinement process is used, this does not pose too many practical problems aside from the time needed.In theory, however, a program able to integrate all the observations and calculate patterns for them could lead to better automatic parameter refinements, a hypothesis tested in this paper using theoretical assemblages.
The program presented in this paper, called PyXRD (short for Python X-ray Diffraction), was designed with this multispecimen approach in mind.It (selectively) shares phase parameters across specimens and keeps phase quantities identical in each specimen, thus reducing the number of parameters while at the same time increasing the number of observations.Other design goals for PyXRD were (i) to have an easy-to-use interface, (ii) to be an open program allowing as many aspects of the input to be changed as possible, (iii) to provide a means for automatic parameter refinement, and (iv) to provide an open-source program for others, allowing them to use the software freely and make improvements where they see fit.
This paper illustrates the general structure of this program and presents the results from a comparison between PyXRD and Sybilla v2.2.2 and between automatic parameter refinements for several theoretical mineral assemblages, with and without the use of the multi-specimen approach.The software manual contains more detailed information about the numerical solutions used for calculating the X-ray diffraction patterns and a guided example on how to create projects using the graphical user interface (GUI).

Model implementation and licence
PyXRD is written in Python 2.7 and uses a number of opensource third-party modules.The graphical user interface (GUI) utilizes PyGTK as widget toolkit and has an internal model-view-controller framework.To improve calculation speed, PyXRD makes use of the NumPy and SciPy libraries.NumPy provides multi-dimensional array objects and many related routines for manipulating them, while SciPy provides more complex mathematical and scientific algorithms built on top of NumPy (Jones et al., 2001;van der Walt et al., 2011).The Matplotlib library is used for plotting patterns and data (Hunter, 2007).Finally, the Distributed Evolutionary Algorithms for Python (DEAP) library is used to harness to power of evolutionary algorithms to automatically refine parameters (Fortin et al., 2012).
PyXRD is released under a BSD (Berkeley Software Distribution) licence, except for the model-view-controller (mvc) module, which, as it is a derived work from the gtkmvc project, is licensed as GNU Lesser General Public License (LGPL) v2.

Program data structure
PyXRD is implemented according to a mvc paradigm separating data and calculations from GUI-related aspects.In the following section, an overview is given of the most important objects found in the data layer and their associations.More details can also be found in the manual and the source code documentation.

Project object
The user interface of PyXRD can create (or load) a single Project object.It is a container object grouping lists of AtomType, Phase, Specimen, and Mixture objects together.These are the four top-level objects, which are used to calculate X-ray diffraction patterns.Their associations are shown schematically in Fig. 1.The purpose of each of them will be explained in more detail below.

Atom type object
The AtomType object is the most basic building block.This object bundles all the physical constants (e.g.charge, atomic weight, scattering factors) for a single ion (e.g.Fe 2+ , Fe 3+ ) or for a molecule (e.g.H 2 O and ethylene glycol) small enough to be considered having a spherical electron cloud.When a new Project is created, a default list of these Atom-Type objects is loaded, using the atomic scattering factors as published by Waasmaier and Kirfel (1995).

Phase and component objects
Phase objects contain all the information needed to calculate a one-dimensional X-ray diffraction pattern of a (mixedlayer) mineral.A Phase combines (i) a Probability object, (ii) an object describing the coherent scattering domain size (CSDS), and (iii) one or more Component objects, which contain information about the structure of the different types of layers in the phase.
The Probability object describes how these layers are stacked by means of Markovian statistics and the Reichweite concept (Drits and Tchoubar, 1990).Currently PyXRD has implemented probability models for R values ranging from 0 to 3. For each combination of Reichweite and number of components there are a number of independent parameters required to calculated the remaining parameters, which describe the stacking order or disorder.The values of these independent parameters can be based on another phase with the same combination of Reichweite and number of components.For example, this means it is possible to share the illite content in an illite-smectite mixed layer across its AD and EG phase, but have different weight fractions (or junction probabilities) for the different types of smectite in those phases.For a complete explanation on how these calculations work and which parameters were chosen to be independent we refer to the manual.
The CSDS object describes what type of coherent scattering domain size distribution should be used and contains the necessary parameter values to describe this distribution (e.g.average CSDS).Two types of CSDS distributions are currently implemented: a generic log-normal distribution and a log-normal distribution in which the distribution constants published in Drits et al. (1997) are employed and the average CSDS is the only remaining unknown variable.Each phase also has a σ * factor, which makes it possible to correct for incomplete preferred orientation (Reynolds Jr., 1986;Dohrmann et al., 2009).
The Component object describes the size, structure, composition, and (variation in) basal spacing of a single layer type in that phase.A Component contains two lists that combine an AtomType from the project with its (projected) coordinate along the c* axis (also known as the z coordinate) and the number of projected ions of that type at that coordinate.The first list involves atoms in the silicate lattice, while the other list describes the (variable) interlayer space.With this approach, the silicate structure can be shared between different phases (e.g.AD and EG states), while the interlayer contents may still be different.

Specimen objects
Specimen objects provide all the information regarding the experimental data (the actual measurements, sample size, etc.) and the Goniometer set-up (radius, slit sizes, etc.).They do not hold a direct reference to phases, but are linked with them through Mixture objects.

Mixture objects
Mixture objects are the starting point for the actual calculations as they link phases and specimens together.In the user interface, a table can be created by adding just as many rows as there are phases and just as many columns as there are Specimens.In the column headers, there are slots where the user can select the Specimen.Similarly, the user can select the corresponding Phase in each cell of the table.This enables the user to select different states of smectite for an AD (air dry) and an EG (ethylene glycol) specimen (see Fig. 2 for a screenshot of the GUI), while keeping unaffected phases (e.g.kaolinites, micas, and chlorites) unchanged.
Once a Mixture is created in this way, a number of parameters are available for automatic refinement (e.g.weight fractions from the Probability object, the average CSDS, etc.).In a refinement dialog, the user can select which parameters she/he would like to improve and the minimum and maximum values between which the ideal value should lie.A number of different refinement methods are also availablesome of them more complex or specialized than others.Yet, as a complete description of all methods is beyond the scope of this article, only the algorithm used for the refinements will be explained in detail below.

Numerical calculations
The X-ray diffraction patterns are calculated using the matrix formalism, for which a very good summary can be found in Drits and Tchoubar (1990).Later developments incorporated can be found in Drits et al. (1997) and Plançon (2001).Since the complete mathematical deduction followed for PyXRD is rather long, in itself does not contain new developments, and is not the aim of this paper it is not included here.However, an overview of the mathematical deductions and calculations, as they are implemented in the calculations module, can be found in the online manual (http://users.ugent.be/~madumon/pyxrd/Manual.pdf)or in the manual included in this article's Supplement.
To improve calculation speed, programs can make use of multi-threading, spreading the load from the different threads evenly over the different cores in a multi-core CPU.However, multi-threading is not very effective in Python because of the global interpreter lock (GIL).This lock can only be obtained by a single active thread, while the others have to wait for it to be released again.So instead of multi-threading, PyXRD uses multi-processing, which creates a new python interpreter for each process, circumventing the GIL problem.The downside is that processes, unlike threads, do not share memory.Therefore, each process needs to be given all the data required to run the calculation.This is achieved by isolating the calculation functions from objects and by extracting the required data from the objects described in the previous section.As a result, the data exchanged between processes are reduced to a minimum.
This approach also makes it possible to run PyXRD refinements effectively on high-performance computing (HPC) clusters.The experiments presented in this paper were run on the HPC clusters of the Stevin Supercomputer Infrastructure at Ghent University.The main reason to run these experiments on an HPC cluster was the large number of replicates (50) involved in this work and the practical aspect of not having to dedicate a separate PC.This does not mean refinements take too long on a regular PC; e.g. a refinement with a dozen parameters finishes in less then 15 min on a 64-bit quad-core 3.10 GHz Intel ® Core i5-2400 system.The set-up on an HPC cluster is also more cumbersome as it requires the user to get PyXRD to work on the cluster and submit a W 1 = 0.9 P 22 = 0 1.9 2.3 P 2112 = 0.5 Illite-smectite 3 2 AD 15 W 1 = 0.9 P 22 = 0 0.5 1.0 P 2112 = 0.0 working job script for her/his refinement from a command line.Running a refinement on a local PC is much easier as the refinement algorithm (see below) and its parameters can be selected and run from the GUI itself.

Refinement algorithm
PyXRD supports several refinement algorithms, but for more complex problems involving several parameters, the genetic algorithms or evolutionary strategies are found to be most reliable.PyXRD implements several evolutionary strategies, among which are a covariance matrix adaptation evolutionary strategy (Hansen and Ostermeier, 2001) and a (multiple) particle swarm optimization (Blackwell et al., 2008).While the particle swarm optimization is effective at searching the parameter space for minima, being able to escape local minima easily, it can take a lot of function calls for it to converge.On the other hand, the covariance matrix adaptation evolutionary strategy is much more effective for local searches, but does get stuck in local minima more easily.Therefore, PyXRD also implements a particle swarm covariance matrix adaptation evolutionary strategy algorithm, which extends the covariance matrix adaptation evolutionary strategy with collaborative concepts from a particle swarm optimization (Muller et al., 2009), making it the more robust choice.This particle swarm covariance matrix adaptation evolution-ary strategy was also used for the experiments presented below.

Results
In the following sections, PyXRD's output is compared with Sybilla's output.In the first section, single phases are tested to check the implementation of the model.In the second section a number of assemblages are tested to check if the obtained weight fractions are correct.In the last section a comparison is made between single-vs.multi-specimen refinements.

Comparison between Sybilla and PyXRD results: calculated 00l reflections for single discrete and mixed-layer phyllosilicates
In total, 13 phases were tested.An overview of these phases with their most important structural parameters are given in Table 1.The original Sybilla projects, the produced patterns, and the PyXRD projects used can be found in this paper's Supplement.All patterns were calculated using a fixed σ * value of 12, a sample length of 1.25 cm, a goniometer radius of 17.3 cm, a divergence slit of 0.5   cell dimension in the z direction, the octahedral iron content (for illite, chlorite, and smectite components), the interlayer water, ethylene glycol and cation contents (for smectite and illite components), and the average coherent scattering domain size.The probability parameters were entered as such that identical P and W matrices were obtained.For most of the phases this meant identical parameters could be entered.
Only for the R2 illite-smectite with two components, two additional parameters were entered in comparison with Sybilla, which has a more restricted probability model for this combination of Reichweite and components.These parameters are the junction probabilities P 21 (fixed at 1.0 in Sybilla) and P 221 (fixed at 0.0 in Sybilla).A complete deduction on how the entered probabilities and weight fractions are used to calculate the unknown weight and probability fractions is present in the PyXRD manual.Sybilla uses scattering factors for the atoms in the silicate lattice assuming 50 % ionization, with the exception of Mg, which is fully ionized (D.Mc-Carty, personal communication, 2015).The scattering factors used in PyXRD for this study have been set to match this.The kaolinite, illite, talc, and chlorite phases are composed of a single component.As such, these are testing the basic aspects of the model such as the orientation factor σ *, the calculation of the coherent scattering domain size, and the calculation of the atomic scattering and structure factors.To test whether PyXRD can handle different sample states correctly, an R0 two-component smectite in air-dried and glycolated state is modelled as well.To further test the implementation of the matrix algorithm for mixed-layer phases, and the related probability models, a number of illite-smectite phases were used.In total seven phases were tested, four of which are two-component illite-smectite phases with Reichweite values varying from 0 to 3 and three of which are three-component illite-smectite phases with Reichweite values varying from 0 to 2. The different smectite components have different hydration states, i.e. the first component always has two planes of water (AD state) or two planes of ethylene glycol molecules (EG state) in its interlayer space while the second component has only a single layer of water or ethylene glycol molecules.For these illite-smectite phases two variants were calculated: one with a low CSDS not at maximum possible degree of ordering (MPDO) and one with a higher CSDS at MPDO.Table 1 contains the Rp factor obtained for these test cases.A few of these patterns are presented in Figs. 3 and 4. From them and from the Rp and Rwp factors, it is clear PyXRD can produce patterns almost identical to those produced by Sybilla.The small deviations can probably be explained by different physical constants (e.g.atomic scattering factors), although it is impossible to know exactly.

Comparison between Sybilla and PyXRD results: calculated 00l reflections for mixtures of discrete and mixed-layer phyllosilicates
To further validate the model, five patterns were produced in PyXRD for mixtures of increasing complexity.These patterns were imported in Sybilla and modelled using the same phases and the same parameters.This should allow one to validate whether the weight fractions in PyXRD can also be obtained by Sybilla.The entered and obtained weight fractions and the corresponding Rp and Rwp factors are presented in Table 2. Figure 5 shows the comparison between the calculated patterns for mixture 5 from Sybilla and PyXRD.The used phases are largely identical to the phases used in the previous validation, except for the addition of a few phases for which details are also given in Table 2.The input files for PyXRD and Sybilla are included in this paper's Supplement.
As can be observed, the weight fractions in PyXRD and Sybilla are reasonably close to each other, with all of the deviations being smaller then 0.5 wt%.Such differences are not to be considered significant for a real sample, but when using "ideal" theoretical phases they do indicate there are differences between Sybilla and PyXRD.In order to check if some phases contribute more or less then others to the whole-pattern misfit, Rp and Rwp factors are calculated for angular ranges corresponding to first-order reflections for the phases in mixtures 1, 2, and 3 (Table 3).This was not done for mixtures 4 and 5 due to the presence of too many overlapping peaks, making statements about the contribution of separate phases to the total misfit difficult.The Rp and Rwp factors obtained in this way are all of the same order of magnitude.Therefore, each phase must be contributing more or less equally towards the whole-pattern misfit.The remaining differences between the patterns can be explained by small differences in physical constants, while the remaining differences in wt% can be explained by differences in unit cell dimensions.

Assemblage set-up
In total, four theoretical mineral assemblages were tested (Table 4): Assemblage 1 is a very simple test because of the absence of overlapping and similar phases.Its main purpose was to see whether the program and, more importantly, the selected refinement strategy, can produce a reliable result.The assemblage consists of equal amounts of a discrete kaolinite, a discrete illite, and an R0 illite-smectite with only 10 % illite layers.
Assemblage 2 is more complex, comprising six different phases: a discrete illite, a discrete kaolinite, an R0 illitesmectite with 65 % illite layers, an R0 kaolinite-smectite with 80 % kaolinite layers, a smectite, and a poorly crystalline chlorite.The idea behind this assemblage was to mimic phases encountered in some soils.The poorly crystalline chlorite component can be interpreted as a small amount of hydroxy-interlayered smectite (or vermiculite) and is not to be considered a primary trioctahedral chlorite, while the kaolinite-smectite represents a neoformed, defective kaolinite or smectite.This kind of phase has been reported a number of times, usually in finer clay fractions (≤ 0.2 µm) of certain soils (Hubert et al., 2009(Hubert et al., , 2012;;Ryan and Huertas, 2009;Dumon et al., 2014).The different phases are also present in different quantities, with the illite-bearing phases each contributing 25.0 wt%, the smectite taking up 20.0 wt%, the kaolinite phases each accounting for 12.5 wt% and the chlorite being a minor phase with only 5.0 wt%.
Assemblage 3 is composed of 30 % discrete illite, 35 % kaolinite, 20 % high-charge smectite (vermiculite like), and 15 % low-charge smectite.The main idea behind this test assemblage was to see whether the presence of high-charge and low-charge phases (which in this case produced similar patterns under AD and heated conditions, but different patterns under EG conditions) has an influence on the refinement and the quantification in the different set-ups.
For assemblages 1 and 2, both the smooth and noisy patterns were used in separate refinements to assess the influence of this treatment.For assemblages 3 and 4, only the noisy patterns were used, because the previous two experiments showed little influence of the noise on the final results (see below).
Since evolutionary refinement strategies have a stochastic component, each refinement will be different, even if starting and boundary conditions are identical.Nonetheless, the starting point may also have an influence on the final result.To average out these differences and to check if the final output is reproducible, 50 random starting points were sampled so that a normal distribution over the parameter space was obtained.For each of these points a refinement was started.At the end of these refinements, average parameter values and their standard deviations were calculated for these 50 iterations.Additionally, the model kept track of the best solution found at each generation in each iteration, allowing us to create parameter evolution plots.

Assemblage 1
An overview of the obtained average parameter values and standard deviations for assemblage 1 can be found in Tables 5 and 6.Parameter evolution plots for two selected parameters (the average CSDS and the fraction of illite layers in the illite-smectite) are also shown in Fig. 6.Most parameters are determined accurately and with very high precision.The difference between noisy patterns and smooth patterns is marginal, and no difference can be observed between the runs where multiple specimens are combined and those where only a single specimen was used for refinement.As a result of this, the obtained weight fractions for the three phases are also very accurate.The obtained level of accuracy is not a realistic level for natural samples, but stems from the simplicity of this set-up.For the runs using the noisy patterns, a very small (and systematic) deviation in the obtained weight fractions can be observed.This is probably the result of the added noise, since the deviation is not present for runs using the smooth patterns.

Assemblage 2
An overview of the obtained average parameter values and standard deviations for assemblage 2 can be found in Tables 7 and 8.As was the case in the previous assemblage, no significant difference can be observed between runs that use smooth patterns and those that use noisy ones.Both types produced similar parameter accuracies and precisions.Overall, the results are less accurate and precise compared to as-semblage 1, but still very good.Most notably, the weight fractions of the smectite layer types in the kaolinite-smectite show a much larger imprecision.This is also the case in the parameter evolution plots (Fig. 7) for these fractions.An explanation can be found in the sensitivity of these parameters: since the kaolinite fraction in this mixed-layer is relatively high (80 %), the relative amounts of the different types of smectite layers do not have such a large influence on the calculated pattern.Some differences are also noticeable between runs that combine multiple specimens and those where only heated patterns were used.For the latter, the imprecision on the weight fractions for the illite, illite-smectite and smectite phases is significantly larger compared to the other runs.This is to be expected, as heating collapses swelling layers, causing significant peak overlap with the illite peaks.Despite this overlap, it was still possible to obtain accurate and precise averages for the other parameters, comparable to the other runs.

Assemblage 3
An overview of the obtained average parameter values and standard deviations for assemblage 3 can be found in Table 9.With this assemblage, the combined set-up and the set-up using only the EG pattern both resulted in the same www.geosci-model-dev.net/9/41/2016/Geosci.Model Dev., 9, 41-57, 2016       be observed that the weight fractions and parameter values of phases that were unaffected by the treatments (i.e.kaolinite and illite) are more accurate and precise in these set-ups.
It is mainly for the overlapping phases (i.e.smectites) that the errors occur.
Figure 8 shows the parameter plots for the multi-specimen set-up and the AD set-up for a few selected parameters.This figure illustrates the divergent nature of some parameters in the AD set-up very well, while it is clear that the combined set-up does not suffer from this as it has access to the EG pattern as well.
The outcome of this experiment is in line with our expectations, as only the EG pattern contains enough information to distinguish these two smectites from each other.When the EG pattern is absent, the results become divergent, resulting in the high imprecision observed for the AD and heated pattern set-ups.

Assemblage 4
An overview of the obtained average parameter values and standard deviations for assemblage 4 can be found in   mixed-layer illite-smectite as an illite and overlooked the presence of two populations of kaolinite instead of one.Nevertheless the flawed structural model is able to give us decent parameter accuracies.These kinds of "mistakes" are quite common in the real-life use of this kind of program, and apparently do not matter too much either, as long as they are related to natural inhomogeneities.In contrast, a model based on a completely wrong interpretation will never yield any good output, and will result in a very obvious mismatch between the calculated and observed patterns.Even in this assemblage, the (residual) XRD patterns (Fig. 9) show a clear mismatch for these phases.An observant user should notice this and as such be able to identify wrong and/or missing phases.

Summary
For all four assemblages, PyXRD has been able to reproduce the input parameters or at least approximate them with the multi-specimen approach.The only complications occur when single patterns are used, which do not contain enough information on their own (in most cases heated patterns).
The results for these theoretical assemblages seem to suggest that the multi-specimen approach does not add a lot of constraints to the mathematical model.Instead, it appears far more important to correctly identify the phases using multiple specimens than to use these for the parameter refinement.As a result, once the phases are correctly identified, a good quantification can often be obtained with only a single pattern if all phases can be sufficiently discerned from one another in that state.For most natural samples, this could imply that it is sufficient to model the EG and/or the AD pattern.Indeed, many papers presenting modelled X-ray diffraction patterns of phyllosilicates only use the AD and/or EG patterns (Plançon and Roux, 2010;Hubert et al., 2012;Ufer et al., 2012a;Dumon et al., 2014).However, it is important to realize that these results from theoretical experiments cannot be extrapolated automatically to all real-life modelling experiments.
In this context, one needs to understand how realistic it is to share some of the parameters between the different specimens during the refinement.Some of them are rather difficult or impossible to control from measurement to measurement.For example, the number of water or ethylene glycol planes intercalated into smectite-bearing phases is dependent not only on layer charge and the saturating cation but also on the ambient conditions (i.e.temperature and relative humidity) (Tamura et al., 2000).Because of this, a lot of the parameters cannot and should not be shared, and the advantage of having added more observations is partially lost.

Conclusion
In this paper we have presented PyXRD, a new free and open-source program to perform a (semi-)quantitative analysis of disordered layered minerals using multi-specimen Xray diffraction profile fitting.It is the authors' sincere hope that others will pick up on the program and improve it.The novelty of this program lies specifically in the ab initio incorporation of the multi-specimen method, making it possible to share phases and (a selection of) their parameters across multiple specimens.This allows one to model several specimens side-by-side, and is an important step forward.In theory, this could also help in further constraining the mathematical model and thus improving the automatic parameter refinement results (Sakharov et al., 1999a;Meunier, 2005;Lanson, 2011).However, results from theoretical experiments indicate that a multi-specimen refinement set-up is not always required to obtain good parameter estimates.Finally, it remains of paramount importance to use the multispecimen method to obtain a correct identification, as without it no meaningful quantification can ever be obtained.We can conclude that PyXRD has proven to be very useful when X-ray diffraction patterns for complex mineral assemblages containing (mixed-layer) phyllosilicates are modelled with a multi-specimen approach.

Figure 1 .
Figure 1.Schematic overview of the most important objects in PyXRD and their relations.Arrows indicate "is referenced x times by" relations and the numbers indicate the multiplicity of that relation (e.g.Project holds 0 or more references to AtomTypee).

Figure 3 .
Figure 3. Calculated patterns for discrete illite (top) and talc (bottom), showing nearly identical output for PyXRD (solid line) and Sybilla (crosses).For clarity the residual patterns are scaled to 5 times their original intensity.

Figure 5 .
Figure 5. Calculated patterns for mixture 5 for PyXRD (solid line) and Sybilla (crosses).For clarity the residual pattern is scaled to 5 times its original intensity.

Figure 6 .
Figure 6.Parameter evolution plots (left: average CSDS; right: illite content) for the noisy patterns of assemblage 1 for the multi-specimen run (top plots) and the isolated AD run (bottom plots).Minimum and maximum values during the refinement are indicated with dashed lines, iterations' best solutions at each generation indicated by dots and average solution with a solid line.The higher the density of the dots, the lighter they are coloured.

Figure 7 .
Figure 7. Parameter evolution plots for the smectite fractions in the kaolinite-smectite mixed layer of assemblage 2 using the multi-specimen set-up.Plots for the smooth patterns are in the top row, for noisy patterns in the bottom row.Legend as in Fig. 6.

Figure 8 .
Figure 8. Parameter evolution plots for the low-charge smectite in assemblage 3. Plots for the multi-specimen set-up are in the top row, for the AD single-pattern set-up in the bottom row.Legend as in Fig. 6.

Figure 9 .
Figure 9.The input (black solid line) and refined (grey solid line) AD pattern and their difference (grey solid line at the bottom) for the multispecimen set-up of assemblage 4.An observant user should see the mismatches in the patterns and realize his model needs improvement.

Table 1 .
Overview of the discrete phases used to compare the output from PyXRD with the output of Sybilla (R is Reichweite, G is the number of components, N is the average CSDS, AD is air dry, EG is ethylene glycol, relevant probability (P ) and weight (W ) factors are given, Rp and Rwp are the unweighted, and weighted residual errors of the patterns respectively).

Table 2 .
Overview of the test mixtures used to compare the weight fraction output from PyXRD with the output of Sybilla, with details for the different phases (R is Reichweite, N is the average CSDS, d 001 is the basal spacing, relevant probability (P ), and weight (W ) factors are given).

Table 3 .
Calculated Rp and Rwp values for selected intervals of mixtures 1, 2, and 3 calculated in PyXRD and Sybilla.(Rp and Rwp are the unweighted and weighted residual errors of the selected intervals, respectively).

Table 5 .
Overview of the means and standard deviations for weight fractions and refined parameters for assemblage 1 using smooth patterns.

Table 7 .
Overview of the means and standard deviations of weight fractions and refined parameters for assemblage 2 using smooth patterns.

Table 8 .
Overview of the means and standard deviations of weight fractions and refined parameters for assemblage 2 using noisy patterns.
performance, giving accurate and precise parameter values.The set-up with AD or heated patterns, on the other hand, led to inaccurate and imprecise results, especially when the weight fractions are taken into account.Finally, it can also

Table 9 .
Overview of the means and standard deviations of weight fractions and refined parameters for assemblage 3.

Table 10 .
Overview of the means and standard deviations of weight fractions and refined parameters for assemblage 4. Assemblage no. 4 -noisy patterns Multiple specimens (n = 50) Only AD (n = 50) Only EG (n = 50) Only 350 • C (n = 50)