STRAPS v1.0: Evaluating a methodology for predicting electron impact ionisation mass spectra for the aerosol mass spectrometer

. Our ability to model the chemical and thermodynamic processes that lead to secondary organic aerosol (SOA) formation is thought to be hampered by the complexity of the system. While there are fundamental models now available that can simulate the tens of thousands of reactions thought to take place, validation against experiments is highly 10 challenging. Techniques capable of identifying individual molecules such as chromatography are generally only capable of quantifying a subset of the material present, making it unsuitable for a carbon budget analysis. Integrative analytical methods such as the Aerosol Mass Spectrometer (AMS) are capable of quantifying all mass, but because of their inability to isolate individual molecules, comparisons have been limited to simple data products such as total organic mass and O:C ratio. More detailed comparisons could be made if more of the mass spectral information could be used, but because a discrete inversion 15 of AMS data is not possible, this activity requires a system of predicting mass spectra based on molecular composition. In this proof of concept study, the ability to train supervised methods to predict electron impact ionisation (EI) mass spectra for the AMS is evaluated. Supervised Training Regression for the Arbitrary Prediction of Spectra (STRAPS), is not built from first principles. A methodology is constructed whereby the presence of specific mass-to-charge ratio (m/z) channels are 20 fit as a function of molecular structure before the relative peak height for each channel is similarly fit using a range of regression methods. The widely-used AMS mass spectral database is used as a basis for this, using unit mass resolution spectra of laboratory standards.


Introduction
Volatile organic compounds (VOCs), emitted from both natural and anthropogenic sources, are oxidised in the atmosphere to form lower-volatility species that condense onto aerosol particles or contribute to new particle formation (Laaksonen et al., 2008;Sipila et al., 2016;Ehn et al., 2014).With an enormous number of species that are present, this diversity in chemistry is reflected in the extensive range of species and chemical signatures identified in ambient studies (Hamilton et al., 2013).
Within atmospheric science, it is desirable to develop models for secondary organic aerosol (SOA) formation based on a given set of precursors and photochemical processing.Within most global and regional models, often-used techniques include modelling representative photochemical yields from specific precursors and tuning accordingly (Spracklen et al., 2011) or employing a parametric model such as the volatility basis set (Robinson et al., 2007).While both of these approaches can deliver realistic absolute concentrations, because they are not based on explicit physical processes, their predictive skill is always subject to question (Hallquist et al., 2009;Bergstrom et al., 2012).It is therefore desirable to develop SOA models based around actual molecular processes and kinetics constrained through laboratory experiments (where available), such that this skill can be evaluated.Such models rely on explicit chemical mechanisms such as the Master Chemical Mechanism (MCM) (Saunders et al., 1997) or the GECKO model (Aumont et al., 2005).While this mechanistic approach has resulted in poor performance in terms of absolute mass concentrations in the past (Volkamer et al., This is not the first study on predicting EI mass spectra based on molecular composition, or to demonstrate the potential for predicting instrument response functions (Camredon et al., 2007).Bauer and Grimmer (2016) recently reviewed the current performance of quantum chemistry methodologies in predicting EI mass spectrometry for small to medium sized molecules from first principles.Whilst that study documents improving general applicability, they are not immediately suitable for predicting AMS mass spectra because the thermal desorption promotes further fragmentation and, in some cases, pyrolysis (Canagaratna et al., 2015).While the standard AMS analysis takes these processes into account through empirical calibrations, the exact physical processes taking place within the vaporiser system are still the subject of considerable debate (Murphy, 2016;Drewnick et al., 2015;Robinson et al., 2016), so the bottom-up modelling of this is not possible with the current state of knowledge.
Distinct from all previous approaches, the approach presented here relies on supervised learning methods to automatically optimise the relationship between spectral characteristics and molecular features from the instrument in question.Therefore, any internal mechanisms or instrument features impacting on fragmentation are implicitly accounted for in the fitted model.
In section 2 the methodology behind constructing a predictive model is presented, whereas section 3 focuses on results regarding the accuracy of a model with respect to comparisons with spectra for individual components.In addition we present results from simulating the mass spectra of α-pinene aerosol using the GECKO-A model before we discuss future data requirements in section 4.

Methodology
Figure 1 displays the workflow used in building the predictive model.First, a model is trained to predict the occurrence of specific m/z channels as a function of molecular composition before a model for each m/z channel is trained to predict peak height within that channel.It is worthwhile detailing the molecular information used to train each model.Each molecule has varying levels of structural features, which can be written in terms of a 'fingerprint'.This fingerprint is a numerical identification of a given structure that can equally be thought of as stoichiometric information for distinct features.For example, for a collection of 10 compounds we would construct a matrix of stoichiometric information where each row represents a specific molecule and each column the stoichiometry of a given feature.We now refer to each column as a 'key', which might be a specific functional group or feature associated with that molecule.We retain the use of the word 'key' since it can provide more generic information than a functional group.To re-iterate, the entire row we refer to as the molecular fingerprint.For example, identifying the occurrence of carboxylic acid groups is a key within the AIOMFAC fingerprint (Zuend et al., 2011).We then take this information and use it to train a model to predict both the occurrence of a specific m/z channel and then peak heights.To re-iterate, in constructing a model that can predict AMS mass spectra, a library of compounds with measured spectra are used to train a series of regression techniques.This collection of molecules, represented as SMILES strings, is parsed to produce a matrix where each column represents the stoichiometry of a particular key, or feature.This entire matrix is used to fit a predict model for each m/z channel.
The underlying physical principles of EI (F. W. McLafferty, 1994) adjusted to the AMS (Gasteiger et al., 1992), do not exist in algorithmic form, so there is currently no a priori basis for choosing the most appropriate fingerprint for this work.Therefore a collection of common fingerprints, and their combination, is tested in this study and their performance critically evaluated.This is an important sensitivity since one might expect a collection of keys that relate to EI fragmentation principles to offer a more robust basis for fitting any method used here.We discuss this further in section 4.
Fingerprints used in this study include those employed in activity coefficient and vapour pressure predictive techniques provided by the UManSysProp package (Topping et al., 2016;Zuend et al., 2011;Nannoolal et al., 2008), alongside more general fingerprints including the MACCS keys and FP4 keys (Putta et al., 2003).It is difficult to find information on provenance behind these latter generic fingerprints (Putta et al., 2003), other that they are designed to cover a set of molecular features that would be used across a broad range of applications.The MACCS fingerprint provides up-to 162 unique keys of any given molecule, the FP4 fingerprint featuring up to 320.The current implementation of the MACCS keys from the Pybel package (O'Boyle et al., 2011) is used whereas the FP4 keys are extracted from the RDKit open source informatics package (http://www.rdkit.org/docs/index.html).Each key is represented in the UManSysProp package (Topping et al., 2016) using SMARTS notation, and each molecule using the SMILES format.The matrix of keys used to fit each method is constructed by systematically parsing each molecule.Figure 2 demonstrates the use of the MACCS SMARTS to populate a matrix of keys.There are some common features between each fingerprint library, but also a range of differences.For example, all libraries identify the presence of the CH2 group, but then differ in the optional connecting groups.The FP4 keys cycle through systematic groupings, such as: primary carbon, secondary carbon, tertiary carbon…primary alcohol, secondary alcohol, tertiary alcohol etc.Similar groups are detected using the activity coefficient and vapour pressure keys.The full collection of SMARTS keys can be found in the source code and we discuss suggestions for future work on refining fingerprints in section 4. Please refer to section 5 on code availability.
With regards to the supervised methods used, an ensemble tree is trained to predict the occurrence of specific m/z channels as a function of any given fingerprint.To predict peak height per m/z channel, we evaluate a number of supervised methods available in the SciKit-learn package: Generalised Linear methods, Support Vector Machines [with 3 separate kernels], Stochastic Gradient Descent, Bayesian Ridge, Ordinary Least Squares, Decision Trees and Ensemble methods (Pedregosa et al., 2011).There are a number of other methods available yet, as we will discuss in section 4, the results from this study demonstrate a potential whilst further data is needed to confirm general applicability, including the use of other methods.For a brief overview of each method, we refer the reader to Ruske (2016) and references therein.Before training each method, the matrix of identified keys were standardized between zero and one using the MinMaxScaler pre-processing feature within the Scikit learn package.In addition, the use of variable selection is designed to use only those features deemed important to construct fingerprint-peak height relationships to try and mitigate any under or over fitting.The sensitivity to these procedures are discussed in section 3.2.To compare modelled and measured mass spectra, the cosine angle from a dot product of the two are used, focusing on specific m/z channels that are typically found as features within atmospheric and smog chamber mass spectra (Ulbrich et al., 2009): 15,18,28,29,39,41,43,44,50,51,53,55,57,60,73,77,91.The ability of each method to replicate the entire database is first evaluated.Whilst training on a subset and comparing with the entire database will test wider applicability, this initial comparison quantifies the appropriateness of the different fingerprints in building an accurate model.19), with the MACCS fingerprint providing the most (74) and the FP4 keys the second highest (30).The use of more or less information in the fitting procedure should not be assumed to automatically lead to a more accurate predictive model.Ideally there should be a balance between the number of features identified and how those features relate to the mechanisms of fragmentation on the molecule within the instrument in question.As we have already noted, comparing the information provided by each fingerprint with a working knowledge of the mechanics of EI fragmentation might help understand why a given fingerprint is more suitable.However we first and foremost wish to demonstrate the efficacy of using pre-defined fingerprints as they are available in the literature or within existing opensource software packages.The exact physical processes taking place within instrument are still the subject of considerable debate.

Sensitivity to choice of molecular fingerprint
Table 1 presents the median cosine angle of modelled spectra fit to the entire AMS database derived from the different supervised methods and different fingerprints, either isolated or combined into one, to 2 decimal places.The left hand sided box-plots in figure 4a-d display the entire cosine angle spread for each method for the isolated MACCS (4a), FP4 (4b), AIOMFAC (4c) and Nanoolal fingerprints (4d).When fitting to the entire library of AMS spectra, initial results suggest that the tree-based methods ['Tree','Forest'] perform better than others, with the MACCS keys leading to improved model performance over other fingerprints.However, the difference between using either the MACCS or Nanoolal keys, for example, is not significant for any given supervised method as noted in Table 1.Rather than demonstrating 100% accuracy, the values of 1.00 must be taken with caution as we demonstrate in proceeding analyses.Whichever fingerprint is used, the ranking of performance between supervised methods remains similar, with the tree-based methods, Ordinary Least Squares and Bayesian Ridge outperforming Stochastic Gradient Descent and all Support Vector Machine kernels.Along with higher median values, the spread of cosine angles from the tree based methods and Ordinary Least Squares is much lower than all other methods.Whilst the use of MACCS and FP4 provide, in theory, more information, there is some similarity in structural information provided in all keys, as already discussed.For example, each fingerprint identifies key functional groups such as alkanes, alcohol, ketones etc, whilst the FP4 and MACCS keys in particular include more positional detail including relative positions of groups.At least for the 100 compounds in the AMS library, that additional information leads to a slight increase in cosine angle agreement of around 0.02 between methods, if we use only results from table 1 and figure 4. A key objective of this study, noted above, is to demonstrate the use of pre-defined fingerprints in constructing a predictive model.However, it is useful to also demonstrate the efficacy of combining the information from each fingerprint into one, without relating variable performance according to physical processes taking place within the instrument.The performance of combining all fingerprints into one, represented in table 1 under the column heading 'combined', illustrates a similar trend in performance between methods.
We discuss the significance of values displayed in table 1 after performance is re-evaluated following a more general approach of training to a subset of compounds, and the use of variable selection, in the next section.

Training to a subset, variable selection and dimensionality reduction
Table 2 presents the median cosine angle between modelled and predicted mass spectra, as a function of fingerprint, either isolated or combined into one, and regression technique, when training to a subset of the entire database and use of variable selection.To minimise over fitting any model to specific features, the process of variable selection allows us to refit the model to those keys deemed most important.The combination of both strategies might be considered the most suitable test of the methodology presented, with the full spread of statistics presented in the right hand column of figures 4a-d.It should be noted that randomly selecting the subset used for training leads to a significant decrease in model performance.This is due to missing keys within the training subset that are deemed important in predicting spectra for those compounds outside of the subset.A different approach is to select the subset according to maximising the number of keys across each molecule in the training subset, and is used in our proceeding analysis.
In some cases, such as with the Ordinary Least Squares and Forest methods, the data provided in Table 2 suggests that using both strategies leads to a lower median cosine angle, thus slightly reduced model performance when using isolated fingerprints.However, in practice, the statistics presented in Table 1 should not be considered a true test of the methodology, but rather a precursor demonstration of the sensitivity to choice of fingerprint, and perhaps any variability in instrument response across the AMS library.On this, the use of the 'combined' fingerprint demonstrates the ability to retain information from those keys that improve overall performance.
Given their wide use across many disciplines, it is difficult to quantify the reasons behind the poor performance of the Support Vector Machines relative to other methods.To assess whether dimensional reduction procedures would improve accuracy, table 3 presents the median and overall spread of cosine angles when using Principal Component Analysis (PCA) on the 'combined' fingerprints.The number of principal components between 20, 10, 8 and 4. Generally, reducing the number of keys from up to 278 to 20 components, leads to an improvement of around 0.01-0.02 in all methods apart from Ordinary Least Squares and Support Vector Machines with both the polynomial and linear kernels.Results demonstrate clear sensitivity to the number of components when combined with the RBF Support Vector Machine kernel, performance varying from 0.84 to 0.67 on reducing the number of components from 20 to 4.
On the significance of the value of cosine angle, Figures 5 and 6 display predicted spectra for compounds not included in a training set, along with the cosine angle between modelled and measured spectra.From this point on we use isolated fingerprints to demonstrate the efficacy of our approach.For Oxalic acid, in Figure 5, the difference in performance between the FP4 and MACCS fingerprint [cosine of 0.83 and 0.77] is apparent through certain features, including the relative proportion of peak heights for the 3 dominate channels, and the ratio of f44 to f43.In Figure 6, a similar pattern is found for Leucine, including a marked difference in whether the model predicted non-zero entries across f41 -f44.Whilst a small subset, these results suggest use of the cosine angle alone is not sufficient to validate model performance, which is confirmed in section 3.3 when applied to the α-pinene system.Based on these comparisons, a tentative suggestion of using a cosine angle of 0.8 might go some way to clarifying the performance statistics provided in Tables 1 and 2 and Figure 4. Indeed, results demonstrate that, whilst statistics in Table 2 and Figure 4 suggest similar performance for both MACCS and FP4 keys, this performance is composition dependent.This reflects sensitivity to information used in the training process and how similarity between performances should be taken with caution in prescribing which method to take forward.This is better highlighted in the proceeding section with regards to a model SOA system.
Results at least suggest the tree based methods are at least the most stable given the higher range of cosine angles presented in Figures 4a-d and the decision tree method will be used in all proceeding analysis.

Example application to a model aerosol system.
In this section we apply the trained methods to a model SOA system, using output from the GECKO-A model used by (Valorso et al., 2011) to study SOA formation from α-pinene in a simulated chamber experiment.The purpose of this exercise is to explore sensitivity of predicted mass spectra to combined speciated output from a fixed model configuration David Topping 5/4/2017 16:58 Deleted: the through varying fingerprints to support the comparisons made in the previous section.It is not designed as a thorough quantitative analysis of spectra comparisons, but rather to demonstrate the ability to extract specific features and highlight sensitivities to choice of model configuration.A recent study of McVay et al. (2016) presented results demonstrating sensitivity of aerosol mass and composition to processes included in a box-model model, including the addition of autoxidation mechanisms.They proposed that autoxidation might resolve some or all of measurement-model discrepancy from chamber simulations, but that this hypothesis could not be confirmed until more explicit mechanisms are established for α-pinene autoxidation (McVay et al., 2016).One might imagine an ideal sensitivity study would be to use speciated output from these updated models and add additional constraint to prescribing model performance through a comparison between measured and predicted mass spectra.Indeed, that is a rationale behind the study presented here.However, as proceeding results will demonstrate, with the existing training data and lack of validation on simple mixtures, there is potential for false positives in the predicted spectra to confuse a diagnosis of accurate model configurations.Specifically, the composition space derived from a series of box-model configurations would need to be mapped onto the existing space covered by the AMS spectral library.Combined with additional measurements of mixed systems of known composition, we could then prescribe a more robust set of regression model configurations through which a more detailed sensitivity study could take place.
Nonetheless, to illustrate sensitivity to choice of fingerprints in a complex system, Figure 7 displays the predicted mass spectra for the GECKO-A model results of Valorso et al. (2011) combined with the experimental data taken from a chamberbased α-pinene SOA formation experiment reported by Alfarra et al. (2013).This spectra represents "aged" aerosol, after 4 hours of experiment, during which the VOC/NOx ratio was ~2.Without further refinement of model and measurement conditions, these results exhibit large errors in the predicted mass spectra when using MACCS keys, despite the brief analysis presented in section 3.2.This demonstrates that over fitting to distinct features in the training set and difference between this composition space and that provided by the box-model output are leading to features that are missed in the final spectra.This is further supported by the abundance of features extracted from the training set displayed in figure 3.
To expand on this performance, Figure 8  Figure 9 displays the predicted f44 to f43 peak heights from the model system using the commonly used 'triangle plot' (Morgan et al., 2010;Ng et al., 2011), compared with the experimental data taken from the chamber experiments of Alfarra et al. (2013) and also Chhabra et al. (2011), who studied the formation of α-pinene oxidation in response to different oxidants.
Note the trajectories in this space are not monotonic for either the experimental or simulated data, which indicate the complexities in interpreting spectra based on these metrics.Results suggest that f43 values when using the FP4 and Nanoolal keys are plausible when compared to published studies.The f44 peak height is systematically low for all fingerprints, as also shown in figure 5-7.However, rather than a deficiency in the mass spectral prediction methods, this is likely due to a deficiency in the Valorso et al. (2011) model treatment.It has recently been shown how important mechanisms such as autooxidation are to the α-pinene SOA system (Ehn et al., 2014), which are capable of rapidly adding oxygenated functional groups to the molecules that are responsible for both the suppression of vapour pressures necessary for SOA formation and also the increase in the f44 metric (Canagaratna et al., 2015).More recent versions of GECKO-A have included such mechanisms (McVay et al., 2016), however a systematic comparison of the predicted spectra based on these inclusions is beyond the scope of this proof-of-concept paper and will be presented in a future publication.

Discussion and future work
The preceding analysis demonstrates the potential for the methodology presented to lead to interesting investigations on model versus measured mass spectra.However, there are a number of remaining improvements that need to be made.It is inevitable that not all of the chemical species predicted by the models will be covered by previous laboratory work.If a class of species predicted by any chemical mechanism is identified as not covered by existing SMARTS-based fragmentation rules, it could be characterised in the laboratory using the same facilities and methodologies employed for previous characterisation work (Canagaratna et al., (2015) and references therein).
On the sensitivity to choice of fingerprint, our results demonstrate compound specific trends that lead to performance variability when applied to a complex SOA system that is not apparent when analysing median cosine angle statistics.
Combining available fingerprints into one can slightly improve performance in some cases, but as the comparison of isolated MACCS versus FP4 performance illustrates, there is potential danger in over fitting to distinct features in the training set that is not provided by the box-model output.To re-iterate, one might expect a collection of keys that relate to EI fragmentation principles to offer a more robust basis for fitting any method used here.However, that requires further work with additional laboratory data to validate the efficacy of any new bespoke fingerprint.
The methods here have a number of uses, although it must be re-iterated that the predicted mass spectra are not definitive.
The performance of this method will be improved by the addition of further training data.Following the development of group contribution methods, this could include studies on compounds within a specific series and mixtures of those compounds.As outlined in the introduction, the ability of this model to predict AMS spectra will be useful in the development and validation of explicit SOA mechanisms in the laboratory, meaning that the models can be challenged by the entire mass spectrum and not just the mass and O:C ratio.This method can also be used at the experiment design stage, allowing predictions of whether an AMS will be able to discern expected changes in composition associated with a process and thus whether it will be useful to test particular hypotheses.
The method could also be used to simulate atmospheric aerosol, probably if the chemical model is used in a Lagrangian configuration.In addition to the insights gained in atmospheric processes, this could be used to critically test the data model used in positive matrix factorisation (PMF) (Ulbrich et al., 2009).Because of the condition that PMF factors have fixed profiles, the reduction of the complexity associated with atmospheric SOA to (typically) two factors results in an increase in 'rotational ambiguity' associated with the factorisation.A two-component factorisation of SOA is often interpreted as representing the 'low volatility' and 'semivolalite' components of the SOA (Jimenez et al., 2009), although this has shown not to be applicable to all environments, where other sources of variability contribute to the split in the factors (Young et al., 2015).If the mass spectral response to atmospheric SOA could be more explicitly simulated using this technique, a synthetic AMS dataset could be used as the subject of PMF analysis in a manner similar to Ulbrich et al. (2009).This in turn could be used to investigate the contributions of the factorisation on a more explicit level and investigate the effects this has on rotational ambiguity and the validity of solutions.

Code availability
A publicly available copy of the code used to derive performance statistics of the chosen regression methods can be found at : https://github.com/loftytopping/STRAPScovered by a GPL v3.0 license.This includes a copy of the AMS spectral files that now also include appropriate SMILEs strings.The code separates the four fingerprint libraries used in this study.We also provide an associated DOI for the exact model version given in this paper as provided by the Zenodo service: https://zenodo.org/record/213068#.WFlryyiPD3s Please note that an extension to the SMARTS libraries included in UmanSysProp was carried out in this project.Please find attached our responses to each reviewer and changes in the manuscript, as highlighted in the track changes version uploaded.Key changes are in response to reviewer #1 in which I had to re-run some simulations with a 'combined' fingerprint.These new simulations again illustrate the potential of the methodology, but also leave the proof of concept study open for the required future work to take this forward.

Reviewer #1
We would like to thank the reviewer for their recognition of the potential of the approach presented here.In the following we respond to all comments, including detailing some additional work that has been carried out with regards to fingerprint analysis.In the following response we separate and number all distinct comments in order of their appearance in the review, highlighting new text added to the manuscript where appropriate.1) Is the MACCS fingerprints most successful just because of the sheer number of keys, each of which contribute to predictions, or are there particular structural elements not present in the others that improve the predictions?
Response: Comparing average performance statistics in section 3.1 at first implies this might be the case.However the comparison with spectra from the Alfarra et al. ( 2013) paper illustrates the MACCS keys perform poorly.Interrogating the performance from predictions using the MACCS keys for specific compounds illustrates a few problems that might reflect a lack of generality across the MACCS keys.For example, the FP4 keys cycle through systematic functional groupings such as: primary carbon, secondary carbon, tertiary carbon…primary alcohol, secondary alcohol, tertiary alcohol etc.This would lead to a maximum of 320 keys per molecule.MACCS keys on the other hand are almost seemingly designed to capture a random, although extensive, set of features leading to a maximum of 162 features for any given molecule.As we note in the manuscript, it is difficult to find the provenance behind the MACCS keys.However, we have added the following text in section 2, page 5, to try and clarify the issue [new text presented in italics]: 'There are some common features between each fingerprint library, but also a range of differences.For example, all libraries identify the presence of the CH2 group, but then differ in optional connecting groups.The FP4 keys cycle through systematic groupings, such as: primary carbon, secondary carbon, tertiary carbon…primary alcohol, secondary alcohol, tertiary alcohol etc.Similar groups are detected using the activity coefficient and vapour pressure keys.The full collection of SMARTS keys can be found in the source code and we discuss suggestions for future work on refining fingerprints in section 4. Please refer to section 5 on code availability.' 2).The generally poor performance of SVMs for all keys is surprising, is it possibly due to the high dimensionality in the underlying representations that is not present in the others, or is there a more obvious reason to the authors?
Response: We agree this is surprising, especially given the extent of applications to which SVMs are applied.At first we assumed this was down to how the data was normalized prior to training.However, using a maximum/minimum scalar prior to training did not improve performance.There are differences according to which kernel is used.It might be true that dimension reduction procedures, such as PCA, might improve performance.With this in mind, we have conducted tests on using PCA prior to training, using the combined set of fingerprints as requested in point '6' addressed shortly.Based on these results we have added an additional table [table 3] demonstrating the effect of dimension reduction procedures on the performance of all methods, using the combined fingerprint approach:

Table 3 -Median cosine angle between measured and predicted spectra, applying PCA analysis to the 'combined' fingerprints, as a function of the number of principal components used given above each column. The method labels are as follows: SMV [Support vector Machine with 3 kernels (RBF, Poly[nomial] and Lin[near])], BRR: Bayesian Ridge, OLS: Ordinary Least Squares, SGDR:Stochastic Gradient Descent, Tree: Decision Tree and Forest: Random Forest.
We have also added the following text to section 3.2 [new text presented in italics], which is renamed to: 3.2 Training to a subset, variable selection and dimension reductions.'in practice, the statistics presented in Table 1 should not be considered a true test of the methodology, but rather a precursor demonstration of the sensitivity to choice of fingerprint, and perhaps any variability in instrument response across the AMS library.On this, the use of the 'combined' fingerprint demonstrates the ability to retain information from those keys that improve overall performance.Given their wide use across many disciplines, it is difficult to quantify the reasons behind the poor performance of the Support Vector Machines relative to other methods.To assess whether dimension reduction procedures would improve accuracy, table 3 presents the median and overall spread of cosine angles when using Principal Component Analysis (PCA) on the 'combined' fingerprints.The number of principal components was varied between 20, 10, 8 and 4. Generally, reducing the number of keys from, up to, 278 to 20 components, leads to an improvement of around 0.01-0.02 in all methods apart from Ordinary Least Squares and Support Vector Machines with both the polynomial and linear kernels.Results demonstrate clear sensitivity to the number of components when combined with the RBF Support Vector Machine kernel, performance varying from 0.84 to 0.67 on reducing the number of components from 20 to 4.' We cannot say with any certainty what the true cause of variability within each regression technique is.Ultimately, we feel this proof of concept study needs building on with appropriate laboratory data before further quantification of dependencies would be possible.Whilst we state the rationale in the original manuscript, we have added the following text in section 4 to re-iterate this: 'On the sensitivity to choice of fingerprint, our results demonstrate compound specific trends that lead to performance variability when applied to a complex SOA system that is not apparent when analysing median cosine angle statistics.Combining available fingerprints into one can slightly improve performance in some cases, but as the comparison of isolated MACCS versus FP4 performance illustrates, there is potential danger in over fitting to  5 and 6, which are incidentally missing axes labels).There is some mention about f43 being somewhat reasonable and f44 being under predicted, but this seems a bit buried in the presentation.
Response: There are indeed other metrics we could have employed to measure distance between mass spectra, however we considered cosine to be the most appropriate.Firstly, because our aim is to replicate the AMS instrument response function, which can be modelled as a linear addition of multiple component mass spectra, we reason that it would make the most sense to use a metric that places linear weight on the peaks' relative intensities.Secondly, while a different metric may place a relatively greater weight on intermediate peaks (thus ensuring a more general agreement over a larger number of peaks), we would have to take care not to also unduly weight the minor peaks, which can be problematic.As such, an element of subjectivity would have been introduced in the choice of algorithm, which in itself would require more testing.It is possible that there is a better closeness metric that could be tested as part of future work and this would be easily testable within the STRAPS framework, however see that as outside the scope of this particular paper.Concerning the comparison between f43 and f44, this refers to the specific comparison between the GECKO-A run and roughly comparable chamber experiments, however we must stress that this test was only to demonstrate proof-of-concept and not perform a systematic comparison to assess the performance.We merely show that the values produced for these two common AMS metrics are plausible in magnitude.For this to be done properly, a chemical model run matched to the exact chamber system should be performed with a state-of-the-art model; this will form part of future work and a full, systematic comparison of peak magnitudes will be performed there.
5) is it reasonable to try to predict 300 m/z's in the AMS spectrum (In Figures 5-8 only 100 are shown, but is the model trained only to predict 100 m/z's)?Would not the authors benefit from trying to reproduce a "reduced" set of spectra (e.g., reconstructed from a truncated set of PCA or PMF components)?
Response: The methodology presented here is based on predicting a response for each channel, and then predicting the peak height for each channel.Each m/z therefore has its own model and there is not dependency on whether 100, 150 or 300 m/z's are chosen.There is no penalty to predicting the high m/z peaks, as these generally represent a low mass fraction and contribute little to the cosine of the comparisons.However, there will be a tangible disadvantage to operating on a reduced dataset because the data reduction in itself will inherently remove information that is possibly of value for training, so there is a very real risk of an inferior training.
6) Is there a reason why all keys were not combined into a single fingerprint?It would be simple to remove redundant keys simply by inspection, if that were a concern.Regarding the comparison of f44 and O:C (Figure 8), is not the COO+ associated with m/z 44 more sensitive to dicarboxylic acids (Russell et al., 2009)?
Response: This is a good point, and we have conducted additional simulations to investigate this.It is worth noting the initial aim of the paper was to illustrate the use of 'standard' fingerprint libraries, as they exist as distinct developments.As noted in the manuscript, ideally we would like to take this proof of concept work forward by constructing a library of keys that better represents the mechanism of fragmentation within the AMS.It might be that converting general rules of EI fragmentation would be a useful starting point.Tables 1-2

Figure 3
Figure 3 visually compares the number of keys extracted from the 100 compounds in the AMS library according to choice of fingerprint.Data is presented according to the use of AIOMFAC [bottom left], MACCS [top left], Nanoolal [bottom right] and FP4 [top right] keys.Using the AIOMFAC fingerprint leads to, at most, 17 keys identified from the AMS library.The Nanoolal fingerprint leads to a larger set of keys (19), with the MACCS fingerprint providing the most (74) and the FP4 keys displays the predicted mass spectra f44 peak height versus O:C ratio from the GECKO-A model results ofValorso et al (2011) in a manner similar toAiken et al. (2008).There are 9 points on each curve, representing points in time during the GECKO-A simulation, with the model predicting a monotonic increase in O:C over time.It is worth noting the values are low compared to typical atmospheric LV-OOA(Aiken et al., 2008;Kroll et al., 2011).Overall, use of the FP4 and Nanoolal keys give absolute f44s that compare well with published calibrations relative to O:C, specificallyAiken et al. (2008)  and the updated calibration presented byCanagaratna et al. (2015).The direction of the trend in f44 versus O:C is reversed when using the Nanoolal keys, with f44 decreasing with O:C, which runs contrary to expectations.However, it should be noted that the values are within the spread of values used to generate theAiken et al.   and Canagaratna et al. (2015)  calibrations, as these performed regressions over much bigger ranges of O:C than obtained in this simulation, so the prediction based on Nanoolal keys could still be plausible.

Figure 1 -
Figure 1 -Schematic of workflow used in the training process.For a normalised mass spectrum, the SMILEs string associated with each compound is combined with a given molecular fingerprint to train methods to predict the occurrence of a given m/z channel and then a peak height.

Figure 2 .
Figure 2. Basic schematic of interrogating a SMILES string with a SMARTS library to construct a molecular 5 fingerprint.

Figure 3 -
Figure 3 -Sparsity of keys extracted (x axes) from each compound (y axes) as a function of molecular fingerprint used (Top left: MACCS, Top right: FP4, Bottom left: AIOMFAC, Bottom right: Nanoolal).Keys are coloured according to normalised stoichiometry across all compounds.

Figure 4a -
Figure 4a -Spread of cosine angle between experimental and predicted mass spectra [y axes] for all 100 compounds in the AMS library as a function of supervised method [x axes] using the MACCS fingerprint.left: using all compounds in the training process.right: using 80% of the compounds in the training process with variable selection.The method labels are as follows: SMV [Support vector Machine with 3 kernels (RBF, Poly[nomial] and Lin[near])], BRR: Bayesian Ridge, OLS: Ordinary Least Squares, SGDR:Stochastic Gradient Descent, Tree: Decision Tree and Forest: Random Forest.

Figure 4b -
Figure 4b -Spread of cosine angle between experimental and predicted mass spectra [y axes] for all 100 compounds in the AMS library as a function of supervised method [x axes] using the FP4 fingerprint.left: using all compounds in the training process.right: using 80% of the compounds in the training process with variable selection.The method labels are as follows: SMV [Support vector Machine with 3 kernels (RBF, Poly[nomial] and Lin[near])], BRR: Bayesian Ridge, OLS: Ordinary Least Squares, SGDR:Stochastic Gradient Descent, Tree: Decision Tree and Forest: Random Forest.

Figure 5 -
Figure 5 -Measured mass spectra for Oxalic acid [top] versus predicted mass spectra from an ensemble tree using the FP4 fingerprint [middle, cosine of 0.83] and the MACCS fingerprint [bottom, cosine of 0.77].

Figure 6 -Figure 7 -
Figure 6 -Measured mass spectra for Leucine [top] versus predicted mass spectra from an ensemble tree using the FP4 fingerprint [middle, cosine of 0.70] and the MACCS fingerprint [bottom, cosine of 0.94].

Figure 8
Figure 8 Comparison of O:C ratios and predicted fractional contribution to the AMS m/z 44 channel (f44) for the Valorso et al. (2011) GECKO-A simulation, compared against the regressions performed by Aiken et al. (2008) and Canagaratna et al. (2015).The highlighted points indicate the final points in the simulation.

Figure 9 -
Figure 9 -'Triangle plot' comparing predicted f44 and f43 values for the Valorso et al. (2011) GECKO-A α-pinene SOA simulation with chamber experiments.The Chhabra et al. (2011) data compares different oxidant systems and is taken from figure 2A of that paper.The chronological final points in each dataset are highlighted.
now includes median cosine angles from each regression technique when combining all keys into one fingerprint:Table1 -Median cosine angle between measured and predicted spectra when fitting to the entire dataset as a function of molecular fingerprint [Given above each column].Please note, the term 'Combined' refers to a combination of all individual fingerprints into one.The method labels are as follows: SMV [Support vector Machine with 3 kernels (RBF, Poly[nomial] and Lin[near])], BRR: BayesianRidge, OLS: Ordinary Least Squares, SGDR:Stochastic Gradient Descent, Tree: Decision Tree and Forest: Random Forest.

Table 3 -Median cosine angle between measured and predicted spectra, applying PCA analysis to the 'combined' fingerprints, as a function of the number of principal components used given above each column. The method labels are as follows: SMV [Support vector Machine with 3 kernels (RBF, Poly[nomial] and Lin[near])], BRR: Bayesian Ridge, OLS: Ordinary Least Squares, SGDR:Stochastic Gradient Descent, Tree: Decision Tree and Forest: Random Forest.
the training set that is not provided by the box-model output.To re-iterate, one might expect a collection of keys that relate to EI fragmentation principles to offer a more robust basis for fitting any method used here.However, that requires further work with additional laboratory data to validate the efficacy of any new bespoke fingerprint.'3)How are the tuning parameters for the model parameters determined?For instance, the penalty factor for SVM, etc.? Response: Using the cosine angle between spectra as a measure of good fit, parameters for each method, where required, are cycled until the most effective combination were found.These parameter ranges are presented in the code release and are specific to each algorithm,.4) Are cosine angles (uncentered correlations) sufficient to capture agreement that represents more than the range (minimum and maximum) relative ion counts for each spectrum?This angle may not represent disagreement in relative ion counts that are of intermediate value very well.In that there is precedent for cosine angles for mass spectra comparison, it is a safe metric, but the authors may look at analyzing residuals for each mass fragment to understand what their model gets right and less right (to generalize on illustrations provided in Figures