BUMPER v1.0: A Bayesian User-friendly Model for Palaeo-Environmental Reconstruction

. We describe the B ayesian U ser-friendly M odel for P alaeo-E nvironmental R econstruction (BUMPER), a Bayesian transfer function for inferring past climate from microfossil assemblages. BUMPER is fully self-calibrating, straightforward to apply, and computationally fast, requiring ~2 seconds to build a 100-species model from a 100-site training-set on a standard personal computer. We apply the model’s probabilistic framework to generate thousands of artificial training-sets 20 under ideal assumptions. We then use these data to demonstrate the sensitivity of reconstructions to the characteristics of the training-set, considering assemblage richness, species tolerances, and the number of training sites. We find that a useful guideline for the size of a training-set is to provide, on average, at least ten samples of each species. We demonstrate general applicability to real data, considering three different organism types (chironomids, diatoms, pollen) and different reconstructed variables. An identically configured model is used in each application, the only change being the input files 25 that provide the training-set environment and species-count data. The performance of BUMPER is shown to be comparable


Introduction
Transfer functions are numerical tools that are widely used to infer past environment from species assemblages of microfossils preserved in lake or ocean sediments (Imbrie and Kipp, 1971;Birks and Seppä, 2004;Birks et al., 2010).These species assemblages can provide a strong indicator of the environmental characteristics of the habitat in which the microfossils were found.Numerical relationships can be developed between present-day species distributions and the environment, and by applying the principle of uniformitarianism (Rymer, 1978), these relationships can estimate palaeoenvironments by examining the presence and abundance of palaeo-species.Here we take a Bayesian approach and develop a transfer-function model that is tested on three different organism types (chironomids, diatoms, pollen).Although Bayesian transfer functions have a moderately long history (Vasko et al., 2000;Korhola et al., 2002;Haslett et al., 2006) they are still not routinely employed by palaeoecologists, who generally apply frequentist approaches, such as weighted-averaging (Birks et al., 1990).The advantages of frequentist approaches are that they are relatively straightforward to understand and apply, and that they are computationally fast.In contrast, Bayesian approaches tend to involve more complex mathematics and can be computationally demanding.The advantages of Bayesian approaches, however, are numerous and include the use of prior information (van Woesik, 2013).
The principal motivation for a Bayesian approach is that the palaeoenvironment is treated probabilistically, and can be updated as additional data become available.Bayesian approaches therefore provide a reconstruction-specific quantification of the uncertainty in the data and in the model.Bayesian models are in principle ideally suited to multi-proxy reconstructions because the posteriors that are derived from independent proxies can be combined.However, to our knowledge, no Bayesian approach has yet been applied to a multi-proxy reconstruction, at least in part because of the aforementioned difficulties with application.
Here we describe the "Bayesian User-friendly Model for Palaeo-Environmental Reconstruction" BUMPER and demonstrate its general validity and ease of use through applications to both artificial and real datasets.The model can be regarded as a Bayesian analogue to weighted averaging-type approaches (Birks et al., 1990) because it assumes a unimodal response of each taxon to the reconstructed environmental variable of interest.It therefore shares many of the weaknesses of weighted averaging-type approaches (Huntley, 2012;Juggins, 2013), most notably by assuming that the responses to different environmental variables are independent.It is the task of the ecologist, assisted by multivariate techniques such as ordination (Hill and Gauch, 1990;Juggins and Birks, 2012;Juggins 2013), to ascertain which environmental variable is the dominant driver of change in the assemblage through time.When strong interactions are suspected, approaches that concurrently reconstruct the interacting variables are advised (Huntley et al., 1993;Huntley, 2012).
The mathematics of BUMPER was developed and applied to the Surface Water Acidification Programme (SWAP) diatom training-set (Stevenson et al., 1991) in Holden et al. (2008).The mathematics is unchanged here.Instead the primary motivation of the present study is to document the developments to the model needed to make it user-friendly.The two important steps were first to develop an approach to generalise and automate the priors for the model parameters (we do not assume that the user has the necessary expertise to make these decisions), and second to demonstrate that the model can be applied to other organism types and environmental variables.We apply the model to both artificial and real data, considering a range of different organism types to demonstrate the model's general applicability.

The model
The mathematics of BUMPER has been described in detail (Holden et al., 2008).In this section we summarize the underlying philosophy.The principal assumption underlying the model is that biological species exhibit a unimodal response to an environmental variable, so that both the probability of presence and the expected abundance of a given species are maximized at the environmental optimum favored by that species.Moreover, the species optima are fixed, so that the response to the environmental variable of interest is assumed to be independent of other environmental variables.

Model selection
A species response curve (SRC) defines the probability of an observed count of a species, or taxon, as a function of the environmental variable of interest.An SRC is defined by five (initially unknown) parameters.These parameters specify the environmental optimum and tolerance of each taxon, together with the probability of its presence, and its expected abundance under optimal conditions. (1) The second term on the right hand side is the prior, the probability that the model is correct before we consider the data.
Equation 1 is applied sequentially across all training-set sites.At each application, the posterior derived from the previous training-set site becomes the prior for application to the next site, so that the probabilities assigned to each SRC become progressively better defined.The proportionality constant means that we have to normalise the calculated probabilities after completing this process for all SRCs.This is achieved from the assumed constraint that for each species k, as follows:

Environmental reconstruction
To reconstruct the environment from an observed fossil count  !! of each taxon  within the fossil assemblage, the probability-weighted SRCs are used to derive likelihood functions for each taxon of the reconstructed variable .Here we again use the Bayes relationship, this time stating that the probability that a reconstructed value is correct in the light of an observed species count is proportional to the probability that the species count would be observed in that environment.As we do not know the true SRC with certainty (in general, the calibration will have resulted in many SRCs with non-zero probability), we perform this calculation for all significant SRCs2 and combine them into a single likelihood function using the probability weights calculated in Section 2.1, as follows: An alternative and less constrained presence-absence likelihood function can also be derived as follows: The likelihood function assumed by BUMPER is derived as a linear combination of these two functions, as follows: The motivation for this blended likelihood function is that  !provides a tighter constraint on the posterior, but the wide tails contributed by  !reduce the possibility of the correct solution being ruled out by an outlying species, for instance resulting from misidentification or unusual taphonomic processes.We assume  = 0.5, except when building a purely binary, presence/absence model, when  = 1.0.
The posterior probability distribution for the reconstructed variable   is derived by combining any prior knowledge   ! with the assemblage of likelihood functions, as follows: A point estimate for the reconstructed variable is derived as: A simple uncertainty metric for the reconstruction is derived as:

Probability distribution
To apply Bayes' equation we require an expression for the probability of observing some count  as a function of the environmental variable .The assumption of a unimodal response leads us to assume that the expected abundance (given presence)  !" of species  in site  follows a Gaussian distribution about some optimum value  ! of the environmental variable  !, written as: Here  ! is the expected abundance (given presence) at the species optimum, and the tolerance  ! is a measure of how rapidly the expected abundance falls off away from this optimum.
The probability of presence (e.g.used to calculate the  !likelihood function, Equation 4) is also assumed to follow a Gaussian distribution about the same optimum, though not necessarily of the same tolerance, and can be written in terms of the expected abundance, as: where  ! is needed because we do not require the same tolerance for  ! and  !, although we can achieve this with  != 1.
We illustrate the role of  !through example;  !≪ 1 describes a species that is found at high abundance only in an environment that is relatively close to its environmental optimum, but may also be present (albeit only at low abundance) far away from this optimum.
The probability of a non-zero count  !" (used to calculate the  !likelihood function, Equation 3) is assumed to follow an exponential decay, with the decay constant defined by the expected count = 1  !" , normalized so that the total probability of all (non-zero) percentage counts equals the probability of presence  !" , and is expressed as: The denominator normalizes the integral over the range 0 <  !" ≤ 100.We note that the denominator is missing from the description of Holden et al. ( 2008), although it was correctly implemented in the model itself.The probability of absence ( !" = 0) is given by 1 −  !" .

Species Response Curve (SRC) priors
The SRCs are defined by five parameters: species optimum  !, tolerance  !, tolerance scaling  !, probability of presence at environment optimum  !, and expected abundance (given presence) at environmental optimum  ! .In general we have little a priori knowledge of appropriate SRC values, and require their priors to be uninformative.However, for computational tractability, we need to eliminate implausible parameter values.
Previous applications (Holden et al., 2008;Matthews-Bird et al., 2016) used uniform priors for the SRC parameters over ranges that were chosen largely on the basis of subjective judgment (and ascribing a probability of zero outside of those ranges).However, requiring subjective judgement is not desirable, given our desire to maximize the user-friendliness of the model, and it can be rather time consuming.To prevent subjectivity, BUMPER introduces a simple automated process, based upon an indicative species tolerance  ! that is a characteristic of the training-set: For each species  we consider the environmental variable  ! at each site where the species is present, and derive a Root Mean Square (RMS) distance from the weighted-average optima,  !!" .We average across the  species that have at least two training-set observations ( !≥ 2).The weighted-average optima are defined in the usual way: This metric  ! is used to automate the definition of plausible ranges for species optima and tolerance.The approach we take is analogous to precalibrating the parameters of a physical model (Edwards et al., 2011).We only attempt to rule out implausible parameter values and we ascribe equal probabilities to all plausible parameter combinations.The precalibration only seeks to eliminate SRCs with a probability close to zero, and is thereby designed to avoid the trap of over-constraining the posterior by using the training-set data twice.
We consider ten possible values for the species optima, and four possible values for each of the four other parameters, giving a total of 2560 SRC combinations.The priors for the five SRC parameters are uniform within specified ranges: 1) Species optima  ! are allowed to take values in the range  !"# −  ! to  !"# +  !, where  !"# and  !"# are the extremes of the training-set environments.We allow species optima beyond the sampled environment range, but 2) Species tolerances  ! are allowed to take values in the range 2 ! 3 to 3 ! .Tolerances at the upper limit are presumably unlikely, and in any event would only weakly constrain the reconstruction.Although tolerances below the lower limit may be reasonable, this choice represents a conservative constraint that, for instance, prevents 3) The probability of presence at the environmental optimum  ! is allowed to take values in the range where  !! is the percentage of sites in which the species is present.A value below  !! is clearly unlikely as the probability of presence at the optimum must be at least as high as the probability of presence across the training-set.
The upper limit is less straightforward to define; this choice is discussed further below.
4) The expected abundance (given presence) at the environmental optimum  ! is allowed to take values in the range where  !! is the average count of all non-zero observations across the training-set.A value below  !! is clearly unlikely as the expected abundance at the optimum must be at least as high as the average abundance across the training-set.Again, the choice of upper limit is discussed below.
5) The tolerance scaling  !(Eq.7) is allowed to take values in the range 0.2 to 1.0.Low values ( !≪ 1) are required to represent species that need near-optimum conditions to flourish at high abundance, but can survive even when conditions are far removed from this optimum.Values  !> 1 can be ruled out because this would (nonsensically) describe a species that can flourish at high abundance away from its optimum, but cannot survive there.In Holden et al. ( 2008),  ! was allowed to take values in the range 0.4 to 1.0.Here we reduce the lower limit to 0.2, for reasons discussed below.
While aspects of the motivation for ranges 3-5 are discussed above, other aspects are not clearly defined by logical considerations, these being the upper limits for  ! and  !, and the lower limit for  ! .These ranges were selected empirically on the basis of application of the model to three training-sets, being chironomid-based temperature (Matthews-Bird et al., 2016), diatom-based pH (Stevenson et al., 1991) and pollen-based temperature (Bush et al., in prep).We define the posterior value for a parameter to be the probability-weighted value of that parameter across the 2,560 SRCs of the respective taxon.
The posterior values of  !,  ! and  ! for each taxon were sorted from smallest to largest and are plotted sequentially in Figure 1, for each of the three training-sets.The horizontal grid lines represent the four values that each parameter is allowed to take.(Note that although the SRC parameters can only take one of four discrete values, the probability-weighted average can take any value within the prior range).It is apparent that the full range of allowed values is required to describe the species responses of all taxa in all three training-sets (i.e., because the extremes of the probability-weighted averages approach the extremes of the allowed parameter range).Furthermore it is apparent that a wider range is not required, although there is some suggestion that a slightly higher upper limit for  !could have been allowed for the SWAP data (evidenced by the fact that seven of the 225 species take precisely the upper limit  != 2.5 !! ).

Artificial training-set data
The probabilistic framework of BUMPER is well suited to the generation of artificial training-sets.In part, our motivation for artificial training-sets is to investigate how the characteristics of the assemblage affect the performance of the transfer function.Additionally, an important motivation is to apply BUMPER to a selection of training-sets with different characteristics to demonstrate that the model can be applied to an arbitrary training-set without tuning or user modification.
In all of the analyses that follows (in both Sections 3 and 4) the identical model is applied (although we consider the Consider, for instance on the one hand, that by applying the probabilistic framework to generate the training-set, species responses are forcibly defined to be unimodal and with optima that are independent of other environmental variables. Furthermore, there are no misidentifications, nor any possibility of taphonomic error.On the other hand, it is worth noting that the artificial training-sites are statistically independent; this eliminates potentially over-optimistic performance statistics that can arise from pseudo-replication in real spatial data that have similar assemblage composition because of geography proximity, and not necessarily because they experience the same environment (Telford and Birks, 2005).

Generating artificial training-sets
We generate an artificial training-set for an arbitrary environmental variable, under the following assumptions. i) The training-set sites have an observed environment that is randomly sampled from a uniform distribution over some range  ! to  ! .This choice is arbitrary, we select  != 100,  != 200.The SRC parameters and performance statistics that follow are expressed as a percentage of the environmental gradient  !−  != 100. ii) The taxa have optima  ! that are randomly sampled from a uniform distribution in the range  ! to  ! .
Although BUMPER allows optima beyond the sampled gradient, we here simply assume the range of taxa optima is equal to the environmental gradient ( !=  !,  !=  ! ). iii) The taxa have tolerances  ! that are randomly sampled from a uniform distribution in the range  ! to  ! .The characteristic tolerance of a training-set is likely to impact greatly upon the performance of the transfer function because it drives the species-turnover rate across the environmental gradient. iv) The taxa have probabilities-of-presence-at-optimum  ! that are randomly sampled from a modified exponential with scale parameter  !, defined for 0 ≤  ≤ 1.
The denominator adjusts the exponential so that the cumulative probability is 1 at  = 1.The scale parameter  ! is used to tune the assemblage for species richness.
v) The taxa's tolerance scaling  !(Eq.7) are randomly sampled from a uniform distribution in the range 0.2 to 1.0.
The values of the abundance parameter  ! are drawn from a modified exponential distribution (c.f.Equation 14) with scale parameter  ! .The value of  ! is not an input because an arbitrary choice will lead to an expected total number of counts (i.e., the sum of all species counts within a site) that differs from 100%.An appropriate value of  ! is derived from a Monte-Carlo approach; the expected total count is derived from an exploratory 10,000-site training-set with  != 10% and then  ! is scaled appropriately and the scaled value applied to generate the actual training-set.BUMPER does not model a multinomial probability distribution so that percentage counts generated under this approach are not constrained to sum precisely to 100%, even when the appropriate value of  ! is chosen.A pragmatic approach is taken to address this issue by randomly drawing assemblages, accepting only those with a total count of between 90% and 110% until the required number of sites has been generated.In accepted sites the counts are scaled to total 100%.to 57).The improvement in performance with training-set size is expected as the species optima become more accurately defined.However, there is only a modest improvement in performance as the training-set size increases from 250 to 1000 (sampling density increasing from 14 to 57).These diminishing returns are seen across all three of the assemblage types, and together suggest that a requirement for an average of at least 10 observations per species is a useful indicator for a robustly characterised training-set, although we note that idealised data are presumably easier to characterise than real (noisier) data.
An increase in assemblage richness (by a factor of approximately three) or a decrease in assemblage tolerance (by a factor of two), both approximately halve the WAPLS1 prediction error (RMSEP reductions of between 37% and 52%).We note that while these improvements cannot in general be controlled (they are characteristic of the assemblage), low-richness assemblages are likely to benefit especially from high sample counts (high number of counts per training-set site) which tends to increase the observed species richness.

BUMPER performance and CPU demand
We now consider the performance of BUMPER.The previous applications of BUMPER (to real ecological data) have made two pragmatic assumptions.First,  = 0.5 (Equation 5), blending the abundance likelihood function  !(Equation 3) with the presence/absence likelihood function  !(Equation 4) so that the wide tails contributed by  !allow for the possibility of outlying species counts.Second, only including species that have abundances >2% when performing a reconstruction; while very low counts do contribute information, they are generally less informative than high counts, they increase computational load, and they are likely to be less robust.Neither of these conservative assumptions is necessary for application to the idealized data considered here.However, we choose to retain them for this analysis in order to avoid the risk of overstating the performance statistics (relative to WAPLS) that are likely to be possible in practice.
The crosses on Figure 2 plot the leave-one-out RMSEP of BUMPER.For each of the twelve scenarios we select one training-set, being the first of the randomly generated ensemble members that exhibits a WAPLS1 RMSEP within 0.1 of the ensemble mean.BUMPER performance is generally comparable to WAPLS1 performance.This has previously been found to be the case in SWAP diatoms (Holden et al., 2008) and tropical-Andean chironomids (Matthews-Bird et al., 2016).We note that individual reconstructions (as distinct from the summary statistics of a training-set) can differ significantly between the two approaches (Matthews-Bird et al., 2016), in particular because of the increased sensitivity of the reconstruction to 'low-count' species (that are only ever observed in low abundance).
The CPU demands of BUMPER are dominated by the calculation of the SRC probabilities (i.e., applying Eq. 1 across all training-set sites for each species).This takes 0.022 seconds per species on a standard personal computer (2.5GHzIntel Core

BUMPER uncertainty
The solid lines in Figure 2 plot the BUMPER uncertainty metric ∆ (Equation 8) from the reconstructions described in Section the blended presence/absence likelihood function ( = 0.5, Equation 5).This intentionally broadens the likelihood functions to prevent an outlier, such as a taxon misidentification, from ruling out the correct solution.While this assumption has proved useful for real data, it is unnecessarily conservative for idealised data and is therefore expected to overstate uncertainty.Additional BUMPER models (not illustrated) were built for each training-set with  = 0.0 and including all present species.These models exhibited improved RMSEP relative to the default, reduced by typically 10%.Moreover, these models produced posteriors that better reflect the reconstruction error, with an average uncertainty of 112±17% RMSEP, demonstrating that the uncertainty is meaningfully quantified when the likelihood function is not artificially broadened.
(Nevertheless, we retain the broadened likelihood functions for applications to real ecological data.)

Applications to real data
We consider three training-sets of real data, using three different biological proxies and having different assemblage characteristics, summarized in Table 1.The tropical chironomid (Mean Annual Temperature) dataset of Matthews-Bird et al.
( 2016) is a 59-site species-poor training-set comprised largely of species with narrow environmental tolerances (expressed as a percentage of the training-set environmental gradient).The SWAP diatom (pH) dataset (Stevenson et al., 1991) is a 167site species-rich training-set, significantly larger than the Matthews-Bird dataset, but with broader characteristic tolerances.
The NIMBIOS tropical pollen (mean annual temperature) dataset (Bush et al., in prep.) is a 682-site species-rich training-set that is currently under development and characterized by narrow tolerances.These pollen data are not (in their current state) expected to have the high quality of SWAP, because some taxa are identified to family level whereas other taxa are identified to genus.
We consider four alternative BUMPER models.First, we test the requirement for a 2% abundance threshold (Section 3.2.2) by generating reconstructions with and without this constraint.Removing this constraint is expected to decrease the uncertainty of the reconstruction (all present species are included, thereby narrowing the posterior).We wish to test whether this decreased uncertainty is associated with a reduction in reconstruction error when applied to real data.Secondly, we consider sensitivity to the form of the likelihood function.The default model applies  = 0.5 (Section 2.2).However, in Holden et al. (2008), RMSEP was found to be only weakly dependent upon this parameter so that presence/absence data alone ( = 1.0) contain sufficient information to derive a useful predictive model.Compared with the full model, the presence/absence model is expected to increase robustness at the expense of increased uncertainty, with a broader posterior.
There may be situations where this conservative approach is preferable, for instance when SRCs are poorly constrained in small training-sets, or when misclassification is a concern as the presence/absence model is less sensitive to outliers.The cross-validated performances of the four models ( !> 2% or 0%,  = 0.5 or 1.0) are summarized for each of the three training-sets in Table 2, and are plotted for two models ( = 0.5 or 1.0,  !> 2%) in Figure 3.

Chironomids (Matthews-Bird et al. 2016)
The chironomid training-set of Matthews-Bird et al. (2016) comprises 59 lakes across tropical South America.The prediction uncertainty associated with this dataset (WAPLS RMSEP 2.4°C) is approximately twice the uncertainty that has been achieved for European chironomid transfer functions, which typically approach ~1°C (e.g.Brooks et al., 2012).It has been postulated that this reduced performance is caused by the small size of the training-set and uneven sampling across the environmental gradient (Matthews-Bird et al., 2016).
The default model ( !> 2%,  = 0.5) exhibits an RMSEP of 2.41°C.However, the best model (RMSEP 2.27°C) is the presence/absence model that requires 2% abundance to include a species in a reconstruction.This model version is the most conservative and might therefore be expected to be the least well performing.However, the training-set is small, so the SRCs are not expected to be well defined, and furthermore the number of counts per site is relatively low.These data may therefore favour a more conservative model that does not over-constrain the reconstructions.
When all species are included, the reconstruction uncertainty is reduced as expected, from 12.0% to 11.0% in the default model and from 13.3% to 11.8% in the presence/absence model.However, this is not accompanied by a reduction in RMSEP.The RMSEP increases from 9.7% to 9.8% in the default model and from 9.1% to 9.6% in the presence/absence model.These data support our choice to retain the 2% abundance threshold in the default model.Including very low abundance counts increases the computational demand, does not in general improve model performance, and is likely to under-state the uncertainty associated with the reconstruction.We note for completeness that the default BUMPER model was also applied in Matthews-Bird et al. (2016) and has a RMSEP of 2.37°C.The slight difference in performance here results from the different SRCs priors used (see Section 2.4).
The WAPLS1 transfer functions built from these data exhibits a mean RMSEP of 6.3±0.6% of the environmental gradient.
Although a direct comparison is difficult, given that an improvement in performance is expected under idealized assumptions, these statistics are broadly consistent with those of the real transfer functions (ranging from 9.1% to 9.8%).
We consider the training-set characteristics in order to investigate whether they might explain the larger uncertainties of the Matthews-Bird transfer function when compared with European chironomid transfer functions.First, we increase the idealized-analogue training-set size from 59 lakes to 118 lakes, thereby better defining the species characteristics.This improves the WAPLS1 RMSEP from 6.3±0.6% to 5.9±0.4%.Second, we increase the species richness from 6.7 to 13.5 by doubling the number of species to 118.This improves the WAPLS1 performance to 5.1±0.5%.Although both factors are likely to contribute to explaining the reduced uncertainty of the European transfer functions, in isolation they do not appear sufficient.We consider a specific example.The Norwegian-chironomid training-set of Brooks et al. (2012) comprises 140 taxa sampled from 157 lakes, and was used to construct a WAPLS2 transfer function for July temperature with RMSEP of 1.06°C when four outliers were deleted (Brooks et al., 2012).We note that the Matthews

SWAP (Stevenson et al. 1991)
The SWAP training-set (Stevenson et al., 1991) was developed as part of a substantial scientific effort directed at understanding the impacts of acid rain.Taxonomic workshops resolved problems of nomenclature, splitting and amalgamation of species, and identification criteria (Munro et al., 1990).Approximately 500 counts per sample were made.
Geosci.Model Dev.Discuss., doi:10.5194/gmd-2016-227,2016 Manuscript under review for journal Geosci.Model Dev.Published: 21 September 2016 c Author(s) 2016.CC-BY 3.0 License.BUMPER considers 2,560 possible SRCs for each taxon 1 and applies Bayesian model selection to quantify how well each of the SRCs fits the observations of the training-set.Bayes equation states that the probability that a model (each of the SRCs) is correct, in light of observational data, is proportional to the probability that the data would be observed if the model were correct.Expressed more formally, for each of the j=1,2560 SRCs considered for each taxon k, probability weights are derived from the observed environmental variable  ! and the species count  !" in each training-set site :
Geosci.ModelDev.Discuss., doi:10.5194/gmd-2016-227,2016   Manuscript under review for journal Geosci.Model Dev.Published: 21 September 2016 c Author(s) 2016.CC-BY 3.0 License.sensitivity of the model to two important assumptions).This model internally generates precalibrated priors from the characteristic tolerance  ! and the environmental gradient as described in Section 2.4.The artificial data are generated from the BUMPER probabilistic framework.As a result of generation, the data are idealised and are expected to over-state performance relative to a training-set of real data with otherwise similar characteristics.

i5 processor, 4
GB 1600 MHz DDR3 memory) for a 100-site training-set.The CPU demand scales approximately linearly with number of sites.To illustrate, deriving the leave-one-out cross-validated performance of a training-set of 500 sites with an average species richness of 10, would require on average ~4.99×10×0.022= 1.1 seconds per site (i.e. to derive leaveone-out SRC probabilities for ten species from 499 training-set sites), thereby taking ~9 minutes to cross-validate the entire training-set.
3.2.2.Although the uncertainties broadly reflect the trends in the reconstruction error, uncertainty is significantly overstated, being 151±21% of the RMSEP across the twelve training-sets.This overstatement is expected for idealised data because of Geosci.Model Dev.Discuss., doi:10.5194/gmd-2016-227,2016 Manuscript under review for journal Geosci.Model Dev.Published: 21 September 2016 c Author(s) 2016.CC-BY 3.0 License.

Figure 1 :Figure 2 :
Figure 1: Probability-weighted values of the SRC parameters   (left-hand axis),   (left-hand axis) and   (righthand axis).Three training-sets are considered: chironomid-based temperature (Matthews-Bird et al., 2016), diatombased pH (Stevenson et al., 1991) and pollen-based temperature (Bush et al., in prep).For each SRC parameter, probability-weighted values across for the taxa of each training-set are ordered and plotted sequentially.Horizontal 5 gridlines represent the discrete values allowed within individual SRCs.

Figure 3 .
Figure 3. Cross-validated reconstructions (Eq.7) together with associated uncertainty (Eq.8) are plotted against observed training-set environment for each of the three training-sets (Matthews-Bird et al., 2016; Stevenson et al., 1991; Bush et al., in prep.)We plot the full model ( = .) and the presence/absence model (p/a) ( = .).All 5 analyses impose the constraint that 2% abundance is required for inclusion.
-Bird training-set reconstructs MAT, not July temperature.The Brooks and Birks training-set has an average richness of 24 and an indicative tolerance of 16%.An idealized dataset was built with these characteristics (159 lakes, 140 taxa,  != 55%,  != 11%,  != 21%, yielding an ensemble averaged species richness of 22) and was found to exhibit an WAPLS1 RMSEP of 5.1±0.3%(=0.64°C).When expressed as a percentage of the environmental gradient, modest improvements in performance are apparent, consistent with that expected from increased training -set size and species richness.However, most of the improvement in predictive power is due to tolerance, but expressed in absolute terms.The Matthews-Bird indicative tolerance is 3.2°C.The Brooks and Birks indicative tolerance is 2.1°C.These factors together (increased training-set size, increased species richness and, most significantly, narrower absolute tolerances) appear sufficient to explain the improved performances of the Brooks and Birks transfer function over the Matthews-Bird transfer function.