Large ensemble modeling of the last deglacial retreat of the West Antarctic Ice Sheet : comparison of simple and advanced statistical techniques

A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ∼ 20 000 yr. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation–age data and uplift rates, with an aggregate score computed for each run that measures overall model–data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. The analyses provide sea-level-rise envelopes with well-defined parametric uncertainty bounds, but the simple averaging method only provides robust results with full-factorial parameter sampling in the large ensemble. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree well with the more advanced techniques. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds.


Introduction
Modeling studies of future variability of the Antarctic Ice Sheet have focused to date on the Amundsen Sea Embayment (ASE) sector of West Antarctica, including the Pine Island and Thwaites Glacier basins.These basins are currently undergoing rapid thinning and acceleration, producing the largest Antarctic contribution to sea-level rise (Shepherd et al., 2012;Rignot et al., 2014).The main cause is thought to be increasing oceanic melt below their floating ice shelves, which reduces back pressure on the grounded inland ice (buttressing; Pritchard et al., 2012;Dutrieux et al., 2014).There is a danger of much more drastic grounding-line retreat and sea-level rise in the future, because bed elevations in the Pine Island and Thwaites Glacier basin interiors deepen to depths of a kilometer or more below sea level, potentially allowing marine ice-sheet instability (MISI) due to the strong dependence of ice flux on grounding-line depth (Weertman, 1974;Mercer, 1978;Schoof, 2007;Vaughan, 2008;Rignot et al., 2014;Joughin et al., 2014).
Recent studies have mostly used high-resolution models and/or relatively detailed treatments of ice dynamics (higherorder or full Stokes dynamical equations; Morlighem et al., 2010;Gladstone et al., 2012;Cornford et al., 2013;Parizek et al., 2013;Docquier et al., 2014;Favier et al., 2014;Joughin et al., 2014).Because of this dynamical and topographic detail, models with two horizontal dimensions have been confined spatially to limited regions of the ASE and temporally D. Pollard et al.: Large ensemble modeling of the last deglacial retreat of the West Antarctic Ice Sheet to durations on the order of centuries to 1 millennium.On the one hand, these types of models are desirable because highly resolved bed topography and accurate ice dynamics near the modern grounding line could well be important on timescales of the next few decades to century (references above, and Durand et al., 2011;Favier et al., 2012).On the other hand, the computational run-time demands of these models limit their applicability to small domains and short timescales, and they can only be calibrated against the modern observed state and decadal trends at most.
Here we take an alternate approach, using a relatively coarse-grid ice-sheet model with hybrid dynamics.This allows run durations of several 10 000 yr, so that model parameters can be calibrated against geologic data of major retreat across the continental shelf since the Last Glacial Maximum (LGM) over the last ∼ 20 000 yr.The approach is a tradeoff between (i) less model resolution and dynamical fidelity, which degrades future predictions on the scale of ∼ 10's km and the next few decades (sill-to-sill retreat immediately upstream from modern grounding lines), and (ii) more confidence on larger scales of 100's km and 1000's yr (deeper into the interior basins, further into the future) provided by calibration vs. LGM extents and deglacial retreat of the past 20 000 yr. Also, the approach allows more thorough exploration of uncertain parameter ranges and their interactions, such as sliding coefficients on modern ocean beds, oceanic melting strengths, and different Earth treatments of bedrock deformation.
A substantial body of geologic data is available for the last deglacial retreat in the ASE and other Antarctic sectors.Notably this includes recent reconstructions of groundingline locations over the last 25 kyr by the RAISED Consortium (2014).Other types of data at specific sites include relative sea-level records, cosmogenic elevation-age data, and modern uplift rates (compiled in the RAISED Consortium, 2014;Briggs and Tarasov, 2013;Briggs et al., 2013Briggs et al., , 2014;;Whitehouse et al., 2012a, b).Following several recent Antarctic modeling studies (Briggs et al., 2013, 2014, and Whitehouse et al., 2012a, b, as above;Golledge et al., 2014;Maris et al., 2015), we utilize these data sets in conjunction with large ensembles (LE), i.e., sets of hundreds of simulations over the last deglacial period with systematic variations of selected model parameters.LE studies have also been performed for past variations of the Greenland Ice Sheet, for instance by Applegate et al. (2012) and Stone et al. (2013).
This paper follows on from Chang et al. (2015Chang et al. ( , 2016)), who apply relatively advanced Bayesian statistical techniques to LEs generated by our ice-sheet model.The statistical steps are described in detail in Chang et al. (2015Chang et al. ( , 2016)), and include the following.
-Statistical emulators, used to interpolate results in parameter space, constructed using a new emulation technique based on principal components.
-Probability models, replacing raw square-error modeldata misfits with formal likelihood functions, using a new approach for binary spatial data such as groundingline maps.
-Markov chain Monte Carlo (MCMC) methods, used to produce posterior distributions that are continuous probability density functions of parameter estimates and projected results based on formally combining the information from the above two steps in a Bayesian inferential framework.
Some of these techniques were applied to LE modeling for Greenland in Chang et al. (2014).McNeall et al. (2013) used a Gaussian process emulator, and scoring similar to our simple method, in their study of observational constraints for a Greenland Ice Sheet model ensemble.Tarasov et al. (2012) used artificial neural nets in their LE calibration study of North American ice sheets, and have mentioned their potential application to Antarctica (Briggs and Tarasov, 2013).
Apart from these examples, to our knowledge the statistical techniques in Chang et al. (2015Chang et al. ( , 2016) ) are considerably more advanced than the simpler averaging method used in most previous LE ice-sheet studies; these previous studies have involved i. computing a single objective score for each LE member that measures the misfit between the model simulation and geologic and modern data, and ii.calculating parameter ranges and envelopes of model results by straightforward averaging over all LE members, weighted by the scores.
The more advanced statistical techniques offer substantial advantages over the simple averaging method, such as providing robust and smooth probability density functions in parameter space.For instance, Applegate et al. (2012) and Chang et al. (2014) show that the simple averaging method fails to provide reasonable results for LEs with coarsely spaced Latin HyperCube sampling, whereas for the same LE, the advanced techniques successfully interpolate in parameter space and provide smooth and meaningful probability densities.However, the advanced techniques in Chang et al. (2015Chang et al. ( , 2016) ) require statistical expertise not readily available to most ice-sheet modeling groups.It may be that the simple averaging method still gives reasonable results, especially for LEs with full-factorial sampling, i.e., with every possible combination of selected parameter values (also referred to as a grid or Cartesian product; Urban and Fricker, 2010).The purpose of this paper is to apply both the advanced statistical and simple averaging methods to the same Antarctic LE, compare the results, and thus assess whether the simple (and commonly used) method is a viable alternative to the more advanced techniques, at least for full-factorial LEs.The results include probabilistic ranges of model parameter values, and envelopes of model results such as equivalent sea-level rise.Further types of results related to specific glaciological problems (LGM ice volume, MeltWater Pulse 1A, future retreat) will be presented in Pollard et al. (2016) using the simple-averaging method, and do not modify or extend the comparisons of the two methods in this paper.
Sections 2.1 and 2.2 describe the model, the setup for the last deglacial simulations, and the model parameters chosen for the full-factorial LE.Sections 2.3 to 2.4 describe the objective scoring vs. past and modern data used in the simple averaging method, and Sect.2.5 provides an overview of the advanced statistical techniques.Results are shown for best-fit model parameter ranges and equivalent sea-level envelopes in Sects.3 and 4, comparing simple and advanced techniques.Conclusions and steps for further work are described in Sect. 5.

Ice-sheet model and simulations
The 3-D ice-sheet model has previously been applied to past Antarctic variations in Pollard and DeConto (2009), De-Conto et al. (2012) and Pollard et al. (2015).The model predicts ice thickness and temperature distributions, evolving due to slow deformation under its own weight, and to mass addition and removal (precipitation, basal melt and runoff, oceanic melt, and calving of floating ice).Floating ice shelves and grounding-line migration are included.It uses hybrid ice dynamics and an internal condition on ice velocity at the grounding line (Schoof, 2007).The simplified dynamics (compared to full Stokes or higher-order) captures grounding-line migration reasonably well (Pattyn et al., 2013), while still allowing O(10 000's) yr runs to be feasible.As in many long-term ice-sheet models, bedrock deformation is modeled as an elastic lithospheric plate above local isostatic relaxation.Details of the model formulation are described in Pollard and DeConto (2012a, b).The drastic ice-retreat mechanisms of hydrofracturing and ice-cliff failure proposed in Pollard et al. (2015) are only triggered in warmer-than-present climates and so do not play any role in the glacial-deglacial simulations here; in fact, they are switched off in all runs.Tests show that they play no perceptible role in simulations over the last 40 kyr.
The model is applied to a limited area nested domain spanning all of West Antarctica, with a 20 km grid resolution.Lateral boundary conditions on ice thicknesses and velocities are provided by a previous continental-scale run.The model is run over the last 30 000 yr, initialized appropriately at 30 ka (30 000 yr before present, relative to 1950 AD) from a previous longer-term run.Atmospheric forcing is computed using a modern climatological Antarctic data set (ALBMAP: Le Brocq et al., 2010), with uniform cooling perturbations proportional to a deep-sea core δ 18 O record (as in Pollard andDeConto, 2009, 2012a).Oceanic forcing uses archived ocean temperatures from a global climate model simulation of the last 22 kyr (Liu et al., 2009).Sea-level variations vs. time, which are controlled predominantly by northern hemispheric ice-sheet variations, are prescribed from the ICE-5G data set (Peltier, 2004).Modern bedrock elevations are obtained from the Bedmap2 data set (Fretwell et al., 2013).

Large ensemble and model parameters
The large ensemble analyzed in this study uses full-factorial sampling, i.e., a run for every possible combination of parameter values, with four parameters varied and with each parameter taking five values, requiring 625 (= 5 4 ) runs.As discussed above, results are analyzed in two ways: (1) using the relatively advanced statistical techniques (emulators, likelihood functions, MCMC) in Chang et al. (2015Chang et al. ( , 2016)), and (2) using the much simpler averaging method of calculating an aggregate score for each run that measures modeldata misfit, and computing results as averages over all runs weighted by their score.Because the second method has no means of interpolating results between sparsely separated points in multi-dimensional parameter space, it is effectively limited to full-factorial sampling with only a few parameters and a small number of values each.The small number of values is nevertheless sufficient to span the full reasonable "prior" range for each parameter, and although the results are very coarse with the second method, they show the basic patterns adequately.Furthermore, envelopes of results of all model runs are compared in Appendix D with corresponding data, and demonstrate that the ensemble results do adequately "span" the data; i.e., there are no serious outliers of data far from the range of model results.
The four parameters and their five values are the following.
The four parameters were chosen based on prior experience with the model; each has a strong effect on the results, yet

Individual data types and scoring
Following Whitehouse et al. (2012a, b), Briggs and Tarasov (2013) and Briggs et al. (2013Briggs et al. ( , 2014)), we test the model against three types of data for the modern observed state, and five types of geologic data relevant to ice-sheet variations of the last ∼ 20 000 yr, using straightforward mean squared or root-mean-square misfits in most cases.Each misfit (M i , i = 1 to 8) is normalized into individual scores (S i ), which are then combined into one aggregate score (S) for each member of the LE.Only data within the domain of the model (West Antarctica) are used in the calculation of the misfits.
One approach to calculating misfits and scores is to borrow from Gaussian error distribution concepts, i.e., individual misfits M of the form [(mod − obs)/σ ] 2 and overall scores of the form e −M/s , where mod is a model quantity, obs is a corresponding observation, σ is an observational or scaling uncertainty, M is an average of individual misfits over data sites and types of measurements, and s is another scaling value (Briggs and Tarasov, 2013;Briggs et al., 2014).However, the choice of these forms is somewhat heuristic, and different choices are also appropriate for complex model-data comparisons with widespread data points, very different types of data, and with many model-data error types not being strictly Gaussian.In order to determine the influence of these choices on the results, we compare two approaches: (a) with formulae adhering closely to Gaussian forms throughout, and (b) with some non-Gaussian aspects attempting to provide more straightforward and interpretable scalings between different data types.Both approaches are described fully below (next section, and Appendix B).They yield very similar results, with no significant differences between the two, as shown in Appendix C. The second more heuristic approach (b) is used for results in the main paper.
The eight individual data types and model-data misfits are listed below, with basic information that applies to both of the above approaches.More details are given in Appendix B, including formulae for the two approaches, and "intra-datatype weighting" that is important for closely spaced sites (Briggs and Tarasov, 2013).The two approaches of combining the individual scores into one aggregate score S for the simple averaging method are described in the following Sect.2.4.The more advanced statistical techniques (Chang et al., 2015(Chang et al., , 2016) ) use elements of these calculations but differ fundamentally in some aspects, as outlined in Sect.2.5.
The eight individual data types are the following.
4. TROUGH: past grounding-line distance vs. time along the centerline trough of Pine Island Glacier.Centerline data for the Ross and Weddell basins can also be used, but not in this study.Misfit M 4 : based on model-data differences over the period 20 to 0 ka.Data: RAISED Consortium (2014) (Anderson et al., 2014, for the Ross;Hillenbrand et al., 2014, for the Weddell;Larter et al., 2014, for the Amundsen Sea).

Combination into one aggregate score for a simple averaging method
Each of the misfits above are first transformed into a normalized individual score for each data type i = 1 to 8. The transformations differ for the two approaches mentioned above.Fretwell et al., 2013).Blue and purple areas show expanded grounded-ice extents at 5, 10, 15 and 20 ka (thousands of years before present) reconstructed by the RAISED Consortium ( 2014), plotted using their vertex information (S.Jamieson, personl communication, 2015), and choosing their Scenario A for the Weddell embayment (Hillenbrand et al., 2014).These maps are used in the large ensemble scoring (TOTE, TROUGH and GL2D data types, Sect.2.3).

For approach (a), closely following Gaussian error forms, using misfits M i as described in Appendix B
-For a given data type i, the misfits M i for all runs (1 to 625) are sorted, and normalized using the median value M 50 i , i.e., M i = M i / M 50 i .This is somewhat analogous to the heuristic scaling for overall scores in Briggs et al. (2014, their Sect. 2.3), but applied here to individual types.
-The individual score S i for data type i and each run is set to e −M i .
-The aggregate score for each run is S = S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 , i.e., e − M i .
Of the two approaches, this most closely follows Briggs and Tarasov (2013) and Briggs et al. (2014), except for their interdata-type weighting, which assigns very different weights to the individual types based on spatial and temporal volumes of influence (Briggs and Tarasov, 2013, their Sect. 4.3.2;Briggs et al., 2014, their Sect. 2.2).Here, we assume that each data type is of equal importance to the overall score, and that if any one individual score is very bad (S i ≈ 0), the overall score S should also be ≈ 0. This corresponds to the notion that if any single data type is completely mismatched, the run should be rejected as unrealistic, regardless of the fit to the other data types.The fits to past data, even if more uncertain and sparser than modern, seem equally important to the goal of obtaining the best calibration for future applications with very large departures from modern conditions.

2.4.2
For the more heuristic approach (b), using misfits M i as described in Appendix B -For a given data type i, a "cutoff" value C i is set by taking the geometric mean (i.e., square root of the product) of (i) the minimum (best) misfits M i over all runs 1 to 625, and (ii) the algebraic average of the 10 largest (worst) values.The 10 worst values are used to avoid a single outlier that could be unbounded; the single best value is used because it is bounded by zero, and is not an outlier but represents close-to-the-best possible simulation with the current model.The geometric mean and not the algebraic mean of these two numbers is more appropriate if the values range over many orders of magnitude.
-The normalized misfit M i for data type i and each run is set to M i /C i .We implicitly assume that M i values close to 0 (M i C i ) represent very good simulations of this data type, close to the best possible within the LE.M i values ≥ 1 (M i ≥ C i ) represent very poor simulations, diverging from this data type so much that the run should be rejected no matter what the other scores are.
-The individual score S i for data type i and each run is -The aggregate score for each run is S = (S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 ) 1/8 .
In both approaches, the formulae apply equal weights to the individual data types, and do not use "inter-data-type" weighting (Briggs and Tarasov, 2013;Briggs et al., 2014).
As in (a), if any individual score S i is ≈ 0, then the overall score S is ≈ 0, and the discussion above also applies to approach (b).Both approaches have loose links to the calibration technique in Gladstone et al. ( 2012) and the more formal treatment in McNeall et al. (2013).

Advanced statistical techniques
The more advanced statistical techniques (Chang et al., 2015(Chang et al., , 2016) ) consist of an emulation and a calibration stage, involving the same four model parameters and the 625-member LE as above.The aggregate scores S described in Sect.2.4 are not used at all.The techniques are outlined here; full accounts are given in Chang et al. (2015Chang et al. ( , 2016)).

Emulation phase
Emulation is the statistical approach by which a computer model is approximated by a statistical model.that addresses the complications that arise due to the fact that the data are non-Gaussian.Details are available in Chang et al. (2015Chang et al. ( , 2016)).The emulators provide a statistical model that essentially replaces the data types TOTE, TROUGH and GL2D described in Sect.2.3.In an extension to Chang et al. (2016), Gaussian process emulators are also used here to estimate distributions of individual score values for the five data types TOTI, TOTDH, RSL, ELEV/DSURF and UPL (S 2 , S 3 , S 6 , S 7 , S 8 , approach (b), Sect.2.3 and Appendix B), one emulator per score.Again, emulators are developed for each of the scores by using the Gaussian process machinery and maximum likelihood estimation.

Calibration phase
The calibration stage solves the following problem in a statistically rigorous fashion: given observations and model runs at various parameter settings, which parameters of the model are most likely?In a Bayesian inferential framework, this translates to learning about the posterior probability distribution of the parameter values given all the available computer model runs and observations.The approach may be sketched out as follows.The emulation phase provides a statistical model connecting the parameters to the model output.Suppose it is assumed that the model at a particular (ideal) set of parameter values produces output that resembles the observations of the process.We also allow for measurement error and systematic discrepancies between the computer model and the real physical system.We do this via a "discrepancy function" that simultaneously accounts for both; this is reasonable because both sources of error are important while also being difficult to tease apart.Hence, one can think of our approach as assuming that the observations are modeled as the model output at an ideal parameter setting, added to a discrepancy function.Once we are able to specify a model in this fashion, Bayesian inference provides a a very standard approach to obtain the resulting posterior distribution of the parameters: we start with a prior distribution for the parameters, where we assume that all the values are equally likely before any observations are obtained, and then use Bayes' theorem to find the posterior distribution given the data.The posterior distribution cannot be found in analytical form.Hence, in this second "calibration" stage, posterior densities of the model parameters are inferred via Markov chain Monte Carlo (MCMC).The observation and model quantities used in emulation and calibration consist of the modern and past grounding-line locations, and five individual scores.The discrepancy function is accounted for in assessing model vs. observed groundingline fits in our Bayesian approach.It is based in part on the locations and times in which grounded ice occurs in the model and not in the observations, or vice versa, in 50 % or more of the LE runs (Chang et al., 2015(Chang et al., , 2016)).For the individual scores, we use exponential marginal densities, whose rate parameters receive gamma priors scaled in such a way that the 90th percentile of the prior density coincides with each score's cutoff value C i in Sect.2.4.2.
In the above procedures, observational error enters for the individual scores RSL, ELEV/DSURF and UPL via the calculations described in Appendix B. It is implicitly taken into account by the discrepancy function for grounding-line locations.Observational error is considered to be negligible for modern TOTI and TOTDH scores.

Results: aggregate scores with a simple averaging method
Figure 2 shows the aggregate scores S for all 625 members of the LE, over the 4-D space of the parameters CSHELF, TAUAST, OCFAC and CALV.Each individual subpanel shows TAUAST vs. CSHELF, and the subpanels are arranged left-to-right for varying CALV, and bottom-to-top for varying OCFAC.

"Outer" variations, CALV and OCFAC
All scores with the largest CALV value of 1.7 (right-hand column of subpanels) are 0. In these runs, excessive calving results in very little floating ice shelves and far too much grounding-line retreat.Conversely, with the smallest CALV value of 0.3 (left-hand column of subpanels), most runs have too much floating ice and too advanced grounding lines during the runs, so most of this column also has zero scores.However, small CALV can be partially compensated for by large OCFAC (strong ocean melting), so there are some nonzero scores in the upper-left subpanels.

"Inner" variations, CSHELF and TAUAST
For mid-range CALV and OCFAC (subpanels near the center of the figure), the best scores require high CSHELF (inner x axis) values, i.e., slippery ocean-bed coefficients of 10 −6 to 10 −5 m a −1 Pa −2 .This is the most prominent signal in Fig. 2, and is consistent with the widespread extent of deformable sediments on continental shelves noted above.Ideally the LE should have included CSHELF values greater than 10 −5 .However, we note that values of 10 −5 to 10 −6 have been found to well represent active Siple Coast ice-stream beds in model inversions (Pollard and DeConto, 2012b).Subsequent work with wider CSHELF ranges confirms that values around 10 −5 are in fact optimal, with unrealistic behavior for larger values (Pollard et al., 2016).Somewhat lower but still reasonable scores exist for lower CSHELF values of 10 −7 , but only for higher OCFAC (3 to 10) and smaller TAUAST (1 to 2 kyr).This is of interest because smaller CSHELF values support thicker ice thicknesses at LGM where grounded ice has expanded over continental shelves, producing greater equivalent sea-level lowering and alleviating the LGM "missing-ice" problem (Clark and Tarasov, 2014).In order for the extra ice to be melted by present day, ocean melting needs to be more aggressive (higher OCFAC), and to recover in time from the greater bedrock depression at LGM, TAUAST has to be smaller (more rapid bedrock rebound).This glaciological aspect is explored in Pollard et al. (2016).
Scores are quite insensitive to the TAUAST asthenospheric rebound timescale (inner y axis), although there is a tendency to cluster around 2 to 3 kyr and to disfavor higher values (5 to 7 kyr), especially for high OCFAC.
4 Results: comparisons of simple averaging vs. advanced statistical techniques

Single parameter ranges
The main results seen in Fig. 2 are borne out in Fig. 3.The left-hand panels show results using the simple averaging method, i.e., the average score for all runs in the LE with a particular parameter value.Triangles in these panels show the mean parameter value V m = (S (n) V (n) )/ S (n) , where S (n)  is the aggregate score and V (n) is the value of this parameter for run n (1 to 625), and whiskers show the standard deviation.The prominent signal of high CSHELF values (slippery ocean beds) is evident, along with the absence (near absence) of positive scores for the extreme CALV values of 1.7 (0.3), and the more subtle trends for OCFAC and TAUAST.The right-hand panels of Fig. 3 show the same singleparameter "marginal" probably density functions for this LE, using the advanced statistical techniques described in Chang et al. (2015Chang et al. ( , 2016) ) and summarized above.For OCFAC, CSHELF and TAUAST, there is substantial agreement with the simple-averaging results in both the peak "best-fit" values and the width of the ranges.For CALV, the peak values agree quite well, but the simple-averaging distribution has a significant tail for lower CALV values that is not present in the advanced results; this might be due to the discrepancy function in the advanced method (Sect.2.5), which has no counterpart in the simple averaging method.

Paired parameter ranges
Probability densities for pairs of parameter values are useful in evaluating the quality of LE analysis, and can display offsetting physical processes that together maintain realistic results, e.g., greater OCFAC and lesser CALV (Chang et al., 2014(Chang et al., , 2015(Chang et al., , 2016)).In Fig. 4, the left-hand panels show mean scores for pairs of the four parameters, using the simple averaging method and averaged over all LE runs with a particular pair of values.The right-hand panels show corresponding densities for the same parameter pairs using the advanced statistical techniques.Overall the same encouraging agreement is seen as for the single-parameter densities in Fig. 3, with the locations of the main maxima being roughly the same for each parameter pair.There are some differences in the extents of the maxima, notably for CALV, where the zone of high scores with the simple averaging method extends to lower CALV values than with the advanced techniques, as seen for the individual parameters in Fig. 3.In general, though, there is good agreement between the two methods regarding parameter ranges in Figs. 3 and 4, suggesting that the simple averaging method is viable, at least for LEs with full-factorial sampling of parameter space.

Equivalent sea-level contribution
Figure 5 illustrates the use of the LE to produce past envelopes of model simulations.Figure 5a, b show equivalent sea-level (ESL) scatterplots for all 625 runs.Early in the runs around LGM (20 to 15 ka), the curves cluster into noticeable groups with the same CSHELF values, due to the relatively weak effects of the other parameters (OCFAC, CALV and TAUAST) for cold climates and ice sheets in near equilibrium.Figure 5c, d show the mean and one-sided standard deviations for the simple method.Most of the retreat and sealevel rise occurs between ∼ 14 and 10 ka.Glaciological aspects of the retreat will be discussed in more detail in Pollard et al. (2016).
Figure 5e, f shows the equivalent mean and standard deviations derived from the advanced statistical techniques.There is substantial agreement with the simple-method curves in Fig. 5c, d, for most of the duration of the runs.The largest difference is around the Last Glacial Maximum ∼ 20 to 15 ka, when mean sea levels are nearly ∼ 2 m lower (larger LGM ice volumes) in the simpler method compared to the advanced.This may be due to the simpler method's scores using past 2-D grounding-line reconstructions (data type GL2D), which are not used in the advanced techniques.
Figure 6 shows probability densities of equivalent sealevel rise at particular times in the runs.Figure 6a-d show results with the simple averaging method, computed using }, where S (n) is the aggregate score for run n, ESL (n) (t) is the equivalent sea-level rise for run n at time t, and the sums are over all n (1 to 625) in the LE.Black curves show the one-sided standard deviations, i.e., the root mean square of deviations for ESL (n) above the mean (upper curve) or below the mean (lower curve) at each time t.ESL (n)  score-weighted densities and 0.2 m wide ESL bins (see caption).The uneven noise in this figure is due to the small number of parameter values in our LE.The separate peaks for LGM (−15 000 yr) in Fig. 6a and b are due to the widely separated CSHELF values and the relatively weak effects of the other parameters (OCFAC, CALV and TAUAST) for cold climates and ice sheets in near equilibrium.Figure 6e shows the equivalent but much smoother probability densities using the advanced statistical techniques.All major aspects agree reasonably well with the simple averaging results, and the separate peaks for −15 000 yr are smoothed into a single broad range.

Conclusions and further work
1.The simple averaging method, with quantities weighted by aggregate scores, produces results that are reasonably compatible with relatively sophisticated statistical techniques involving emulation, probability model/likelihood functions, and MCMC (Chang et al., 2015(Chang et al., , 2016;;Sect. 2.5).They are applied to the same LE with full-factorial sampling in parameter space, for which both techniques yield smooth and robust results, and the advanced technique acts as a benchmark against which the simple method can be compared.
Unlike the advanced techniques, the simple averaging method cannot interpolate in parameter space, and so is limited practically to relatively few parameters (four here) and a small number of values for each (five here).Previous work using LEs with Latin Hy-perCube sampling (Applegate et al., 2012;Chang et al., 2014Chang et al., , 2015) ) has shown that the simple averaging method can fail if the sampling is too coarse, whereas the advanced technique provides smooth and meaningful results.This is primarily due to emulation and MCMC in the advanced techniques, which still interpolate successfully in the coarsely sampled parameter space.Of course, this distinction depends on the size of the LE and the coarseness of the sampling; somewhat larger LEs with Latin HyperCube sampling and fewer parameters can be amenable to the simple method.Note that this is not addressed in this paper, where just one full-factorial LE is used.
2. The best-fit parameter ranges deduced from the LE analysis generally fit prior expectations.In particular, the results strongly confirm that large basal sliding coefficients (i.e., slippery beds) are appropriate for modern continental-shelf oceanic areas.In further work we will assess heterogeneous bed properties such as the inner region of hard outcropping basement observed in the ASE (Gohl et al., 2013).The best-fit range for the asthenospheric relaxation timescale TAUAST values is quite broad, including the prior reference value ∼ 3 kyr but extending to shorter times ∼ 1 kyr.This may be connected with low upper-mantle viscosities and thin crustal thicknesses suggested in recent work (Whitehouse et al., 2012b;Chaput et al., 2014), which will be examined in further work with full Earth models (Gomez et al., 2013(Gomez et al., , 2015;;Konrad et al., 2015).
3. The total Antarctic ice amount at the Last Glacial Maximum is equivalent to ∼ 5 to 10 m of global equivalent sea level below modern (Fig. 5).This is consistent with the trend in recent modeling studies (Ritz et al., 2001;Huybrechts, 2002;Philippon et al., 2006;Briggs et al., 2014;Whitehouse et al., 2012a, b;Golledge et al., 2012Golledge et al., , 2013Golledge et al., , 2014)), whose LGM amounts are generally less than in older papers.(Note that Fig. 5 shows contributions only from our limited West Antarctic domain, but as shown in Mackintosh et al., 2011, the contribution from East Antarctica at the LGM is much smaller, ∼ 1 m ESL.)This suggests that Antarctic expansion is insufficient to explain the "missing ice" problem; i.e., the total volume of reconstructed ice sheets worldwide is less than the equivalent fall in sea-level records at that time by 15 to 20 m (Clark and Tarasov, 2014).A subsequent paper (Pollard et al., 2016) examines this glaciological aspect in more detail but does not alter the conclusions here.
4. There are only minor episodes of accelerated West Antarctic Ice Sheet (WAIS) retreat and equivalent sea-level rise in the simulations (Fig. 5), and none with magnitudes comparable to Melt Water Pulse 1A for instance, with ∼ 15 m ESL rise in ∼ 350 yr around ∼ 14.5 ka (Deschamps et al., 2012), in apparent conflict with significant Antarctic contribution implied by sea-level fingerprinting studies (Bassett et al., 2005;Deschamps et al., 2012) and ice-rafted debris (IRD) core analysis (Weber et al., 2014).Model retreat rates are examined in more detail in Pollard et al. (2016), again without altering the findings here.
A natural extension of this work is to extend the Antarctic model simulations and LE methods into the future, using climates and ocean warming following Representative Concentration Pathway scenarios (Meinshausen et al., 2011).In these warmer climates we expect marine ice-sheet instability to occur in WAIS basins, consistent with past retreats simulated in Pollard and DeConto (2009).Also, drastic retreat mechanisms of hydrofracture and ice-cliff failure, not triggered in the colder-than-present simulations of this paper, may play a role, as found for the Pliocene in Pollard et al. (2015).Future applications with simple-average LEs are described in Pollard et al. (2016), and detailed future scenarios with another type of LE are described in DeConto and Pollard (2016).

Appendix A: Model parameters varied in the large ensemble
The four model parameters (OCFAC, CALV, CSHELF and TAUAST) and their ranges in the large ensemble are summarized in Sect.2.2.Their physical effects in the model and associated uncertainties are discussed in more detail here.
OCFAC is the main coefficient in the parameterization of sub-ice-shelf oceanic melt, which is proportional to the square of the difference between nearby water temperature at 400 m and the pressure-melting point of ice.Oceanic melting (or freezing) erodes (or grows on) the base of floating ice shelves, as warm waters at intermediate depths flow into the cavities below the shelves.The resulting ice-shelf thinning reduces pinning points and lateral friction, and thus back stress on grounded interior ice.As mentioned above, recent increases in ocean melt rates are considered to be the main cause of ongoing downdraw and acceleration of interior ice in the ASE sector of WAIS (Pritchard et al., 2012;Dutrieux et al., 2014).High-resolution dynamical ocean models (Hellmer et al., 2012) are not yet practical on these timescales, and simple parameterizations of sub-iceshelf melting such as the one used here are quite uncertain (e.g., Holland et al., 2008).For small (large) OCFAC values, oceanic melting is reduced (increased), ice shelves thicken (thin), discharge of interior ice across the grounding line decreases (increases), and grounding lines tend to advance (retreat).
CALV is the main factor in the parameterization of iceberg calving at the oceanic edges of floating shelves.Calving has important effects on ice-shelf extent with strong feedback effects via buttressing of interior ice.However, the processes controlling calving are not well understood, probably depending on a combination of pre-existing fracture regime, large-scale stresses, and hydrofracturing by surface meltwater.There is little consensus on calving parameterizations.We use a common approach based on parameterized crevasse depths and their ratio to ice thickness (Benn et al., 2007;Nick et al., 2010).For small (large) CALV, calving is decreased (increased), producing more (less) extensive floating shelves, and greater (lesser) buttressing of interior ice.
CSHELF is the basal sliding coefficient for ice grounded on areas that are ocean bed today (and is not frozen to the bed).Coefficients under modern grounded ice are deduced by inverse methods (Pollard and DeConto, 2012b;Morlighem et al., 2013), but they are relatively unconstrained for modern oceanic beds, across which grounded ice advanced at the Last Glacial Maximum ∼ 20 to 15 ka.Most oceanic beds around Antarctica are covered in deformable sediment today, due to Holocene marine sedimentation, and to earlier transport and deposition of till by previous ice advances.For these regions, coefficients are expected to be relatively high (i.e., slippery bed), but there is still a plausible range that has significant effects on model results, because it strongly controls the steepness of the ice-sheet surface profile and ice thicknesses, and thus the sensitivity to climate change.In this paper, we vary the sliding coefficient CSHELF uniformly for all modernoceanic areas.(In further work, we will allow for heterogeneity such as the hard crystalline bedrock zone observed in the inner Amundsen Sea Embayment; Gohl et al., 2013).
TAUAST is the e-folding time of asthenosephic relaxation in the bedrock model component.Ice-sheet evolution on long timescales is affected quite strongly by the bedrock response to varying ice loads, especially for marine ice sheets in contact with the ocean where bathymetry determines groundingline depths.During deglacial retreat, the bedrock rebounds upwards due to reduced ice load, which slows down ice retreat due to shallower grounding-line depths and less discharge of interior ice.However, the O(10 3 ) yr lag in this process is important in reducing this negative feedback, and accelerates the positive feedback of marine ice-sheet instability if the bed deepens into the ice-sheet interior.As in many large-scale ice-sheet models, our bedrock response is represented by a simple Earth model consisting of an elastic plate over a local e-folding relaxation towards isostatic equilibrium (elastic lithosphere relaxing asthenosphere).Based on more sophisticated global Earth models, the asthenospheric e-folding timescale is commonly set to 3 kyr (e.g., Gomez et al., 2013), but note that recent geophysical studies suggest considerably shorter timescales for some West Antarctic regions (Whitehouse et al., 2012b;Chaput et al., 2014).In further work we plan to perform large ensembles with the icesheet model coupled to a full Earth model, extending Gomez et al. (2013Gomez et al. ( , 2015)).

Appendix B: Data types and individual misfits
The eight types of modern and past data used in evaluating the model simulations are summarized in Sect.2.3.More details on the algorithms used to compute the individual mismatches M 1 to M 8 with model quantities are given below.The term "domain" refers to the nested model grid that spans all of West Antarctica, and we only compare with observational sites and data within this domain.Modern observed data are from the Bedmap2 data set (Fretwell et al., 2013).
As discussed in Sects.2.3 and 2.4, we use 2 approaches in scoring: (a) more closely following Gaussian error forms, and (b) with more heuristic forms.Some of the algorithms for individual misfits differ between the two, as indicated by bullets (a) and (b) below.For most data types, approach (a) uses mean-square errors, and (b) uses root-mean-square errors.For some data types, the errors are normalized not by observational uncertainty, but by an "acceptable model error magnitude" representing typical model departures from observations in reasonably realistic runs, if this is larger than observational error.Note that if this scaling uncertainty is the same for all data of a given type, it cancels out in the normalization of individual misfits (M i to M i in Sect.2.4), so has no effect on the further results.

Modern grounding-line locations.
A = total area of mismatch where model is grounded and observed is floating ice or ocean, or vice versa.A tot = total area of the domain.
Approach (a): Misfit M 1 = (A /B) 2 , where B = (A tot ) 1/2 σ w .Here B is the product of the linear domain size, and σ w = 30 km representing the typical size of modern grounding-line location errors in "reasonable" model runs.

B2 TOTI
Modern floating ice-shelf locations.A = total area of mismatch where model has floating ice and observed does not, or vice versa.A tot = total area of the domain.
Approach (a): Here B is the product of the linear domain size, and σ w = 30 km representing the typical size of modern floating-ice extent errors in "reasonable" model runs.

B3 TOTDH
Modern grounded ice thicknesses.Approach (a): Misfit M 3 is the mean of ((h − h obs )/σ h ) 2 , where h is model ice thickness, h obs is observed ice thick-ness, and σ h = 10 m represents the typical size of modern ice thickness errors in "reasonable" model runs.The mean is taken over areas with observed modern grounded ice.
Approach (b): Misfit M 3 is the root mean square of (h − h obs ), over areas with observed modern grounded ice.

B4 TROUGH
Past grounding-line distance vs. time along centerline troughs of Pine Island Glacier, and optionally the Ross and Weddell basins.Observed distances at ages 20, 15, 10 and 5 ka are obtained from grounding-line reconstructions of the RAISED Consortium (2014): Anderson et al. (2014) for the Ross; Larter et al. (2014) for the Amundsen Sea, and Hillenbrand et al. ( 2014) for the Weddell, using their Scenario A of most retreated Weddell ice.Distances are then linearly interpolated in time between these dates.The centerline trough for Pine Island Glacier is extended across the continental shelf following the paleo-ice-stream trough shown in Jakobsson et al. (2011).The resulting Pine Island Glacier transect vs. time is similar to that in Smith et al. (2014).
Approach (a): Misfit M 4 is the mean of ((x − x obs )/σ x ) 2 , where x is model grounding-line position on the transect at a given time, x obs is the reconstructed position, and σ x = 30 km represents a typical difference in "reasonable" model runs, and is also midway between "measured" and "inferred" uncertainties in the reconstructed data (RAISED Consortium, 2014).The mean is taken over the period 20 to 0 ka.Approach (b): Misfit M 4 is the root-mean-square of (x − x obs ), over the period 20 to 0 ka.
In this study just the Pine Island Glacier trough is used, but if the Ross and Weddell are used also, the means are taken over all three troughs.

B5 GL2D
Past grounding-line locations.This uses reconstructed grounding-line maps for 20, 15, 10, and 5 ka by the RAISED Consortium (2014; Anderson et al., 2014;Hillenbrand et al., 2014;Larter et al., 2014;Mackintosh et al., 2014;O Cofaigh et al., 2014), with vertices provided by S. Jamieson, personal communication, 2015, and choosing their Scenario A for the Weddell embayment (Hillenbrand et al., 2014).The modern grounding line (0 ka) is derived from the Bedmap2 data set (Fretwell et al., 2013).For this study only the Amundsen Sea region is considered.We allow for uncertainty in the past reconstructions by setting a probability of reconstructed floating ice or open ocean at each point P obs as follows.
i. Computing the distance D 1 from the reconstructed grounding line.
ii. Dividing this distance by the sum D 2 of the (Kriged) reported uncertainty of nearby vertices (interpreting their "measured" = 10 km, "inferred" = 50 km, "speculative" = 100 km) and a distance that ramps up to iii.Setting the probability P obs to a value decaying upwards or downwards from 0.5, i.e., to 0.5 e −D s if on the grounded side of the grounding line, or to 1-0.5 e −D s if on the non-grounded side.
Then the "mismatch probability" P mis at each model grid point is set to 2 (0.5 − P obs ) if P obs < 0.5 and the model is not grounded, or 2 (P obs − 0.5) if P obs > 0.5 and the model is grounded.P mis is zero if the model is not grounded anywhere on the non-grounded side of the observed grounding line, or if it is grounded anywhere on the grounded side.Thus, if the model and observed grounding lines coincide exactly everywhere, then P mis is zero at all points, regardless of the observational uncertainty reflected in P obs (which seems a desirable feature).Approach (a): Misfit M 5 is the mean of the squared mismatch probabilities (P mis ) 2 , with means computed over 3 separate subdomains: Ross Sea, Amundsen Sea, and Weddell Sea embayments (defined crudely by intervals of longitude: 150 • E to 120 • W, 120 • W to 90 • W, 90 • W to 0, respectively).In this study we only use the mean for the Amundsen Sea sector.Similarly to TOTE and TOTI, the areal mean is increased by a factor (A tot ) 1/2 /σ w , where A tot is the total subdomain area and σ w = 100 km is a representative width scale of reasonable past grounding-zone mismatches.Finally, the mean values for each of the reconstructed past times (20, 15, 10 and 5 ka) are averaged together equally.
Approach (b): Misfit M 5 is the mean of P mis over the Amundsen Sea sector subdomain, with no adjustment factor to A tot , and otherwise as for (a) above.

B6 RSL
Past relative sea-level (RSL) records.This uses the compilation by Briggs and Tarasov (2013) of published RSL data vs. time at sites close to the modern coastline.Following those authors, the model RSL where SL(t) is global sea level (with t = 0 at modern) and h b is bed elevation, at the closest model grid point to the observed site.The minimum model-minus-observed difference δRSL for each observed datum is used, i.e., the minimum elevation difference value over all model times within the range of the observational time uncertainty (t obs ± σ to ).
Approach (a): Misfit M 6 is the weighted mean of (δRSL/σ zo ) 2 , where σ zo is the observational RSL uncertainty.Just as in Briggs and Tarasov (2013), the default for σ zo is much larger for one-sided constraints (50 m) than absolute constraints (2 m).To reduce the influence of many nearby (and presumably correlated) data, we closely follow Briggs and Tarasov (2013) and apply "intra-data-type weighting" in calculating the mean.The weights are inversely proportional to the number of measurements within a distance L of each other, where L is equivalent to 5 • latitude (∼ 550 km), so that each ∼ L-sized cluster of data contributes ∼ equally to the overall mean.
Approach (b): Misfit M 6 is the weighted mean of max [0, |RSL|−σ zo ].The uncertainties σ zo and the intra-data-type weights are the same as in (a).

B7 ELEV/DSURF
This uses a combination of two compilations of cosmogenic data: elevation vs. age in Briggs and Tarasov (2013) for ELEV, and thickness change from modern vs. age in RAISED Consortium (2014) (with individual citations as above) for DSURF.
For ELEV, the calculations closely follow Briggs and Tarasov (2013, their Sect. 4.2): i. a time series of a model ice surface is used, with sealevel and bedrock elevation changes subtracted out, for the closest model grid point to each ELEV datum.
ii.Only model elevations with a "deglaciating trend" are used; i.e., the model elevation for each time is replaced by the maximum elevation between that time and the present, if the latter is greater, allowing for an uncertainty h = √ 2σ h , as in Briggs and Tarasov (2013).
iii.The mismatch for each datum is the minimum of (δh/σ h ) 2 + (δt/σ t ) 2 over the time series, where δh is the elevation difference from observed and δt is the time difference, σ h = [σ 2 hobs + (100 m) 2 ] 1/2 , and σ hobs and σ t are the observational uncertainties in elevation and time, respectively.
Approach (a): Misfit M 7 is the weighted mean of the mismatches for ELEV above, with intra-data-type weighting exactly as described for RSL above.The DSURF type is not used in approach (a).
Approach (b): for approach (b), ELEV calculations as above are combined with DSURF calculations.
The DSURF calculations are simpler: for each datum, the time series of model surface elevations h s at the closest model grid point is used.The minimum model-minusobserved difference δh min s is found, i.e., the minimum difference over all model times within the range of the observational time uncertainty (t obs ± σ to ).The mismatch for the datum is max [0, δh min s −σ h ] where σ h is the observational elevation uncertainty.The mean over all data is taken, weighted by intra-data-type weighting as described above.Finally, the ELEV and DSURF misfits are converted into separate normalized scores (S 7a , S 7b ) as in Sect.2.4.2, which are then combined into one individual score S 7 = (S 7a S 7b ) 1/2 .

B8 UPL
This uses modern uplift rates on rock outcrops, using the compilation in Whitehouse et al. (2012b).For each observed site, the model's modern ∂h b /∂t at the closest model grid point is used.
Approach (a): the mismatch at each datum is [(U mod − U obs )/σ uobs ] 2 , where U mod and U obs are model and observed uplift rates, respectively, and σ uobs is the observed 1-σ uncertainty.The misfit M 8 is the mean over all data points, using intra-data-type weighting as above.
Approach (b): the mismatch at each datum is (U mod − U obs ) 2 , and the misfit M 8 is the root-mean square over all data points, with no intra-data-type weighting (justified by the relatively uniform distribution of data points).

Appendix D: Span of data by the large ensemble
This appendix compares envelopes of model results with corresponding types of geologic data used in the LE scoring.The main goal is to demonstrate that the envelopes of the 625-member ensemble adequately span the data; i.e., at least some runs yield results that fall on both sides of each type of data, so that ensemble averages may potentially represent reasonably realistic ice-sheet behavior (even if no single model run is close to all data types).
For modern data (grounded and floating ice extents, grounded ice thicknesses), the standard model has previously been shown to yield quite realistic simulations, both for perpetual modern climate and at the end of long-term glacialinterglacial runs (Pollard and DeConto, 2012a).Modern grounded ice thicknesses are close to observed mainly because of the inverse procedure in specifying the distribution of basal sliding coefficients (Pollard and DeConto, 2012b).
Here we concentrate on fits to geologic data.
Figure D1 compares scatterplots of relative sea level in all 625 runs with RSL records, for the three sites within the model's West Antarctic domain (Briggs and Tarasov, 2013).The data for each site fall well within the overall model envelope, and in most cases within the envelopes of the top 120 scoring runs (colored curves).Similar comparisons for single runs are shown in Gomez et al. (2013), both using the simple bedrock model as here (their "uncoupled" runs), and coupled to a global Earth-sea-level model.
Similarly, Fig. D2 compares elevation vs. age time series for all 625 runs with cosmogenic data at the 18 sites within the model domain (Briggs and Tarasov, 2013).With a few exceptions, the data lie within the LE model envelopes, although elevations at many of the sites are lower than in most of the model runs.At Reedy Glacier, the model exhibits oscillations of ∼ 200 m amplitude and several hundred year period; these might be due to internal variability of ice streams as seen elsewhere in West Antarctica in Pollard and De-Conto (2009).
Figure D3 shows modern uplift rates for all model runs, at the 26 sites in the Whitehouse et al. (2012b) compilation that lie within the mode domain.Again, nearly all of the observed values lie within the overall model envelope.The geographic distribution for single runs is compared with that observed in Gomez et al. (2013), both using a simple bedrock model ("uncoupled") and coupled to a global Earth-sea-level model.
The remaining past data types (GL2D and TROUGH) concern grounding-line locations during the last deglacial retreat, and are less amenable to scatterplots, but can be compared with model averaged results.Figure D4 shows maps of probability (0-1) of the presence of grounded ice at particular times, deduced by score-weighted averages over the ensemble.The thick black lines at 20, 15, 10 and 5 ka show grounding-line positions in the reconstructions of the RAISED Consortium ( 2014).(The figures do not show the uncertainty information associated with the data, which is used in the scoring; Appendix B.) At all of these times, the envelopes of the model "grounding zone", i.e., the areas with intermediate probability values, span or are close to the observed positions.
Similarly, Fig. D5 shows model probabilities (0-1) of grounded ice vs. time along the centerline transects of the major West Antarctic embayments.Again, the model envelopes mostly span the various observed estimates for each transect (from RAISED Consortium, 2014, and various earlier studies).
Taken together, the various model vs. data comparisons in this appendix show that the model's ensemble envelopes do encompass the ranges of data satisfactorily, as necessary for meaningful interpretations of the statistical results.      . Jamieson, personal communication, 2015), and choosing their Scenario A for the Weddell embayment (Hillenbrand et al., 2014).For 20 and 15 ka around East Antarctica, the black line is from the 20 ka RAISED time slice that for the EAIS is based on Livingstone et al. (2012) and Mackintosh et al. (2014).Similarly, the modern grounding line (Fretwell et al., 2013) is shown by a thick black line for 0 ka, which is also used around East Antarctica for 10 and 5 ka.

Figure 1 .
Figure 1.Geographical map of West Antarctica.Light yellow shows the modern extent of grounded ice (using Bedmap2 data;Fretwell et al., 2013).Blue and purple areas show expanded grounded-ice extents at 5, 10, 15 and 20 ka (thousands of years before present) reconstructed by the RAISED Consortium (2014), plotted using their vertex information (S.Jamieson, personl communication, 2015), and choosing their Scenario A for the Weddell embayment(Hillenbrand et al., 2014).These maps are used in the large ensemble scoring (TOTE, TROUGH and GL2D data types, Sect.2.3).

Figure 2 .
Figure 2. Aggregate scores for the complete large ensemble suite of runs (625 runs, 4 model parameters, 5 values each, Sect.2.2), used in the simple method with score-weighted averaging.The score values range from 0 (white, no skill) to 100 (dark red, perfect fit).The figure is organized to show the scores in the 4-D space of parameter variations.The four parameters are CSHELF = basal sliding coefficient in modern oceanic areas (exponent x, 10 −x m a −1 Pa −2 ), TAUAST = e-folding time of bedrock-elevation isostatic relaxation (kyr), OCFAC = oceanic-melt-rate coefficient at the base of floating ice shelves (non-dimensional) and CALV = calving-rate factor at the edge of floating ice shelves (non-dimensional).Since each parameter only takes five values, the results are blocky, but effectively show the behavior of the score over the full range of plausible parameter values.

Figure 3 .
Figure 3. (Left) Ensemble-mean scores for individual parameter values, using the simple averaging method.The red triangle shows the mean, and whiskers show the 1-sigma standard deviations.(Right) Probability densities for individual parameters, using the advanced statistical techniques in Chang et al. (2016) extended as described in Sect.2.5.

Figure 4 .
Figure 4. (Left) Ensemble-mean scores for pairs of parameters, using the simple averaging method.(Right) Probability densities for pairs of parameters, using the advanced statistical techniques in Chang et al. (2016) extended as described in Sect.2.5.

Figure 5 .
Figure 5. Equivalent global-mean sea-level contribution (ESL) relative to modern vs. time.Time runs from 20 000 yr before present to modern.ESL changes are calculated from the total ice amount in the domain divided by global ocean area, allowing for less contribution from ice grounded below sea level.(a) Scatterplot of all 625 individual runs in the LE.ESL amounts are calculated relative to modern observed Antarctica, so non-zero values at time = 0 imply departures from the observed ice state.Grey curves are for runs with the aggregate score S equal to or very close to 0, and colored curves are for the 120 top-scoring runs in descending S order with 20 curves per color (red, orange, yellow, green, cyan, blue).The best scoring individual run is shown by a thick black curve (OCFAC = 3, CALV = 1, CSHELF = −5, TAUAST = 3, with S = 0.571).(b) As (a) but with ESL amounts relative to each run's modern value, so the curves pass exactly through zero at time = 0. (c) Score-weighted curves over the whole LE, using the simple statistical method.Red curve is the score-weighted mean, i.e., {S (n) ESL (n) (t)}/ {S (n)}, where S(n) is the aggregate score for run n, ESL (n) (t) is the equivalent sea-level rise for run n at time t, and the sums are over all n (1 to 625) in the LE.Black curves show the one-sided standard deviations, i.e., the root mean square of deviations for ESL(n) above the mean (upper curve) or below the mean (lower curve) at each time t.ESL(n) (t) are relative to modern observed Antarctica, as in (a).(d) As (c) but with ESL (n) (t) relative to each run's modern value as in (b).(e, f) Corresponding results to (c) and (d), respectively, using the advanced statistical techniques in Chang et al. (2016) extended as described in Sect.2.5.
(t) are relative to modern observed Antarctica, as in (a).(d) As (c) but with ESL (n) (t) relative to each run's modern value as in (b).(e, f) Corresponding results to (c) and (d), respectively, using the advanced statistical techniques in Chang et al. (2016) extended as described in Sect.2.5.

Figure 6 .
Figure 6.(a) Probability densities of equivalent sea-level (ESL) rise at particular times in the LE simulations, computed with the simple averaging method.At a given time t, the density P (E) is the sum of aggregate scores S (n) for all runs n with equivalent sea-level rise ESL (n) (t) within the bin E − 0.1 to E + 0.1 m, i.e., using equispaced bins 0.2 m wide.The resulting P (E) are normalized so that the integral with respect to E is 1.ESL (n) (t) are relative to modern observed Antarctica, as in Fig. 5a.(b) As (a) but with ESL (n) (t) relative to each run's modern value, as in Fig. 5b.(c, d) Corresponding results to (a) and (b), respectively, using the advanced statistical techniques in Chang et al. (2016) extended as described in Sect.2.5.

Figure C4 .
Figure C4.Equivalent global-mean sea-level contribution (ESL) relative to modern vs. time as in Fig. 5. (a) Scatterplot of all 625 individual runs in the LE, using close-to-Gaussian scoring approach (a) (Sect.2.4, Appendix B).(b) As (a) except using the more heuristic approach (b) (Sect.2.4, Appendix B), exactly as in Fig. 5. (c) Score-weighted mean and one-sided standard deviations, using close-to-Gaussian scoring approach (a).(d) As (c) except using the more heuristic approach (b), exactly as in Fig. 5.

Figure D1 .
Figure D1.Model vs. observed relative sea-level (RSL) data, for the three RSL sites (Briggs and Tarasov, 2013) that lie within and away from the edges of the model's West Antarctic domain.The observations and uncertainty ranges are shown as black dots and whiskers.Model curves are shown for all 625 runs, with aggregate scores S indicated by colors as in Fig. 5.The run with the best individual score for each site is shown as a thick black line, and the run with the best aggregate score S is shown as a thick blue line.(a) Southern Scott Coast, ∼ 77.3 • S, 163.6 • E. (b) Terra Nova Bay, ∼ 74.9 • N, 163.8 • E. (c) Marguerite Bay, ∼ 67.7 • S, 67.3 • W.

Figure D4 .
Figure D4.Score-weighted probability (0 to 1) of grounded ice vs. floating ice or open ocean at each grid point (see text), for various times over the last 20 000 yr, concentrating on the period of rapid retreat between 15 and 10 ka.The LE and model version is essentially the same as above, except with all-Antarctic coverage to include East Antarctic variations.The quantity shown is the sum of scores S(n) for runs n with grounded ice at each grid point and time, divided by the sum of scores for all runs in the ensemble.Thick black lines in the panels for 20, 15, 10 and 5 ka show grounding lines reconstructed for West Antarctica by the RAISED Consortium (2014), plotted using their vertex information(S.Jamieson, personal communication, 2015), and choosing their Scenario A for the Weddell embayment(Hillenbrand et al., 2014).For 20 and 15 ka around East Antarctica, the black line is from the 20 ka RAISED time slice that for the EAIS is based onLivingstone et al. (2012) andMackintosh et al. (2014).Similarly, the modern grounding line(Fretwell et al., 2013) is shown by a thick black line for 0 ka, which is also used around East Antarctica for 10 and 5 ka.

www.geosci-model-dev.net/9/1697/2016/ Geosci. Model Dev., 9, 1697-1723, 2016 D. Pollard et al.: Large ensemble modeling of the last deglacial retreat of the West Antarctic Ice Sheet
by running the model at many parameter settings and then "fitting" a Gaussian process model to the input-output combinations, analogous to fitting a regression model that relates independent variables (parameters) to dependent variables (model output) in order to make predictions of the dependent variable at new values of the independent variables.Of course, unlike basic regression, the model output may itself be multivariate.An emulator is useful because (i) it provides a computationally inexpensive method for approximating the output of a computer model at any parameter setting without having to actually run the model each time, and (ii) it provides a statistical model relating parameter values to computer model output -this means the approximations automatically include uncertainties, with larger uncertainties at parameter settings that are far from parameter values where the computer model has already been run.Specifically, the model output consisting of (i) modern grounding-line maps, and (ii) past locations of grounding lines vs. time along the centerline trough of Pine Island, This statistical www.geosci-model-dev.net/9/1697/2016/Geosci.Model Dev., 9, 1697-1723, 2016 approximation is obtained are first reduced in dimensionality by computing principal components (PCs) over all LE runs.(Principal components are often referred to in the atmospheric science literature as empirical orthogonal functions or EOFs.)The first 10 PCs are used for modern maps, and the first 20 for past trough locations.Hence, we develop a Gaussian process emulator for each of the above PCs.Gaussian process emulators work especially well for model outputs that are scalars.The emulators are "fitted" to the PCs using a maximum likelihood estimation-based approach developed in Chang et al. (2015)