A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the
West Antarctic Ice Sheet over the last

Modeling studies of future variability of the Antarctic Ice Sheet have focused to date on the Amundsen Sea Embayment (ASE) sector of West Antarctica, including the Pine Island and Thwaites Glacier basins. These basins are currently undergoing rapid thinning and acceleration, producing the largest Antarctic contribution to sea-level rise (Shepherd et al., 2012; Rignot et al., 2014). The main cause is thought to be increasing oceanic melt below their floating ice shelves, which reduces back pressure on the grounded inland ice (buttressing; Pritchard et al., 2012; Dutrieux et al., 2014). There is a danger of much more drastic grounding-line retreat and sea-level rise in the future, because bed elevations in the Pine Island and Thwaites Glacier basin interiors deepen to depths of a kilometer or more below sea level, potentially allowing marine ice-sheet instability (MISI) due to the strong dependence of ice flux on grounding-line depth (Weertman, 1974; Mercer, 1978; Schoof, 2007; Vaughan, 2008; Rignot et al., 2014; Joughin et al., 2014).

Recent studies have mostly used high-resolution models and/or relatively detailed treatments of ice dynamics (higher-order or full Stokes dynamical equations; Morlighem et al., 2010; Gladstone et al., 2012; Cornford et al., 2013; Parizek et al., 2013; Docquier et al., 2014; Favier et al., 2014; Joughin et al., 2014). Because of this dynamical and topographic detail, models with two horizontal dimensions have been confined spatially to limited regions of the ASE and temporally to durations on the order of centuries to 1 millennium. On the one hand, these types of models are desirable because highly resolved bed topography and accurate ice dynamics near the modern grounding line could well be important on timescales of the next few decades to century (references above, and Durand et al., 2011; Favier et al., 2012). On the other hand, the computational run-time demands of these models limit their applicability to small domains and short timescales, and they can only be calibrated against the modern observed state and decadal trends at most.

Here we take an alternate approach, using a relatively coarse-grid ice-sheet
model with hybrid dynamics. This allows run durations of several 10 000 yr,
so that model parameters can be calibrated against geologic data of major
retreat across the continental shelf since the Last Glacial Maximum (LGM)
over the last

A substantial body of geologic data is available for the last deglacial retreat in the ASE and other Antarctic sectors. Notably this includes recent reconstructions of grounding-line locations over the last 25 kyr by the RAISED Consortium (2014). Other types of data at specific sites include relative sea-level records, cosmogenic elevation–age data, and modern uplift rates (compiled in the RAISED Consortium, 2014; Briggs and Tarasov, 2013; Briggs et al., 2013, 2014; Whitehouse et al., 2012a, b). Following several recent Antarctic modeling studies (Briggs et al., 2013, 2014, and Whitehouse et al., 2012a, b, as above; Golledge et al., 2014; Maris et al., 2015), we utilize these data sets in conjunction with large ensembles (LE), i.e., sets of hundreds of simulations over the last deglacial period with systematic variations of selected model parameters. LE studies have also been performed for past variations of the Greenland Ice Sheet, for instance by Applegate et al. (2012) and Stone et al. (2013).

This paper follows on from Chang et al. (2015, 2016), who apply relatively
advanced Bayesian statistical techniques to LEs generated by our ice-sheet
model. The statistical steps are described in detail in Chang et al. (2015,
2016), and include the following.

Statistical emulators, used to interpolate results in parameter space, constructed using a new emulation technique based on principal components.

Probability models, replacing raw square-error model–data misfits with formal likelihood functions, using a new approach for binary spatial data such as grounding-line maps.

Markov chain Monte Carlo (MCMC) methods, used to produce posterior distributions that are continuous probability density functions of parameter estimates and projected results based on formally combining the information from the above two steps in a Bayesian inferential framework.

computing a single objective score for each LE member that measures the misfit between the model simulation and geologic and modern data, and

calculating parameter ranges and envelopes of model results by straightforward averaging over all LE members, weighted by the scores.

However, the advanced techniques in Chang et al. (2015, 2016) require statistical expertise not readily available to most ice-sheet modeling groups. It may be that the simple averaging method still gives reasonable results, especially for LEs with full-factorial sampling, i.e., with every possible combination of selected parameter values (also referred to as a grid or Cartesian product; Urban and Fricker, 2010). The purpose of this paper is to apply both the advanced statistical and simple averaging methods to the same Antarctic LE, compare the results, and thus assess whether the simple (and commonly used) method is a viable alternative to the more advanced techniques, at least for full-factorial LEs. The results include probabilistic ranges of model parameter values, and envelopes of model results such as equivalent sea-level rise. Further types of results related to specific glaciological problems (LGM ice volume, MeltWater Pulse 1A, future retreat) will be presented in Pollard et al. (2016) using the simple-averaging method, and do not modify or extend the comparisons of the two methods in this paper.

Sections 2.1 and 2.2 describe the model, the setup for the last deglacial simulations, and the model parameters chosen for the full-factorial LE. Sections 2.3 to 2.4 describe the objective scoring vs. past and modern data used in the simple averaging method, and Sect. 2.5 provides an overview of the advanced statistical techniques. Results are shown for best-fit model parameter ranges and equivalent sea-level envelopes in Sects. 3 and 4, comparing simple and advanced techniques. Conclusions and steps for further work are described in Sect. 5.

The 3-D ice-sheet model has previously been applied to past Antarctic
variations in Pollard and DeConto (2009), DeConto et al. (2012) and Pollard
et al. (2015). The model predicts ice thickness and temperature
distributions, evolving due to slow deformation under its own weight, and to
mass addition and removal (precipitation, basal melt and runoff, oceanic
melt, and calving of floating ice). Floating ice shelves and grounding-line
migration are included. It uses hybrid ice dynamics and an internal condition
on ice velocity at the grounding line (Schoof, 2007). The simplified dynamics
(compared to full Stokes or higher-order) captures grounding-line migration
reasonably well (Pattyn et al., 2013), while still allowing

The model is applied to a limited area nested domain spanning all of West
Antarctica, with a 20 km grid resolution. Lateral boundary conditions on ice
thicknesses and velocities are provided by a previous continental-scale run.
The model is run over the last 30 000 yr, initialized appropriately at
30 ka (30 000 yr before present, relative to 1950 AD) from a previous
longer-term run. Atmospheric forcing is computed using a modern
climatological Antarctic data set (ALBMAP: Le Brocq et al., 2010), with
uniform cooling perturbations proportional to a deep-sea core

The large ensemble analyzed in this study uses full-factorial sampling, i.e.,
a run for every possible combination of parameter values, with four
parameters varied and with each parameter taking five values, requiring 625
(

The four parameters and their five values are the following.

OCFAC: sub-ice oceanic melt coefficient.
Values are 0.1, 0.3, 1, 3, and 10 (non-dimensional). Corresponds to

CALV: factor in calving of icebergs at the oceanic edge of floating ice
shelves. Values are 0.3, 0.7, 1, 1.3, and 1.7 (non-dimensional). Multiplies
the combined crevasse-depth-to-ice-thickness ratio

CSHELF: basal sliding coefficient for ice grounded on modern-ocean beds.
Values are 10

TAUAST:

Following Whitehouse et al. (2012a, b), Briggs and Tarasov (2013) and Briggs
et al. (2013, 2014), we test the model against three types of data for the
modern observed state, and five types of geologic data relevant to ice-sheet
variations of the last

One approach to calculating misfits and scores is to borrow from Gaussian
error distribution concepts, i.e., individual misfits

The eight individual data types and model–data misfits are listed below,
with basic information that applies to both of the above approaches. More
details are given in Appendix B, including formulae for the two approaches,
and “intra-data-type weighting” that is important for closely spaced sites
(Briggs and Tarasov, 2013). The two approaches of combining the individual
scores into one aggregate score

The eight individual data types are the following.

TOTE: modern grounding-line locations.
Misfit

TOTI: modern floating ice-shelf locations.
Misfit

TOTDH: modern grounded ice thicknesses.
Misfit

TROUGH: past grounding-line distance vs. time along the centerline
trough of Pine Island Glacier. Centerline data for the Ross and Weddell
basins can also be used, but not in this study. Misfit

GL2D: past grounding-line locations (see Fig. 1). Only the Amundsen Sea
region is used in this study. Misfit

RSL: past relative sea-level (RSL) records.
Misfit

ELEV/DSURF: past cosmogenic elevation vs. age (ELEV) and thickness vs.
age (DSURF). Misfits

UPL: modern uplift rates on rock outcrops.
Misfit

Geographical map of West Antarctica. Light yellow shows the modern extent of grounded ice (using Bedmap2 data; Fretwell et al., 2013). Blue and purple areas show expanded grounded-ice extents at 5, 10, 15 and 20 ka (thousands of years before present) reconstructed by the RAISED Consortium (2014), plotted using their vertex information (S. Jamieson, personl communication, 2015), and choosing their Scenario A for the Weddell embayment (Hillenbrand et al., 2014). These maps are used in the large ensemble scoring (TOTE, TROUGH and GL2D data types, Sect. 2.3).

Each of the misfits above are first transformed into a normalized individual
score for each data type

For a given data type

The individual score

The aggregate score for each run is

For a given data type

The normalized misfit

The individual score

The aggregate score for each run is

The more advanced statistical techniques (Chang et al., 2015, 2016) consist
of an emulation and a calibration stage, involving the same four model
parameters and the 625-member LE as above. The aggregate scores

Emulation is the statistical approach by which a computer model is approximated by a statistical model. This statistical approximation is obtained by running the model at many parameter settings and then “fitting” a Gaussian process model to the input–output combinations, analogous to fitting a regression model that relates independent variables (parameters) to dependent variables (model output) in order to make predictions of the dependent variable at new values of the independent variables. Of course, unlike basic regression, the model output may itself be multivariate. An emulator is useful because (i) it provides a computationally inexpensive method for approximating the output of a computer model at any parameter setting without having to actually run the model each time, and (ii) it provides a statistical model relating parameter values to computer model output – this means the approximations automatically include uncertainties, with larger uncertainties at parameter settings that are far from parameter values where the computer model has already been run. Specifically, the model output consisting of (i) modern grounding-line maps, and (ii) past locations of grounding lines vs. time along the centerline trough of Pine Island, are first reduced in dimensionality by computing principal components (PCs) over all LE runs. (Principal components are often referred to in the atmospheric science literature as empirical orthogonal functions or EOFs.) The first 10 PCs are used for modern maps, and the first 20 for past trough locations. Hence, we develop a Gaussian process emulator for each of the above PCs. Gaussian process emulators work especially well for model outputs that are scalars. The emulators are “fitted” to the PCs using a maximum likelihood estimation-based approach developed in Chang et al. (2015) that addresses the complications that arise due to the fact that the data are non-Gaussian. Details are available in Chang et al. (2015, 2016). The emulators provide a statistical model that essentially replaces the data types TOTE, TROUGH and GL2D described in Sect. 2.3.

In an extension to Chang et al. (2016), Gaussian process emulators are also
used here to estimate distributions of individual score values for the five
data types TOTI, TOTDH, RSL, ELEV/DSURF and UPL (

The calibration stage solves the following problem in a statistically
rigorous fashion: given observations and model runs at various parameter
settings, which parameters of the model are most likely? In a Bayesian
inferential framework, this translates to learning about the posterior
probability distribution of the parameter values given all the available
computer model runs and observations. The approach may be sketched out as
follows. The emulation phase provides a statistical model connecting the
parameters to the model output. Suppose it is assumed that the model at a
particular (ideal) set of parameter values produces output that resembles the
observations of the process. We also allow for measurement error and
systematic discrepancies between the computer model and the real physical
system. We do this via a “discrepancy function” that simultaneously
accounts for both; this is reasonable because both sources of error are
important while also being difficult to tease apart. Hence, one can think of
our approach as assuming that the observations are modeled as the model
output at an ideal parameter setting, added to a discrepancy function. Once
we are able to specify a model in this fashion, Bayesian inference provides a
a very standard approach to obtain the resulting posterior distribution of
the parameters: we start with a prior distribution for the parameters, where
we assume that all the values are equally likely before any observations are
obtained, and then use Bayes' theorem to find the posterior distribution
given the data. The posterior distribution cannot be found in analytical
form. Hence, in this second “calibration” stage, posterior densities of the
model parameters are inferred via Markov chain Monte Carlo (MCMC). The
observation and model quantities used in emulation and calibration consist of
the modern and past grounding-line locations, and five individual scores. The
discrepancy function is accounted for in assessing model vs. observed
grounding-line fits in our Bayesian approach. It is based in part on the
locations and times in which grounded ice occurs in the model and not in the
observations, or vice versa, in 50 % or more of the LE runs (Chang et
al., 2015, 2016). For the individual scores, we use exponential marginal
densities, whose rate parameters receive gamma priors scaled in such a way
that the 90th percentile of the prior density coincides with each score's
cutoff value

In the above procedures, observational error enters for the individual scores RSL, ELEV/DSURF and UPL via the calculations described in Appendix B. It is implicitly taken into account by the discrepancy function for grounding-line locations. Observational error is considered to be negligible for modern TOTI and TOTDH scores.

Figure 2 shows the aggregate scores

Aggregate scores for the complete large ensemble suite of runs
(625 runs, 4 model parameters, 5 values each, Sect. 2.2), used in the simple
method with score-weighted averaging. The score values range from 0 (white,
no skill) to 100 (dark red, perfect fit). The figure is organized to show the
scores in the 4-D space of parameter variations. The four parameters are
CSHELF

All scores with the largest CALV value of 1.7 (right-hand column of subpanels) are 0. In these runs, excessive calving results in very little floating ice shelves and far too much grounding-line retreat. Conversely, with the smallest CALV value of 0.3 (left-hand column of subpanels), most runs have too much floating ice and too advanced grounding lines during the runs, so most of this column also has zero scores. However, small CALV can be partially compensated for by large OCFAC (strong ocean melting), so there are some non-zero scores in the upper-left subpanels.

For mid-range CALV and OCFAC (subpanels near the center of the figure), the
best scores require high CSHELF (inner

Somewhat lower but still reasonable scores exist for lower CSHELF values of
10

Scores are quite insensitive to the TAUAST asthenospheric rebound timescale
(inner

The main results seen in Fig. 2 are borne out in Fig. 3. The left-hand panels
show results using the simple averaging method, i.e., the average score for
all runs in the LE with a particular parameter value. Triangles in these
panels show the mean parameter value

(Left) Ensemble-mean scores for individual parameter values, using the simple averaging method. The red triangle shows the mean, and whiskers show the 1-sigma standard deviations. (Right) Probability densities for individual parameters, using the advanced statistical techniques in Chang et al. (2016) extended as described in Sect. 2.5.

(Left) Ensemble-mean scores for pairs of parameters, using the simple averaging method. (Right) Probability densities for pairs of parameters, using the advanced statistical techniques in Chang et al. (2016) extended as described in Sect. 2.5.

The right-hand panels of Fig. 3 show the same single-parameter “marginal” probably density functions for this LE, using the advanced statistical techniques described in Chang et al. (2015, 2016) and summarized above. For OCFAC, CSHELF and TAUAST, there is substantial agreement with the simple-averaging results in both the peak “best-fit” values and the width of the ranges. For CALV, the peak values agree quite well, but the simple-averaging distribution has a significant tail for lower CALV values that is not present in the advanced results; this might be due to the discrepancy function in the advanced method (Sect. 2.5), which has no counterpart in the simple averaging method.

Equivalent global-mean sea-level contribution (ESL) relative to
modern vs. time. Time runs from 20 000 yr before present to modern. ESL
changes are calculated from the total ice amount in the domain divided by
global ocean area, allowing for less contribution from ice grounded below sea
level.

Probability densities for pairs of parameter values are useful in evaluating the quality of LE analysis, and can display offsetting physical processes that together maintain realistic results, e.g., greater OCFAC and lesser CALV (Chang et al., 2014, 2015, 2016). In Fig. 4, the left-hand panels show mean scores for pairs of the four parameters, using the simple averaging method and averaged over all LE runs with a particular pair of values. The right-hand panels show corresponding densities for the same parameter pairs using the advanced statistical techniques. Overall the same encouraging agreement is seen as for the single-parameter densities in Fig. 3, with the locations of the main maxima being roughly the same for each parameter pair. There are some differences in the extents of the maxima, notably for CALV, where the zone of high scores with the simple averaging method extends to lower CALV values than with the advanced techniques, as seen for the individual parameters in Fig. 3. In general, though, there is good agreement between the two methods regarding parameter ranges in Figs. 3 and 4, suggesting that the simple averaging method is viable, at least for LEs with full-factorial sampling of parameter space.

Figure 5 illustrates the use of the LE to produce past envelopes of model
simulations. Figure 5a, b show equivalent sea-level (ESL) scatterplots for
all 625 runs. Early in the runs around LGM (20 to 15 ka), the curves cluster
into noticeable groups with the same CSHELF values, due to the relatively
weak effects of the other parameters (OCFAC, CALV and TAUAST) for cold
climates and ice sheets in near equilibrium. Figure 5c, d show the mean and
one-sided standard deviations for the simple method. Most of the retreat and
sea-level rise occurs between

Figure 5e, f shows the equivalent mean and standard deviations derived from
the advanced statistical techniques. There is substantial agreement with the
simple-method curves in Fig. 5c, d, for most of the duration of the runs. The
largest difference is around the Last Glacial Maximum

Figure 6 shows probability densities of equivalent sea-level rise at
particular times in the runs. Figure 6a–d show results with the simple
averaging method, computed using score-weighted densities and 0.2 m wide ESL
bins (see caption). The uneven noise in this figure is due to the small
number of parameter values in our LE. The separate peaks for LGM
(

The simple averaging method, with quantities weighted by aggregate scores, produces results that are reasonably compatible with relatively sophisticated statistical techniques involving emulation, probability model/likelihood functions, and MCMC (Chang et al., 2015, 2016; Sect. 2.5). They are applied to the same LE with full-factorial sampling in parameter space, for which both techniques yield smooth and robust results, and the advanced technique acts as a benchmark against which the simple method can be compared.

Unlike the advanced techniques, the simple averaging method cannot interpolate in parameter space, and so is limited practically to relatively few parameters (four here) and a small number of values for each (five here). Previous work using LEs with Latin HyperCube sampling (Applegate et al., 2012; Chang et al., 2014, 2015) has shown that the simple averaging method can fail if the sampling is too coarse, whereas the advanced technique provides smooth and meaningful results. This is primarily due to emulation and MCMC in the advanced techniques, which still interpolate successfully in the coarsely sampled parameter space. Of course, this distinction depends on the size of the LE and the coarseness of the sampling; somewhat larger LEs with Latin HyperCube sampling and fewer parameters can be amenable to the simple method. Note that this is not addressed in this paper, where just one full-factorial LE is used.

The best-fit parameter ranges deduced from the LE analysis generally fit
prior expectations. In particular, the results strongly confirm that large
basal sliding coefficients (i.e., slippery beds) are appropriate for modern
continental-shelf oceanic areas. In further work we will assess heterogeneous
bed properties such as the inner region of hard outcropping basement observed
in the ASE (Gohl et al., 2013). The best-fit range for the asthenospheric
relaxation timescale TAUAST values is quite broad, including the prior
reference value

The total Antarctic ice amount at the Last Glacial Maximum is equivalent
to

There are only minor episodes of accelerated West Antarctic Ice Sheet (WAIS) retreat and equivalent
sea-level rise in the simulations (Fig. 5), and none with magnitudes
comparable to Melt Water Pulse 1A for instance, with

A natural extension of this work is to extend the Antarctic model simulations and LE methods into the future, using climates and ocean warming following Representative Concentration Pathway scenarios (Meinshausen et al., 2011). In these warmer climates we expect marine ice-sheet instability to occur in WAIS basins, consistent with past retreats simulated in Pollard and DeConto (2009). Also, drastic retreat mechanisms of hydrofracture and ice-cliff failure, not triggered in the colder-than-present simulations of this paper, may play a role, as found for the Pliocene in Pollard et al. (2015). Future applications with simple-average LEs are described in Pollard et al. (2016), and detailed future scenarios with another type of LE are described in DeConto and Pollard (2016).

The code for the ice-sheet model (PSUICE-3D) is available on request from the corresponding author. The post-processing codes for the large-ensemble statistical analyses are highly tailored to specific sets of model output and are not made available; however, modules that compute scores for the individual data types are also available on request.

The four model parameters (OCFAC, CALV, CSHELF and TAUAST) and their ranges in the large ensemble are summarized in Sect. 2.2. Their physical effects in the model and associated uncertainties are discussed in more detail here.

OCFAC is the main coefficient in the parameterization of sub-ice-shelf oceanic melt, which is proportional to the square of the difference between nearby water temperature at 400 m and the pressure-melting point of ice. Oceanic melting (or freezing) erodes (or grows on) the base of floating ice shelves, as warm waters at intermediate depths flow into the cavities below the shelves. The resulting ice-shelf thinning reduces pinning points and lateral friction, and thus back stress on grounded interior ice. As mentioned above, recent increases in ocean melt rates are considered to be the main cause of ongoing downdraw and acceleration of interior ice in the ASE sector of WAIS (Pritchard et al., 2012; Dutrieux et al., 2014). High-resolution dynamical ocean models (Hellmer et al., 2012) are not yet practical on these timescales, and simple parameterizations of sub-ice-shelf melting such as the one used here are quite uncertain (e.g., Holland et al., 2008). For small (large) OCFAC values, oceanic melting is reduced (increased), ice shelves thicken (thin), discharge of interior ice across the grounding line decreases (increases), and grounding lines tend to advance (retreat).

CALV is the main factor in the parameterization of iceberg calving at the oceanic edges of floating shelves. Calving has important effects on ice-shelf extent with strong feedback effects via buttressing of interior ice. However, the processes controlling calving are not well understood, probably depending on a combination of pre-existing fracture regime, large-scale stresses, and hydrofracturing by surface meltwater. There is little consensus on calving parameterizations. We use a common approach based on parameterized crevasse depths and their ratio to ice thickness (Benn et al., 2007; Nick et al., 2010). For small (large) CALV, calving is decreased (increased), producing more (less) extensive floating shelves, and greater (lesser) buttressing of interior ice.

CSHELF is the basal sliding coefficient for ice grounded on areas that are
ocean bed today (and is not frozen to the bed). Coefficients under modern
grounded ice are deduced by inverse methods (Pollard and DeConto, 2012b;
Morlighem et al., 2013), but they are relatively unconstrained for modern
oceanic beds, across which grounded ice advanced at the Last Glacial Maximum

TAUAST is the

The eight types of modern and past data used in evaluating the model
simulations are summarized in Sect. 2.3. More details on the algorithms used
to compute the individual mismatches

As discussed in Sects. 2.3 and 2.4, we use 2 approaches in scoring: (a) more
closely following Gaussian error forms, and (b) with more heuristic forms.
Some of the algorithms for individual misfits differ between the two, as
indicated by bullets (a) and (b) below. For most data types, approach (a)
uses mean-square errors, and (b) uses root-mean-square errors. For some data
types, the errors are normalized not by observational uncertainty, but by an
“acceptable model error magnitude” representing typical model departures
from observations in reasonably realistic runs, if this is larger than
observational error. Note that if this scaling uncertainty is the same for
all data of a given type, it cancels out in the normalization of individual
misfits (

Modern grounding-line locations.

Approach (a): Misfit

Approach (b): Misfit

Modern floating ice-shelf locations.

Approach (a): Misfit

Approach (b): Misfit

Modern grounded ice thicknesses.

Approach (a): Misfit

Approach (b): Misfit

Past grounding-line distance vs. time along centerline troughs of Pine Island Glacier, and optionally the Ross and Weddell basins. Observed distances at ages 20, 15, 10 and 5 ka are obtained from grounding-line reconstructions of the RAISED Consortium (2014): Anderson et al. (2014) for the Ross; Larter et al. (2014) for the Amundsen Sea, and Hillenbrand et al. (2014) for the Weddell, using their Scenario A of most retreated Weddell ice. Distances are then linearly interpolated in time between these dates. The centerline trough for Pine Island Glacier is extended across the continental shelf following the paleo-ice-stream trough shown in Jakobsson et al. (2011). The resulting Pine Island Glacier transect vs. time is similar to that in Smith et al. (2014).

Approach (a): Misfit

Approach (b): Misfit

In this study just the Pine Island Glacier trough is used, but if the Ross and Weddell are used also, the means are taken over all three troughs.

Past grounding-line locations. This uses reconstructed grounding-line maps
for 20, 15, 10, and 5 ka by the RAISED Consortium (2014; Anderson et al.,
2014; Hillenbrand et al., 2014; Larter et al., 2014; Mackintosh et al., 2014;
O Cofaigh et al., 2014), with vertices provided by S. Jamieson, personal
communication, 2015, and
choosing their Scenario A for the Weddell embayment (Hillenbrand et al.,
2014). The modern grounding line (0 ka) is derived from the Bedmap2 data set
(Fretwell et al., 2013). For this study only the Amundsen Sea region is
considered. We allow for uncertainty in the past reconstructions by setting a
probability of reconstructed floating ice or open ocean at each point

Computing the distance

Dividing this distance by the sum

Setting the probability

Approach (a): Misfit

Approach (b): Misfit

Past relative sea-level (RSL) records. This uses the compilation by Briggs
and Tarasov (2013) of published RSL data vs. time at sites close to the
modern coastline. Following those authors, the model RSL

Approach (a): Misfit

Approach (b): Misfit

This uses a combination of two compilations of cosmogenic data: elevation vs. age in Briggs and Tarasov (2013) for ELEV, and thickness change from modern vs. age in RAISED Consortium (2014) (with individual citations as above) for DSURF.

For ELEV, the calculations closely follow Briggs and Tarasov (2013, their
Sect. 4.2):

a time series of a model ice surface is used, with sea-level and bedrock elevation changes subtracted out, for the closest model grid point to each ELEV datum.

Only model elevations with a “deglaciating trend” are used; i.e., the
model elevation for each time is replaced by the maximum elevation between
that time and the present, if the latter is greater, allowing for an
uncertainty

The mismatch for each datum is the minimum of (

Approach (a): Misfit

Approach (b): for approach (b), ELEV calculations as above are combined with DSURF calculations.

The DSURF calculations are simpler: for each datum, the time series of model
surface elevations

This uses modern uplift rates on rock outcrops, using the compilation in
Whitehouse et al. (2012b). For each observed site, the model's modern

Approach (a): the mismatch at each datum is [(

Approach (b): the mismatch at each datum is
(

As discussed in Sect. 2.3, the choice of formulae and algorithms to calculate model vs. data misfits and scores in the simple averaging method is somewhat heuristic, and different choices are also appropriate for complex model–data comparisons with widespread data points, very different types of data, and with many model–data error types not being strictly Gaussian. Two possible approaches are described above (Sect. 2.4, Appendix B): approach (a) uses formulae closely following Gaussian error distribution forms, and approach (b) uses more heuristic forms. Approach (b) is used for all results in the main paper. In this appendix the simple-averaging results (Figs. 2–5) are compared using both approaches. No significant differences are found, especially in the LE-averaged results, which suggests that different reasonable approaches to misfits and scoring yield robust statistical results for the ensemble.

In Fig. C1, the individual scores have much the same patterns over 4-D parameter space. There are some minor differences in the relative magnitudes of very good, vs. poor but still meaningful scores, which we have compensated for to some extent in the two color scales, but these do not lead to any significant differences in the averaged results in the following figures.

Aggregate scores for the complete large ensemble suite of runs
(625 runs, 4 model parameters, 5 values each), used in the simple method with
score-weighted averaging. The organization of the figure regarding the
4 parameter ranges is as described in Fig. 2.

In the parameter-pair scores (Fig. C2), the overall patterns are very similar. The biggest difference is for CALV vs. TAUAST, where the scores for approach (a) are higher and more tightly concentrated.

In the plots of equivalent sea level vs. time (Fig. C3), approach (a) generally favors runs with less ice volume during LGM and retreat, compared to approach (b) (red curves, Fig. C3c vs. d). On the other hand, the single best-scoring run in approach (a) retreats later than the corresponding run in approach (b) (black curves, Fig. C3a vs. b). Generally, these differences are minor compared to the overall model behavior through the deglaciation.

In the density distributions of equivalent sea level at particular times
(Fig. C4), there is very little difference between the two approaches. The
size of the

Ensemble-mean scores for individual parameter values, using the
simple averaging method as in Fig. 3.

Ensemble-mean scores for pairs of parameters, using the simple
averaging method as in Fig. 4.

Equivalent global-mean sea-level contribution (ESL) relative to
modern vs. time as in Fig. 5.

Probability densities of equivalent sea-level (ESL) rise at
particular times as in Fig. 6.

This appendix compares envelopes of model results with corresponding types of geologic data used in the LE scoring. The main goal is to demonstrate that the envelopes of the 625-member ensemble adequately span the data; i.e., at least some runs yield results that fall on both sides of each type of data, so that ensemble averages may potentially represent reasonably realistic ice-sheet behavior (even if no single model run is close to all data types).

For modern data (grounded and floating ice extents, grounded ice thicknesses), the standard model has previously been shown to yield quite realistic simulations, both for perpetual modern climate and at the end of long-term glacial–interglacial runs (Pollard and DeConto, 2012a). Modern grounded ice thicknesses are close to observed mainly because of the inverse procedure in specifying the distribution of basal sliding coefficients (Pollard and DeConto, 2012b). Here we concentrate on fits to geologic data.

Figure D1 compares scatterplots of relative sea level in all 625 runs with RSL records, for the three sites within the model's West Antarctic domain (Briggs and Tarasov, 2013). The data for each site fall well within the overall model envelope, and in most cases within the envelopes of the top 120 scoring runs (colored curves). Similar comparisons for single runs are shown in Gomez et al. (2013), both using the simple bedrock model as here (their “uncoupled” runs), and coupled to a global Earth-sea-level model.

Similarly, Fig. D2 compares elevation vs. age time series for all 625 runs
with cosmogenic data at the 18 sites within the model domain (Briggs and
Tarasov, 2013). With a few exceptions, the data lie within the LE model
envelopes, although elevations at many of the sites are lower than in most of
the model runs. At Reedy Glacier, the model exhibits oscillations of

Figure D3 shows modern uplift rates for all model runs, at the 26 sites in the Whitehouse et al. (2012b) compilation that lie within the mode domain. Again, nearly all of the observed values lie within the overall model envelope. The geographic distribution for single runs is compared with that observed in Gomez et al. (2013), both using a simple bedrock model (“uncoupled”) and coupled to a global Earth-sea-level model.

The remaining past data types (GL2D and TROUGH) concern grounding-line locations during the last deglacial retreat, and are less amenable to scatterplots, but can be compared with model averaged results. Figure D4 shows maps of probability (0–1) of the presence of grounded ice at particular times, deduced by score-weighted averages over the ensemble. The thick black lines at 20, 15, 10 and 5 ka show grounding-line positions in the reconstructions of the RAISED Consortium (2014). (The figures do not show the uncertainty information associated with the data, which is used in the scoring; Appendix B.) At all of these times, the envelopes of the model “grounding zone”, i.e., the areas with intermediate probability values, span or are close to the observed positions.

Similarly, Fig. D5 shows model probabilities (0–1) of grounded ice vs. time along the centerline transects of the major West Antarctic embayments. Again, the model envelopes mostly span the various observed estimates for each transect (from RAISED Consortium, 2014, and various earlier studies).

Taken together, the various model vs. data comparisons in this appendix show that the model's ensemble envelopes do encompass the ranges of data satisfactorily, as necessary for meaningful interpretations of the statistical results.

Model vs. observed relative sea-level (RSL) data, for the three RSL
sites (Briggs and Tarasov, 2013) that lie within and away from the edges of
the model's West Antarctic domain. The observations and uncertainty ranges
are shown as black dots and whiskers. Model curves are shown for all
625 runs, with aggregate scores

Model vs. observed elevation vs. age data, for the 18 sites in the
compilation (Briggs and Tarasov, 2013) that lie within and away from the
edges of the model's West Antarctic domain, shown roughly in west-to-east
order. The observations and uncertainty ranges are shown as black dots and
whiskers. Model curves are shown for all 625 runs, with aggregate scores

Model vs. observed modern uplift rates, for the 25 sites in the
compilation (Whitehouse et al., 2012b) that lie within the model's West
Antarctic domain, shown roughly in west-to-east order. The observations and
uncertainty ranges are shown as black dots and whiskers. Model rates are
shown for all 625 runs, with straight lines joining the sites, and aggregate
scores

Score-weighted probability (0 to 1) of grounded ice vs. floating ice
or open ocean at each grid point (see text), for various times over the last
20 000 yr, concentrating on the period of rapid retreat between 15 and
10 ka. The LE and model version is essentially the same as above, except
with all-Antarctic coverage to include East Antarctic variations. The
quantity shown is the sum of scores

Upper panels: score-weighted probability (0 to 1) of grounded ice vs. time, as in Fig. D4 but along centerline transects of (i) Pine Island Glacier and its paleo-trough, (ii) Ross embayment and (iii) Weddell embayment. Black symbols show various published data: Pine Island, circles: Larter et al. (2014) (the RAISED Consortium, 2014). Pine Island, crosses: Kirshner et al. (2012), Hillenbrand et al. (2014) and Smith et al. (2014). Ross, circles: Anderson et al. (2014) (the RAISED Consortium, 2014). Ross, crosses: Conway et al. (1999) and McKay et al. (2008). Weddell, “A” and “B”: Hillenbrand et al. (2014) (the RAISED Consortium, 2014), Scenarios A and B, respectively. Lower panels: modern bathymetric profiles along each transect (from Bedmap2; Fretwell et al., 2013).

We thank Zhengyu Liu and his group at U. Wisconsin for providing output of
their coupled GCM simulation (TraCE-21 ka; Liu et al., 2009; He et al.,
2013;