LS3MIP (v1.0) contribution to CMIP6: the Land Surface, Snow and Soil moisture Model Intercomparison Project – aims, setup and expected outcome

. The Land Surface, Snow and Soil Moisture Model Intercomparison Project (LS3MIP) is designed to provide a comprehensive assessment of land surface, snow and soil moisture feedbacks on climate variability and climate change, and to diagnose systematic biases in the land modules of current Earth system models (ESMs). The solid and liquid water stored at the land surface has a large inﬂuence on the regional climate, its variability and predictability, including effects on the energy, water and carbon cycles. No-tably, snow and soil moisture affect surface radiation and ﬂux partitioning properties, moisture storage and land surface memory. They both strongly affect atmospheric conditions, in particular surface air temperature and precipitation, but also large-scale circulation patterns. However, models show divergent responses and representations of these feedbacks as well as systematic biases in the underlying processes. LS3MIP will provide the means to quantify the associated uncertainties and better constrain climate change projections, which is of particular interest for highly vulnerable regions (densely populated areas, agricultural regions, the Arctic, semi-arid and other sensitive terrestrial ecosystems). The experiments are subdivided in two components, the ﬁrst addressing systematic land biases in ofﬂine mode (“LMIP”, building upon the 3rd phase of Global Soil Wetness Project; GSWP3) and the second addressing land feedbacks attributed to soil moisture and snow in an integrated framework (“LFMIP”, building upon the GLACE-CMIP blueprint).


Introduction
Land surface processes, including heat fluxes, snow, soil moisture, vegetation, turbulent transfer and runoff, continue to be ranked highly on the list of the most relevant yet complex and poorly represented features in state-of-the-art climate models. People live on land, exploit its water and natural resources and experience day-to-day weather that is strongly affected by feedbacks with the land surface. The six Grand Challenges of the World Climate Research Program (WCRP) 1 include topics governed primarily (Water Availability, Cryosphere) or largely (Climate Extremes) by land surface characteristics.
Despite the importance of a credible representation of land surface processes in Earth system models (ESMs), a number of systematic biases and uncertainties persist. Biases in hydrological characteristics (e.g., moisture storage in soil and snow, runoff, vegetation and surface water bodies), partitioning of energy and water fluxes , definition of initial and boundary conditions at the appropriate spatial scale, feedback strengths (Koster et al., 2004;Qu and Hall, 2014) and inherent land surface related predictability 1 http://www.wcrp-climate.org/grand-challenges (Douville et al., 2007;Dirmeyer et al., 2013) are still subjects of considerable research effort.
These biases and uncertainties are problematic, because they affect, among others, forecast skill (Koster et al., 2010a), regional climate change patterns (Campoy et al., 2013;Seneviratne et al., 2013;Koven et al., 2012) and explicable trends in water resources (Lehning, 2013). In addition, there is evidence of the presence of large-scale systematic biases in some aspects of land hydrology in current climate models  and the terrestrial component of the carbon cycle Mystakidis et al., 2016). Notably, land surface processes can be an important reason for a direct link between the climate models' temperature biases in the present period and in the future projections with increased radiative forcings at the regional scale (Cattiaux et al., 2013).
For snow cover, a better understanding of the links with climate is critical for interpretation of the observed dramatic reduction in springtime snow cover over recent decades (e.g., Derksen and Brown, 2012;Brutel-Vuilmet et al., 2013), to improve the seasonal to interannual forecast skill of temperature, runoff and soil moisture (e.g., Thomas et al., 2016;Peings et al., 2011) and to adequately represent polar warming amplification in the Arctic (e.g., Holland and Bitz, 2003). Snow-related biases in climate models may arise from the snow-albedo feedback (Qu and Hall, 2014;Thackeray et al., 2015), but also from the energy sink induced by snow melting in spring and the thermal insulation effect of snow on the underlying soil Gouttevin et al., 2012). Temporal dynamics of snow-atmospheric coupling during various phases of snow depletion Dirmeyer, 2011, 2012) are crucial for a proper representation of the timing and atmospheric response to snow melt. Phase 1 and 2 of the Snow Model Intercomparison Project (SnowMIP) (Etchevers et al., 2004;Essery et al., 2009) provided useful insights in the capacity of snow models of different complexity to simulate the snowpack evolution from local meteorological forcing but did not explore snow-climate interactions. Because of strong snow/atmosphere interactions, it remains difficult to distinguish and quantify the various potential causes for disagreement between observed and modeled snow trends and the related climate feedbacks.
Soil moisture plays a central role in the coupled landvegetation-snow-water-atmosphere system van den Hurk et al., 2011), where interactions are evident at many relevant timescales: diurnal cycles of land surface fluxes, seasonal and subseasonal predictability of droughts, floods and hot extremes, annual cycles governing the water buffer in dry seasons and shifts in the climatology in response to changing patterns of precipitation and evaporation. The representation of historical variations in land water availability and droughts still suffer from large uncertainties, due to model parameterizations, unrepresented hydrologic processes such as lateral groundwater flow, lateral flows connected to re-infiltration of river water or irrigation B. van den Hurk et al.: LS3MIP (v1.0) contribution to CMIP6 with river water, and/or atmospheric forcings Zampieri et al., 2012;Trenberth et al., 2014;Greve et al., 2014;Clark et al., 2015). This also applies to the energy and carbon exchanges between the land and the atmosphere (e.g., Mueller and Seneviratne, 2014;Friedlingstein et al., 2013).
It is difficult to generate reliable observations of soil moisture and land surface fluxes that can be used as boundary conditions for modeling and predictability studies. Satellite retrievals, in situ observations, offline model experiments (Second Global Soil Wetness Project, GSWP2; Dirmeyer et al., 2006) and indirect estimates all have a potential to generate relevant information but are largely inconsistent, covering different model components, and suffer from methodological flaws (Mueller et al., 2013;Mao et al., 2015). As a consequence, the pioneering work on deriving soil moisture related land-atmosphere coupling strength (Koster et al., 2004) and regional/global climate responses in both present and future climate (Seneviratne et al., 2006 has been carried out using (ensembles of) modeling experiments. The second Global Land Atmosphere Coupling Experiment (GLACE2; Koster et al., 2010a) measured the actual temperature and precipitation skill improvement of using GSWP2 soil moisture initializations, which is much lower than suggested by the coupling strength diagnostics. Limited quality of the initial states, limited predictability and poor representation of essential processes determining the propagation of information through the hydrological cycle in the models all play a role.
Altogether, there are substantial challenges concerning both the representation of land surface processes in currentgeneration ESMs and the understanding of related climate feedbacks. The Land Surface, Snow and Soil moisture Model Intercomparison Project (LS3MIP) is designed to allow the climate modeling community to make substantial progress in addressing these challenges. It is part of the sixth phase of the Coupled Model Intercomparison Project (CMIP6; . The following section further develops the objectives and rationale of LS3MIP. The experimental design and analysis plan is presented thereafter. The final discussion section describes the expected outcome and impact of LS3MIP.

Objectives and rationale
The goal of the collection of LS3MIP experiments is to provide a comprehensive assessment of land surface, snow and soil moisture-climate feedbacks, and to diagnose systematic biases and process-level deficiencies in the land modules of current ESMs. While vegetation, carbon cycle, soil moisture, snow, surface energy balance and land-atmosphere interaction are all intimately coupled in the real world, LS3MIP focuses -necessarily -on the physical land surface in this complex system: interactions with vegetation and carbon cycle are included in the analyses wherever possible without losing this essential focus. In the complementary experiment Land Use MIP (LUMIP; see Lawrence et al., 2016) and C4MIP  vegetation, the terrestrial carbon cycle and land management are the central topics of analysis. LS3MIP and LUMIP share some model experiments and analyses (see below) to allow to be addressed the complex interactions at the land surface and yet remain able to focus on well-posed hypotheses and research approaches.
LS3MIP will provide the means to quantify the associated uncertainties and better constrain climate change projections, of particular interest for highly vulnerable regions (including densely populated regions, the Arctic, agricultural areas, and some terrestrial ecosystems).
The LS3MIP experiments collectively address the following objectives: evaluate the current state of land processes including surface fluxes, snow cover and soil moisture representation in CMIP DECK (Diagnostic, Evaluation and Characterization of Klima) experiments and CMIP6 historical simulations , to identify the main systematic biases and their dependencies; estimate multi-model long-term terrestrial energy/water/carbon cycles, using the land modules of CMIP6 models under observation-constrained historical (land reanalysis) and projected future (impact assessment) climatic conditions considering land use/land cover changes; assess the role of snow and soil moisture feedbacks in the regional response to altered climate forcings, focusing on controls of climate extremes, water availability and high-latitude climate in historical and future scenario runs; assess the contribution of land surface processes to systematic Earth system model biases and the current and future predictability of regional temperature/precipitation patterns.
These objectives address each of the three CMIP6 overarching questions: (1) What are regional feedbacks and responses to climate change?; (2) What are the systematic biases in the current climate models?; and (3) What are the perspectives concerning the generation of predictions and scenarios? LS3MIP encompasses a family of model experiments building on earlier multi-model experiments, particularly (a) offline land surface experiments (GSWP2 and its successor GSWP3), (b) the coordinated snow model intercomparisons SnowMIP phase 1 and 2 (Etchevers et al., 2004;Essery et al., 2009), and (c) the coupled climate timescale GLACEtype configuration (GLACE-CMIP, Seneviratne et al., 2013). Within LS3MIP the Land-only experimental suite is referred to as LMIP (Land Model Intercomparison Project) with the experiment ID Land, while the coupled suite is labeled as LFMIP (Land Feedback MIP). A detailed description of the model design is given below, and a graphical display of the various components within LS3MIP is shown in Fig. 1.
As illustrated in Fig. 2, LS3MIP is addressing multiple WCRP Grand Challenges and core projects. The LMIP experiment will provide better estimates of historical changes in snow and soil moisture at global scale, thus allowing the evaluation of changes in freshwater, agricultural drought and streamflow extremes over continents and a better understanding of the main drivers of these changes. The LFMIP experiments are of high relevance for the assessment of key feedbacks and systematic biases of land surface processes in coupled mode , and are particularly focusing on two of the main feedback loops over land: the snow-albedo-temperature feedback involved in Arctic Amplification, and the soil moisture-temperature feedback leading to major changes in temperature extremes (Douville et al., 2016). In addition, LS3MIP will allow the exchange of data and knowledge across the snow and soil moisture research communities that address a common physical topic: terrestrial water in liquid and solid form. Snow and soil moisture dynamics are often interrelated (e.g., Hall et al., 2008;Xu and Dirmeyer, 2012) and jointly contribute to hydrological variability (e.g., Koster et al., 2010b).
LS3MIP will also provide relevant insights for other research communities, such as global reconstructions of land variables that are not directly observed for detection and attribution studies , estimates of freshwater inputs to the oceans (which are relevant for sea-level changes and regional impacts; Carmack et al., 2015), the assessment of feedbacks shown to strongly modulate regional climate variability relevant for regional climate information, as well as the investigation of land climate feedbacks on large-scale circulation patterns and cloud occurrence (Zampieri and Lionello, 2011). This will thus also imply potential contributions to programmes like the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP; Warszawski et al., 2014) and the International Detection and Attribution Group IDAG. LS3MIP is geared to extend and consolidate available data, models and theories to support human awareness and resilience to highly variable environmental conditions in a large ensemble of sectoral domains, including disaster risk reduction, food security, public safety, nature conservation and societal wellbeing. Figure 3 illustrates the embedding of LS3MIP within CMIP6. LS3MIP fills a major gap by considering systematic land biases and land feedbacks. In this context, LS3MIP is part of a larger "LandMIP" series of CMIP6 experiments fully addressing biases, uncertainties, feedbacks and forcings from the land surface ( Fig. 1), which are complementary to similar experiments for ocean or atmospheric processes . In particular, we note that while LS3MIP focuses on systematic biases in land surface processes (Land) and on feedbacks from the land surface processes on the climate system (LFMIP), the complementary Land Use MIP (LUMIP) experiment addresses the role of Figure 1. Structure of the "LandMIPs". LS3MIP includes (1) the offline representation of land processes (LMIP) and (2) the representation of land-atmosphere feedbacks related to snow and soil moisture (LFMIP). Forcing associated with land use is assessed in LUMIP. Substantial links also exist to C4MIP (terrestrial carbon cycle). Furthermore, a land albedo test bed experiment is planned within GeoMIP. From Seneviratne et al. (2014). land use forcing on the climate system. The role of vegetation and carbon stores in the climate system is a point of convergence between LUMIP, C4MIP and LS3MIP, and the offline LMIP experiment will serve as land-only reference experiments for both the LS3MIP and LUMIP experiments. In addition, there will also be links to the C4MIP experiment with respect to impacts of snow and soil moisture processes (in particular droughts and floods) on terrestrial carbon exchanges and resulting feedbacks to the climate system.

Experimental design
The experimental design of LS3MIP consists of a series of offline land-only experiments (LMIP) driven by a land surface forcing data set and a variety of coupled model simulations (LFMIP) (see Fig. 4 and Table 1  mate, and LS3MIP to provide soil moisture and snow boundary conditions. Meteorological forcings, ancillary data (e.g., land use/cover changes, surface parameters, CO 2 concentration and nitrogen deposition) and documented protocols to spinup and execute the experiments are essential ingredients for a successful offline land model experiment (Wei et al., 2014). The first Global Soil Wetness Project (GSWP; Dirmeyer et al., 1999), covering two annual cycles (1987)(1988), established a successful template, which was updated and fine-tuned in a number of follow-up experiments, both with  Table 1 and text. global Sheffield et al., 2006) and regional (Boone et al., 2009) coverage.

Available data sets for meteorological forcing
Offline experiments will primarily use GSWP3 3 (Tier 1) forcing (Kim et al., 2016) with alternate forcing used in Tier 2 experiments.
The third Global Soil Wetness Project (GSWP3) provides meteorological forcings for the entire 20th century and beyond, making extensive use of the 20th Century Reanalysis (20CR) (Compo et al., 2011). In this reanalysis product only surface pressure and monthly sea-surface temperature and sea-ice concentration are assimilated. The ensemble uncertainty in the synoptic variability of 20CR varies with the time-changing observation network. High correlations for geopotential height (500 hPa) and air temperature (850 hPa) with an independent long record  of upper-air data were found (Compo et al., 2011), comparable to forecast skill of a state-of-the-art forecasting system at 3 days lead time. GSWP3 forcing data are generated based on a dynamical downscaling of 20CR. A simulation of the Global Spectral Model (GSM), run at a T248 resolution (∼ 50 km) is nudged to the vertical structures of 20CR zonal and meridional winds and air temperature using a spectral nudging dy-namical downscaling technique that effectively retains synoptic features in the higher spatial resolution (Yoshimura and Kanamitsu, 2008). Additional bias corrections using observations, vertical damping (Hong and Chang, 2012) and single ensemble member correction (Yoshimura and Kanamitsu, 2013) are applied, giving considerable improvements. Weedon et al. (2011) provide the meteorological forcing data for the EU Water and Global Change (WATCH) programme 4 , designed to evaluate global hydrological trends and impacts using offline modeling. The half-degree resolution, 3-hourly WATCH Forcing Data (WFD) was based on the ECMWF ERA-40 reanalysis and included elevation correction and monthly bias correction using CRU observations (and alternative GPCC precipitation total observations). WATCH hydrological modeling led to the WaterMIP study (Haddeland et al., 2011). The WFD stops in 2001, but within a follow-up project EMBRACE Weedon et al. (2014) generated the WFDEI data set that starts in 1979 and was recently extended to 2014. The WFDEI was based on the WATCH Forcing Data methodology but used the ERA-Interim reanalysis (4D-var and higher spatial resolution than ERA-40) so that there are offsets for some variable in the overlap period with the WFD. The forcing consists of 3-hourly ECMWF ERA-Interim reanalysis data (WFD used ERA-40) interpolated to half degree spatial resolution. The 2 m temperatures are bias-corrected in terms of monthly means and monthly average diurnal temperature range using CRU half degree observations. The 2 m temperature, surface pressure, specific humidity and downwards longwave radiation fluxes are sequentially elevation corrected. Shortwave radiation fluxes are corrected using CRU cloud cover observations and corrected for the effects of seasonal and interannual changes in aerosol loading. Rainfall and snowfall rates are corrected using CRU wet days per month and according to CRU or GPCC observed monthly precipitation gauge totals. The WFDEI data set is also used as forcing to the ISIMIP2.1 project, which focuses on historical validation of global water balance under transient land use change (Warszawski et al., 2014).
To support the Global Carbon Project 5 (Le Quere et al., 2009) with annual updates of global carbon pools and fluxes, the offline modeling framework TRENDY 6 applies an ensemble of terrestrial carbon allocation and land surface models. For this a forcing data set is prepared in which NCEP reanalysis data are bias corrected using the gridded in situ climate data from the Climate Research Unit (CRU), the socalled CRU-NCEP data set . This data set is currently available from 1901 to 2014 at 0.5 • horizontal spatial resolution and 6-hourly time step. It is being updated annually.
The Princeton Global Forcing data set 7 (Sheffield et al., 2006) was developed as a forcing for land surface and other terrestrial models, and for analyzing changes in near-surface climate. The data set is based on 6-hourly surface climate from the NCEP-NCAR reanalysis, which is corrected for biases at diurnal, daily and monthly timescales using a variety of observational data sets. The data are available at 1.0, 0.5 and 0.25 • resolution and 3-hourly time step. The latest version (V2.2) covers 1901-2014, with a real-time extension based on satellite precipitation and weather model analysis fields. The reanalysis precipitation is corrected by adjusting the number of rain days and monthly accumulations to match observations from CRU and the Global Precipitation Climatology Project (GPCP). Precipitation is downscaled in space using statistical relationships based on GPCP and the TRMM Multi-satellite Precipitation Analysis (TMPA), and to 3-hourly resolution based on TMPA. Temperature, humidity, pressure and longwave radiation are downscaled in space with account for elevation. Daily mean temperature and diurnal temperature range are adjusted to match the CRU monthly data. Shortwave and longwave surface radiation are adjusted to match satellite-based observations from the University of Maryland (Zhang et al., 2016) and to be consistent with CRU cloud cover observations outside of the satellite period. An experimental version (V3) assimilates station observations into the background gridded field to provide local-scale corrections (J. Sheffield, personal communication, February 2016). Figure 5 shows the performance in terms of correlation and standard deviation of the forcing data sets compared to daily observations from 20 globally distributed in situ FLUXNET sites (Baldocchi et al., 2001). Although for precipitation intrinsic heterogeneity leads to significant differences with the in situ observations, longwave and shortwave downward radiation (not shown) and air temperature show variability characteristics similar to the observations.
The participating modeling groups are invited to run a number of experiments in this land-only branch of LS3MIP.

Historical offline simulations: Land-Hist
The Tier 1 experiments of the offline LMIP experiment consist of simulations using the GSWP3 forcing data for a historical (1831-2014) interval. The land model configuration should be identical to that used in the DECK and CMIP6 historical simulations for the parent coupled model.
The atmospheric forcing will be prepared at a standard 0.5 × 0.5 • spatial resolution at 3-hourly intervals and distributed with a package to regrid data to the native grids of the global climate models (GCMs). Also vegetation, soil, topography and land/sea mask data will be prescribed following the protocol used for the CMIP6 DECK simulations. Spinup of the land-only simulations should follow the TRENDY protocol 8 which calls for recycling of the climate mean and variability from two decades of the forcing data set (e.g., 1831-1850 for GSWP3, 1901-1920 for the alternative land surface forcings). Land use should be held constant at 1850 as in the DECK 1850 coupled control simulation (piControl). See discussion and definition of "constant land-use" in Sect. 2.1 of LUMIP protocol paper (Lawrence et al., 2016). CO 2 and all other forcings should be held constant at 1850 levels during spinup. For the period 1850 to the first year of Figure 5. Taylor diagram for evaluating the forcing data sets comparing to daily observations from FLUXNET sites, as used by (Best et al., 2015): (a) 2 m air temperature and (b) precipitation. Red, blue and green dots indicate GSWP3, Watch Forcing Data  and Princeton forcing (Sheffield et al., 2006), respectively. Grey and orange dots indicate 20CR and its dynamically downscaled product (GSM248).
the forcing data set, the forcing data should continue to be recycled but all other forcings (land-use, CO 2 , etc.) should be as in the CMIP6 historical simulation. Transient land use is a prescribed CMIP6 forcing and is described in the LUMIP protocol (Lawrence et al., 2016).
Interactions with the ocean MIP (OMIP; Griffies et al., 2016) are arranged by the use of terrestrial freshwater fluxes produced in the LMIP simulations as a boundary condition for the forced ocean-only simulations in OMIP, in addition to the forcing provided by (Dai and Trenberth 2002).
Single site time series of in situ observational forcing variables from selected reference locations (from FLUXNET, Baldocchi et al., 2001) are supplied in addition to the forcing data for additional site level validation. This allows the evaluation of land surface models in current GCMs such as applied by Best et al. (2015) and in ESM-SnowMIP (Earth System Model -Snow Module Intercomparison Project; see below). For snow evaluation, an international network of wellinstrumented sites has been identified, covering the major climate classes of seasonal snow, each of which poses unique challenges for the parameterization of snow related processes (see analysis strategy below).
Although Land-Hist is not a formal component of the DECK simulations which form the core of CMIP6 (see Fig. 3), the WCRP Working Group on Climate Modeling (WGCM) recognized the importance of these land-only experiments for the process of model development and benchmarking. A future implementation of a full or subset of this historical run is proposed to become part of the DECK in future CMIP exercises and is included as a Tier 1 experiment in LS3MIP. Land surface model output from this subset of LMIP will also be used as boundary condition in some of the coupled climate model simulations, described below.

Historical simulations with alternative forcings
Additional Tier 2 experiments are solicited where the experimental setup is similar to the Tier 1 simulations, but using 3 alternative meteorological forcing data sets that differ from GSWP3: the Princeton forcing (Sheffield et al., 2006), WFD and WFDEI combined (allowing for offsets as needed; Weedon et al., 2014) and the CRU-NCEP forcing (Wei et al., 2014) used in TRENDY (Sitch et al., 2015. These Tier 2 experiments cover the period 1901-2014. The model outputs will allow assessment of the sensitivity of land-only simulations to uncertainties in forcing data. Differences in the outputs compared to the primary runs with the GSWP3 forcing will help in understanding simulation sensitivity to the selection of forcing data sets. Kim (2010) utilized a similarity index ( ; Koster et al., 2000) to estimate the uncertainty derived from an ensemble of precipitation observation data sets relative to the uncertainty from an ensemble of model simulations for evapotranspiration and runoff. The joint utilization of common monthly observations by the various forcing data sets leads to a high value of when evaluated using monthly mean values. However, evaluation of data set consistency of monthly variance leads to much larger disparities and considerably lower values of (Fig. 6). This uncertainty will propagate differently to other hydrological variables, such as runoff or evapotranspiration (Kim, 2010).

Climate change impact assessment: Land-Future
A set of future land-only time slice simulations (2015-2100) will be generated via forcing data obtained from at least 2 future climate scenarios from the ScenarioMIP (O'Neill et al., 2016) and will be executed at a later stage during CMIP6. Tentatively, Shared Socioeconomic Pathway SSP5-8.5 and SSP4-3.7 9 will be selected, run by 3 model realizations each. The models will be chosen based on the evaluation of the results from the Historical simulations from the CMIP6 Nucleus in order to represent the ensemble spread efficiently and reliably (Evans et al., 2013). To generate a set of ensemble forcing data for the future, a trend preserving statistical bias correction method will be applied to the 3-hourly surface meteorology variables (Table A4) from the scenario output (Hempel et al., 2013;Watanabe et al., 2014). Gridded forcings will be provided in a similar data format as the historical simulations.
Land-Future is a Tier 2 experiment in LS3MIP and focuses on assessment of climate change impact (e.g., shifts of the occurrence of critical water availability due to changing statistical distributions of extreme events) and on the assessment of the land surface analogue of climate sensitivity for various key land variables (Perket et al., 2014;Flanner et al., 2011).

Prescribed land surface states in coupled models
for land surface feedback assessment ("Land Feedback MIP", LFMIP) Land surface processes do not act in isolation in the climate system. A tight coupling with the overlying atmosphere takes place on multiple temporal and spatial scales. A systematic 9 https://cmip.ucar.edu/scenario-mip/experimental-protocols assessment of the strength and spatial structure of land surface interaction at subcontinental, seasonal timescales has been performed with the initial GLACE setup (GLACE1 and GLACE2 experiments; Koster et al., 2004) in which essentially the spread in an ensemble simulation of a coupled land-atmosphere model was compared to a model configuration in which the land-atmosphere interaction was greatly bypassed by prescribing soil conditions throughout the simulation in all members of the ensemble. Examination of the significance of land-atmosphere feedbacks at the centennial climate timescale was later explored at the regional scale in a single-model study (Seneviratne et al., 2006) and on global scale in the GLACE-CMIP5 experiment in a small model ensemble . A protocol very similar to the design of GLACE-CMIP5 is followed in LFMIP. Parallel to a set of reference simulations taken from the CMIP6 DECK, a set of forced experiments is carried out where land surface states are prescribed from or nudged towards predescribed fields derived from coupled simulations. The land surface states are prescribed or nudged at a daily timescale. This setup is similar to the Flux Anomaly Forced MIP (FAFMIP, Gregory et al., 2016), where the role of ocean-atmosphere interaction at climate timescales is diagnosed by idealized surface perturbation experiments.
While earlier experiments used model configurations with prescribed SST and sea ice conditions, the Tier 1 experiment in LFMIP will be based on coupled atmosphere-ocean global climate model (AOGCM) simulations and comprise simulations for a historical  and future (2015-2100) time range. The selection of the future scenario (from the ScenarioMIP experiment) will be based on the choices made in the offline LMIP experiment (see above). In GLACE-CMIP5 only soil moisture states were prescribed in the forced experiments. The configuration of the particular land surface models may introduce the need to make different selections of land surface states to be prescribed, for instance to avoid strong inconsistencies in the case of frozen ground (soil moisture rather than soil water state should be prescribed; M. Hauser, ETH Zurich, personal communication, 2015), melting snow or growing vegetation. Prescribing surface soil moisture only (experiment "S" in Koster et al., 2006) gave unrealistic values of the surface Bowen ratio. A standardization of this selection is difficult as the implementation and consequences may be highly model specific. Here we recommend to prescribe only the water reservoirs (soil moisture, snow mass). The disparity of possible implementations is adding to the uncertainty range generated by the model ensemble, similar to the degree to which implementation of land use, flux corrections or downscaling adds to this uncertainty range. Participating modeling groups are encouraged to apply various test simulations focusing both on technical feasibility and experimental impact to evaluate different procedures to prescribe land surface conditions.
The earlier experience with GLACE-type experiments has revealed a number of technical and scientific issues. Because in most GCMs the land surface module is an integral part of the code describing the atmosphere, prescribing land surface dynamics requires a non-conventional technical interface, reading and replacing variables throughout the entire simulations. Many LS3MIP participants have participated earlier in GLACE-type experiments, but for some the code adjustments will require a technical effort. Interpretation of the effect of the variety of implementations of prescribed land surface variables by the different modeling groups (see above) is helped by a careful documentation of the way the modeling groups have implemented this interface. Tight coordination and frequent exchange among the participating modeling groups on the technical modalities of the implementation of the required forcing methods will be ensured during the preparatory phase of LS3MIP in order to maximize the coherence of the modeling exercise and to facilitate the interpretation of the results.
By design, the prescribed land surface experiments do not fully conserve water and energy, similar to the setup of the Atmospheric Model Intercomparison Project (AMIP), nudged and data assimilation experiments. A systematic addition or removal of water or energy can even emerge as a result of asymmetric land surface responses to dry and to wet conditions, e.g., when surface evaporation or runoff depend strongly non-linearly to soil moisture or snow states (e.g., Jaeger and Seneviratne, 2011). Also, unrepresented processes (such as water extraction for irrigation or exchange with the groundwater) may lead to imbalances in the budget (Wada et al., 2012). This systematic alteration of the water and energy balance may not only perturb the simulation of present-day climate (e.g., Douville, 2003;Douville et al., 2016) but may also interact with the projected climate change signal, where altered climatological soil conditions can contribute to the climate change induced temperature or precipitation signal or water imbalances can lead to imposed runoff changes that could affect ocean circulation and SSTs. Earlier GLACE-type experiments revealed that the problems of water conversion are often reduced when prescribed soil water conditions are taken as the median rather than the mean of a sample over which a climatological mean is calculated (Hauser et al., 2016). In the analyses of the experiments this asymmetry and lack of energy/water balance closure will be examined and put in context of the climatological energy and water balance and its climatic trends.
To be able to best quantify the forcing that prescribing the land surface state represents, the increments of both snow and soil moisture imposed as a consequence of this prescription are required as an additional output. This will enable us to estimate the amplitude of implicit water and energy fluxes imposed by the forcing procedure.
Complementary experiments following an almost identical setup as LFMIP, but limiting the prescription of land surface variables to snow-related variables and thus leaving soil moisture free-running, are carried out in the framework of the ESM-SnowMIP carried out within the WCRP Grand Challenge "Melting Ice and Global Consequences" 10 . ESM-SnowMIP being tightly linked to LS3MIP, these complementary experiments will allow separating effects of soil moisture and snow feedbacks.

Tier 1 experiments in LFMIP
Similar to the setup of GLACE-CMIP5 , the core experiments of LFMIP (tier 1) evaluate two different sets of prescribed land surface conditions (snow and soil moisture): -LFMIP-pdLC: the experiments comprise transient coupled atmosphere-ocean simulations in which a selection of land surface characteristics is prescribed rather than interactively calculated in the model. This "climatological" land surface forcing is calculated as the mean annual cycle in the period 1980-2014 from the historical GCM simulations. The experiment aims at diagnosing the role of land-atmosphere feedback at the climate timescale. Seneviratne et al. (2013) found a substantial effect of changes in climatological soil moisture on projected temperature change in a future climate, both for seasonal mean and daytime extreme temperature in summer. Effects on precipitation are less clear, and the multi-model nature of LS3MIP is designed to sharpen these quantitative effects. Also, LS3MIP will take a potential damping (or amplifying) effect of oceanic responses on altered land surface conditions into account, in contrast to GLACE-CMIP5. Experiments using this setup (i.e., coupled ocean) in a single-model study have shown that the results could be slightly affected by the inclusion of an interactive ocean, although the effects were not found to be large overall ).
-LFMIP-rmLC: a prescribed climatology using a transient 30-yr running mean, where a comparison to the standard CMIP6 runs allows diagnosis of shifts in the regions of strong land-atmosphere coupling as recorded by e.g., Seneviratne et al. (2006), and shifts in potential predictability related to land surface states .
Both sets of simulations cover the historical period (1850-2014) and extend to 2100, based on a forcing scenario to be identified at a later stage. The procedure to initialize the land surface states in the ensemble members is left to the participant, but should allow to generate sufficient spread that can be considered representative for the climate system under study Koster et al. (2006) proposed a preference hierarchy of methods depending on the availability of initialization fields, and LS3MIP will follow this proposal. Output in high temporal resolution (daily, as well as subdaily for some fields and time slices) is required to address the role of land surface-climate feedbacks on climate extremes over land.
Multi-member experiments are encouraged, but the mandatory tier 1 simulations are limited to one realization for each of the two prescribed land surface time series described above.

Tier 2 experiments in LFMIP
To analyze a number of additional features of landatmosphere feedbacks, a collection of tier 2 simulations is proposed in LS3MIP.
-Simulations with observed SST -The AOGCM simulations from Tier 1 are duplicated with a prescribed SST configuration taken from the AMIP runs in the DECK atmospheric global climate model (AGCM), in order to isolate the role of the ocean in propagating and damping/reinforcing land surface responses on climate (Koster et al., 2000). Both the historic and running mean land surface simulations are requested (LFMIP-pdLC + SST and -rmLC + SST, respectively).
-Simulations with observed SST and Land-Hist output -A "pseudo-observed boundary condition" set of experiments use the AMIP SSTs and the Land-Hist land boundary conditions generated by the land surface model used in the participating ESM, leading to simulations driven by surface fields that are strongly controlled by observed forcings. This will only cover the historic period (1901-2014) (LFMIP-PObs + SST). For this the land-only simulations in LMIP need to be interpolated to the native GCM grid, preserving land-sea boundaries and other characteristics.
-Separate effects of soil moisture and snow, and role of additional land parameters and variables -Additional experiments, in which only snow, snow albedo or soil moisture is prescribed will be conducted to assess the respective feedbacks in isolation, and have control on possible interactions between snow cover and soil moisture content. Also vegetation parameters and variables (e.g., leaf area index, canopy height and thickness) are considered. These experiments are not listed in Table 1, but will be detailed in a follow-up protocol to be defined later.
-Fixed land use conditions -In conjunction with the Land Use MIP (LUMIP), a repetition of the Tier 1 experiment under fixed 1850 land cover and land use conditions highlights the role of soil moisture in modulating the climate response to land cover and land use (not listed in Table 1).

Prescribed land surface states derived from pseudo-observations (LFMIP-Pobs)
The use of LMIP (land-only simulations) to initialize the AOGCM experiments (LFMIP) allows a set of predictability experiments in line with the GLACE2 setup (Koster et al., 2010a). The LFMIP-Pobs experiment is an extension to GLACE2 by (a) allowing more models to participate, (b) improving the statistics by extending the original 1986-1995 record to 1980-2014, (c) evaluating the quality of newly available land surface forcings and (d) executing the experiments in AOGCM mode. Koster et al. (2010a) and van den Hurk et al. (2012) concluded that the forecast skill improvement from models using initial soil moisture conditions was relatively low. Possible causes for this low skill are the limited record length and limited quality of the (precipitation) observations used to generate the soil conditions. These issues are explicitly addressed in LFMIP-Pobs. All LFMIP-Pobs experiments are Tier 2, which also gives room for additional model design elements such as the evaluation of various observational data sources (such as for snow mass (Snow Water Equivalent; SWE) or snow albedo, using satellites derived, reanalysis and land surface model outputs). The predictability assessments include the evaluation of the contribution of snow cover melting and its related feedbacks to the underestimation of recent boreal polar warming by climate models.
The experimental protocol (number of simulations years, ensemble size, initialization, model configuration, output diagnostics) has a strong impact on the results of the experiment (e.g., Guo and Dirmeyer, 2013). This careful design of the LFMIP-Pobs experiment needed for a successful implementation has currently not yet taken place. Therefore these experiments are listed as Tier 2 in Table 1, with the comment that the detailed experimental protocol still needs to be defined.

Analysis strategy
LS3MIP is designed to push the land surface component of climate models, observational data sets and projections to a higher level of maturity. Understanding the propagation of model and forecast errors and the design of model parameterizations is essential to realize this goal. The LS3MIP steering group is a multi-disciplinary team (climate modelers, snow and soil moisture model specialists, experts in local and remotely sensed data of soil moisture and snow properties) that ensures that the experiment setups, model evaluations and analyses/interpretations of the results are pertinent.
For both snow and soil moisture the starting point will be a careful analysis of model results from on the one hand (a) the DECK historic simulations (both the AMIP and the historical coupled simulation) and (b) on the other hand the (offline) LMIP historical simulations.
For the evaluation of snow representation in the models, large-scale high-quality data sets of snow mass (SWE) and snow cover extent (SCE) with quantitative uncertainty characteristics will be provided by the Satellite Snow Product Intercomparison and Evaluation Experiment (SnowPEX 11 ). Analysis within SnowPEX is providing the first evaluation of satellite derived snow extent (15 participating data sets) and SWE derived from satellite measurements, land surface assimilation systems, physical snow models and reanalyses (7 participating data sets). Internal consistency between products, and bias relative to independent reference data sets are being derived based on standardized and consistent protocols. The evaluation of variability and trends in terrestrial snow cover extent and mass was examined previously for CMIP3 and CMIP5 by e.g., Brown and Mote (2009), Derksen and Brown (2012) and Brutel-Vuilmet et al. (2013). While these assessments were based on single observational data sets, and hence provide no perspective on observational uncertainty and spread relative to multi-model ensembles, standardized multi-source data sets generated by SnowPEX will allow assessment using a multi-data-set observational ensemble (e.g., Mudryk et al., 2015). For snow albedo, multiple satellite-derived data sets are available, including 16day MODIS 12 data from 2001-present, the ESA GlobAlbedo product 13 , the recently updated twice-daily APP-x 14 product , and a derivation of the snow shortwave radiative effect from 2001-2013 (Singh et al., 2015). Satellite retrievals of snow cover fraction in forested and mountainous areas is an ongoing area of uncertainty which influences the essential diagnostics related to climate sensitivity of snow cover (Thackeray et al., 2015), feeding into essential diagnostics related to climate sensitivity of snow cover (Qu and Hall, 2014;Fletcher et al., 2012).
In the case of soil moisture, land hydrology and vegetation state, several observations-based data sets will be used in the evaluation of the coupled DECK simulations and offline Land experiments. Data considered will include the first multidecadal satellite-based global soil moisture record (Essential Climate Variable Soil Moisture ECVSM) Dorigo et al., 2012Dorigo et al., ), long-term (2002Dorigo et al., -2015 records of terrestrial water storage from the GRACE satellite (Rodell et al., 2009;Reager et al., 2016;Kim et al., 2009), the multi-product LandFlux-EVAL evapotranspiration synthesis (Mueller et al., 2013), multi-decadal satellite retrievals of the Fraction of Photosynthetically Absorbed Radiation (FPAR, e.g., Zscheischler et al., 2015), and upscaled Fluxnet based products (Jung et al., 2010).
Several details of snow and soil moisture dynamical processes can be indirectly inferred through the analysis of river discharge (Orth et al., 2013;Zampieri et al., 2015). Variables simulated by the routing schemes included in the land surface models can be compared with the station data available from the Global Runoff Database (GRDC 15 ). Combined use of in situ discharge observations and terrestrial water storage changes observed by GRACE will verify how the land surface simulations partition the terms in the water balance equation (i.e., precipitation, evapotranspiration, runoff and water storage changes) (Kim et al., 2009).
The coupled LS3MIP (LFMIP) simulations will be analyzed in concert with the control runs to quantify various climatic effects of snow and soil moisture, detect systematic biases and diagnose feedbacks. Anticipated analyses include the following.
-Drivers of variability at multiple timescales -Comparison of simulations with prescribed soil moisture and snow (LFMIP-pdLC) allows quantification of the impact of land surface state variability on variability of climate variables such as temperature, relative humidity, cloudiness, precipitation and river discharge at several timescales. The LFMIP-rmLC simulation allows evaluation of this contribution on seasonal timescales, and changes of patterns of high/low land surface impact in a future climate. In particular, a focus will be put on impacts on climate extremes (temperature extremes, heavy precipitation events, see e.g., Seneviratne et al., 2013) and the possible role of land-based feedbacks in amplifying regional climate responses compared to changes in global mean temperature . A secondary focus will be on the impacts of snow and soil moisture variability on the extremes of river discharge, which can be related to large-scale floods and to nonlocal propagation of drought signals. These aspects will be analyzed in the context of water management and to quantify feedbacks of river discharge to the climate system (through the discharge in the oceans, Materia et al., 2012;Carmack et al., 2015) and to the carbon cycle (through the methane produced in flooded areas, Meng et al., 2015).
-Attribution of model disagreement -The multi-model set up of the experiment allows closer inspection of the effects of modeled soil moisture and snow (and related processes such as plant transpiration, photosynthesis, or snowmelt) on calculated land temperature, precipitation, runoff, vegetation state, and gross primary production. The comparison of LFMIP-pdLC and LFMIP-rmLC will be useful to isolate model disagreement in land surface feedbacks potentially induced by including coupling to a dynamic ocean despite similar land response to climate change.
-Emergent constraints -While the annual cycle of snow cover and local temperature (Qu and Hall, 2014), and the relation between global mean temperature fluctuations and CO 2 -concentration  provide observational constraints on snow-albedo and carbonclimate feedback, respectively, similar emergent constraints may be defined to constrain (regional) soil moisture or snow related feedbacks with temperature or hydrological processes such as, for instance, the timing of spring onset which may be related to snowmelt, spring river discharge (Zampieri et al., 2015) and vegetation phenology (Xu et al., 2013). Use of appropriate observations and diagnostics as emergent constraints will reduce uncertainties in projections of mean climate and extremes (heat extremes, droughts, floods) (Hoffman et al., 2014). The analysis of amplitude and timing of seasonality of hydrological and ecosystem processes will provide additional diagnostics.
-Attribution of model bias -A positive relationship between model temperature bias in the current climate, and (regional) climate response can partly be attributed to the soil moisture-climate feedback, which acts on both the seasonal and climate timescale (Cheruy et al., 2014). A multi-model assessment of this relationship is enabled via LS3MIP. The comparison of AMIP-DECK, LFMIP-CA and LFMIP-LCA will be used to assess the impact of atmospheric-related errors in land boundary conditions on the AGCM biases.
-Changes in feedback hotspots and predictability patterns -Land surface conditions don't exert uniform influence on the atmosphere in all areas of the globe: a distribution of strong interaction "hotspots" and areas of high potential predictability contributions from the land surface exists (e.g., Koster et al., 2004). These patterns may change in a future climate (e.g., Seneviratne et al., 2006). A multi-model assessment such as the one foreseen in LS3MIP allows mapping changes in these patterns, with implications for the occurrence of droughts, heat waves, irrigation limitations or river discharge anomalies and their predictability .
-Snow shortwave radiative effect analysis -The snow shortwave radiative effect (SSRE) can be diagnosed through parallel calculations of surface albedo and shortwave fluxes with and without model snow on the ground or in the vegetation canopy (Perket et al., 2014). This metric provides a precise, overarching measure of the snow-induced perturbation to solar absorption in each model, integrating over the variable influences of vegetation masking, snow grain size, snow cover fraction, soot content, etc. SSRE is analogous to the widely used cloud radiative effect diagnostic, and its time evolution provides a measure of snow albedo feedback in the context of changing climate (Flanner et al., 2011). We recommend that the diagnostic snow shortwave radiative effect (SSRE) calculation be implemented in standard LS3MIP simulations (Tiers 1 and 2). This will enable us to evaluate the integrated effect of model snow cover on surface radiative fluxes.
-Complementary snow-related offline experiments -Additional offline experiments are enabled by the provision of a collection of localized forcing data in the Land-Hist experiment (see above). For snow, a network of well-equipped sites is analyzed in detail for characteristic features (for example, snow-vegetation interactions for taiga snow; wind-driven processes for tundra snow; snow-rain partitioning for maritime snow). Reference simulations at these sites, consistent with previous SnowMIP experiments (Essery et al., 2009), will be complemented by additional experiments with (1) a fixed snow albedo; and (2) the insulative properties of snow removed in order to isolate the contributions of snow to the surface energy budget and ground thermal regime. This will be implemented within the ESM-SnowMIP 16 initiative, aimed at improving our understanding of sources of coupled model biases (global offline and site scale experiments) in order to identify priority avenues for future model development.
Regarding the snow analyses, the initial geographical focus of LS3MIP is on the continental snow cover of both hemispheres, both in ice-free areas (Northern Eurasia and North America) and on the large ice sheets (Greenland and Antarctica). Effects of snow on sea ice and the quality of the representation of snow on sea ice in climate models will be explored later, but they are of interest because of strong recent trends of Arctic sea ice decline and the potential amplifying effect of earlier spring snow melt over land. For soil moisture, the geographical focus is on all land areas, with special interest in agricultural locations with strong land-atmosphere interaction (transition zones between wet and dry climates), extensive irrigation areas, and high interannual variability of warm season climate in densely populated areas.
The analyses are carried out on a standardized model output data set. A summary of the requested output data is given in tables in the Appendix.

Time line, participating models and interaction strategy
The offline land surface experiments (Land-Hist) are expected to be completed in early 2017. Future time slices can only be performed when the Scenario-MIP results become available. All coupled LS3MIP simulations and their subsequent analyses will be timed after the completion of the DECK and historical 20th century simulations, expected by mid-2017. Table 2 lists the participating Earth system modeling groups. The organizational structure of LS3MIP relies on active participation of modeling groups. Coordination structures are in place for the collection and dissemination of data and model results , and for the organization of meetings and seminars (by the core team members of LS3MIP, first six authors of this manuscript). Different from earlier experiments such as GSWP2 and GLACE1/2, no central "analysis group" is put in place that is responsible for the analyses as proposed in this manuscript. The execution and publication of analyses is considered to be a community ef-fort of participating researchers, in order to avoid duplication of efforts and coordinate the production of scientific papers.
6 Discussion: expected outcome and impact of LS3MIP The treatment of the land surface in the current generation of climate models plays a critical role in the assessment of potential effects of widespread changes in radiative forcing, land use and biogeochemical cycles. The land surface both "receives" climatic variations (by its atmospheric forcing) and "returns" these variations as feedbacks or land surface features that are of high relevance to the people living on it. The strong coupling between land surface, atmosphere, hydrosphere and cryosphere makes an analysis of its performance characteristics challenging: the response and the state of the land surface strongly depend on the climatological context, and metrics of interactions or feedbacks, which are all difficult to define and observe (van den Hurk et al., 2011). LS3MIP addresses these challenges by enhancing earlier diagnostic studies and experimental designs. Within the limits to which complex models such as ESMs can be evaluated with currently available observational evidence (see e.g., the interesting philosophical discussion on climate model evaluation by Lenhard and Winsberg, 2010) it will lead to enhanced understanding of the contribution of land surface treatment to overall climate model performance; give inspiration on how to optimize land surface parameterizations or their forcing; support the development of better forecasting tools, where initial conditions affect the trajectory of the forecast and can be used to optimize forecast skill; and, last but not least, provide a better historical picture of the evolution of our vital water resources during the recent century. In particular, LS3MIP will provide a solid benchmark for assessing water and climate related risks and trends therein. Given the critical importance of changes in land water availability and of impacts of changes in snow, soil moisture and land surface states for the projected evolution of climate mean and extremes, we expect that LS3MIP will help the research community make fundamental advances in this area.

Data availability
The offline forcing data for the Land-Hist experiments and output from the model simulations described in this paper will be distributed through the Earth System Grid Federation (ESGF) with digital object identifiers (DOIs) assigned. The model output required for LS3MIP is listed in the Appendix. Model data distributed via ESGF will be freely accessible through data portals after registration. This infrastructure makes it possible to carry out the experiments in a distributed matter, and to allow later participation of additional modeling groups. Links to all forcings data sets will be made available via the CMIP Panel website 17 . Information about accreditation, data infrastructure, metadata structure, citation and acknowledging is provided by .
B. van den Hurk et al.: LS3MIP (v1.0) contribution to CMIP6 Appendix A: Output data tables requested for LS3MIP Table A1. Variable request table "LEday": daily variables related to the energy cycle. Priority index (p * ) in column 1 indicates 1: "Mandatory" and 2: "Desirable". The dimension (dim.) column indicates T : time, Y : latitude, X: longitude, and Z: soil or snow layers. "Direction" identifies the direction of positive numbers.