Journal cover Journal topic
Geoscientific Model Development An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

Journal metrics

  • IF value: 4.252 IF 4.252
  • IF 5-year value: 4.890 IF 5-year 4.890
  • CiteScore value: 4.49 CiteScore 4.49
  • SNIP value: 1.539 SNIP 1.539
  • SJR value: 2.404 SJR 2.404
  • IPP value: 4.28 IPP 4.28
  • h5-index value: 40 h5-index 40
  • Scimago H index value: 51 Scimago H index 51
Volume 9, issue 12 | Copyright
Geosci. Model Dev., 9, 4381-4403, 2016
https://doi.org/10.5194/gmd-9-4381-2016
© Author(s) 2016. This work is distributed under
the Creative Commons Attribution 3.0 License.

Development and technical paper 07 Dec 2016

Development and technical paper | 07 Dec 2016

Evaluating lossy data compression on climate simulation data within a large ensemble

Allison H. Baker1, Dorit M. Hammerling1, Sheri A. Mickelson1, Haiying Xu1, Martin B. Stolpe2, Phillipe Naveau3, Ben Sanderson1, Imme Ebert-Uphoff4, Savini Samarasinghe4, Francesco De Simone5, Francesco Carbone5, Christian N. Gencarelli5, John M. Dennis1, Jennifer E. Kay6, and Peter Lindstrom7 Allison H. Baker et al.
  • 1The National Center for Atmospheric Research, Boulder, CO, USA
  • 2Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland
  • 3Laboratoire des Sciences du Climat et l'Environnement, Gif-sur-Yvette, France
  • 4Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO, USA
  • 5CNR-Institute of Atmospheric Pollution Research, Division of Rende, UNICAL-Polifunzionale, Rende, Italy
  • 6Department of Oceanic and Atmospheric Sciences, University of Colorado, Boulder, CO, USA
  • 7Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA

Abstract. High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.

Publications Copernicus
Download
Short summary
We apply lossy data compression to output from the Community Earth System Model Large Ensemble Community Project. We challenge climate scientists to examine features of the data relevant to their interests and identify which of the ensemble members have been compressed, and we perform direct comparisons on features critical to climate science. We find that applying lossy data compression to climate model data effectively reduces data volumes with minimal effect on scientific results.
We apply lossy data compression to output from the Community Earth System Model Large Ensemble...
Citation
Share