Articles | Volume 12, issue 9
https://doi.org/10.5194/gmd-12-4099-2019
https://doi.org/10.5194/gmd-12-4099-2019
Development and technical paper
 | 
23 Sep 2019
Development and technical paper |  | 23 Sep 2019

Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files

Xavier Delaunay, Aurélie Courtois, and Flavien Gouillon

Related authors

A Comparison of Lossless Compression Algorithms for Altimeter Data
Mathieu Thevenin, Stephane Pigoury, Olivier Thomine, and Flavien Gouillon
EGUsphere, https://doi.org/10.5194/egusphere-2022-1094,https://doi.org/10.5194/egusphere-2022-1094, 2022
Preprint archived
Short summary
A Parquet Cube alternative to store gridded data for data analytics and modeling
Jean-Michel Zigna, Reda Semlal, Flavien Gouillon, Ethan Davis, Elisabeth Lambert, Frédéric Briol, Romain Prod-Homme, Sean Arms, and Lionel Zawadzki
Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2021-138,https://doi.org/10.5194/gmd-2021-138, 2021
Preprint withdrawn
Short summary

Related subject area

Numerical methods
Implementation and application of ensemble optimal interpolation on an operational chemistry weather model for improving PM2.5 and visibility predictions
Siting Li, Ping Wang, Hong Wang, Yue Peng, Zhaodong Liu, Wenjie Zhang, Hongli Liu, Yaqiang Wang, Huizheng Che, and Xiaoye Zhang
Geosci. Model Dev., 16, 4171–4191, https://doi.org/10.5194/gmd-16-4171-2023,https://doi.org/10.5194/gmd-16-4171-2023, 2023
Short summary
A dynamical core based on a discontinuous Galerkin method for higher-order finite-element sea ice modeling
Thomas Richter, Véronique Dansereau, Christian Lessig, and Piotr Minakowski
Geosci. Model Dev., 16, 3907–3926, https://doi.org/10.5194/gmd-16-3907-2023,https://doi.org/10.5194/gmd-16-3907-2023, 2023
Short summary
GStatSim V1.0: a Python package for geostatistical interpolation and conditional simulation
Emma J. MacKie, Michael Field, Lijing Wang, Zhen Yin, Nathan Schoedl, Matthew Hibbs, and Allan Zhang
Geosci. Model Dev., 16, 3765–3783, https://doi.org/10.5194/gmd-16-3765-2023,https://doi.org/10.5194/gmd-16-3765-2023, 2023
Short summary
Leveraging Google's Tensor Processing Units for tsunami-risk mitigation planning in the Pacific Northwest and beyond
Ian Madden, Simone Marras, and Jenny Suckale
Geosci. Model Dev., 16, 3479–3500, https://doi.org/10.5194/gmd-16-3479-2023,https://doi.org/10.5194/gmd-16-3479-2023, 2023
Short summary
An improved subgrid channel model with upwind-form artificial diffusion for river hydrodynamics and floodplain inundation simulation
Youtong Rong, Paul Bates, and Jeffrey Neal
Geosci. Model Dev., 16, 3291–3311, https://doi.org/10.5194/gmd-16-3291-2023,https://doi.org/10.5194/gmd-16-3291-2023, 2023
Short summary

Cited articles

Baker, A. H., Hammerling, D. M., Mickelson, S. A., Xu, H., Stolpe, M. B., Naveau, P., Sanderson, B., Ebert-Uphoff, I., Samarasinghe, S., De Simone, F., Carbone, F., Gencarelli, C. N., Dennis, J. M., Kay, J. E., and Lindstrom, P.: Evaluating lossy data compression on climate simulation data within a large ensemble, Geosci. Model Dev., 9, 4381–4403, https://doi.org/10.5194/gmd-9-4381-2016, 2016. 
Caron, J.: Compression by Scaling and Offset, available at: http://www.unidata.ucar.edu/blogs/developer/en/entry/compression_by_scaling_and_offfset (last access: 27 September 2018), 2014a. 
Caron, J.: Compression by bit shaving, available at: http://www.unidata.ucar.edu/blogs/developer/entry/compression_by_bit_shaving (last access: 27 September 2018), 2014b. 
Collet, Y.: LZ4 lossless compression algorithm, available at: http://lz4.org (last access: 27 September 2018), 2013. 
Collet, Y. and Turner, C.: Smaller and faster data compression with Zstandard, available at: https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/ (last access: 27 September 2018), 2016. 
Download
Short summary
This research aimed at finding a compression method suitable for the ground processing of CFOSAT and SWOT satellite datasets. Lossless algorithms did not allow enough compression. That is why we began studying lossy alternatives. This work introduces the digit rounding algorithm which reduces the volume of scientific datasets keeping only the significant digits in each sample value. The number of digits kept is relative to each sample so that both small and high values are similarly preserved.