Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+)

. Geoscientiﬁc models and measurements generate false precision (scientiﬁcally meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and be scientiﬁcally pointless, especially for measurements. By contrast, lossy compression can be both economical (save space) and heuristic (clarify data limitations) without compromising the scientiﬁc integrity of data. Data quantization can thus be appropriate regardless of whether space limitations are a concern. We introduce, implement, and characterize a new lossy compression scheme suitable for IEEE ﬂoating-point data. Our new Bit Grooming algorithm alternately shaves (to zero) and sets (to one) the least signiﬁcant bits of consecutive values to preserve a desired precision. This is a symmetric, two-sided variant of an algorithm sometimes called Bit Shaving that quantizes values solely by zeroing bits. Our variation eliminates the artiﬁ-cial low bias produced by always zeroing bits, and makes Bit Grooming more suitable for arrays and multi-dimensional ﬁelds whose mean statistics are important. Bit Grooming relies on standard lossless compression to achieve the actual reduction in storage space, so we tested Bit Grooming by applying the


Introduction
The increased resolution of geoscientific models and measurements (GSMMs) leads to increases in dataset size that outpace improvements in both accuracy (nearness to true values) and precision (degree of repeatability).Numerical precision that exceeds true or assumed knowledge of the underlying phenomena is called false precision and a significant fraction of GSMM storage bits archive this false precision as essentially random (and therefore hard to compress) bits that lack scientific content.Lossy compression techniques can reduce storage requirements without sacrificing scientific content by eliminating unused ranges and/or false precision of stored fields.We introduce a new algorithm, Bit Grooming, that preserves a specified level of precision, is statistically unbiased, retains the full representable range of floating-point data, and yet requires no additional software tools or filters to read or write.
For measurements there is never a scientific reason to retain false precision, as it amounts to storing random bits.Rea-sons to retain false precision during prognostic integrations of geoscientific models include numerical stability, conservation checks (e.g., mass, energy, momentum), and correct treatment of threshold and resonance phenomena.There are fewer reasons to retain false precision after than during a simulation.Most GSMMs store their data as either four-or eightbyte IEEE floating-point numbers.IEEE single-precision (SP, four-byte) and double-precision (DP, eight-byte) formats (IEEE, 2008) represent 6 and 15 decimal digits of precision, respectively.Even SP often exceeds the precision to which the data are trusted.Lossy data compression can exploit the gap between the precision representable by the data type (SP or DP) and the precision associated with the values to be stored.
Data compression is well-studied (e.g., Sayood, 2003;Salomon and Molta, 2010) and before attempting lossy data compression, most researchers will check whether lossless data compression adequately serves their needs.Widely used lossless algorithms are embedded in ubiquitous (and free and patent-unencumbered) tools such as gzip/zlib (Gailly and Adler, 2000), bzip2 (Seward, 2007), and lz4 (Collet, 2013).These tools operate on generic byte steams.Special-purpose lossless compressors designed for scientific data can exploit the four-byte or eight-byte structure of floating-point data (e.g., Isenburg et al., 2005;Burtscher and Ratanaworabhan, 2009).Temporal and/or spatial correlations in GSMM data with large-scale patterns (e.g., climate data) can further enhance lossless compression (Liu et al., 2014).
The compression ratios of lossless techniques are limited by the need to recover the exact data compressed.Lossy compression (also called quantization) relaxes this requirement and can "trade off" precision for compression.Losses acceptable with some forms of data can only be determined subjectively, as for example the quality of photographic images.In contrast, researchers can, at least in principle, know a priori the scientifically defensible precision of GSMMs.False precision can mislead (by implying noise is signal) and be scientifically pointless, especially for measurements.By contrast, lossy compression can be both economical (save space) and heuristic (clarify data limitations).Data quantization can thus be appropriate regardless of whether space limitations are a concern.Thus after presenting our quantitative results, we describe techniques that make Bit Grooming simple and practical.
This paper is organized into four more sections.Section 2 describes the lossy and lossless compression algorithms that this paper will intercompare.Section 3 defines the comparison metrics and evaluates the statistical properties and compression ratios of Bit Grooming.Section 4 discusses implementation features of all lossy and lossless compression algorithms in NCO, with particular focus on Bit Grooming.Section 5 summarizes our conclusions.

Methods
A primary motivation in developing Bit Grooming is to reduce the storage of climate-related datasets.We implemented and tested Bit Grooming in the netCDF Operators, NCO (Zender and Mangalam, 2007;Zender, 2008), a freely available suite of tools for manipulating data stored in the netCDF and HDF formats (Rew et al., 2006;HDF Group, 2015) that are widely used in the geosciences for both modeled and satellite-measured data.NCO implements or accesses four different compression algorithms; one is lossless and three are lossy.All four algorithms reduce the on-disk size of a dataset while sacrificing no (lossless) or a specified amount (lossy) of precision.
First, NCO can read and write data encoded with the (lossless) DEFLATE algorithm (Deutsch, 1996) accessible to both netCDF4 and HDF5 (Rew et al., 2006;HDF Group, 2015).DEFLATE is a widely used, freely available, and efficient compression technique that combines Lempel-Ziv compression (Ziv andLempel, 1977, 1978) with Huffman coding.It identifies patterns at the bit level and always identifies, encodes, and compresses space freed by the simple Bit Shaving (setting to zero) and Bit Setting (to one) techniques described here.DEFLATE works equally well on Bit Grooming, which is simply an alternation between Bit Shaving and Bit Setting.Some users and many data centers manually DEFLATE and re-inflate netCDF3 files with gzip and gunzip, respectively, so DEFLATE is effectively available for all netCDF and HDF datasets.Hence our metrics will show the volume of uncompressed data, the same data (losslessly) deflated as the base case for compression, and the same data (lossily) quantized with Bit Grooming in tandem with DE-FLATE.

Packing
The three lossy compression algorithms NCO employs are packing and two precision-preserving algorithms (including Bit Grooming).Packing quantizes (usually) floatingpoint data into a lower precision type (fewer bytes per value) that represents a much smaller range.By convention netCDF defines a linear-packing algorithm that depends on two parameters (scale_factor and add_offset) (Rew et al., 2016;Caron, 2014a).Linear Packing quantizes SP and DP data into (usually) two-byte signed integers.NetCDF uses the nomenclature NC_FLOAT for SP (float32), NC_DOUBLE for DP (float64), NC_SHORT for int16, and NC_INT for int32.In netCDF nomenclature, packing converts NC_FLOATs and NC_DOUBLEs into NC_SHORTs.Since packing works at the byte level, the space saved is usually a factor of 2 (NC_FLOAT → NC_SHORT) or 4 (NC_DOUBLE → NC_SHORT), and cannot be specified at finer levels.Packed data can be (losslessly) deflated for additional space savings.
Packing floating-point data into integers has benefits and drawbacks.The type conversion frees up the IEEE754 exponent bits (8 bits for SP, and 11 bits for DP), which then contribute to the dynamic range of the packed integers (16 and 32 bits for NC_SHORT and NC_INT, respectively).However, integers have a much-reduced dynamic range relative to floating-point numbers.The dynamic ranges of SP and DP numbers are ∼ 10 37 and ∼ 10 308 , respectively, whereas data packed linearly into two-byte and four-byte integers have dynamic ranges of ∼ 10 5 and ∼ 10 10 , respectively.Variables packed as NC_SHORT, for example, can represent only about 64 000 discrete values in the range −32768 × scale_factor + add_offset to 32767×scale_factor+add_offset.The optimal add_offset parameter for Linear Packing is the midpoint of the data to be packed, and the optimal scale_factor is the data dynamic range (i.e., maximum minus minimum) divided by 2 16  − 1 = 65 535 (Zender, 2016a).Unpacked values must cluster within a dynamic range of ∼ 10 5 that may itself reside anywhere within the full (∼ 10 37 ) floating-point range.Thus archived fields that meaningfully span more than 5 orders of magnitude (equals 5 decades) are not well-suited for Linear Packing into two-byte integers.The presence of such fields depends on the GSMM.Candidates in climate models include aerosol number concentrations, pressure, solar heating rates, and (some) tracer mixing ratios.Astrophysical and stellar models span larger scales and are replete with such fields, e.g., plasma density, pressure, and thermal radiation.
Another limitation of Linear Packing is that the precision of packed data cannot be specified or guaranteed in advance because it depends on the distribution of the values to be packed.While the numeric resolution (i.e., the smallest resolvable difference) of unpacked data always equals scale_factor, the number of significant digits of precision depends on the dynamic range (maximum minus minimum) of values to quantize, and rapidly degrades beyond the first decade of unpacked values.To illustrate this, consider a pressure field p (Pa) uniformly spanning values 0.0 ≤ p ≤ 65 535.0.Linear Packing exactly represents integer values in this range and quantizes all fractional values to integers.For example, p = 1.23456Pa and p = 65 534.23456Pa would be quantized as 1 and 65 534, respectively, which have 1 and 4 significant digits (nsd = 1 and 4 in the terminology defined in Sect.2.2 below), respectively.Packing this distribution of values achieves its highest precision (4 decimal digits for two-byte integers) only for the greatest (in absolute value) unpacked value.Unpacked values of lesser magnitude lose precision at a rate of approximately 1 significant digit per decade from the maximum.Since the precision of Linear Packing degrades by about 1 digit per decade, only values within 1 decade of the maximum regularly achieve the highest possible precision (4 decimal digits for two-byte integers).This is the maximum precision that packing guarantees for an arbitrary distribution of values.
Consider the same dynamic range used previously except now offset by 10 5 (i.e., add_offset = 10 5 ); so, 100 000.0 ≤ p ≤ 165 535.0.The previously examined values, offset by 10 5 , are p = 100 001.23456Pa and p = 165 534.23456Pa.These would be quantized as 100 001 and 165 534, respectively, which both have 6 significant digits.Thus the add_offset parameter can provide additional precision to unpacked values, bringing the total precision up to 6 digits, for some but not all distributions of values.Except where otherwise indicated in this work we state the best precision that a compression algorithm guarantees for any distribution of values, not the best precision it can achieve for special distributions of values.

Precision-preserving compression
The other two lossy compression algorithms considered both perform precision-preserving compression (PPC).The operational definition of "significant digit" in our precisionpreserving algorithms is that the exact value, before rounding or quantization, is within one-half of the value of the decimal place occupied by the least significant digit (LSD) of the rounded value.For example, the value π = 3.14 correctly represents the exact mathematical constant π to 3 significant digits because the LSD of the rounded value (i.e., 4) is in the one-hundredths digit place, and the difference between the exact value and the rounded value is less than one-half of one one-hundredth, i.e., 3.14159265358979323844−3.14 = 0.00159 < 0.005.
One PPC algorithm preserves the specified total number of significant digits (NSD) of the value.For example, there is only one significant digit in the weight of most "eighthundred pound gorillas" that you will encounter, i.e., so nsd = 1.NSD is the most straightforward measure of precision, and is the default PPC algorithm in NCO.Bit Grooming combines two NSD algorithms (described below) to yield more accurate statistical properties.
The other PPC algorithm preserves the number of decimal significant digits (DSD), i.e., the number of significant digits following (positive, by convention) or preceding (negative) the decimal point.For example, 0.008 and 800 have, respectively, 3 and −2 decimal digits following the decimal point, and correspond to dsd = 3 and dsd = −2.
Their fundamental difference is that NSD is independent of dimensional units and DSD is not.The NSD for a given GSMM value depends on the intrinsic accuracy and error characteristics of the model or measurements.The appropriate DSD for a given value depends on these intrinsic characteristics and, in addition, the dimensional units with which values are stored.Our eight-hundred pound gorilla has nsd = 1 regardless of whether the value is stored in pounds or in some other unit.The DSD corresponding to this weight is dsd = −2 if the value is stored in pounds (8 × 10 2 lb), and dsd = 4 if stored in megapounds (8 × 10 −4 Mlb).

Algorithms
The time penalty for compressing and uncompressing data varies according to the algorithm.Silver and Zender (2016) show that lossless compression dominates the total compression time, and that quantization via Bit Grooming or Linear Packing can actually shorten total compression time because they reduce the amount of data to compress.At least in our implementations and for the purposes of this discussion, a number of significant digit (NSD) algorithm quantizes by bitmasking and employs no floating-point math.By contrast, a decimal significant digit (DSD) algorithm quantizes by rounding, and thus does require floating-point math.
Hence NSD is likely faster than DSD, though the difference has not been measured.NSD algorithms create a bitmask to alter the significand (mantissa or fraction) of IEEE 754 floating-point data.For instance, the bitmask for the NSD technique called Bit Shaving is one for all bits to be retained and zero for ignored bits (Caron, 2014b).The logical AND of this mask with the exact IEEE value produces the quantized IEEE value.The bitmask for the NSD technique we call Bit Setting is zero for retained bits and one for discarded bits.The logical OR of this mask with the exact IEEE value produces the quantized IEEE value.These algorithms assume that the number of binary digits (i.e., bits) necessary to represent a single base-10 digit is ln(10)/ ln(2) = 3.32.The exact numbers of explicit mantissa bits retained for single-and double-precision values are ceil(3.32× nsd) + 1 and ceil(3.32× nsd) + 2, respectively.(The IEEE format includes a single mantissa bit that is implicit and that is not included in these counts because it consumes no memory.)This is more than predicted by the simple rule that the required number of bits is nsd × ln(10)/ ln(2).The extra bits are the (experimentally determined) overhead required to guarantee that terminal significant digits are accurate to within half the minimal value of their decimal position.Once the number of bits required exceeds the IEEE SP and DP storage standards of 23 and 53 explicit mantissa bits, respectively, bitmasking is completely ineffective.This occurs at nsd = 6.3 and 15.4, respectively.To guarantee preserving 1-7 digits of precision, Bit Grooming must retain 5, 8, 11, 15, 18, 21, and 25 explicit mantissa bits, respectively.Thus Bit Grooming (and IEEE) require the DP format to guarantee nsd ≥ 7.
The DSD algorithm, by contrast, uses rounding to remove undesired precision.The rounding zeroes the greatest number of (base-2) significand bits consistent with the desired (base-10) decimal precision.Our NCO implementation rounds with the internal math library rint() family of functions that were standardized in C99.The exact algorithm NCO employs is val = rint(scale×val)/scale, where scale is the nearest power of 2 that exceeds 10 prc and the inverse of scale is used when prc < 0. For ppc = 3 or ppc = −2, for example, we have scale = 1024 and scale = 1/128.Because our DSD algorithm rounds a base-10 integer to achieve a base-10 precision, we call it the Decimal Rounding algorithm.The Decimal Rounding algorithm implemented in the nc3tonc4 software tool by J. Whitaker is distinct from but consistent with and equivalent to (though not bit-for-bit) NCO's.
Maintaining non-biased statistical properties during lossy compression requires special attention.Decimal Rounding uses rint() to round toward the nearest even integer.Thus our DSD algorithm has no systematic bias.However, NSD algorithms use a bitmask technique that is susceptible to statistical bias.Zeroing all non-significant bits is guaranteed to produce numbers quantized to the specified tolerance, i.e., half of the decimal value of the position occupied by the LSD.However, always zeroing the non-significant bits results in quantized numbers that never exceed the exact number.Thus Bit Shaving produces a negative bias in statistical quantities (e.g., the average) subsequently derived from the quantized numbers.Likewise Bit Setting produces a positive statistical bias.To avoid bias, Bit Grooming (our new NSD algorithm) rounds non-significant bits down (to zero) or up (to one) in an alternating fashion when processing array data.In general, the first element is rounded down, the second up, the third down, etc. Hence Bit Grooming can nearly eliminate the mean quantization bias.Our Bit Grooming implementation has one exception to the rule of alternately setting and shaving bits: never quantize upwards the floating-point value of zero.This exception prevents creation of quantization fluctuations in arrays of zeros.Finally, for simplicity, our implementation of Bit Grooming always rounds scalars downwards.
To demonstrate the change in IEEE representation caused by quantization, consider again the case of π , represented as an NC_FLOAT: the IEEE 754 single-precision representations of the exact value (3.141592...), the value with only 3 significant digits treated as exact (3.140000...), and the value as stored (3.140625) after NSD (prc = 3) and DSD (prc = 2) quantization (Table 1).The string of 16 trailing zero-bits in the rounded values facilitates both byte-stream and bitwise compression.NSD and DSD algorithms do not always produce results that are bit-for-bit identical, although they do in this particular case when the NSD algorithm is Bit Grooming or Bit Shaving (which are identical algorithms for a single scalar value).When the NSD algorithm is Bit Setting we obtain the fifth row where insignificant bits are set to one, not zero.
Reducing the preserved precision of NSD rounding produces increasingly long strings of identical bits amenable to compression (Table 2).
The consumption of about 3 bits per digit of base-10 precision is evident, as is the coincidence of a quantized value that greatly exceeds the mandated precision for NSD = 2.Although the NSD algorithm generally masks some bits for all nsd ≤ 6 (for NC_FLOAT), compression algorithms like DEFLATE may need byte-size-or-greater (i.e., at least eight-bit) bit patterns before their algorithms can take advan-Table 1. Exact and lossy IEEE single-precision floating-point Pi.IEEE-754 single-precision binary representations of π stored exactly, with 3 significant digits, and with three quantization algorithms.a Bit 0 is s, which the IEEE-754 format uses to encode signedness as −1 s .b Bits 1-8 form base-2 exponent q in the factor 2 q−127 , which in IEEE-754 multiplies the significand.c Bits 9-31 are base-2 significand (or mantissa or fraction) c in the IEEE-754 representation of the full value −1 s × (1 + c) × 2 q−127 .d Bit Grooming and Bit Shaving are identical for a single value.tage of encoding such patterns for compression.Do not expect significantly enhanced compression from nsd > 5 (for NC_FLOAT) or nsd > 14 (for NC_DOUBLE).Clearly values stored as NC_DOUBLE (i.e., eight-byte) are susceptible to much greater compression than NC_FLOAT for a given precision because their significands explicitly contain 53 bits rather than 23 bits.

Metrics
How can one be sure lossy data are sufficiently precise?
We define several metrics to quantify quantization error.The mean error and mean absolute error + incurred in quantizing a variable from true values x i to quantized values q i are, respectively, , where µ i is 1 unless x i is a missing value, m i is 1 unless x i is masked, and w i is the weight.The maximum and minimum errors max and min are both signed max = max (x i − q i ) and min = min (x i − q i ) , while the maximum and minimum absolute errors + max and + min are positive-definite.
Typically + min = 0 for quantization, since many exact values need no quantization.
The three most important error metrics for quantization are + max , + , and .The upper bound (worst case) quantization performance is + max .The mean accuracy indicates whether statistical properties of quantized numbers will accurately reflect the true values.However, allows positive and negative offsets to compensate each other and conceal poor performance.+ measures the absolute mean accuracy of quantization, so that all errors accumulate and (unlike ) do not compensate.The difference between + max and + indicates how much of an outlier the worst case error is.All three metrics are expressed in terms of the fraction of the tens place occupied by the LSD.If the LSD is the hundreds digit or the thousandths digit, then the metrics are fractions of 100 or 1/1000, respectively.PPC algorithms should produce maximum absolute errors less than 0.5 in these units.If the LSD is the hundreds digit, then quantized versions of true values will be within 50 of the true value.It is much easier to satisfy this tolerance for a true value of 100 (only 50 % accuracy required) than for 999 (5 % accuracy required).Thus the minimum accuracy guaranteed for nsd = 1 ranges from 5 to 50 %.For this reason, the best and worst cast performances usually occur for true values whose LSD value is close to 1 and 9, respectively.Of course most users prefer prc > 1 because accuracies increase exponentially with prc.Continuing the previous example to prc = 2, quantized versions of true values from 1000 to 9999 will also be within 50 of the true value, i.e., have accuracies from 0.5 to 5 %.In other words, only 2 significant digits are necessary to guarantee better than 5 % accuracy in quantization.We recommend that dataset producers and users consider quantizing datasets with nsd = 3.This guarantees an accuracy of 0.05 to 0.5 % for individual values.Statistics computed from ensembles of quantized values will, assuming the mean error is small, have much better accuracy than 0.5 %.This accuracy is the most that many applications can justify.
To demonstrate these principles we conduct error analyses on an artificial, reproducible dataset, and on an actual dataset 1 of values from a re-analysis of observed weather data.Table 3 summarizes quantization accuracy for each NSD based on the three metrics: the maximum absolute error + max , the mean absolute error + , and the mean error .PPC quantization performs as expected.First, absolute max-1 The artificial dataset employed is one million evenly spaced values from 1.0 to 2.0.The analysis data are N = 13 934 592 values of the temperature field from the NASA MERRA analysis of 20 130 601.imum errors + max < 0.5 for all prc.We increased the exact number of bits shaved or groomed until the worst performance ( + max = 0.49 for prc = 3) was better than + max = 0.5.This guarantees that Bit Grooming always produces precision that meets or exceeds the requested number of significant digits.
For 1 ≤ prc ≤ 6, quantization results in comparable maximum absolute and mean absolute errors + max and + , respectively (Table 3).Mean errors are orders of magnitude smaller because quantization produces over-and underestimated values in balance.When prc = 7, quantization of single-precision values is ineffective, because all available bits are used to represent the maximum precision of 7 digits.The maximum and mean absolute errors + max and + are nearly identical across algorithms, precisions, and dataset types.This is consistent with both the artificial data and empirical data being random, and thus exercising equally strengths and weaknesses of the algorithms over the course of millions of input values.We generated artificial arrays with many different starting values and interval spacings, and all gave qualitatively similar results.The results presented are the worst obtained.
The artificial data have a much smaller mean error than the observational analysis.The reason why is unclear.It may be because the temperature field is concentrated in particular ranges of values (and associated quantization errors) prevalent on Earth, e.g., 200 < T < 320.It is worth noting that the mean error < 0.01 for 1 ≤ prc < 6, and that is typically at least 2 or more orders of magnitude less than + max .Thus quantized values with precisions as low as prc = 1 still yield highly significant statistics by contemporary scientific standards.

Compressing real climate datasets
PPC quantization enhances compression of typical climate datasets.The degree of enhancement depends, of course, on the required precision.Model results are often computed as NC_DOUBLE and then archived as NC_FLOAT to save space, while, in our experience, observations are usually stored as NC_FLOAT because most sensors lack the precision required to justify NC_DOUBLE.We evaluated compression performance of lossless and lossy compression techniques on four datasets representative of model simulations and satellite retrievals.Only floating-point data were compressed.No attempt was made to compress integer-type variables as they occupy an insignificant fraction of most climate datasets.
The first dataset tested (Table 4) comes from a global aerosol simulation (Zender et al., 2003) of horizontal resolution latitude × longitude = 64 × 128 (i.e., 8192 grid points).This dataset is the smallest (35 MB, Row A) relative to the others tested, and was produced uncompressed, as is still the norm for most climate models.Weak and strong compression (BZ1 and BZ9) with bzip2 (Seward, 2007) both achieve compression ratios CR ∼ 84 % (Rows B-C).Conversion from   4 and 5 were not yet compressed, and those in Tables 6 and 7 were already compressed.i Compression method, if any.The Supplement provides full commands to reproduce results.j A dash (-) indicates the associated compression feature was not employed.
Both the weak and strong HDF implementation of DEFLATE (Deutsch, 1996) shrink the data to CR ∼ 81 % (Rows E-F), slightly better than bzip2 (Rows B-C).There continues to be little difference between weak and strong lossless compression of a given mode (bzip2 or DEFLATE), so for brevity in the following we focus on performance with weak (DF1) DEFLATE compression (e.g., Rows H-O).
Packing SP floating-point data into two-byte integers yields CR ∼ 51 % (Row G).Lossless compression more than halves that CR to ∼ 23 % (Row H).For this dataset, the Bit Grooming range is 81 ≥ CR ≥ 29 % for 7 ≥ NSD ≥ 1 (Rows I-O).Table 4 shows packing as having 1 NSD 4. Packing (into two-byte integers) uses 16-bit integers, the same as the number of mantissa bits Bit Grooming uses (as discussed in Sect.2.3, and including the implicit IEEE bit) to guarantee NSD = 4. Section 2.1 describes why linear Packing guarantees NSD 4 precision only for the greatest decade of unpacked values, and degrades to NSD 1 for the smallest decade of unpacked values.
The second dataset tested (Table 5) contains an atmospheric GCM simulation (Dennis et al., 2012) on a higher horizontal resolution unstructured grid (with 48 602 grid points), and occupies 840 MB uncompressed (Row A).It is about 15 % more susceptible to both bzip2 (Rows B-C) and DEFLATE (Row E-F) compression than the dataset in Table 4.The reasons for this are unclear, though at ∼ 25 times the size of the first dataset, it seems possible that the internal metadata stored by DEFLATE are more efficient with larger datasets.Packing is nearly as efficient as before (Row G), since the CR of packing is independent of the values packed.
NASA uses HDF4 format to store and distribute the third dataset tested (Table 6).Satellite-borne remote sensing datasets may be most commonly found in HDF4 format due to its early availability and the long mission duration of satellites.This dataset contains compressed (DF5) meteorological data from MERRA re-analysis (Rienecker et al., 2011) on a medium-resolution (latitude × longitude = 144 × 288, 41 472 grid points) grid and is 244 MB compressed (Row A) and 617-694 MB uncompressed (Rows D3-D4).bzip2 compression has no effect on the dataset as distributed in HDF4 format (Row B).However, converting from HDF4 format to netCDF4 format reduces its size by 13 % to CR ∼ 87 % (Rows D1-D2).Neither of these formats affords any help to bzip2 (Rows B2-C).The uncompressed data occupy 10 % less space in netCDF3 than in netCDF4 format (Rows D3-D4).The HDF5 implementation of DEFLATE yields a moderately more dynamic range (91 % ≥ CR ≥ 85) (Rows E-F) than in the previous two datasets.The reasons for this are unclear.Packing once again yields a 50 % reduction relative to the uncompressed dataset size (Row G), and compressing that yields CR ∼ 55 % (Row H).Bit Grooming yields 92 ≥ CR ≥ 41 % for 7 ≥ NSD ≥ 1 (Rows I-O).
NASA uses HDF5 format to store and distribute the fourth dataset tested (Table 7), which is representative of current storage practices.HDF5 and netCDF4 are used by all new satellite missions to our knowledge.This dataset contains compressed (DF5) satellite retrievals, a swath from the OMI instrument (Krotkov et al., 2008), in a curvilinear (Time × Cross-track = 1644 × 60, 98 000 grid points) grid and is 30 MB compressed (Row A) and 50 MB un- compressed (Row D2).The dataset can be converted directly to netCDF4 (Row D1) at no additional cost in storage.However, it cannot be converted to netCDF3 because it uses so-called "enhanced" features (such as hierarchical groups) available only in netCDF4/HDF5.Once again the already-compressed data are insensitive to the level of DEFLATE (Rows E-F).Packing reduces the uncompressed size by nearly 50 % (Row G), and compressing that yields CR ∼ 44 %.Bit Grooming yields 100 ≥ CR ≥ 33 % for 7 ≥ NSD ≥ 1 (Rows I-O).

Discussion
PPC algorithms preserve all significant digits of every value.The Bit Grooming (NSD) algorithm uses a theoretical approach (3.32 bits per base-10 digit), tuned and tested to en-sure the worst case quantization error is less than half the value of the minimum increment in the least significant digit.The Decimal Rounding (DSD) algorithm uses floating-point math to round each value optimally so that it has the maximum number of zeroed bits that preserve the specified precision.
While Bit Grooming works on top of any lossless compression technique, we demonstrated it with the DEFLATE algorithm (Deutsch, 1996), which is free and ubiquitous.Byte-stream compression techniques such as DEFLATE (which is accessible through the netCDF4/HDF5 interfaces) always compress strings of zeros and of ones more efficiently than random digits.We expect the additional compression achieved by Bit Grooming to remain roughly the same with different underlying lossless compression techniques.

Comparison of lossy compression techniques
Factors influencing the choice of lossy compression technique include precision, accuracy, dynamic range, compression ratio, and portability (Silver and Zender, 2016).Section 3 evaluates Bit Grooming performance alongside Linear Packing, a widely used, well-known lossy compression method.Packing four-byte SP floating-point data into twobyte integers produces a compression ratio CR ∼ 50 % relative to uncompressed data (Tables 4-7, Row G).Lossless compression more than halves that CR, so that Linear Packing followed by DEFLATE achieves ∼ 26 % ≥ CR ≥ 19 % (Row H) relative to uncompressed data.All other things being equal, a competitive lossy compression algorithm should produce a comparable CR to be considered as a sensible option to packing plus DEFLATE.For the tested datasets, Bit Grooming produces 43 ≥ CR ≥ 21 % for NSD = 2 and 29 ≥ CR ≥ 15 % for NSD = 1 (Rows I-O) relative to uncompressed data.Thus Bit Grooming is only competitive with compressed packing if used aggressively (i.e., preserving only 1 or 2 digits) and/or if other factors are considered as important as CR.
These other factors may include the greater transparency, dynamic range, and guaranteed precision of Bit Grooming relative to packing.Regarding transparency, Bit-Groomed data are valid IEEE floating-point data immediately suitable for analysis and plotting, whereas packed data must first be unpacked and reconstituted into intelligible floating-point data.Hence Bit-Groomed data are more portable than packed data.
Another important consideration is precision.Bit Grooming guarantees that its lossy quantization will preserve a specified number of (decimal) significant digits.Packing into two-byte integers always provides 16 bits for discretization, which can potentially yield the same precision as Bit Grooming, with NSD = 4.However, as described in Sect.2.1, Linear Packing guarantees NSD 4 precision only for the single greatest decade of unpacked values.Unpacked values of lesser absolute magnitude lose approximately 1 guaranteed significant digit per decade.By contrast, Bit Grooming guarantees the specified minimum precision level over the entire IEEE range.Other types of packing, e.g., logarithmic packing or "layer packing", can alleviate though not eliminate precision issues that affect Linear Packing (Silver and Zender, 2016).However, only Linear Packing is a netCDF convention (Rew et al., 2016).Thus other forms of packing are less portable than Linear Packing, which (as mentioned above) is itself less portable than Bit Grooming.
In terms of range, Bit Grooming has the same dynamic ranges as IEEE SP and DP data, ∼ 10 37 and ∼ 10 308 , respectively.Linear packing into two-byte integers (the usual case) reduces the dynamic range to 2 16  −1 = 65 535 discretely representable values that lay in a 5-decade cluster within the IEEE range.The greater range of Bit Grooming relative to packing (∼ 10 37 vs. 10 5 ) favors it for GSMM fields that span multiple orders of magnitude, such as aerosol number concentrations, pressure, solar heating rates, and (some) tracer mixing ratios.

Implementation in NCO
Offering multiple quantization and compression algorithms with a consistent and simple interface is important so that users can easily find the algorithm that best suits their needs.This section describes the NCO implementation of the three quantization algorithms and single lossless compression algorithm that NCO exposes to user control.We focus on the new PPC algorithms (Bit Grooming and Decimal Rounding) whose characteristics are the subject of most of this study, but we begin with a brief summary of the DEFLATE and packing implementations that have been in NCO for 10-20 years.NCO triggers lossless DEFLATE compression with the -L switch followed by a compression-level argument on a scale from 0 (no compression) to 9 (full compression, much slower): # DEFLATE compression level 5 ncks -L 5 in.ncout.nc The NCO operator ncpdq performs packing quantization: # Pack Data Quickly (quantization) ncpdq in.nc out.ncNCO implements numerous packing policies (which variables should be packed) and packing maps (which data type should a higher-precision data type be stored as).The Users Guide (Zender, 2016a) contains a full description of policies and maps.Packing followed by lossless compression is simple and yields the most impressive compression ratios in Tables 4-7.
# Pack then compress ncpdq -L 5 in.ncout.ncAlthough Bit Grooming instantly reduces data precision, on-disk storage reduction occurs only once the data are compressed either internally (e.g., by netCDF) or externally (by a user-supplied mechanism).It is straightforward to compress data internally using the built-in compression and decompression supported by netCDF4/HDF5.For convenience, NCO automatically activates file-wide DEFLATE deflation level one (i.e., -L 1) when PPC is requested for any variable in a netCDF4 output file.This makes PPC easier to use, since the user need not explicitly specify deflation.Any explicitly specified deflation (including no deflation, or -L 0 with NCO) overrides the PPC deflation default.If the output file is netCDF3 format, NCO emits a message that suggests internal netCDF4 or external netCDF3 compression.netCDF3 files compressed by an external utility such as gzip accrue approximately the same benefits (shrinkage) as netCDF4, although with netCDF3 the user or provider must uncompress (e.g., gunzip) the file before accessing the data.There is no storage benefit to rounding numbers and storing them in netCDF3 files unless such custom compression/decompression is employed.Without compression, one may as well maintain the undesired precision.
NCO users can invoke PPC with the long option --ppc var=prc, or give the same arguments to the synonyms --precision_preserving_compression or --quantize.Here var is the variable to quantize, and prc is the precision.NCO assumes that prc specifies Bit Grooming (i.e., NSD precision) so, e.g., T=2 means nsd = 2.In NCO, users may prepend prc with a decimal point to specify decimal rounding (i.e., DSD precision); e.g., T=.2 means dsd = 2. Bit Grooming precision must be specified as a positive integer.Rounding precision may be a positive or negative integer, and is specified as the negative base-10 logarithm of the desired precision, in accord with common usage.For example, specifying T=.3 or T=.-2 tells the Decimal Rounding algorithm to store only enough bits to preserve the value of T rounded to the nearest thousandth or hundred, respectively.
NCO users can specify the precision of an entire dataset with many variables in one simple command.Setting var to default has the special meaning of applying the associated PPC algorithm to all normal floating-point variables.The exceptions, i.e., variables not affected by default, include integer and non-numeric atomic types, dimensional coordinates (such as longitude, latitude), and, in accordance with the CF Metadata Convention (Gregory, 2003;Eaton et al., 2016), variables mentioned in the bounds, climatology, or coordinates attributes of any variable.These exceptions prevent the coordinate grid itself, and the variables needed to describe it, from losing precision.Usually the coordinate grid is known to much higher precision than the fields stored on the grid.NCO applies PPC to coordinate grid variables only if those variables are explicitly specified (i.e., not with default=prc).NCO applies PPC to integer-type variables only if those variables are explicitly specified (i.e., not with the default=prc, and only if the DSD algorithm is invoked with a negative prc).To prevent PPC in NCO from applying to certain non-coordinate variables (e.g., gridcell_area or gaussian_weight), explicitly specify a precision exceeding 7 (for NC_FLOAT) or 15 (for NC_DOUBLE) for those variables.Since these are the maximum representable precisions in decimal digits, NCO turns off PPC (i.e., does nothing) when more precision is requested.
NCO users access PPC through a single switch, --ppc, repeated as many times as necessary.To request Bit Grooming only for variable u, use, e.g., ncks -7 --ppc u=2 in.nc out.nc The output file will preserve 2 significant digits of u.The options -4 or -7 ensure a netCDF4-format output (regardless of the input file format) to support internal compression.NCO recommends though does not require writing netCDF4 files after PPC.However, for conciseness the -4/-7 switches are omitted in subsequent examples.To maintain data-processing provenance, NCO attaches attributes that indicate the algorithm used and degree of precision retained for each variable affected by PPC.The Bit Grooming (i.e., NSD) and Decimal Rounding (i.e., DSD) algorithms store the attributes number_of_significant_digits and least_significant_digit2 , respectively.It is safe to attempt PPC on input that has already been rounded.Variables can be made rounder, not sharper; i.e., variables cannot be "un-rounded".Thus PPC attempted on an input variable with an existing PPC attribute proceeds only if the new rounding level exceeds the old; otherwise, no new rounding occurs (i.e., a "no-op"), and the original PPC attribute is retained rather than replaced with the newer value of prc.
To request, say, 5 significant digits (nsd = 5) for all fields, except, say, wind speeds u and v, which are only known to integer values (dsd = 0) in the supplied units, use --ppc twice: ncks --ppc default=5 --ppc u,v=.0 in.nc out.nc To preserve 5 digits in all variables except coordinate variables and u and v, use the default option and separately specify the exceptions: ncks --ppc default=5 --ppc u,v=20 in.nc out.ncSpecify the --ppc option any number of times to support varying precision types and levels.Each option may aggregate all the variables with the same precision: ncks --ppc p,w,z=5 --ppc q,RH=4 --ppc T,u,v=3 in.nc out.nc C. S. Zender: Statistically accurate precision-preserving quantization This type of per-variable approach to PPC may yield the best balance of precision and compression.It does require that the dataset producer understand the intrinsic precision of each variable treated in a non-default manner.For convenience, variable names may be extended regular expressions.This simplifies generating lists of related variables: ncks --ppc Q.?=5 --ppc FS.?,FL.?=4 --ppc RH=.3 in.nc out.nc

Conclusions
We introduced a new lossy and precision-preserving compression (PPC) algorithm called Bit Grooming, and evaluated it against its nearest cousin, Bit Shaving, as well as against packing and lossless techniques.Bit Grooming replaces the (unwanted) least significant bits of the IEEE significand with a string of identical values that alternates between zeroes and ones for consecutive elements of an array.We quantified the trade-offs involved in the choice of lossy-packing technique for four climate-related datasets.We found that PPC compression reduces the volume of singleprecision compressed data by roughly 10 % per decimal digit quantized (or "groomed") after the third such digit.Bit Grooming reduces the storage space required by initially uncompressed and compressed climate data by 25-80 and 5-65 %, respectively, for single-precision values (the most common case for climate data) quantized to retain 1-5 decimal digits of precision.Bit Groomed and Bit Shaved data are equally efficiently compressed, and Bit Grooming eliminates undesirable statistical artifacts of Bit Shaving.By alternately using zero and one as the fill bit, Bit Grooming produces no mean absolute bias, whereas Bit Shaving is negatively biased.
The lossy technique of Linear Packing, followed by lossless compression, produces significantly better compression ratios than PPC algorithms like Bit Grooming for most precision levels.Bit Grooming yields comparable or better compression than packing only when retaining 1 or 2 significant digits of precision.Packing, however, can only encode values from a much smaller dynamic range than Bit Grooming, and its guaranteed precision degrades rapidly (1 digit per decade) outside the largest decade of values to be quantized.Moreover, packed data require additional software overhead to unpack.Bit Grooming, in contrast, works on all ranges of floating-point values, has well-defined and guaranteed precision, and requires no additional software interface to read.By understanding the trade-offs between precision, statistical accuracy, numerical range and storage space of common lossypacking techniques, producers can make better decisions regarding how much precision to archive in their datasets, and how to discard the false precision.
6 Code and data availability NCO source code is available from GitHub at https://github.com/nco.The NCO software version 4.6.1 (Zender, 2016b) used to produce this paper is permanently archived with doi:10.5281/zenodo.61341,though any version since 4.4.8 should be functionally equivalent with regards to features described here.NCO executables are available on most modern Linux and OS X systems using standard commands (aptget install nco, brew install nco, conda install -c conda-forge nco, dnf install nco, port install nco, yum install nco).Additional binaries are available for easy installation; see the homepage http://nco.sf.net for more details.Detailed documentation and help pages are also at http://nco.sf.net.The Supplement details the commands and datasets necessary to reproduce the results.
The Supplement related to this article is available online at doi:10.5194/gmd-9-3199-2016-supplement.

Table 2 .
Bit Grooming Pi.Same as Table 1 but after varying degrees of Bit Grooming.
(Caron, 2014b)accurate precision-preserving quantization 3.2 Bit Grooming vs. Bit Shaving Traditional Bit Shaving bit-shifts zeros into the least significant bits (LSBs) of true values(Caron, 2014b).Thus Bit-Shaving nearly always underestimates true values, and this produces max = 0. Conversely, bit-shifting ones into the LSBs, a procedure that might be called Bit Setting, would nearly always overestimate true values and result in min = 0.The intrinsic compression efficiencies of Bit Shaving and Bit Setting are identical.The key innovation in Bit Grooming is to alternately bit-shift zeroes and ones into the consecutive true values in an array.By alternating high with low quantization errors, Bit Grooming balances the mean quantization error.As a result, statistical operations produce less-biased results when operating on values quantized by Bit Grooming than by Bit Shaving or Bit Setting.Balanced algorithms like Bit Grooming should yield max ≈ − min , + www.geosci-model-dev.net/9/3199/2016/Geosci.Model Dev., 9, 3199-3211, 2016 C. S. Zender:

Table 3 .
Error metrics for Bit Grooming vs. Bit Shaving.N = 13 934 592 values of the temperature field from the NASA MERRA analysis of 20 130 601.c BG is Bit Grooming, BS is Bit Shaving, SP is single-precision, and DP is double-precision.Values for + max and + are shown only once.They are identical to two significant figures for BG and BS in both SP and DP, for both artificial and observed data.
b d NSD is the number of significant digits.

Table 4 .
Compression ratios for low-resolution initially uncompressed model netCDF3 data.Row a Fmt b LLC c Qnt d Rng e NSD f Size g CR h Method i N3 for netCDF CLASSIC, N4 for NETCDF4, N7 for NETCDF4_CLASSIC (which comprises netCDF3 data types and structures with netCDF4 storage features like compression), H4 for HDF4, and H5 for HDF5.N4/7 means results apply to both N4 and N7 file types.c Lossless compression method (if any) employed.Numbers prefixed by DF refer to the strength of the DEFLATE algorithm employed internally by netCDF4/HDF5, while numbers prefixed by BZ refer to the block size employed by the Burrows-Wheeler algorithm in bzip2.Number of significant digits retained.The similarity symbol indicates the value is approximate, not guaranteed.Full IEEE single precision has nsd ∼ 7 and guarantees nsd ≥ 6. Bit Grooming guarantees a specified number of digits.Linear Packing achieves nsd 4 in the largest decade of unpacked values, decreasing by 1 digit per decade to nsd 1 in the smallest decade of unpacked values.
a Row; also labels the compression configuration in that row.b Format on disk: d Quantization (lossy compression) method (if any) employed: BG for Bit Grooming and LP for the default ncpdq linear-packing algorithm (convert floating-point types to NC_SHORT).e Dynamic range of values compressible to indicated precision.f g Resulting file size in MB. h Compression ratio in %, i.e., file size after compression divided by its original size, times 100.Compression ratios reported are relative to the size of the original file as distributed (e.g., by NASA).The original files in Tables

Table 5 .
Compression ratios for high-resolution initially uncompressed model data * .

Table 6 .
Compression ratios for high-resolution initially compressed observed HDF4 data.

Table 7 .
Compression ratios for initially compressed HDF5 data.