Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Works with all HDF5 types
  • Fast compression and decompression
  • Is only available in Python (ships with h5py); C source available

Extra filters in HDF5

SHUFFLE

Treats low and high bytes separately

Code Block
>>> dset = myfile.create_dataset("Data", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), compression="gzip",
shuffle=True)
SHUFFLE features:
  • Available with all HDF5 distributions
  • Very fast (negligible compared to the compression time)
  • Only useful in conjunction with filters like GZIP or LZF

FLETCHER32 Filter

Check-sum

Code Block
dset = myfile.create_dataset("Data2", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), fletcher32=True, ...)
>>> dset.fletcher32
True

FLETCHER32 features:

  • Available with all HDF5 distributions
  • Very fast
  • Compatible with all lossless filters

 

Code Block
titlegzip, szip, lzf compression results
gzip default compression_opts level=4
  raw:  gzip  t1(create)=0.003280(sec)  t2(+save)=0.216324(sec)  input size=4594000(byte)  ratio=1.583  shuffle=False  fletcher32=False
  raw:  gzip  t1(create)=0.003025(sec)  t2(+save)=0.146706(sec)  input size=4594000(byte)  ratio=1.958  shuffle=True   fletcher32=False

calib:  gzip  t1(create)=0.002738(sec)  t2(+save)=0.168040(sec)  input size=4594000(byte)  ratio=2.072  shuffle=False  fletcher32=False
calib:  gzip  t1(create)=0.002926(sec)  t2(+save)=0.178174(sec)  input size=4594000(byte)  ratio=2.188  shuffle=True   fletcher32=False
calib:  gzip  t1(create)=0.002579(sec)  t2(+save)=0.182965(sec)  input size=4594000(byte)  ratio=2.187  shuffle=True   fletcher32=True


calib:  lzf  t1(create)=0.003225(sec)  t2(+save)=0.100822(sec)  input size=4594000(byte)  ratio=1.351  shuffle=False  fletcher32=False
calib:  lzf  t1(create)=0.002815(sec)  t2(+save)=0.086916(sec)  input size=4594000(byte)  ratio=1.473  shuffle= True  fletcher32=False
  raw:  lzf  t1(create)=0.003125(sec)  t2(+save)=0.108339(sec)  input size=4594000(byte)  ratio=1.045  shuffle=False  fletcher32=False
  raw:  lzf  t1(create)=0.003071(sec)  t2(+save)=0.075530(sec)  input size=4594000(byte)  ratio=1.698  shuffle= True  fletcher32=False

Compression filter "szip" is unavailable
Compression filter "lzo" is unavailable
Compression filter "blosc" is unavailable
Compression filter "bzip2" is unavailable

 

References

Igor's compressor

https://pswww.slac.stanford.edu/svn-readonly/psdmrepo/

Compressor designated for LCLS detector uint16 data:

  1. estimates dataset spread,
  2. use 16-bit and 8-bit words to save data.  

Features

  • Optimized to work with 16-bit detector data only (not with xtc or hdf5 files containing metadata).
  • By design Hist16 compression factor ≤2.
  • Single array of data is split and processed in multi-threads (inside compression algorithm).
  • Igor statement: up to ~two order of magnitude faster than gzip.
  • Igor thinks that further specialization of data (separation of signal and background regions between threads) may  improve compression factor.

Matt's Hist16 and HistN compressors

Available in external package pdsdata/compress/

  1. Hist16 - the same as Igor's compressor, but does not use multi-threading  - slow
  2. HistN - developed by Matt, uses 16-bit and 8,7,6...-bit words, compression factor HistN upto ~2.

SZ compressor from Argonne

https://github.com/disheng222/SZ

-> Clone or download -> Download ZIP -> installed under ~/lib/sz/sz-1.4.9/

Run tests like:

~/lib/sz/sz-1.4.9/SZ-master/example]$ ./testfloat_compress sz.config testdata/x86/testfloat_8_8_128.dat 8 8 128

  • works with float and double.
  • int16 and uint16 not implemented

compression factors ~ 56, 110, and 49 for  
- testfloat_8_8_128.dat,
- testdouble_8_8_128.dat, and 
- testdouble_8_8_8_128.dat, respectively.
 

But for data with VERY NARROW SPECTRA:

 testfloat_8_8_128.txt            mean=1.000000  std=1.232407
testdouble_8_8_128.txt       mean=1.000000  std=1.254261
testdouble_8_8_8_128.txt  mean=1.300935   std=0.502083

References