Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Works with all HDF5 types
  • Fast compression and decompression
  • Is only available in Python (ships with h5py); C source available

Extra filters in HDF5

SHUFFLE

Treats low and high bytes separately

Code Block
>>> dset = myfile.create_dataset("Data", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), compression="gzip",
shuffle=True)
SHUFFLE features:
  • Available with all HDF5 distributions
  • Very fast (negligible compared to the compression time)
  • Only useful in conjunction with filters like GZIP or LZF

FLETCHER32 Filter

Check-sum

Code Block
dset = myfile.create_dataset("Data2", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), fletcher32=True, ...)
>>> dset.fletcher32
True

FLETCHER32 features:

  • Available with all HDF5 distributions
  • Very fast
  • Compatible with all lossless filters

 

Code Block
titlegzip, szip, lzf compression results
gzip default compression_opts level=4
  raw:  gzip  t1(create)=0.003280(sec)  t2(+save)=0.216324(sec)  input size=4594000(byte)  ratio=1.583  shuffle=False  fletcher32=False
  raw:  gzip  t1(create)=0.003025(sec)  t2(+save)=0.146706(sec)  input size=4594000(byte)  ratio=1.958  shuffle=True   fletcher32=False

calib:  gzip  t1(create)=0.002738(sec)  t2(+save)=0.168040(sec)  input size=4594000(byte)  ratio=2.072  shuffle=False  fletcher32=False
calib:  gzip  t1(create)=0.002926(sec)  t2(+save)=0.178174(sec)  input size=4594000(byte)  ratio=2.188  shuffle=True   fletcher32=False
calib:  gzip  t1(create)=0.002579(sec)  t2(+save)=0.182965(sec)  input size=4594000(byte)  ratio=2.187  shuffle=True   fletcher32=True


calib:  lzf  t1(create)=0.003225(sec)  t2(+save)=0.100822(sec)  input size=4594000(byte)  ratio=1.351  shuffle=False  fletcher32=False
calib:  lzf  t1(create)=0.002815(sec)  t2(+save)=0.086916(sec)  input size=4594000(byte)  ratio=1.473  shuffle= True  fletcher32=False
  raw:  lzf  t1(create)=0.003125(sec)  t2(+save)=0.108339(sec)  input size=4594000(byte)  ratio=1.045  shuffle=False  fletcher32=False
  raw:  lzf  t1(create)=0.003071(sec)  t2(+save)=0.075530(sec)  input size=4594000(byte)  ratio=1.698  shuffle= True  fletcher32=False

Compression filter "szip" is unavailable
Compression filter "lzo" is unavailable
Compression filter "blosc" is unavailable
Compression filter "bzip2" is unavailable

 

Igor's compressor Hist16 and Matt's HistN

Compressor designated for LCLS detector uint16 data:

  1. estimates dataset spread,
  2. use 16-bit and 8-bit words to save data.   HistN (developed by Matt) uses 16-bit and 8,7,6...-bit

Features

  • Optimized to works with 16-bit detector data only (not with xtc or hdf5 files metadata).
  • By design Hist16 compression factor ≤2, HistN ~2.
  • Single array of data is split and processed in multi-threads (inside compression algorithm).
  • Igor statement: up to ~2 order of magnitude faster than gzip.
  • Igor thinks that further specialization of data (separation of signal and background regions between threads) may  improve compression factor.

 

References