Page History
...
- Works with all HDF5 types
- Fast compression and decompression
- Is only available in Python (ships with h5py); C source available
Extra filters in HDF5
SHUFFLE
Treats low and high bytes separately
Code Block |
---|
>>> dset = myfile.create_dataset("Data", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), compression="gzip", shuffle=True) |
- Available with all HDF5 distributions
- Very fast (negligible compared to the compression time)
- Only useful in conjunction with filters like GZIP or LZF
FLETCHER32 Filter
Check-sum
Code Block |
---|
dset = myfile.create_dataset("Data2", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), fletcher32=True, ...) >>> dset.fletcher32 True |
FLETCHER32 features:
- Available with all HDF5 distributions
- Very fast
- Compatible with all lossless filters
Code Block | ||
---|---|---|
| ||
gzip default compression_opts level=4 raw: gzip t1(create)=0.003280(sec) t2(+save)=0.216324(sec) input size=4594000(byte) ratio=1.583 shuffle=False fletcher32=False raw: gzip t1(create)=0.003025(sec) t2(+save)=0.146706(sec) input size=4594000(byte) ratio=1.958 shuffle=True fletcher32=False calib: gzip t1(create)=0.002738(sec) t2(+save)=0.168040(sec) input size=4594000(byte) ratio=2.072 shuffle=False fletcher32=False calib: gzip t1(create)=0.002926(sec) t2(+save)=0.178174(sec) input size=4594000(byte) ratio=2.188 shuffle=True fletcher32=False calib: gzip t1(create)=0.002579(sec) t2(+save)=0.182965(sec) input size=4594000(byte) ratio=2.187 shuffle=True fletcher32=True calib: lzf t1(create)=0.003225(sec) t2(+save)=0.100822(sec) input size=4594000(byte) ratio=1.351 shuffle=False fletcher32=False calib: lzf t1(create)=0.002815(sec) t2(+save)=0.086916(sec) input size=4594000(byte) ratio=1.473 shuffle= True fletcher32=False raw: lzf t1(create)=0.003125(sec) t2(+save)=0.108339(sec) input size=4594000(byte) ratio=1.045 shuffle=False fletcher32=False raw: lzf t1(create)=0.003071(sec) t2(+save)=0.075530(sec) input size=4594000(byte) ratio=1.698 shuffle= True fletcher32=False Compression filter "szip" is unavailable Compression filter "lzo" is unavailable Compression filter "blosc" is unavailable Compression filter "bzip2" is unavailable |
References
Igor's compressor
https://pswww.slac.stanford.edu/svn-readonly/psdmrepo/
Compressor designated for LCLS detector uint16 data:
- estimates dataset spread,
- use 16-bit and 8-bit words to save data.
Features
- Optimized to work with 16-bit detector data only (not with xtc or hdf5 files containing metadata).
- By design Hist16 compression factor ≤2.
- Single array of data is split and processed in multi-threads (inside compression algorithm).
- Igor statement: up to ~two order of magnitude faster than gzip.
- Igor thinks that further specialization of data (separation of signal and background regions between threads) may improve compression factor.
Matt's Hist16 and HistN compressors
Available in external package pdsdata/compress/
- Hist16 - the same as Igor's compressor, but does not use multi-threading - slow
- HistN - developed by Matt, uses 16-bit and 8,7,6...-bit words, compression factor HistN upto ~2.
SZ compressor from Argonne
https://github.com/disheng222/SZ
-> Clone or download -> Download ZIP -> installed under ~/lib/sz/sz-1.4.9/
Run tests like:
~/lib/sz/sz-1.4.9/SZ-master/example]$ ./testfloat_compress sz.config testdata/x86/testfloat_8_8_128.dat 8 8 128
- works with float and double.
- int16 and uint16 not implemented
compression factors ~ 56, 110, and 49 for
- testfloat_8_8_128.dat,
- testdouble_8_8_128.dat, and
- testdouble_8_8_8_128.dat, respectively.
But for data with VERY NARROW SPECTRA:
testfloat_8_8_128.txt mean=1.000000 std=1.232407
testdouble_8_8_128.txt mean=1.000000 std=1.254261
testdouble_8_8_8_128.txt mean=1.300935 std=0.502083
References
- Using compression in HDF5
- Szip Compression in HDF
- Third-party compression filters
- HDF5 Tutorial
- HDF5 Software Documentation
- Using HDF5 filters
- HDF5 Data Compression Demystified
- gzip - CLI, zlib - API
- Entropy (information_theory)
- Dictionary compression
- Python and HDF5
- LibLZF
- SZ compressor (sz-1.4-user-guide.pdf) Authors: Sheng Di, Dingwen Tao, supervisor: Franck Cappello
- 2017-02-24-Report-of-SZ-Lossy-Compression-on-EXAFEL-Datasets-v5.pdf
- 2013-04-17-igor-pyana_xtc_decompression_status.pdf, psdmrepo
- 2017-03-21-Lossless-compression.pdf
- 2017-09-27-Lossless-compression.pdf - on Users mtg
- Using compression in HDF5
- Szip Compression in HDF
- Third-party compression filters
- HDF5 Tutorial
- HDF5 Software Documentation
- Using HDF5 filters
- HDF5 Data Compression Demystified
- gzip - CLI, zlib - API
- Entropy (information_theory)
- Dictionary compression
- Python and HDF5
- LibLZF
- SZ compressor (sz-1.4-user-guide.pdf) Authors: Sheng Di, Dingwen Tao, supervisor: Franck Cappello