Page History
...
- Works with all HDF5 types
- Fast compression and decompression
- Is only available in Python (ships with h5py); C source available
Extra filters in HDF5
SHUFFLE
Treats low and high bytes separately
Code Block |
---|
>>> dset = myfile.create_dataset("Data", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), compression="gzip", shuffle=True) |
- Available with all HDF5 distributions
- Very fast (negligible compared to the compression time)
- Only useful in conjunction with filters like GZIP or LZF
FLETCHER32 Filter
Check-sum
Code Block |
---|
dset = myfile.create_dataset("Data2", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), fletcher32=True, ...) >>> dset.fletcher32 True |
FLETCHER32 features:
- Available with all HDF5 distributions
- Very fast
- Compatible with all lossless filters
Code Block | ||
---|---|---|
| ||
gzip default compression_opts level=4 raw: gzip t1(create)=0.003280(sec) t2(+save)=0.216324(sec) input size=4594000(byte) ratio=1.583 shuffle=False fletcher32=False raw: gzip t1(create)=0.003025(sec) t2(+save)=0.146706(sec) input size=4594000(byte) ratio=1.958 shuffle=True fletcher32=False calib: gzip t1(create)=0.002738(sec) t2(+save)=0.168040(sec) input size=4594000(byte) ratio=2.072 shuffle=False fletcher32=False calib: gzip t1(create)=0.002926(sec) t2(+save)=0.178174(sec) input size=4594000(byte) ratio=2.188 shuffle=True fletcher32=False calib: gzip t1(create)=0.002579(sec) t2(+save)=0.182965(sec) input size=4594000(byte) ratio=2.187 shuffle=True fletcher32=True calib: lzf t1(create)=0.003225(sec) t2(+save)=0.100822(sec) input size=4594000(byte) ratio=1.351 shuffle=False fletcher32=False calib: lzf t1(create)=0.002815(sec) t2(+save)=0.086916(sec) input size=4594000(byte) ratio=1.473 shuffle= True fletcher32=False raw: lzf t1(create)=0.003125(sec) t2(+save)=0.108339(sec) input size=4594000(byte) ratio=1.045 shuffle=False fletcher32=False raw: lzf t1(create)=0.003071(sec) t2(+save)=0.075530(sec) input size=4594000(byte) ratio=1.698 shuffle= True fletcher32=False Compression filter "szip" is unavailable Compression filter "lzo" is unavailable Compression filter "blosc" is unavailable Compression filter "bzip2" is unavailable |
Igor's compressor Hist16 and Matt's HistN
Compressor designated for LCLS detector uint16 data:
- estimates dataset spread,
- use 16-bit and 8-bit words to save data. HistN (developed by Matt) uses 16-bit and 8,7,6...-bit
Features
- Optimized to works with 16-bit detector data only (not with xtc or hdf5 files metadata).
- By design Hist16 compression factor ≤2, HistN ~2.
- Single array of data is split and processed in multi-threads (inside compression algorithm).
- Igor statement: up to ~2 order of magnitude faster than gzip.
- Igor thinks that further specialization of data (separation of signal and background regions between threads) may improve compression factor.
References
- Using compression in HDF5
- Szip Compression in HDF
- Third-party compression filters
- HDF5 Tutorial
- HDF5 Software Documentation
- Using HDF5 filters
- HDF5 Data Compression Demystified
- gzip - CLI, zlib - API
- Entropy (information_theory)
- Dictionary compression
- Python and HDF5
- LibLZF
- SZ compressor (sz-1.4-user-guide.pdf) Authors: Sheng Di, Dingwen Tao, supervisor: Franck Cappello
- 2013-04-17-igor-pyana_xtc_decompression_status.pdf
Overview
Content Tools