Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
dset = f.create_dataset("BigDataset", shape=(100032,185,1000388), dtype='f'np.int16, chunks=(1,185,388), compression="gzip")
>>> dset.compression
'gzip'
>>> dset.compression_opts
9

...

Code Block
dset = myfile.create_dataset("Dataset4", shape=(100032,185,388), dtype=np.int16, chunks=(1,185,388),compression="lzf")

LZF features:

  • Works with all HDF5 types
  • Fast compression and decompression
  • Is only available in Python (ships with h5py); C source available

Extra filters in HDF5

SHUFFLE

Treats low and high bytes separately

Code Block
>>> dset = myfile.create_dataset("Data", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), compression="gzip",
shuffle=True)
SHUFFLE features:
  • Available with all HDF5 distributions
  • Very fast (negligible compared to the compression time)
  • Only useful in conjunction with filters like GZIP or LZF

FLETCHER32 Filter

Check-sum

Code Block
dset = myfile.create_dataset("Data2", shape=(32,185,388), dtype=np.int16, chunks=(1,185,388), fletcher32=True, ...)
>>> dset.fletcher32
True

FLETCHER32 features:

  • Available with all HDF5 distributions
  • Very fast
  • Compatible with all lossless filters

 

Code Block
titlegzip, szip, lzf compression results
gzip default compression_opts level=4
  raw:  gzip  t1(create)=0.003280(sec)  t2(+save)=0.216324(sec)  input size=4594000(byte)  ratio=1.583  shuffle=False  fletcher32=False
  raw:  gzip  t1(create)=0.003025(sec)  t2(+save)=0.146706(sec)  input size=4594000(byte)  ratio=1.958  shuffle=True   fletcher32=False

calib:  gzip  t1(create)=0.002738(sec)  t2(+save)=0.168040(sec)  input size=4594000(byte)  ratio=2.072  shuffle=False  fletcher32=False
calib:  gzip  t1(create)=0.002926(sec)  t2(+save)=0.178174(sec)  input size=4594000(byte)  ratio=2.188  shuffle=True   fletcher32=False
calib:  gzip  t1(create)=0.002579(sec)  t2(+save)=0.182965(sec)  input size=4594000(byte)  ratio=2.187  shuffle=True   fletcher32=True


calib:  lzf  t1(create)=0.003225(sec)  t2(+save)=0.100822(sec)  input size=4594000(byte)  ratio=1.351  shuffle=False  fletcher32=False
calib:  lzf  t1(create)=0.002815(sec)  t2(+save)=0.086916(sec)  input size=4594000(byte)  ratio=1.473  shuffle= True  fletcher32=False
  raw:  lzf  t1(create)=0.003125(sec)  t2(+save)=0.108339(sec)  input size=4594000(byte)  ratio=1.045  shuffle=False  fletcher32=False
  raw:  lzf  t1(create)=0.003071(sec)  t2(+save)=0.075530(sec)  input size=4594000(byte)  ratio=1.698  shuffle= True  fletcher32=False

Compression filter "szip" is unavailable
Compression filter "lzo" is unavailable
Compression filter "blosc" is unavailable
Compression filter "bzip2" is unavailable
Code Block
>>> dset = myfile.create_dataset("Data", (1000,), compression="gzip",
shuffle=True)

 

References