Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
titleResults for CSPAD image with swapped bytes
collapsetrue
load_nda_from_file:
Data from file nda-cxitut13-r0010-e000011-raw.npy:  shape:(32, 185, 388)  size:2296960  dtype:int16 [1028 1082 1101 1072 1131]...
H(8-bit) = 5.080
nda8 :  shape:(2296960, 2)  size:4593920  dtype:uint8 [  4   4  58   4  77   4  48   4 107   4]...

# split data array for two with even and odd bytes: 
nda8L:  shape:(2296960,)    size:2296960  dtype:uint8 [  4  58  77  48 107  73  45 103  28  89]...
nda8H:  shape:(2296960,)    size:2296960  dtype:uint8 [4 4 4 4 4 4 4 4 4 4]...
H(low -byte) = 7.821
H(high-byte) = 0.376

 

Compression in HDF5

GZIP

"A number of compression filters are available in HDF5. By far the most commonly used is the GZIP filter. "

Code Block
dset = f.create_dataset("BigDataset", (1000,1000), dtype='f', compression="gzip")
dset.compression
GZIP features:
  • Works with all HDF5 types
  • Built into HDF5 and available everywhere
  • Moderate to slow speed compression
  • Performance can be improved by also using SHUFFLE

SZIP

"SZIP is a patented compression technology used extensively by NASA. Generally you only have to worry about this if you’re exchanging files with people who use satellite data. Because of patent licensing restrictions, many installations of HDF5 have the compressor (but not the decompressor) disabled."

...

SZIP features:

  • Integer (1, 2, 4, 8 byte; signed/unsigned) and floating-point (4/8 byte) types only
  • Fast compression and decompression
  • A decompressor that is almost always available

LZF

"For files you’ll only be using from Python, LZF is a good choice. It ships with h5py; C source code is available for third-party programs under the BSD license. It’s optimized for very, very fast compression at the expense of a lower compression ratio compared to GZIP. The best use case for this is if your dataset has large numbers of redundant data points."

...