Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The example below specifies a chunk size of 2048 elements for the small data, and 12 elements for the large. Each large element is about 9MB, so each chunk of the large dataset is about 100MB. If you run this example over a large number of events, you will notice it takes slightly longer to process every 12th event. This is when a chunk of the large dataset gets filled and flushed to disk. If you run the example as it is, over the 3 events, you will notice that the output file is quite large, about 100MB - that is Hdf5 does not write partial chunks - only complete chunks. This script lives in /reg/g/psdm/tutorials/examplePython/userSmallHDF5_2userLargeHDF5.py:

Code Block
languagepython
 
import numpy as np
import psana
import h5py
 
NUM_EVENTS_TO_WRITE=3
 
ds = psana.DataSource('exp=xpptut15:run=54:smd')
 
h5out = h5py.File("userData.h5", 'w')
smallDataSet = h5out.create_dataset('cspad_sums',(0,), dtype='f8', 
                                    chunks=(2048,), maxshape=(None,))
largeDataSet = h5out.create_dataset('cspads',(0,32,185,388), dtype='f4', 
                                    chunks=(12,32,185,388), 
                                    maxshape=(None,32,185,388))
cspad = psana.Detector('cspad', ds.env())
 
for idx, evt in enumerate(ds.events()):
    if idx > NUM_EVENTS_TO_WRITE: break
    calib = cspad.calib(evt)
    if calib is None: continue
    smallDataSet.resize((idx+1,))
    largeDataSet.resize((idx+1,32,185,388))
    smallDataSet[idx] = np.sum(calib)
    largeDataSet[idx,:] = calib[:]    
 
h5out.close()

...