Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The compound type
    • when the number of fields gets large, this does not print well when interactively exploring the data in h5py
    • Field names are far from the data
    • dtypes, especially with enums in them, can be very complicated
  • enums - these are well defined objects in hdf5 - in the data they are stored as integers.  A dictionary that maps symbolic names to integers is stored in one spot in the dataset.
    • In h5py they are displayed as ints, extra steps are required to obtain the enum dictionary and translate ints to strings, or use strings to test the values of the enum.
  • vlen data
    • h5py (as of version 2.2) presently only supports variable length strings
    • Our EvrData uses general vlen data to represent the variable number of event codes that occur with each event
    • We have patched h5py to be able to read  general vlen data, such as what is in our EvrData

The current version of h5py works fine with vlen data, but older versions did not support it.

vlen data

Here is an example of how one might work with vlen data. An example of vlen data is the EvrData. During each event, the EvrData includes fifoCodes - this is a variable length list. Each element in the list has three parts, timestampHigh, timestampLow and eventCode. Let's write an example that takes the EvrData and flattens it out into a table, where each entry in the table is 0 or 1 depending on whether or not that eventCode fired.

Code Block
languagepython
import h5py
import numpy as np

f=h5py.File('/reg/d/psdm/xpp/xpptut13/hdf5/xpptut13-r0179.h5','r')
evrData=f['/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0/data']
numberOfEvents = len(evrData)    # this gives 483
largestEventCode = max([max([fifoEvent['eventCode'] for fifoEvent in fifoEvents]) for fifoEvents in  evrData['fifoEvents']])
# this gives 162, this is the largest eventCode that occurs in this calib cycle.
eventCodes = np.zeros((numberOfEvents, largestEventCode+1), np.int8)
for eventIndex,fifoEvents in enumerate(evrData['fifoEvents']):
    for fifoEvent in fifoEvents:
        eventCodes[eventIndex, fifoEvent['eventCode']]=1

At this point, eventCodes is a 483 x 162 table of 0/1 - the rows are the events, and the columns are the event codes. If we wanted to find what event codes were present and in what frequency, one could do

Code Block
languagepython
eventCodesInData = np.where(np.sum(eventCodes,0)!=0)[0]
numberOfTimesEachEventCodeFired = dict(zip(eventCodesInData, np.sum(eventCodes,0)[eventCodesInData]))
# this dict will be
# {41: 242, 
#  42: 121, 
#  67: 98, 
# 140: 483, 
# 162: 69}

One could then construct a logical index array to quickly average the cspad over the 121 events where event code 42 fired:

Code Block
languagepython
eventsWith42 = eventCodes[:,42]==1
cspad=f['/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/XppGon.0:Cspad.0/data']
assert len(eventsWith42)==len(cspad), "There are cases when datasets for different types " + \
        "are not aligned due to damage, it is best to do more than this and check that the times datasets are the same between cspad and the evrdata"
cspadAt42 = cspad[eventsWith42]
cspadAt42.shape    # this returns  (121, 32, 185, 388)
avgAt42 = np.sum(cspadAt42,0)
avgAt42.shape      # this returns  (32, 185, 388)

 To work with the vlen field in the EvrData using the high level h5py functions, you will need to obtain the patched version of h5py from LCLS.  One can also use the low level interface to the HDF5 library that h5py provides (our patched version is not needed in this case).

Code to Print Datasets

Below is code that provides a function, printds to display h5py datasets. The output has the following features:

...