Below is a Python example that gathers some data from an experiment and saves it to a hdf5 file for future use. The gathered data will be used to select which shots to use for future analysis. It demonstrates the use of h5py to write and read compound data types to hdf5 files.
The data used is a CXI run from the Tutorial test data.
To use this script:
- place it in your release
- run ipython
enter the commands:
import gather_save
data,timePair = gather_save.gatherData()
gather_save.writeCompoundDataToH5(data,timePair,"saved_output.h5")
data,timePair = gather_save.readCompoundDataFromH5("saved_output.h5")
The function gatherData() is one that needs to be modified for different datasets. writeCompoundData and readCompoundData will not.
data is a numpy array with 6 named fields that gather different values from the events, epics pv's , a value from the gas detector, and the voltage sum of a Diode. The fields have names like 'aD' (the Diode sum) and 'aG' for the gas detector value.
One can work with the data using numpy features as follows:
logicalIndex = data['aG'] > 0.84 # a mask that is 1 when 'aG' is greater than 0.84 data['aD'][logicalIndex] # the mask is used to get diode values when 'aG' is > than 0.84 logicalIndex.nonzero()[0] # turn the mask into a list of positions, see the documentation on the numpy function nonzero http://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html # likewise one can do import numpy as np np.where(data['aG'] > 0.84)[0] # the [0] is necessary to get the indicies along the first axis
Details
Below we discuss how things are done in the example.
Use of Default Dict
from collections import defaultdict ... dataLists = defaultdict(list) for num,evt in enumerate(ds.events()): ... dataLists['aTime'].append(eventId.time())
The use of the standard Python defaultdict allows us to avoid initializing the keys in dataList. The price is that if we later make a typo when we retrieve 'aTime', from dataLists, the error message will not make this clear.
Creation of Numpy Array with Named Fields
When creating numpy arrays, it is more efficient to create with a known size. You can append to an existing numpy array, but to do this with every event may lead to a great deal of memory reallocation. In this example, we read the data into Python lists. Once we have all the data, we create the numpy array of the known size.
To create a numpy array with named fields you must define a dtype. For this example where each field is a float, it is fairly straightforward:
import numpy as np ... compoundDataType = np.dtype([('aG',np.float), ('aD',np.float), ('aH',np.float), ('aX',np.float), ('aY',np.float), ('aZ',np.float)]) compoundData = np.zeros(len(dataLists['aTime']), dtype=compoundDataType)
numpy dtypes can get quite complicated, for more information visit the documentation: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.htmlWri
Writing an HDF5 File of Compound Data
Once we have created the numpy array of named fields, it is straightforward to make a hdf5 file with one dataset that contains the numpy array. See the h5py documentation for more information.
import h5py f = h5py.File('myfile.h5''w') data_comp_type = compoundData.dtype data_dset = f.create_dataset('data', compoundData.shape, data_comp_type) data_dset[:] = compoundData f.close()
It is important to call the close() method of the h5py.File object when done.