Page History

Below is an example of how one might gather data from a run and save it into an hdf5 file. The data could then be read back in later without having to reprocess the file.This example is for a particular CXI experiment. It could be modified for other experiments. It uses hdf5 compound types for the storagea Python example that gathers some data from an experiment and saves it to a hdf5 file for future use. The gathered data will be used to select which shots to use for future analysis. It demonstrates the use of h5py to write and read compound data types to hdf5 files.

The data used is a CXI run from the Tutorial test data.

Code Block

theme	Confluence
language	python
title	gather_save.py
linenumbers	true
collapse	true

import numpy as np
import h5py
from collections import defaultdict
import psana


def getFirstEpicsValue(epics,pvName):
    '''convenience function to get first epics value from the given pv
    '''
    return epics.getPV(pvName).data()[0]

def gatherData(eventsToProcess=None):
    '''Goes through a specific experiment and run, 
    gathers specific values from the data,
    returns a two numpy arrays:
       compoundData: the gathered data
       compoundTime: the event times for each row of the gathered data
    '''
    ExperimentName = 'cxi80410'
    RunNum = 1116
    DataType = ':h5' # set to ':h5' for HDF5 and '' for XTC
    dataset_name = 'exp=CXI/%s:run=%d%s' % (ExperimentName,RunNum,DataTypeds = psana.DataSource("exp=cxitut13:run=0022")
    ds = psana.DataSource(dataset_name)
    src_diode = psana.Source('DetInfo(CxiDg4.0:Ipimb.0)')
    src_gasdet = psana.Source('BldInfo(FEEGasDetEnergy)')
    epics = ds.env().epicsStore()

    dataLists = defaultdict(list)
    for num,evt in enumerate(ds.events()):
        if eventsToProcess is not None and num >= eventsToProcess:
            break
        diode = evt.get(psana.Ipimb.DataV2,src_diode)
        dataLists['aDiode'].append(diode.channel0Volts()+diode.channel1Volts()+
                                   diode.channel2Volts()+diode.channel3Volts())
        GasDet = evt.get(psana.Bld.BldDataFEEGasDetEnergy,src_gasdet)
        dataLists['aGasDet'].append(GasDet.f_11_ENRC())
        eventId = evt.get(psana.EventId)
        dataLists['aTime'].append(eventId.time())
        dataLists['aSampleX'].append(getFirstEpicsValue(epics,'CXI:SC1:MZM:08:ENCPOSITIONGET'))
        dataLists['aSampleY'].append(getFirstEpicsValue(epics,'CXI:SC1:MZM:09:ENCPOSITIONGET'))
        dataLists['aSampleZ'].append(getFirstEpicsValue(epics,'CXI:SC1:MZM:10:ENCPOSITIONGET'))
        dataLists['aKB1Hpitch'].append(getFirstEpicsValue(epics,'CXI:KB1:MMS:07.RBV'))
        dataLists['aKB1Vpitch'].append(getFirstEpicsValue(epics,'CXI:KB1:MMS:11.RBV'))        
        if num % 1000 == 0:
            print "event number=%8d diode.chann8el0Volts=%f GasDet.f_11_ENCR=%f" % \
                (num, diode.channel0Volts(),GasDet.f_11_ENRC())

    # motor calibration
    EncoderScale = 0.005 
    Xoffset = 17050.
    Yoffset = 1056510.
    Zoffset = 1830400.

    compoundDataType = np.dtype([('aG',np.float), ('aD',np.float), ('aH',np.float), ('aV',np.float), 
                                 ('aX',np.float), ('aY',np.float), ('aZ',np.float)])

    compoundData = np.zeros(len(dataLists['aTime']), dtype=compoundDataType)

    compoundData['aG'] = dataLists['aGasDet']
    compoundData['aD'] = dataLists['aDiode']
    compoundData['aH'] = dataLists['aKB1Hpitch']
    compoundData['aV'] = dataLists['aKB1Vpitch']
    compoundData['aX'] = np.array(dataLists['aSampleX']) * EncoderScale - Xoffset
    compoundData['aY'] = np.array(dataLists['aSampleY']) * EncoderScale - Yoffset
    compoundData['aZ'] = np.array(dataLists['aSampleZ']) * EncoderScale - Zoffset

    timeDataType = np.dtype([('seconds',np.uint32),('nanoseconds',np.uint32)])
    compoundTime = np.zeros(len(dataLists['aTime']), dtype=timeDataType)
    compoundTime[:] = dataLists['aTime']

    return compoundData, compoundTime

def writeCompoundDataToH5(compoundData, compoundTime, h5filename):
    f = h5py.File(h5filename,'w')
    data_comp_type = compoundData.dtype
    data_dset = f.create_dataset('data', compoundData.shape, data_comp_type)
    data_dset[:] = compoundData
    time_comp_type = compoundTime.dtype
    time_dset = f.create_dataset('time',compoundTime.shape, time_comp_type)
    time_dset[:] = compoundTime
    f.close()

def readCompoundDataFromH5(h5filename):
    f = h5py.File(h5filename)
    data_dset = f['data']
    data_array = data_dset[:]
    time_dset = f['time']
    time_array = time_dset[:]
    f.close()
    return data_array, time_array

if __name__ == '__main__':
    data,timePair = gatherData()
    writeCompoundDataToH5(data,timePair,"saved_output.h5")
    data,timePair = readCompoundDataFromH5("saved_output.h5")

To use this script:

place it in your release
run ipython

enter the commands:

import gather_save

data,timePair = gather_save.gatherData()

gather_save.writeCompoundDataToH5(data,timePair,"saved_output.h5")

data,timePair = gather_save.readCompoundDataFromH5("saved_output.h5")

...

data is a numpy array with 7 6 named fields that gather different values from the events, epics pv's , a value from the gas detector, and the voltage sum of a Diode. The fields have names like 'aD' (the Diode sum) and 'aG' for the gas detector value.

...

Code Block

language	python

logicalIndex = data['aG'] > 0.84   # a mask that is 1 when 'aG' is greater than 0.84
data['aD'][logicalIndex]           # the mask is used to get diode values when 'aG' is > than 0.84
logicalIndex.nonzero()[0]          # turn the mask into a list of positions, see the documentation on the numpy function nonzero http://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html

# likewise one can do
import numpy as np
np.where(data['aG'] > 0.84)[0]     # the [0] is necessary to get the indicies along the first axis

Details

Below we discuss how things are done in the example.

Use of Default Dict

Code Block

language	python

from collections import defaultdict
...
    dataLists = defaultdict(list)
    for num,evt in enumerate(ds.events()):
    ...
        dataLists['aTime'].append(eventId.time())

The use of the standard Python defaultdict allows us to avoid initializing the keys in dataList. The price is that if we later make a typo when we retrieve 'aTime', from dataLists, the error message will not make this clear.

Creation of Numpy Array with Named Fields

When creating numpy arrays, it is more efficient to create with a known size. You can append to an existing numpy array, but to do this with every event may lead to a great deal of memory reallocation. In this example, we read the data into Python lists. Once we have all the data, we create the numpy array of the known size.

...

Code Block

language	python

import numpy as np
... 
compoundDataType = np.dtype([('aG',np.float), ('aD',np.float), ('aH',np.float), ('aV',np.float),
                             ('aX',np.float), ('aY',np.float), ('aZ',np.float)])
 
compoundData = np.zeros(len(dataLists['aTime']), dtype=compoundDataType)

numpy dtypes can get quite complicated, for more information visit the documentation: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.htmlWri

Writing an HDF5 File of Compound Data

Once we have created the numpy array of named fields, it is straightforward to make a hdf5 file with one dataset that contains the numpy array. See the h5py documentation for more information.

...

Beginning November 4, 2024, login to Confluence and Jira will change. Read more.

Page tree

Versions Compared

Old Version 4

New Version 5

Key

Details

Use of Default Dict

Creation of Numpy Array with Named Fields

Writing an HDF5 File of Compound Data