Introduction
Matlab provides both a high level and a low level interface to the Hdf5 library. Functions at the high level include h5read
and h5write
for reading and writing from hdf files. Many of our datasets are one dimensional arrays of a compound type. A compound type is a well defined concept in Hdf5, it is like a C struct. In hdfview (a useful tool for viewing hdf5 files provided by the hdf5 group) they often look like two dimensional arrays, but the columns are really the field names of the compound type. When you read a dataset in with h5read
each field is separated into its own 1D array. For instance, if the dataset looks like
fieldA fieldB 0 101 23.3 1 110 99.1 2 784 13.3
In hdfview, Matlab, will return a Matlab struct with two attributes:
dataset = fieldA: [3x1 uint16] fieldB: [3x1 float32]
vlen data
Although much of the LCLS data fits into the simpler static model of arrays, or structs whose fields are arrays, some of the LCLS data can be quite complex. In particular data written as HDF5 variable length data (vlen data) will show up in cell arrays in Matlab. One example that users may want to work with is the Evr Data. the evr data includes a list of the event codes that fired during an event. Below we'll take a look at working with the evr data in Matlab. We'll unpack the cell array based data into a flat array. From the flat array, we'll make a logical index array of the events with a certain event code. Then we'll use this to average over cspad for those events.
% we'll read evr data from the first calib cycle of a tutorial file: evrData = h5read('/reg/d/psdm/xpp/xpptut13/hdf5/xpptut13-r0179.h5','/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0/data'); % evrData is a struct with one element - fifoEvents. fifoEvents is a cell array with one entry per event in the CalibCycle fifoEvents = evrData.fifoEvents; numberOfEvents = length(fifoEvents); % numberOfEvents is 483 % let's identify the smallest and largest event code codes in this calib cycle minEventCode=99999; maxEventCode=0; for eventIdx = 1:numEvents; maxEventCode = max(maxEventCode, max(fifoEvents{eventIdx}.eventCode)); end; for eventIdx = 1:numEvents; minEventCode = min(minEventCode, min(fifoEvents{eventIdx}.eventCode)); end; assert(minEventCode>0, 'unexpected - minimum event code should be greater than 0. matlab uses 1-up array indexing, 0 will not work with below code'); % lets fill out a flat Array, each row is an event, and column k is 1 only if eventCode k occurred in that event eventCodesFlat = int8(zeros(numEvents, maxEventCode)); for i=1:numEvents; eventCode = fifoEvents{i}.eventCode; for ec=eventCode; eventCodesFlat(i,ec)=1; end; end; % here is an example of how you might work with the flat array numberOfEventsWithEventCode42 = sum(eventCodesFlat(:,42)); % there are 121 events with event code 42 % lets say we want to average over the cspad data for just those 121 events. % The below code is easy to write, although it uses quite a bit of memory: cspad = h5read('/reg/d/psdm/xpp/xpptut13/hdf5/xpptut13-r0179.h5','/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/XppGon.0:Cspad.0/data'); % That reads in 2GB - 483 events * 2 bytes per element * 32 * 185 * 388. % cspad's size is 388 185 32 483 eventsWith42 = eventCodesFlat(:,42)==1; cspadEventCode42 = cspad(:,:,:,eventsWith42); % size(cspadEventCode42) returns 388 185 32 121 cspadAverageEventCode42 = mean(cspadEventCode42,4);
Matlab Issue with Enums
When a Matlab user loads a dataset with h5read
that is an array of enums, it returns a cell array of strings - so the user can work with the strings. However Matlab (as of version 2013b) fails to do this with our datasets where the enum is a field within a compound type. We have made Matlab aware of this problem and they understand the issue. Users for which enum to string translation is an important feature should feel free to contact Matlab and reference the Service request: 1-O0DQBH, enum field in compound type of hdf5
that we have made.
In the meantime, Matlab has provided some workaround code that we have reworked into a function users can use. You can download the file here:
or soon be able to get it from the src directory of the h5tools package of the analysis release: /reg/g/psdm/sw/releases/ana-current/h5tools/src (as of today -10/3/2013, it is not yet part of ana-current, but will be soon). Once you have the file, add it to a directory in your Matlab search path, and call the function translate_enums
in order to have the enum field translated into a string field. One would use this as follows:
filename='somefile.h5'; datasetname='/path/to/dataset'; ds = h5read(filename,datasetname); ds = translate_enums(ds, filename, datasetname);
If ds
is a dataset whose base type is a compound type, translate_enums
will replace fields that are enums with their string counterparts. For example, suppose one had
ds field1 [3x1 int16] field2 [3x1 float32]
where field1
was for an enum ONE=1, TWO=2, THREE=3
and was the array [2,1,3]
. Then the output of translate_enums
will be
ds field1 {3x1 cell array} field2 [3x1 float32]
where field1
is now
{ 'TWO', 'ONE', 'THREE' }
Performance
One challenge Matlab users face is how Matlab handles the large datasets in the LCLS files. If a dataset is to large to fit in memory, you will have to read a subset of the rows (h5read
has optional arguments to do this).