Introduction

Matlab provides both a high level and a low level interface to the Hdf5 library. Functions at the high level include h5read and h5write for reading and writing from hdf files. Many of our datasets are one dimensional arrays of a compound type. A compound type is a well defined concept in Hdf5, it is like a C struct. In hdfview (a useful tool for viewing hdf5 files provided by the hdf5 group) they often look like two dimensional arrays, but the columns are really the field names of the compound type. When you read a dataset in with h5read each field is separated into its own 1D array. For instance, if the dataset looks like

    fieldA  fieldB
0   101     23.3
1   110     99.1
2   784     13.3

In hdfview, Matlab, will return a Matlab struct with two attributes:

dataset =

      fieldA: [3x1 uint16]
      fieldB: [3x1 float32]

new users that will work with Matlab should contact their experiment POC's to see if they have any recommended code or tools. For instance XPP has developed numerous functions to help Matlab users work with LCLS data efficiently, including Evr vlen data which we discuss below.

vlen data

Although much of the LCLS data fits into the simpler static model of arrays, or structs whose fields are arrays, some of the LCLS data can be quite complex. In particular data written as HDF5 variable length data (vlen data) will show up in cell arrays in Matlab. One example that users may want to work with is the Evr Data. The evr data includes a list of the event codes that fired during an event. As suggested above, contact your experiment POC about any Matlab code that is recommend for this kind of task. Below we provide a simple example for working with the evr data in Matlab. We'll unpack the cell array based data into a flat array. From the flat array, we'll make a logical index array of the events with a certain event code. Then we'll use this to average over cspad for those events. For recent data that uses the Evr::DataV4 type, one doesn't need to do this as the evr event codes will already be translated into such a flattened table called 'present'. For the purposes of demonstrating how to work with vlen data, we will forget this for the time being. One thing to note when using the 'present' table in the hdf5 is that it is a 0-up table, i.e, column 1 is for event code 0. Below we build a 1-up table (since event code 0 should not be used).

% we'll read evr data from the first calib cycle of a tutorial file using Matlab's high level read function
evrData = h5read('/reg/d/psdm/xpp/xpptut13/hdf5/xpptut13-r0179.h5','/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0/data');

% evrData is a struct with one element - fifoEvents. fifoEvents is a cell array with one entry per event in the CalibCycle
fifoEvents = evrData.fifoEvents;
numberOfEvents = length(fifoEvents);
% numberOfEvents is 483

% let's identify the smallest and largest event code codes in this calib cycle
minEventCode=99999;
maxEventCode=0;
for eventIdx = 1:numberOfEvents; maxEventCode = max(maxEventCode, max(fifoEvents{eventIdx}.eventCode)); end;
for eventIdx = 1:numberOfEvents; minEventCode = min(minEventCode, min(fifoEvents{eventIdx}.eventCode)); end;
assert(minEventCode>0, 'unexpected - minimum event code should be greater than 0. matlab uses 1-up array indexing, 0 will not work with below code');

% lets fill out a flat Array, each row is an event, and column k is 1 only if eventCode k occurred in that event
eventCodesFlat = int8(zeros(numberOfEvents, maxEventCode));
for evtIdx=1:numberOfEvents; eventCode = fifoEvents{evtIdx}.eventCode; for ec=eventCode; eventCodesFlat(evtIdx,ec)=1; end; end;

% here is an example of how you might work with the flat array
numberOfEventsWithEventCode42 = sum(eventCodesFlat(:,42));
% there are 121 events with event code 42
% lets say we want to average over the cspad data for just those 121 events. 
% The below code is easy to write, although it uses quite a bit of memory:

cspad = h5read('/reg/d/psdm/xpp/xpptut13/hdf5/xpptut13-r0179.h5','/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/XppGon.0:Cspad.0/data');
% That reads in 2GB - 483 events * 2 bytes per element * 32 * 185 * 388.
% cspad's size is 388   185    32   483

eventsWith42 = eventCodesFlat(:,42)==1;
assert(length(eventsWith42)==length(cspad), 'number of events with cspad != number of events with evr data, even if equal, should do more and check that all entries in time datasets are the same');
 cspadEventCode42 = cspad(:,:,:,eventsWith42);
% size(cspadEventCode42) returns  388   185    32   121

cspadAverageEventCode42 = mean(cspadEventCode42,4);

 

Matlab Issue with Enums

When a Matlab user loads a dataset with h5read that is an array of enums, it returns a cell array of strings - so the user can work with the strings. However Matlab (as of version 2013b) fails to do this with our datasets where the enum is a field within a compound type. We have made Matlab aware of this problem and they understand the issue. Users for which enum to string translation is an important feature should feel free to contact Matlab and reference the Service request: 1-O0DQBH, enum field in compound type of hdf5 that we have made.

In the meantime, Matlab has provided some workaround code that we have reworked into a function users can use. You can download the file here:

translate_enums.m

or soon be able to get it from the src directory of the h5tools package of the analysis release: /reg/g/psdm/sw/releases/ana-current/h5tools/src (as of today -10/3/2013, it is not yet part of ana-current, but will be soon).  Once you have the file, add it to a directory in your Matlab search path, and call the function translate_enums in order to have the enum field translated into a string field. One would use this as follows:

filename='somefile.h5';
datasetname='/path/to/dataset';
ds = h5read(filename,datasetname);
ds = translate_enums(ds, filename, datasetname);

If ds is a dataset whose base type is a compound type, translate_enums will replace fields that are enums with their string counterparts. For example, suppose one had

  ds
     field1  [3x1 int16]
     field2  [3x1 float32]

where field1 was for an enum ONE=1, TWO=2, THREE=3 and was the array [2,1,3]. Then the output of translate_enums will be

ds
  field1  {3x1 cell array}
  field2  [3x1 float32]

where field1 is now

{ 'TWO', 'ONE', 'THREE' }

Performance

One challenge Matlab users face is how Matlab handles the large datasets in the LCLS files. If a dataset is to large to fit in memory, you will have to read a subset of the rows (h5read has optional arguments to do this).

  • No labels