You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Unable to render {include} The included page could not be found.
Unable to render {include} The included page could not be found.

Introduction

Matlab provides both a high level and a low level interface to the Hdf5 library. Functions at the high level include h5read and h5write for reading and writing from hdf files. Many of our datasets are one dimensional arrays of a compound type. A compound type is a well defined concept in Hdf5, it is like a C struct. In hdfview (a useful tool for viewing hdf5 files provided by the hdf5 group) they often look like two dimensional arrays, but the columns are really the field names of the compound type. When you read a dataset in with h5read each field is separated into its own 1D array. For instance, if the dataset looks like

    fieldA  fieldB
0   101     23.3
1   110     99.1
2   784     13.3

In hdfview, Matlab, will return a Matlab struct with two attributes:

dataset =

      fieldA: [3x1 uint16]
      fieldB: [3x1 float32]

vlen data

Although much of the LCLS data fits into the simpler static model of arrays, or structs whose fields are arrays, some of the LCLS data can be quite complex. In particular data written as HDF5 variable length data (vlen data) will show up in cell arrays in Matlab. One example that users may want to work with is the Evr Data. the evr data includes a list of the event codes that fired during an event. Below we'll take a look at working with the evr data in Matlab. We'll unpack the cell array based data into a flat array. From the flat array, we'll make a logical index array of the events with a certain event code. Then we'll use this to average over cspad for those events.

% we'll read evr data from the first calib cycle of a tutorial file:
evrData = h5read('/reg/d/psdm/xpp/xpptut13/hdf5/xpptut13-r0179.h5','/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0/data');

% evrData is a struct with one element - fifoEvents. fifoEvents is a cell array with one entry per event in the CalibCycle
fifoEvents = evrData.fifoEvents;
numberOfEvents = length(fifoEvents);
% numberOfEvents is 483

% let's identify the smallest and largest event code codes in this calib cycle
minEventCode=99999;
maxEventCode=0;
for eventIdx = 1:numEvents; maxEventCode = max(maxEventCode, max(fifoEvents{eventIdx}.eventCode)); end;
for eventIdx = 1:numEvents; minEventCode = min(minEventCode, min(fifoEvents{eventIdx}.eventCode)); end;
assert(minEventCode>0, 'unexpected - minimum event code should be greater than 0. matlab uses 1-up array indexing, 0 will not work with below code');

% lets fill out a flat Array, each row is an event, and column k is 1 only if eventCode k occurred in that event
eventCodesFlat = int8(zeros(numEvents, maxEventCode));
for i=1:numEvents; eventCode = fifoEvents{i}.eventCode; for ec=eventCode; eventCodesFlat(i,ec)=1; end; end;

% here is an example of how you might work with the flat array
numberOfEventsWithEventCode42 = sum(eventCodesFlat(:,42));
% there are 121 events with event code 42
% lets say we want to average over the cspad data for just those 121 events. 
% The below code is easy to write, although it uses quite a bit of memory:

cspad = h5read('/reg/d/psdm/xpp/xpptut13/hdf5/xpptut13-r0179.h5','/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/XppGon.0:Cspad.0/data');
% That reads in 2GB - 483 events * 2 bytes per element * 32 * 185 * 388.
% cspad's size is 388   185    32   483

eventsWith42 = eventCodesFlat(:,42)==1;
cspadEventCode42 = cspad(:,:,:,eventsWith42);
% size(cspadEventCode42) returns  388   185    32   121

cspadAverageEventCode42 = mean(cspadEventCode42,4);

 

Matlab Issue with Enums

When a Matlab user loads a dataset with h5read that is an array of enums, it returns a cell array of strings - so the user can work with the strings. However Matlab (as of version 2013b) fails to do this with our datasets where the enum is a field within a compound type. We have made Matlab aware of this problem and they understand the issue. Users for which enum to string translation is an important feature should feel free to contact Matlab and reference the Service request: 1-O0DQBH, enum field in compound type of hdf5 that we have made.

In the meantime, Matlab has provided some workaround code that we have reworked into a function users can use. You can download the file here:

translate_enums.m

or soon be able to get it from the src directory of the h5tools package of the analysis release: /reg/g/psdm/sw/releases/ana-current/h5tools/src (as of today -10/3/2013, it is not yet part of ana-current, but will be soon).  Once you have the file, add it to a directory in your Matlab search path, and call the function translate_enums in order to have the enum field translated into a string field. One would use this as follows:

filename='somefile.h5';
datasetname='/path/to/dataset';
ds = h5read(filename,datasetname);
ds = translate_enums(ds, filename, datasetname);

If ds is a dataset whose base type is a compound type, translate_enums will replace fields that are enums with their string counterparts. For example, suppose one had

  ds
     field1  [3x1 int16]
     field2  [3x1 float32]

where field1 was for an enum ONE=1, TWO=2, THREE=3 and was the array [2,1,3]. Then the output of translate_enums will be

ds
  field1  {3x1 cell array}
  field2  [3x1 float32]

where field1 is now

{ 'TWO', 'ONE', 'THREE' }

Performance

One challenge Matlab users face is how Matlab handles the large datasets in the LCLS files. If a dataset is to large to fit in memory, you will have to read a subset of the rows (h5read has optional arguments to do this).

  • No labels