Objective
Currently LCLS does not offer a uniform approach to the analysis of accumulated experemental data. Users exploit myana
, pyana
, MatLab
, IDL
, CASS
, and probably something else. The work on long-awaited project of psana
is in progress. The psana
is going to be quite generic and probably not so simple approach. In this page we discuss a simple but absolutly flexible approach to analysis of data stored in HDF5 files. It is based on Python
code with extensive exploitation of standard libraries. A few examples of how to access and process data are presented at the end of this page.
There are obvious advantages in this approach,
- this approach is absolutely flexible; HDF5 file has indexed structure, that means direct access to any event data from any file from your code.
Python
is a high-level scripting language allows to write transparent and compact code based on well-elaborated standard libraries.- In general code in
Python
works slow comparing toC++
, but there are libraries likeNumPy
written onC++
, which solve this problem for manipulation with large arrays.
There is a couple of drawbacks in this approach,
- you have to know or learn
Python
- corrent version of the
h5py
library works quite slow with long HDF5 files
The first issue about Python
is not really a drawback. Basic concept of this high-level language can be learned from scratches for about a couple of days. In a week you will feel yourself as an expert and will enjoy programming on this powerfull language. Second issue about slow h5py
library is really anoying, but we hope that authors will account for our comments and its performane can be improved soon.
Below we assume that everything is set up to work on LCLS analysis farm, othervise see Computing and Account Setup.
Libraries
Here is a list of libraries with appropriate references which we are going to use in our examples:
These libraries can be easily imported somewhere around the header of the Python
file, for example
#!/usr/bin/env python import h5py import numpy as np import scipy as sp import scipy.ndimage as spi import matplotlib.pyplot as plt
Basic operations
Let us consider basic operation which you have to code in order to access HDF5 data.
- Open file, get dataset, get array for current event, and close file:
file = h5py.File(hdf5_file_name, 'r') # Open hdf5 file in read-only mode dataset = file[dataset_name] arr1ev = dataset[event_number] file.close()
where we assume that all necessary parameters were defined earlier, for example
hdf5_file_name = '/reg/d/psdm/XPP/xppcom10/hdf5/xppcom10-r0546.h5' dataset_name = '/Configure:0000/Run:0000/CalibCycle:0000/Camera::FrameV1/XppSb4Pim.1:Tm6740.1/image' event_number = 5
The arr1ev
is returned as a NumPy
object. There are many methods which allow to manipulate with this object. For example, one can
- print array shape and content:
print 'arr1ev.shape =', arr1ev.shape print 'arr1ev =\n', arr1ev
Advanced operations
- Get item attributes
- Get group name and the list of daughters
- Check if the HDF5 item is "File", "Group", or "Data"
Code examples
- Example 1, basic operations:
#!/usr/bin/env python import h5py import numpy as np eventNumber = 5 file = h5py.File('/reg/d/psdm/XPP/xppcom10/hdf5/xppcom10-r0546.h5', 'r') dataset = file['/Configure:0000/Run:0000/CalibCycle:0000/Camera::FrameV1/XppSb4Pim.1:Tm6740.1/image'] arr1ev = dataset[eventNumber] file.close() print 'arr1ev.shape =', arr1ev.shape print 'arr1ev =\n', arr1ev
- Example 2, advanced operations:
Needs to be added