You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Objective

Currently LCLS does not offer a uniform approach to the analysis of accumulated in experements data. Users exploits myana, pyana, MatLab, IDL, CASS, and probably something else. The work on long-awaited project of psana is in progress, but this is going to be quite generic and probably not so simple for new users approach. In this page we discuss a simple but absolutly flexible approach to analysis of data stored in HDF5 files. It is based on Python code with extensive expluatation of standard libraries. A few examples of how to access and process data are presented at the end of this page.

There are obvious advantages in this approach,

  • this approach is absolutely flexible; HDF5 file has indexed structure, that means direct access to any event data from any file from your code.
  • Python is a high-level scripting language allows to write transparent and compact code based on well-elaborated standard libraries.
  • In general code in Python works slow comparing to C++, but there are libraries like NumPy written on C++, which solve this problem for manipulation with large arrays.

There is a couple of drawbacks in this approach,

  • you have to know or learn Python
  • corrent version of the h5py library works quite slow with long HDF5 files

The first issue about Python is not really a drawback. Basic concept of this high-level language can be learned from scratches for about a couple of days. In a week you will feel yourself as an expert and will enjoy programming on this powerfull language. Second issue about slow h5py library is really anoying, but we hope that authors will account for our comments and its performane can be improved soon.

Below we assume that everything is set up to work on LCLS analysis farm, othervise see Computing and Account Setup.

Libraries

Here is a list of libraries with appropriate references which we are going to use in our examples:

These libraries can be easily imported somewhere around the header of the Python file, for example

#!/usr/bin/env python
import h5py
import numpy as np
import scipy as sp
import scipy.ndimage as spi
import matplotlib.pyplot as plt

Basic operations

Let us consider basic operation which you have to code in order to access HDF5 data.

  • Open file, get dataset, get array for current event, and close file:
    file    = h5py.File(hdf5_file_name, 'r')   # Open hdf5 file in read-only mode
    dataset = file[dataset_name]
    arr1ev  = dataset[event_number]
    file.close()

where we assume that all necessary parameters were defined earlier, for example

    hdf5_file_name = '/reg/d/psdm/XPP/xppcom10/hdf5/xppcom10-r0546.h5'
    dataset_name   = '/Configure:0000/Run:0000/CalibCycle:0000/Camera::FrameV1/XppSb4Pim.1:Tm6740.1/image'
    event_number   = 5

The arr1ev is returned as a NumPy object. There are many methods which allow to manipulate with this object. For example, one can

  • print array shape and content:
    print 'arr1ev.shape =', arr1ev.shape
    print 'arr1ev =\n',     arr1ev

Advanced operations

  • Get item attributes
  • Get group name and the list of daughters
  • Check if the HDF5 item is "File", "Group", or "Data"

Code examples

  • Example 1, basic operations:
#!/usr/bin/env python

import h5py
import numpy as np

eventNumber = 5

file    = h5py.File('/reg/d/psdm/XPP/xppcom10/hdf5/xppcom10-r0546.h5', 'r')
dataset = file['/Configure:0000/Run:0000/CalibCycle:0000/Camera::FrameV1/XppSb4Pim.1:Tm6740.1/image']
arr1ev  = dataset[eventNumber]
file.close()

print 'arr1ev.shape =', arr1ev.shape
print 'arr1ev =\n',     arr1ev
  • Example 2, advanced operations:
Needs to be added
  • No labels