1. About this tutorial
The main objective of this session is to introduce and to explain the new Python interface for accessing LCLS data from analysis applications. The new software framework is known as the "Interactive psana" or just ipsana. The first idea of implementing such tool was suggested around 1.5 years ago at the joint PCDS/SRD meeting (look for ipsana). Though its underlying machinery is largely based on the batch version of psana the interface to the data is much simple, more intuitive, and it requires much less code to be written by a user in order to get to "that CSPad image" (EPICS PV, etc.). The new framework won't work for everyone, especially for those users who have either heavily invested into the modular code of the batch frameworks, or who needs the performance of modules written in C++. Still our intent is to demonstrate the power of the new approach and to encourage using the tool where it seems to be appropriate.
What is beyond a scope of this tutorial
- This isn't an explanation how to do the data analysis. Note that our goal is to explain basic techniques for getting to your data, not for using it!
- This is not a Users or Reference Guide for the interactive framework
The pyana users, attention!
As it's been announced earlier, the pyana framework will be phased out at some point. There is a variety of reason why:
- first of all, as our understanding of what kind of analysis framework would work better for our users and for ourselves as developers grew over time we realized that we needed to develop a tool which would have a better interface to the data, a better internal architecture and be much easy to maintain and extend for new data types of LCLS instruments. Hence we ended up with psana.
- for its users, the psana framework has a number of advantages, among which:
- better API to the data
- a possibility of writing (mixing) modules in C++ and Python. Modules written in different languages will still the same data, and they can also exchange data within the framework.
- supporting data sets in both XTC and HDF5
- an ability to read the "live" files (while they're being recorded by the DAQ system or data movers)
- and ability to read data from shared memory of the DAQ (DSS) machines
The last two features are opening an interesting possibility of using psana for real-time monitoring of data while benefiting from reusing
the same code which might be developed for the traditional OFFLINE processing/analysis.
2. Things to know before to run the examples
The location of examples
We put all examples for today's session at the following diretory:
/reg/g/psdm/tutorials
Data files used in the examples
In order to make our examples as close to the "real" analysis environment as possible we chose to create 6 pseudo experiments (one per instrument):
- AMO/amotut13
- SXR/sxrtut13
- XPP/xpptut13
- XCS/xcstut13
- CXI/cxitut13
- MEC/mectut13
Each experiment's directory has the standard structure:
ls -al ls -al /reg/d/psdm/XPP/xpptut13/ drwxr-sr-x 8 psdatmgr ps-data 4096 Jun 4 12:02 . drwxrwsr-x 30 psdatmgr ps-data 4096 Jun 4 12:02 .. drwxrwsr-x+ 3 psdatmgr ps-data 4096 Jun 5 10:53 calib drwxrwsr-x+ 2 psdatmgr ps-data 4096 Jun 4 12:02 ftc drwxr-sr-x 2 psdatmgr ps-data 4096 Jun 6 16:41 hdf5 drwxrwsr-x+ 2 psdatmgr ps-data 4096 Jun 4 12:02 res drwxrwsr-x+ 2 psdatmgr ps-data 4096 Jun 4 12:03 scratch drwxr-sr-x 2 psdatmgr ps-data 4096 Jun 6 16:33 xtc ls -al /reg/d/psdm/XPP/xpptut13/hdf5/ drwxr-sr-x 2 psdatmgr ps-data 4096 Jun 6 16:41 . drwxr-sr-x 8 psdatmgr ps-data 4096 Jun 4 12:02 .. -r--r--r-- 1 psdatmgr ps-data 1765036151 Jun 6 16:39 xpptut13-r0178.h5 -r--r--r-- 1 psdatmgr ps-data 394528165 Jun 6 16:39 xpptut13-r0179.h5 -r--r--r-- 1 psdatmgr ps-data 128185688 Jun 6 16:39 xpptut13-r0180.h5 -r--r--r-- 1 psdatmgr ps-data 912811050 Jun 6 16:39 xpptut13-r0181.h5
All data files are open for reading by anyone who can log onto PCDS computers. Moreover, those directories (like scratch/, ftc/) are open for writing by anyone. And yes, one can also see these experiments in the Web Portal.
Setting up your environment
- make sure you can run X11 applications. Most examples of this tutorial will do a simple visualization.
- log onto any machine of interactive analysis clusters psananeh or psanafeh
- make sure you sources (just once) one of the following scripts (depending on which UNIX shell you are using):
. /reg/g/psdm/etc/sit_env.sh source /reg/g/psdm/etc/sit_env.csh
- run (just once) the following command which will set up a proper OFFLINE Analysis environment for the latest analysis release:
sit_setup ana-current
At this point you must be ready to go. To test the your environment is set up correctly try running psana without any parameters. If your environment is properly set you should see something like this:
psana [error:2013-06-06 20:54:44.131:PSAnaApp.cpp:218] no analysis modules specified
3. Basic examples
This section presents a few simple scripts which have been developed to underline the main ideas behind the framework's API. The code of the examples along with a simple HOWTO file can be found at:
/reg/g/psdm/tutorials/common/data_access_methods/
Printing identifiers of all events of a run
First try this:
./print_event_id.py
Then look at the code. It will do three things:
- import the psana module:
import psana
- open the data set. Note the syntax for the data set specification string:
dsname = "exp=sxrtut13:run=366" ds = psana.DataSet(dsname)
- note that by default the framework will look for XTC files at the standard location where all experimental data are supposed to be. If you want to play with HDF (in case if there are HDF5* version of the run) you may slightly change that string by appending h5 in the end:
dsname = "exp=sxrtut13:run=366:h5"
- the next thing which this code will do will be to iterate over all events. At each step you will get a reference to an event object evt and it will extract and print an identifier of the event:
for i, evt in enumerate(ds.events()): evtnum = i + 1 id = evt.get(psana.EventId) print "%6d:" % evtnum, id
In the end you're supposed to see something like this:
./print_event_id.py 1: XtcEventId(run=366, time=2013-04-21 04:37:39.343773772-07, fiducials=38877, ticks=329342, vector=19553) 2: XtcEventId(run=366, time=2013-04-21 04:37:39.360457259-07, fiducials=38883, ticks=331442, vector=19554) 3: XtcEventId(run=366, time=2013-04-21 04:37:39.377123777-07, fiducials=38889, ticks=330560, vector=19555) 4: XtcEventId(run=366, time=2013-04-21 04:37:39.393797466-07, fiducials=38895, ticks=329762, vector=19556) 5: XtcEventId(run=366, time=2013-04-21 04:37:39.410477971-07, fiducials=38901, ticks=331204, vector=19557) 6: XtcEventId(run=366, time=2013-04-21 04:37:39.427145705-07, fiducials=38907, ticks=331036, vector=19558) 7: XtcEventId(run=366, time=2013-04-21 04:37:39.443816588-07, fiducials=38913, ticks=329370, vector=19559) 8: XtcEventId(run=366, time=2013-04-21 04:37:39.460499778-07, fiducials=38919, ticks=331414, vector=19560) 9: XtcEventId(run=366, time=2013-04-21 04:37:39.477167658-07, fiducials=38925, ticks=330616, vector=19561) 10: XtcEventId(run=366, time=2013-04-21 04:37:39.493840079-07, fiducials=38931, ticks=329720, vector=19562) ...
Printing a catalog of event components
The sample can be run like this:
./print_event_keys.py Components of the first event found in the dataset: EventKey(type=psana.EvrData.DataV3, src='DetInfo(NoDetector.0:Evr.0)') EventKey(type=psana.Acqiris.DataDescV1, src='DetInfo(SxrEndstation.0:Acqiris.0)') EventKey(type=psana.Acqiris.DataDescV1, src='DetInfo(SxrEndstation.0:Acqiris.1)') EventKey(type=psana.Bld.BldDataEBeamV3, src='BldInfo(EBeam)') EventKey(type=psana.Bld.BldDataPhaseCavity, src='BldInfo(PhaseCavity)') EventKey(type=psana.Bld.BldDataFEEGasDetEnergy, src='BldInfo(FEEGasDetEnergy)') EventKey(type=psana.Bld.BldDataGMDV1, src='BldInfo(GMD)') EventKey(type=psana.EventId) EventKey(type=None)
Why it's so important to know this information? Because these parameters will tell you:
- what's inside the event
- and how to extract the corresponding data objects associated with these keys
The above shown output will translate into the following getters (similar to the one which is used in the very first example extracting event identifiers):
obj = evt.get( psana.EvrData.DataV3, psana.Source('DetInfo(NoDetector.0:Evr.0)')) obj = evt.get( psana.Acqiris.DataDescV1, psana.Source('DetInfo(SxrEndstation.0:Acqiris.0)')) obj = evt.get( psana.Acqiris.DataDescV1, psana.Source('DetInfo(SxrEndstation.0:Acqiris.1)')) obj = evt.get( psana.Bld.BldDataEBeamV3, psana.Source('BldInfo(EBeam)')) obj = evt.get( psana.Bld.BldDataPhaseCavity, psana.Source('BldInfo(PhaseCavity)')) obj = evt.get( psana.Bld.BldDataFEEGasDetEnergy, psana.Source('BldInfo(FEEGasDetEnergy)')) obj = evt.get( psana.Bld.BldDataGMDV1, psana.Source('BldInfo(GMD)')) obj = evt.get( psana.EventId)
Note that event components obtained through this API will be objects of various classes. A full catalog of those objects can be found in the DOXYGEN documentation which is auto-generated from the code of the OFFLINE releases.
Iterating over scans and events
Some of our experiments (in particular XPP) are heavily relying on so called scans (also known as "Calibration Transitions*) while taking their data. Each DAQ run has one or many scans. Events are recorded in a scope of a scan. The new framework has a special provision for scans through the iterator of scans. The idea begin the following example is:
- open a data set which has multiple scans in each run
- iterate over scans
- iterate over events in each scan
This simple application knows about scan boundaries. More over, this example illustrates how to open a data set composed of many runs (processing a series of runs at once). There are two examples in this set:
./scans_in_runs_xtc.py ./scans_in_runs_hdf.py
They both do the same. The only subtle difference is which data format they're suing. The first example will read XTC files, while the second one will read HDF5 files. When running these examples you should notice differences in their performance. They're explained by different organization of data in XTC vs HDF5 formats. We'll be happy to provide you with an explanation if you'll be interested in it.
4. Instrument-specific examples
This section includes a number of examples which are relevant for different instruments. Their primary meaning is to illustrate how to access data objects which are
XCS: movie
The code of examples is found at:
/reg/g/psdm/tutorials/xcs/princeton_movie/
SXR: correlation plots for signals from GDM and Diode
The code of examples is found at:
/reg/g/psdm/tutorials/sxr/gmd_vs_diode/
CXI: diffraction patterns on the CSPad detector
The code of examples is found at:
/reg/g/psdm/tutorials/cxi/cspad_imaging/
5. Doing something less trivial
Custom HDF5 translator written by a user
A problem:
- Let's suppose we need to write a data extraction tool to extract images (CSPad, Princeton, etc) from XTC files and make then available for further analysis in Matlab. At this point we should already know how to get images from the raw files using ipsana. Now the only remaining problem is to store them in some form which may be readable from Matlab (assuming we're looking at some reasonable performance).
Perhaps the best way to solve the problem would be to store those images in an HDF5 file using some library. And this is what this example offers. It uses the PyTables package to dump numpy arrays into an out put files. This package is known for its simple API which doesn't require a user to learn the low-level library h5py.
The code of example is found at:
/reg/g/psdm/tutorials/common/hdf5_translator/