You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

1. About this Tutorial

The main objective of this session is to introduce and to explain the new Python interface for accessing LCLS data. This software framework is known as the "Interactive psana" or ipsana for short. The first idea of implementing such tool was suggested around 1.5 years ago at the joint PCDS/SRD meeting (look for ipsana). Though its underlying machinery is largely based on the batch version of psana the interface to the data is simpler, more intuitive, and it requires less code to be written by a user in order to get to that CSPad image or that EPICS PV. The new framework won't work for everyone, specifically for those users who have either heavily invested into the modular code of the batch framework, or who need the performance of modules written in C++. Still our intent is to demonstrate the power of the new approach and to encourage using the tool where appropriate.

Scope of this Tutorial

  • This tutorial doesn't provide any real data analysis, it's just an explanation of basic techniques for accessing your data
  • This is not a Reference Manual for the interactive framework

(warning) Users of pyana and myana, attention: The pyana and myana frameworks will be phased out at some point. As our understanding of what kind of analysis framework works better for the users and for ourselves as developers grew evolved, we realized that we needed to develop a tool which would have an easier interface to the data, a better internal architecture and be easier to maintain and extend for new data types. Hence we developed psana framework which has a number of advantages:

  1. Better API to the data.
  2. Possibility of writing (mixing) modules in C++ and Python. Modules written in different languages will still see the same data, and they can also exchange data within the framework.
  3. Supporting both XTC and HDF5 and files.
  4. Ability to read the "live" files while they're being recorded by the DAQ system or data movers.
  5. Ability to read data from shared memory on the monitoring machines.

The last two features are opening the powerful possibility of using psana for real-time monitoring of data while benefiting from reusing the same code which might be developed for the traditional OFFLINE processing/analysis.

2. Getting Started

Example Location

All examples can be found under the following directory:

/reg/g/psdm/tutorials

Data Files

In order to make our examples as close to the "real" analysis environment as possible we chose to create 6 pseudo experiments (one per instrument):

  • AMO/amotut13
  • SXR/sxrtut13
  • XPP/xpptut13
  • XCS/xcstut13
  • CXI/cxitut13
  • MEC/mectut13

Each experiment's directory has the standard structure:

ls -al ls -al /reg/d/psdm/XPP/xpptut13/
drwxr-sr-x   8 psdatmgr ps-data 4096 Jun  4 12:02 .
drwxrwsr-x  30 psdatmgr ps-data 4096 Jun  4 12:02 ..
drwxrwsr-x+  3 psdatmgr ps-data 4096 Jun  5 10:53 calib
drwxrwsr-x+  2 psdatmgr ps-data 4096 Jun  4 12:02 ftc
drwxr-sr-x   2 psdatmgr ps-data 4096 Jun  6 16:41 hdf5
drwxrwsr-x+  2 psdatmgr ps-data 4096 Jun  4 12:02 res
drwxrwsr-x+  2 psdatmgr ps-data 4096 Jun  4 12:03 scratch
drwxr-sr-x   2 psdatmgr ps-data 4096 Jun  6 16:33 xtc


ls -al /reg/d/psdm/XPP/xpptut13/hdf5/
drwxr-sr-x 2 psdatmgr ps-data       4096 Jun  6 16:41 .
drwxr-sr-x 8 psdatmgr ps-data       4096 Jun  4 12:02 ..
-r--r--r-- 1 psdatmgr ps-data 1765036151 Jun  6 16:39 xpptut13-r0178.h5
-r--r--r-- 1 psdatmgr ps-data  394528165 Jun  6 16:39 xpptut13-r0179.h5
-r--r--r-- 1 psdatmgr ps-data  128185688 Jun  6 16:39 xpptut13-r0180.h5
-r--r--r-- 1 psdatmgr ps-data  912811050 Jun  6 16:39 xpptut13-r0181.h5

All data files are open for reading by anyone who can log onto PCDS computers. Moreover, those directories (like scratch/, ftc/) are open for writing by anyone. And yes, one can also see these experiments in the Web Portal.

Setting up the Environment

  • Make sure you can run X11 applications. Most examples of this tutorial will do a simple visualization. You can pass the -X or -Y argument to ssh to make sure you can forward the screens to your local machine, eg:
    ssh -Y psexport
    
  • Log onto any machine of the interactive pools psananeh or psanafeh, eg:
    ssh psananeh
    
  • Make sure you source (just once) one of the following scripts. When using the bash shell:
    . /reg/g/psdm/etc/sit_env.sh
    
    Or, when using the csh shell family:
    source /reg/g/psdm/etc/sit_env.csh
    
    Note that the default shell for most LCLS users is bash.
  • Run (just once) the following command which will set up a proper analysis environment for the latest analysis release:
    sit_setup ana-current
    

At this point you are ready to go. To test that the your environment is set up correctly try running psana without any parameters. If your environment is properly set you should see something like this:

psana
[error:2013-06-06 20:54:44.131:PSAnaApp.cpp:218] no analysis modules specified

3. Basic Examples

This section presents a few simple scripts which have been developed to underline the main ideas behind the framework's API. The code of the examples along with a simple HOWTO file can be found at:

/reg/g/psdm/tutorials/common/data_access_methods/
Printing Identifiers for all Events in a Run

First try this:

./print_event_id.py

Then look at the code. It will do three things:

  • Import the psana module:
    import psana
    
  • Open the data set. Note the syntax for the data set specification string:
    dsname = "exp=sxrtut13:run=366"
    ds = psana.DataSet(dsname)
    
  • Note that by default the framework will look for XTC files at the standard location where all experimental data are supposed to be. If you want to play with HDF5 (in case there is an HDF5 version of the run) you can change that string by appending h5 in the end:
    dsname = "exp=sxrtut13:run=366:h5"
    
  • The next thing which this code will do will be to iterate over all events. At each step you will get a reference to an event object evt and it will extract and print an identifier of the event:
    for evtnum, evt in enumerate(ds.events()):
        id = evt.get(psana.EventId)
        print "%6d:" % evtnum, id
    

In the end you're supposed to see something like this:

./print_event_id.py
     1: XtcEventId(run=366, time=2013-04-21 04:37:39.343773772-07, fiducials=38877, ticks=329342, vector=19553)
     2: XtcEventId(run=366, time=2013-04-21 04:37:39.360457259-07, fiducials=38883, ticks=331442, vector=19554)
     3: XtcEventId(run=366, time=2013-04-21 04:37:39.377123777-07, fiducials=38889, ticks=330560, vector=19555)
     4: XtcEventId(run=366, time=2013-04-21 04:37:39.393797466-07, fiducials=38895, ticks=329762, vector=19556)
     5: XtcEventId(run=366, time=2013-04-21 04:37:39.410477971-07, fiducials=38901, ticks=331204, vector=19557)
     6: XtcEventId(run=366, time=2013-04-21 04:37:39.427145705-07, fiducials=38907, ticks=331036, vector=19558)
     7: XtcEventId(run=366, time=2013-04-21 04:37:39.443816588-07, fiducials=38913, ticks=329370, vector=19559)
     8: XtcEventId(run=366, time=2013-04-21 04:37:39.460499778-07, fiducials=38919, ticks=331414, vector=19560)
     9: XtcEventId(run=366, time=2013-04-21 04:37:39.477167658-07, fiducials=38925, ticks=330616, vector=19561)
    10: XtcEventId(run=366, time=2013-04-21 04:37:39.493840079-07, fiducials=38931, ticks=329720, vector=19562)
    ...
Printing a Catalog of Event Components

The example can be run like this:

./print_event_keys.py

Components of the first event found in the dataset:
  EventKey(type=psana.EvrData.DataV3, src='DetInfo(NoDetector.0:Evr.0)')
  EventKey(type=psana.Acqiris.DataDescV1,         src='DetInfo(SxrEndstation.0:Acqiris.0)')
  EventKey(type=psana.Acqiris.DataDescV1,         src='DetInfo(SxrEndstation.0:Acqiris.1)')
  EventKey(type=psana.Bld.BldDataEBeamV3,         src='BldInfo(EBeam)')
  EventKey(type=psana.Bld.BldDataPhaseCavity,     src='BldInfo(PhaseCavity)')
  EventKey(type=psana.Bld.BldDataFEEGasDetEnergy, src='BldInfo(FEEGasDetEnergy)')
  EventKey(type=psana.Bld.BldDataGMDV1,           src='BldInfo(GMD)')
  EventKey(type=psana.EventId)
  EventKey(type=None)

Why it's so important to know this information? Because these parameters will tell you:

  • What's inside the event
  • and how to extract the corresponding data objects associated with these keys

The type and the source of the output shown above will allow you to access the data you need by using the following get functions:

obj = evt.get( psana.EvrData.DataV3,             psana.Source('DetInfo(NoDetector.0:Evr.0)'))
obj = evt.get( psana.Acqiris.DataDescV1,         psana.Source('DetInfo(SxrEndstation.0:Acqiris.0)'))
obj = evt.get( psana.Acqiris.DataDescV1,         psana.Source('DetInfo(SxrEndstation.0:Acqiris.1)'))
obj = evt.get( psana.Bld.BldDataEBeamV3,         psana.Source('BldInfo(EBeam)'))
obj = evt.get( psana.Bld.BldDataPhaseCavity,     psana.Source('BldInfo(PhaseCavity)'))
obj = evt.get( psana.Bld.BldDataFEEGasDetEnergy, psana.Source('BldInfo(FEEGasDetEnergy)'))
obj = evt.get( psana.Bld.BldDataGMDV1,           psana.Source('BldInfo(GMD)'))
obj = evt.get( psana.EventId)

You already encountered one of these get functions in the very first example where you were extracting the event identifier. The type field indicates what kind of data you are accessing, the source field indicates the instance of that particular detector. In this example there are two acqiris digitizers with the same data type. Note that event components obtained through this API will be objects of various classes. A full catalog of those objects can be found in the DOXYGEN documentation which is auto-generated from the code of the OFFLINE releases.

Iterating over scans and events

Some of our experiments (in particular XPP) are heavily relying on so called scans, also known as Calibration Transitions. Each DAQ run has one or many scans. Events are recorded in a scope of a scan. The new framework has a special provision for scans through the iterator of scans. The idea behind the following example is:

  • Open a data set which has multiple scans in each run
  • Iterate over scans
  • Iterate over events in each scan

This simple application knows about scan boundaries. More over, this example illustrates how to open a data set composed of many runs (processing a series of runs at once). There are two examples in this set:

./scans_in_runs_xtc.py
./scans_in_runs_hdf.py

(info)  These two scripts perform the same processing, the only difference being which data format they're accessing. The first example will read XTC files, while the second one will read HDF5 files. When running these examples you will notice that the HDF5 version is significantly faster. This is due to the fact that we don't yet support indexing for XTC files.

4. Instrument Specific Examples

This section includes a number of examples which are relevant to different instruments. Their primary goal is to illustrate how to access data objects specific to each instrument.

XCS

Princeton Movie

The code for these examples is found at:

/reg/g/psdm/tutorials/xcs/princeton_movie/

SXR

Correlation Plots for Signals from GDM and Diode

The code for these examples is found at:

/reg/g/psdm/tutorials/sxr/gmd_vs_diode/

CXI

Diffraction Patterns on the CSPad Detector

The code for these examples is found at:

/reg/g/psdm/tutorials/cxi/cspad_imaging/

There are three examples in this directory showing an increasingly complex processing of the CSPad detector. These example also introduce the ability to provide a configuration file describing the parameters of the analysis. These configuration files follow the usual psana syntax.

  1. dump_2x1_elements: The first test illustrates how individual 2x1 structures can be located from an event and displayed AS-IS w/o any processing. This test won't use any psana modules.
  2. frame_reco: The second example adds one of the standard psana modules in order to reconstruct a full CSPad frame from the corresponding 2x1 components. This test is based on the interactive psana's ability to run events through an optional chain of modules. The modules are specified and configured via an external configuration file frame_reco.cfg.
  3. frame_reco_calib: The last example will add one more module to calibrate (pedestals subtraction and gain correction) reconstructed CSPad images. The modules are configured in an external configuration file frame_reco_calib.cfg

Please, run these test in the order described above:

./dump_2x1_elements.py
./frame_reco.py
./frame_reco_calib.py

5. Something Less Trivial

Custom HDF5 translator

Let's suppose the user doesn't want to use the standard HDF5 translator, but prefers to write a data extraction tool to extract a particular detector from XTC files and make the data available for further analysis in Matlab. At this point we should already know how to get images from the raw files using ipsana. Now the only remaining problem is to store them in some form which may be readable from Matlab. This example uses the PyTables package to dump numpy arrays into an HDF5 output file. This package is known for its simple API which doesn't require a user to learn the low-level library h5py.

The code for this example is found at:

/reg/g/psdm/tutorials/common/hdf5_translator/
  • No labels