Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

If you want a quick start, a 17-line example script you can run is Here.

We also highly recommend learning by running the small examples Here.

This document offers an overview of the psana analysis framework for user scripts written in the Python programming language. It allows its users to benefit from a rich set of services of the core framework while retaining full control over an iteration in data sets (runs, files, etc.). This combination makes it possible for the interactive exploration and (if needed) visualization of the experimental data. Note that by the latter we always mean data files in the XTC or HDF5 formats produced by the LCLS DAQ or Data Management system. We also suggest visiting the Copy Glossary of ManualTerms which is found in the end of the document.

...

A reader of the document isn't required to be fully familiar with the batch framework. Those areas where such knowledge would be needed are expected to be covered by the document. Though, we still encourage our users to spend some time to get an overview of the Data Analysis Tools we provide at PCDS. That's because many problems in doing the data analysis can be solved by the batch version of psana in a more efficient and natural way. These two flavors of the framework are not meant to compete with each other, they are designed to complement each other to cover a broader spectrum of analysis scenarios.

Setting up the analysis environment

There are three steps which need to be performed in order to get access to this API and the data for your experiment:

...

Code Block
bgColor#F7F7ED
% ipython

Python 2.7.2 (default, Jan 14 2013, 21:09:22)
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import psana

In [2]: psana.
Display all 108 possibilities? (y or n)
psana.Acqiris            psana.Gsc16ai            psana.ndarray_float32_1  psana.ndarray_int32_5    psana.ndarray_uint32_3
psana.Andor              psana.Imp                psana.ndarray_float32_2  psana.ndarray_int32_6    psana.ndarray_uint32_4
psana.Bld                psana.Ipimb              psana.ndarray_float32_3  psana.ndarray_int64_1    psana.ndarray_uint32_5
psana.BldInfo            psana.Lusi               psana.ndarray_float32_4  psana.ndarray_int64_2    psana.ndarray_uint32_6
..

The first application

Here is the psana version of the traditional "Hello World!" program:

...

More details on parameters of the get() function will be provided later in this document. The module exports many other definitions. Some of them will be introduced and explained in the rest of the document as needed.

Data set specification

The data set string encodes various parameters, some of which are needed to locate data files, while others would affect the behavior of the file reader. The general syntax of the string is:

...

The complete description of the data set string syntax and allowed parameters can be found in the specification document.

Extracting data from an event

Detector Names

In a given event, one can get a list of all detector names (and aliases) to use in the examples below with a script similar to the following:

Code Block
from psana import *
ds = DataSource('exp=cxi86715:run=99')
for evt in ds.events():
    print evt.keys()
	break # to stop the loop over events

Simple Area-Detector Data Access

There is a simple "one-line" python interface to access area-detector information (e.g. cspad, epix, pnccd etc.) including dark corrections, bad-pixel suppression, common-mode correction and application of detector metrology:

...

There is a lot of other information accessible through the Detector class (e.g. pixel locations, calibration constants, various masks).  For a more complete list, see /reg/g/psdm/sw/releases/ana-current/Detector/examples/ex_all_dets.py.  We are working on expanding this interface to work for all detector types.

More General Data Access

In this section we're going to focus on an event object to see how to get various information from it. Let's begin with an example where we're fetching and plotting an image captured at the Princeton camera (which is one of the detectors available at the XCS instrument). In this example we won't be iterating over all events. Only the first event will be considered:

...

Info
titleWhy should you care about detector/component types?

By looking at how the get() method was invoked in the example one may argue that the component type information isn't really required, and knowing the detector source alone is all one would need here. That's not quite true. A problem is that, for the some detectors (sources) there may be more than one object stored within an event per such detector. Those objects would have different types. Hence the get() method requires at least two (and in some occasions - even three) keys to be provided to tell the method which of those objects to return. More information on this subject will be given in a subsection found below.

Four forms of the get() method

The psana event contains objects which may have different origins, such as:

...

The ipython interpreter makes it easy to explore the namespace of the framework module to see what's available. This unfortunately doesn't address the question - "Now, as I have my event, how do I know what's in that event, which form of the get() method should I use, and which specific values of parameters should I put? Unless you already know the answer, proceed to the next section to find the one!

Browsing through a catalog of objects stored within an event

The following example demonstrates how to dump a catalog of event components:

...

  • the last component (the one which has type=None) reported by the keys() method should be ignored. This is an artifact of the current implementation of this API. It may go away in some future release of the software. We mention it here just to address possible confusion which users may have when seeing this output.
  • note the variations in the syntax of the components' addresses.

Working with a list of keys

For those would like to build some automation in discovering which components and of what kind exist in the event there is another option. A user can iterate over the list of key elements to examine their attributes. Each such element would encapsulate a type, a source and a key (string) of the corresponding event component. Consider the following example:

...

The type() method will return one of those Python type objects which were already mentioned in section "Four forms of the get() method". If a user is looking for a key which has a specific type then the following type comparison can be used:

...

Finally, the last method key() returns a string representing an optional third key which was already explained in section "Four forms of the get() method". In most cases the method will return an empty string.

Other operations with events

The event object also provides two operations for manipulating the contents of the event: adding more or removing existing components. Signatures of both operations are very similar to the ones of method get(). Specifically these are four forms of method remove():

...

Code Block
bgColor#F7F7ED
ds = DataSource("exp=CXI/cxitut13:run=22,23,24,25)
for evt in ds.events():
    print "run: ", evt.run()
    ..

Accessing event environment data

The event environment encapsulate a broad spectrum of data and services which have various origins. This environment is needed to evaluate or process event data in a proper context. Some of these data may have different life cycles than events. Other parts of this information (such as calibrations) may not even come directly from the input data stream (the DAQ system). The information is available through a special object which is obtained by calling the data set object's method env():

...

The same environment object can also be obtained by calling a similar method of classes Run.env() and Step.env(). The environment can be split into a number of categories which are explained in a dedicated sub-subsection below.

Job configuration information

Methods found in this category are meant to be used for information purposes. Though, one of their practical uses could be to create output (log, data, etc.) files which would have unique yet meaningful names relevant to an input data set and a job processing the data:

  • expNum(): a numeric identifier of the corresponding experiment
  • experiment(): the name of the experiment
  • jobName(): the unique name of the job (this may be used by the users code to create output log files which would have unique yet meaningful names)
  • subprocess(): the process number in a multi-process version of the framework (at the moment the method will always return 0. See details at section "Copy of ManualMPI Parallelization")
  • instrument(): the name of the instrument

...

Code Block
bgColor#F7F7ED
   framework name: psana
         job name: cxitut13:run=22
       instrument: CXI
    experiment id: 304
  experiment name: cxitut13
subprocess number: 0

Calibration Store

There are two methods in this category:

...

The second method would return an object of the generic environment container class EnvObjectStore . Specific details of this interface are beyond a scope of the present document. The Calibration Store is mainly used by special calibration modules.

Configuration Store

This Store encapsulate various detector/device configuration information which is typically (in reality - it may vary) recorded by the DAQ system at the beginning of each run.

...

The get() method of the Configuration Store will return objects describing the corresponding components of the events. The configuration objects are explained in "The psana Reference Manual". As an example, here is a direct link to the documentation of class EvrData.ConfigV7.

ControlPV

ControlPV is a configuration data which is updated on every step (steps are explained later in the document when discussing various ways of iterating over events in a data set). Like any other configuration data it is accessible through the environment object. Here is an example of getting controlPV data:

...

Code Block
bgColor#F7F7ED
[('lxt_ttc', -1.9981747581849466e-12)]
[('lxt_ttc', -1.8004943676675365e-12)]
[('lxt_ttc', -1.6001426205245978e-12)]
[('lxt_ttc', -1.3997908733800049e-12)]
[('lxt_ttc', -1.199439126235412e-12)]
[('lxt_ttc', -9.990873790924733e-13)]
[('lxt_ttc', -7.987356319478803e-13)]
[('lxt_ttc', -5.983838848049417e-13)]
[('lxt_ttc', -3.9803213766034873e-13)]
[('lxt_ttc', -2.0035174714459297e-13)]
[('lxt_ttc', 0.0)]
[('lxt_ttc', 2.0035174714459297e-13)]
[('lxt_ttc', 4.007034942875316e-13)]
[('lxt_ttc', 6.010552414321246e-13)]
[('lxt_ttc', 8.014069885767175e-13)]
..

Anchor
epics
epics

EPICS Store

All EPICS variables can be accessed through the EpicsStore object of the environment:

...

Code Block
languagepython
for i, evt in enumerate(ds.events()):
    pv = epics.getPV('LAS:FS0:ACOU:amp_rf1_17_2:rd')
    if pv.isCtrl():
        print "warning: pv is still ctrl as of event %d, may be known or new bug" % i
        continue
    val = pv.value(0)

Accessing EVR Data

EVR data from the LCLS timing system can be used to learn, for example, whether a laser is on or off on a shot-by-shot basis by accessing special "event codes".  An example is here:

Code Block
bgColor#F7F7ED
from psana import *
events = DataSource('exp=mecd6714:run=1226').events()
for evt in events:
    evr = evt.get(EvrData.DataV3, Source('DetInfo(NoDetector.0:Evr.0)'))
    for fifoEvent in evr.fifoEvents():
        print fifoEvent.eventCode()

Iterating over events, steps, runs

These instructions work for standard "sequential" mode.  In random-access "indexing" mode see indexing documentation.

...

From the performance point of view all methods are equal to each other. A choice of a particular technique depends on specific needs of a user application. Also note that intermediate objects which will be exposed during these iterations may have additional methods. More information on those can be found at the external documentation:

How To Avoid Reading Documentation

With the "ipython" interactive command, it is possible to avoid reading documentation.  Instead, one can use "tab-completion" to learn how to access the data.  First, learn the names of detectors in your dataset:

...

Code Block
In [26]: gdet = evt.get(Bld.BldDataFEEGasDetEnergyV1,Source('BldInfo(FEEGasDetEnergy)'))
In [27]: gdet?
Type:       BldDataFEEGasDetEnergyV1
String Form:<psana.Bld.BldDataFEEGasDetEnergyV1 object at 0x10acab98>
Docstring:
Four energy measurements from Front End Enclosure Gas Detector.
PV names: GDET:FEE1:241:ENRC, GDET:FEE1:242:ENRC, 
     GDET:FEE1:361:ENRC, GDET:FEE1:362:ENRC, 
     GDET:FEE1:363:ENRC, and GDET:FEE1:364:ENRC 
 *363* and *364* are duplicate measurements of *361* and *362* respectively. 
 The difference is that they cover a smaller (10%) dynamic range. 
 When the beam is weak, 361 and 362 don't have good S/N, these 2 extra PVs kick in.
Constructor Docstring:
Raises an exception
This class cannot be instantiated from Python
In [28]: gdet.f_12_ENRC?
Type:       instancemethod
String Form:<bound method BldDataFEEGasDetEnergyV1.f_12_ENRC of <psana.Bld.BldDataFEEGasDetEnergyV1 object at 0x10acab98>>
Docstring:
f_12_ENRC( (BldDataFEEGasDetEnergyV1)arg1) -> float :
    Value of GDET:FEE1:242:ENRC, in mJ.

(deprecated) Using modules written for the batch psana

One of the features (benefits) of the psana python script is that it allows one to reuse algorithms which are written as modules for the batch psana. Here is a simplified architecture of the framework and a data flow between its various components. The first diagram shows a psana python script w/o any external modules, and the second one - with 3 sample modules doing some additional data transformation/processing on the events:
 

...

The rest of this chapter will provide a brief introduction into how to turn on and use the feature in the framework.

 

Configuring psana to use external modules

External modules are activated in the framework in one of two ways. The most common is a specially prepared configuration file. One can also use Python functions to specify configuration. In both cases, this has to be done before opening a data set. Otherwise the configuration will not apply to the data source that is created. Here is an example that uses a configuration file:

...

Info
titleWhere can I find a list of existing *psana* modules?

There are two documents which you may want to explore:

Module Configuration Parameters

Module configuration parameters can be set within python using commands like:

...

Code Block
configBool
configInt
configFloat
configStr
configSrc
configListBool
configListFloat
configListStr
configListSrc

 

Anchor
cspad
cspad

Calibration modules example: re-constructing a full CSPad image

In this section we're going to explore in a little bit more details the effect of the configuration file which was used in the previous. First, let's have a look at the contents of that file:

...

That component has an additional key='reconstructed' (please, refer back to the section on the Four forms of the get() method for more information on that parameter). An object returned with that key has a type of a 2D array of 16-bit elements representing CSPad pixels. With a little bit of help from Matplotlib one can easily turn this into an image. The next example will illustrate how to show both raw (unprocessed) and reconstructed version of the CSPad image:

...

The result is shown below. The first (on the left) image represents 32 so called 2x1 CSpad elements stacked into a 2D array. The second images represents a geometrically correct (including relative alignment of the elements) frame:
 

Advanced Topics

Random Access to XTC Files ("Indexing")

Individual events can be randomly accessed with code similar to the following:

...

  • If you have a list of timestamps for some other analysis, those can be used in the run.event() method without using the run.times() method.
  • Index files are only made available by the DAQ at the end of the run, so can be used before then (this makes "realtime" FFB/shared-memory analysis impossible.
  • Indexing only currently works with XTC files (no HDF5 support)
  • Currently, indexing only has information about one run at a time (one cannot easily jump between events in different runs).

MPI Parallelization

Using the above indexing feature, it is possible to use MPI to have offline-psana analyze events in parallel (this is useful for many, but not all, algorithms) by having different cores access different events (up to "thousands" of cores).  MPI can also work for online-psana from shared memory (an example script is here that was able to process 120Hz with 7MB/event on 3 machines (8 cores per machine)). See Shared Memory Notes (limited access internal reference) when issues arise.  Online-psana from FFB can only be parallelized using MPI up to the number of DAQ "streams" (typically 6).  For an experiment that wants to use this parallelization: it's a good idea to increase the archiving rate of the slow EPICS data to the shot-rate.

...

More examples of using mpi4py can be found at http://mpi4py.scipy.org/docs/usrman/tutorial.html and a broader view of the capabilities of mpi can be found at http://www.open-mpi.org/doc/v1.8/.  Philip Hart also recommended these two links: http://www.sagemath.org/doc/numerical_sage/mpi4py.htmlhttps://computing.llnl.gov/tutorials/mpi/.

Real-time Online Plotting/Monitoring

psana-python code can be run both offline and online (from shared memory, or the Fast-FeedBack (FFB) disks).  Dan Damiani and Ankush Mitra have written software that makes online plotting available based on pyqtgraph (for display) and zeromq (for transferring display data on the network).  This is some example psana-python code that does both a one and a two dimensional plot of numpy arrays (more complex examples, including plot overlays etc. can be found in /reg/g/psdm/sw/releases/ana-current/psmon/examples/).  The last 4 lines are the only lines directly relevant to the plotting (everything else is standard psana-python):

...

'bs' means Blue Square. Another example: 'r-' which means red line. Not all matplotlib styles are supported by pyqtgraph, but many are:  http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot

Minimizing Display Latency

psplot has a buffer size option  (array size that holds the messages on both the display/server side).  This is called the "high water mark" in ZMQ documentation. For the display side use this option to psplot (from the output of "psplot --help"):

...

Code Block
publish.init(bufsize=2)

Calling C++ from Python using BOOST

This makes use of "ndarray converter" software written by Ankush Mitra which allows you to send numpy arrays between C++/python.  In your psana package, create a subdirectory called "pyext".  In it, include BOOST-style python code (tutorial is Here) similar to this:

...

Code Block
bgColor#F7F7ED
ds = DataSource(dsname)

for evt in ds.events():
    ...

ds = DataSource(dsname)     ## this is a fresh dataset object in which
                            ## all iterators are poised to the very first event (run, step)

Working with many events at a time

Note
titleBe aware about side effects of this technique

Python is a dynamic language which has its the "garbage collection" machinery which will decide when objects can be deleted. Objects will become eligible for the deletion only after the last reference to an object will disappear. Storing objects in a collection (or as members of other objects) may prevent this from happening. Therefore use the techniques explained in the rest of this section with extreme caution, always know what you're doing with objects, and try not to accumulate too many objects in memory without a good reason to do so. In any case, be prepared that your application may run out of memory. In an extreme case if you still choose to load all events of a dataset into an application's memory then a "rule of thumb" here would be to check if a size of your dataset doesn't exceed the total amount of memory available to your application.

...

The second thing to worry about is to make sure all relevant environment data are properly handled. In particular, the event environment objects should be obtained and stored along with each cached event at the same time the event object is retrieved from a dataset and before moving to the next event. The problem was already mentioned in this document when discussing the EPICS Store. As an illustration, let's suppose we need to compare images stored in each pair of consecutive events for all events in a run, and to do the comparison we also need the corresponding values of some PV. In that case the correct algorithm may look like this:

Code Block
bgColor#F7F7ED
ds = DataSource('exp=XCS/xcstut13:run=15')
epics = ds.env().epicsStore()

prev_evt = None
prev_pv  = None
for evt in ds.events()
    pv = epics.value('LAS:FS0:ACOU:amp_rf1_17_2:rd')
    if prev_evt  is not None:
        ...  ## compare (prev_evt,pv) vs (evt,pv)
    prev_evt = evt
    prev_pv  = pv

APPENDIX

Anchor
glossary
glossary

Glossary of Terms

Here is an explanation of terms which are used throughout the document. Some of them have a specific meaning in a context of the Framework and its API:

...