Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Include Page
PSDM:PageMenuEnd
PSDM:PageMenuEnd

Introduction

This document offers a brief overview of a high-level interface to the psana analysis framework for user applications written in the Python programming language. The interface is informally known as "interactive psana*. The tool's use is not solely limited to the interactive analysis scenarios. It also allows its users to benefit from a rich set of services of the core framework while retaining a full control over an iteration in data sets (runs, files, etc.). This combination makes it possible for the interactive exploration and (if needed) visualization of the experimental data. Note that by the later we always mean data files in the XTC or HDF5 formats produced at the LCLS DAQ or Data Management system. We also suggest visiting the Glossary of Terms which is found in the end of the document.

...

The final comment, before we'll proceed to the practical steps, is that a reader of the document isn't required to be fully familiar with the batch framework. Those areas where such knowledge would be needed are expected to be covered by the document. Though, we still encourage our users to spend some time to get an overview of the Data Analysis Tools we provide at PCDS. That's because many problems in doing the data analysis can be solved by the batch version of psana in a more efficient and natural way. These two flavors of the framework are not meant to compete with each other, they are designed to complement each other to cover a broader spectrum of analysis scenarios.

Test data

Recognizing that some of our users may be still new to LCLS, or not (yet) affiliated with any experiment, or whose experiment hasn't taken a single run to play with we prepared 6 pseudo-experiments which looks exactly like the real ones. Each of those experiments has a small collection of data files which we believe is sufficient to run the examples and to get yourself familiar with the basic concepts of doing data analysis at PCDS. There is one such experiment per LCLS instrument:

...

Code Block
bgColor#F7F7ED
% ls -al /reg/d/psdm/XPP/xpptut13/

drwxr-sr-x   8 psdatmgr ps-data 4096 Jun  4 12:02 .
drwxrwsr-x  30 psdatmgr ps-data 4096 Jun  4 12:02 ..
drwxrwsr-x+  3 psdatmgr ps-data 4096 Jun  5 10:53 calib
drwxrwsr-x+  2 psdatmgr ps-data 4096 Jun  4 12:02 ftc
drwxr-sr-x   2 psdatmgr ps-data 4096 Jun  6 16:41 hdf5
drwxrwsr-x+  2 psdatmgr ps-data 4096 Jun  4 12:02 res
drwxrwsr-x+  2 psdatmgr ps-data 4096 Jun  4 12:03 scratch
drwxr-sr-x   2 psdatmgr ps-data 4096 Jun  6 16:33 xtc

Setting up the analysis environment

There are two steps which need to be performed in order to get access to this API:

...

Code Block
bgColor#F7F7ED
% ipython

Python 2.7.2 (default, Jan 14 2013, 21:09:22)
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import psana

In [2]: psana.
Display all 108 possibilities? (y or n)
psana.Acqiris            psana.Gsc16ai            psana.ndarray_float32_1  psana.ndarray_int32_5    psana.ndarray_uint32_3
psana.Andor              psana.Imp                psana.ndarray_float32_2  psana.ndarray_int32_6    psana.ndarray_uint32_4
psana.Bld                psana.Ipimb              psana.ndarray_float32_3  psana.ndarray_int64_1    psana.ndarray_uint32_5
psana.BldInfo            psana.Lusi               psana.ndarray_float32_4  psana.ndarray_int64_2    psana.ndarray_uint32_6
..

The first application

Here is the psana version of the traditional "Hello World!" program:

...

More details on parameters of the get() function will be provided later in this document. The module exports many other definitions. Some of them will be introduced and explained in the rest of the document as needed.

Data set specification

The data set string encodes various parameters, some of which are needed to locate data files, while others would affect the behavior of the file reader. The general syntax of the string is:

...

The complete description of the data set string syntax and allowed parameters can be found in the specification document.

Extracting data from an event

In this section we're going to focus on an event object to see how to get various information from it. Let's begin with an example where we're fetching and plotting an image captured at the Princeton camera (which is one of the detectors available at the XCS instrument). In this example we won't be iterating over all events. Only the first event will be considered:

...

Info
titleWhy should you care about detector/component types?

By looking at how the get() method was invoked in the example one may argue that the component type information isn't really required, and knowing the detector source alone is all one would need here. That's not quite true. A problem is that, for the some detectors (sources) there may be more than one object stored within an event per such detector. Those objects would have different types. Hence the get() method requires at least two (and in some occasions - even three) keys to be provided to tell the method which of those objects to return. More information on this subject will be given in a subsection found below.

The psana event is a data container

The most correct way to perceive the psana event is by treating it as a data container storing pertinent information recorded by the LCLS DAQ system (or later produced by the psana framework modules) in a context of a particular LCLS shot. Note that the framework event object has a transient state. It's up to the psana framework how to initialize this object when reading relevant data from the input files. Moreover, the event is a dynamic container whose contents may change during its life time within an application. More information on the event lifetime will be provided later in this document when discussing external psana modules.

...

Anchor
four_forms_of_the_get_method
four_forms_of_the_get_method

Four forms of the get() method

The psana event contains objects which may have different origins, such as:

...

The ipython interpreter makes it easy to explore the namespace of the framework module to see what's available. This unfortunately doesn't address the question - "Now, as I have my event, how do I know what's in that event, which form of the get() method should I use, and which specific values of parameters should I put? Unless you already know the answer, proceed to the next section to find the one!

Browsing through a catalog of objects stored within an event

The following example demonstrates how to dump a catalog of event components:

...

  • the last component (the one which has type=None) reported by the keys() method should be ignored. This is an artifact of the current implementation of this API. It may go away in some future release of the software. We mention it here just to address possible confusion which users may have when seeing this output.
  • note the variations in the syntax of the components' addresses.

Working with a list of keys

For those would like to build some automation in discovering which components and of what kind exist in the event there is another option. A user can iterate over the list of key elements to examine their attributes. Each such element would encapsulate a type, a source and a key (string) of the corresponding event component. Consider the following example:

...

Finally, the last method key() returns a string representing an optional third key which was already explained in section "Four forms of the get() method". In most cases the method will return an empty string.

Other operations with events

The event object also provides two operations for manipulating the contents of the event: adding more or removing existing components. Signatures of both operations are very similar to the ones of method get(). Specifically these are four forms of method remove():

...

Code Block
bgColor#F7F7ED
ds = DataSource("exp=CXI/cxitut13:run=22,23,24,25)
for evt in ds.events():
    print "run: ", evt.run()
    ..

Accessing event environment data

The event environment encapsulate a broad spectrum of data and services which have various origins. This environment is needed to evaluate or process event data in a proper context. Some of these data may have different life cycles than events. Other parts of this information (such as calibrations) may not even come directly from the input data stream (the DAQ system). The information is available through a special object which is obtained by calling the data set object's method env():

...

The same environment object can also be obtained by calling a similar method of classes Run.env() and Step.env(). The environment can be split into a number of categories which are explained in a dedicated sub-subsection below.

Job configuration information

Methods found in this category are meant to be used for information purposes. Though, one of their practical uses could be to create output (log, data, etc.) files which would have unique yet meaningful names relevant to an input data set and a job processing the data:

...

Code Block
bgColor#F7F7ED
   framework name: psana
         job name: cxitut13:run=22
       instrument: CXI
    experiment id: 304
  experiment name: cxitut13
subprocess number: 0

Calibration Store

There are two methods in this category:

...

The second method would return an object of the generic environment container class EnvObjectStore . Specific details of this interface are beyond a scope of the present document. The Calibration Store is mainly used by special calibration modules.

Configuration Store

This Store encapsulate various detector/device configuration information which is typically (in reality - it may vary) recorded by the DAQ system at the beginning of each run.

...

The get() method of the Configuration Store will return objects describing the corresponding components of the events. The configuration objects are explained in "The psana Reference Manual". As an example, here is a direct link to the documentation of class EvrData.ConfigV7.

ControlPV

ControlPV is a configuration data which is updated on every step (steps are explained later in the document when discussing various ways of iterating over events in a data set). Like any other configuration data it is accessible through the environment object. Here is an example of getting controlPV data:

...

Code Block
bgColor#F7F7ED
[('lxt_ttc', -1.9981747581849466e-12)]
[('lxt_ttc', -1.8004943676675365e-12)]
[('lxt_ttc', -1.6001426205245978e-12)]
[('lxt_ttc', -1.3997908733800049e-12)]
[('lxt_ttc', -1.199439126235412e-12)]
[('lxt_ttc', -9.990873790924733e-13)]
[('lxt_ttc', -7.987356319478803e-13)]
[('lxt_ttc', -5.983838848049417e-13)]
[('lxt_ttc', -3.9803213766034873e-13)]
[('lxt_ttc', -2.0035174714459297e-13)]
[('lxt_ttc', 0.0)]
[('lxt_ttc', 2.0035174714459297e-13)]
[('lxt_ttc', 4.007034942875316e-13)]
[('lxt_ttc', 6.010552414321246e-13)]
[('lxt_ttc', 8.014069885767175e-13)]
..

Anchor
epics
epics

EPICS Store

All EPICS variables can be accessed through the EpicsStore object of the environment:

...

Code Block
bgColor#F7F7ED
ds = DataSource(dsname)
epics = ds.env().epicsStore()
prev_val = None
for i, evt in enumerate(ds.events()):
    val = epics.value('LAS:FS0:ACOU:amp_rf1_17_2:rd')
    if val != prev_val:
        print "%6d:" % i, val
        prev_val = val


     0: 2016
   725: 2024
   845: 2019
   966: 2014
  1932: 2021
  2053: 2020
  2174: 2016
  3381: 2020
  3502: 2024
  4830: 2018
  6279: 2019
  6400: 2018
  7728: 2023
  7849: 2019
  9177: 2016
 10626: 2021
 10747: 2022
 12075: 2016
 13524: 2020
 ..

Services

The Histogram Manager is the only public (user-level) service which is implemented in the current version of the framework. A reference to the manager can be obtain using:

Code Block
bgColor#F7F7ED
ds = DataSource('exp=CXI/cxitut13:run=22')
hist_manager = ds.env().hmgr()

Iterating over events, steps, runs

The previous examples have already demonstrated the very basic technique for finding all events in a data set. The interactive psana has actually more elaborate ways of browsing through the data:

...

From the performance point of view all methods are equal to each other. A choice of a particular technique depends on specific needs of a user application. Also note that intermediate objects which will be exposed during these iterations may have additional methods. More information on those can be found at the external documentation:

Using modules written for the batch psana

One of the features (benefits) of the interactive psana is that it allows to reuse algorithms which are written as modules for the batch psana. The concept of the modules is explained in details in the Psana User Manual. Here is a simplified architecture of the framework and a data flow between its various components. The first diagram shows using interctive psana w/o any external modules, and the second one - with 3 sample modules doing some additional data transformation/processing on the events:

...

The rest of this chapter will provide a brief introduction into how to turn on and use the feature in the framework.

Configuring psana to use external modules

External modules are activated in the framework by mean of a specially prepared configuration file which has to be given to the framework before opening a data set. Otherwise the file won't make any effect. Here is this example:

...

Info
titleWhere can I find a list of existing *psana* modules?

There are two documents which you may want to explore:

Anchor
cspad
cspad

Calibration modules example: re-contracting a full CSPad image

In this section we're going to explore in a little bit more details the effect of the configuration file which was used in the previous. First, let's have a look at the contents of that file:

...

The result is shown below. The first (on the left) image represents 32 so called 2x1 CSpad elements stacked into a 2D array. The second images represents a geometrically correct (including relative alignment of the elements) frame:

Advanced techniques

Re-opening data sets, opening multiple data sets simultaneously

The underlying implementation of the psana framework would create a separate instance of the framework upon each successful call to function DataSource(). This opens two possibilities:

...

Code Block
bgColor#F7F7ED
ds = DataSource(dsname)

for evt in ds.events():
    ...

ds = DataSource(dsname)     ## this is a fresh dataset object in which
                            ## all iterators are poised to the very first event (run, step)

Working with many events at a time

Note
titleBe aware about side effects of this technique

Python is a dynamic language which has its the "garbage collection" machinery which will decide when objects can be deleted. Objects will become eligible for the deletion only after the last reference to an object will disappear. Storing objects in a collection (or as members of other objects) may prevent this from happening. Therefore use the techniques explained in the rest of this section with extreme caution, always know what you're doing with objects, and try not to accumulate too many objects in memory without a good reason to do so. In any case, be prepared that your application may run out of memory. In an extreme case if you still choose to load all events of a dataset into an application's memory then a "rule of thumb" here would be to check if a size of your dataset doesn't exceed the total amount of memory available to your application.

...

Code Block
bgColor#F7F7ED
ds = DataSource('exp=XCS/xcstut13:run=15')
epics = ds.env().epicsStore()

prev_evt = None
prev_pv  = None
for evt in ds.events()
    pv = epics.value('LAS:FS0:ACOU:amp_rf1_17_2:rd')
    if prev_evt  is not None:
        ...  ## compare (prev_evt,pv) vs (evt,pv)
    prev_evt = evt
    prev_pv  = pv

Performance/memory considerations

Caching psana.Source()

Consider the following example:

...

Code Block
bgColor#F7F7ED
ds = DataSource('exp=XCS/xcstut13:run=15')
src = Source('DetInfo(XcsBeamline.0:Princeton.0)')

for evt in ds.events():
    frame = evt.get(Princeton.FrameV1, src)
    ..

The cost of accessing detector components

The internal implementation of psana won't fully construct components of an event unless they are requested by a user's code. This will make the following code less efficient when fetching only those components which are actually needed by an application:

...

Another reason why we wouldn't generally recommend this technique (though, we understand it may be quite handy in some cases) is that at some future version of the framework event components may be dynamically loaded from disk into memory. Therefore touching components w/o a good reason may not only incur more CPU usage (to construct object), but it may also incur the higher latency to load data due to extra I/O operations.

XTC vs HDF5

In those cases when the same data set is available in both XTC and HDF5 formats, a choice which one to use may be determined by differences in framework performance for these formats. Although being semantically equal, experimental data stored in XTC and HDF5 formats have fundamentally different internal organization. These differences allows for certain (non-overlapping between formats) optimizations when reading data from files. Leaving apart various details, the differences can be summarized as:

...

These recipes won't cover all possible data access scenarios. But we hope they will give a reader an idea where to look further.

The essential functionality which is not implemented in psana

Anchor
parallel
parallel

Parallel data processing

We recognize an importance of supporting some form of parallel processing in the framework. But we still haven't settled on a specific solution for that. There are many pros and cons of different techniques which we're still evaluating one against each other and weighting their benefits versus complications. A problem is that their effectiveness greatly depends on a class of problems solved by a user. In the realm of various data analysis approaches being taken by LCLS users it's hard to find a common solution which would suit all equally well. For now, we leave it up to a user to find an appropriate scheme.

Direct access to events

Any form of direct access would require to have an index for all objects recorded within a run. The index would complement the (purely) sequential structure of the XTC files. This feature is presently under design. It may appear in some later release of the software. And this will happen the API will be extended with additional methods allowing to address and fetch (with very little overhead) a particular event without needed to iterate over prior events.

Data exchange between external psana modules and Python code

Some of our readers may not even recognize that such problem exists, and what it means. In order to illustrate it, consider the following scenario:

...

Any other types are not supported.

APPENDIX

Anchor
glossary
glossary

Glossary of Terms

Here is an explanation of terms which are used throughout the document. Some of them have a specific meaning in a context of the Framework and its API:

  • data set (or dataset) - a collection of files associated with a run or many runs
  • event - a collection of information associated with a particular X-Ray shot at LCLS. Please, note that event is also a Python object in psana.
  • environment (or event environment) - a collection of supplementary information which is needed to interpret event data in the right context. It includes: the latest state of the EPICS variables at a time when the event was recorded by the DAQ system, the DAQ configuration information, the calibrations for the instrument's detectors.
  • run - a continuous period of time when the DAQ system was running and recording data.
  • step (or Calibration Cycle) - an interval within a run when certain experimental environment was stable (such as motor positions, temperatures, etc.)
  • detector - a measuring device within an LCLS instrument. This is just a generalization for sensors, diodes, cameras, etc. Can be also known in this document as an event component.
  • data source - is an object withing the framework's event representing a particular detector
  • XTC - is a raw data format for files produced by the LCLS DAQ system. The information is stored in these files sequentially. This implies certain limitation on how various data can be extracted from the files. The files have extension of '.xtc'.
  • HDF5 - is a portable data format for files which are produced as a result of translating the raw XTC files. The files have extension of '.h5'.

References