Page History
Table of Contents |
---|
About this document
This document offers a brief overview of a high-level interface to the psana analysis framework for user applications written in the Python programming language. The interface is informally known as "interactive psana*. The tool's use is not solely limited to the interactive analysis scenarios. It also allows its users to benefit from a rich set of services of the core framework while retaining a full control over an iteration in data sets (runs, files, etc.). This combination makes it possible for the interactive exploration and (if needed) visualization of the experimental data. Note that by the later we always mean data files in the XTC or HDF5 formats produced at the LCLS DAQ or Data Management system. We also suggest visiting the Glossary of Terms which is found in the end of the document.
...
The final comment, before we'll proceed to the practical steps, is that a reader of the document isn't required to be fully familiar with the batch framework. Those areas where such knowledge would be needed are expected to be covered by the document. Though, we still encourage our users to spend some time to get an overview of the Data Analysis Tools we provide at PCDS. That's because many problems in doing the data analysis can be solved by the batch version of psana in a more efficient and natural way. These two flavors of the framework are not meant to compete with each other, they are designed to complement each other to cover a broader spectrum of analysis scenarios.
Test data
Recognizing that some of our users may be still new to LCLS, or not (yet) affiliated with any experiment, or whose experiment hasn't taken a single run to play with we prepared 6 pseudo-experiments which looks exactly like the real ones. Each of those experiments has a small collection of data files which we believe is sufficient to run the examples and to get yourself familiar with the basic concepts of doing data analysis at PCDS. There is one such experiment per LCLS instrument:
...
Code Block | ||
---|---|---|
| ||
% ls -al /reg/d/psdm/XPP/xpptut13/ drwxr-sr-x 8 psdatmgr ps-data 4096 Jun 4 12:02 . drwxrwsr-x 30 psdatmgr ps-data 4096 Jun 4 12:02 .. drwxrwsr-x+ 3 psdatmgr ps-data 4096 Jun 5 10:53 calib drwxrwsr-x+ 2 psdatmgr ps-data 4096 Jun 4 12:02 ftc drwxr-sr-x 2 psdatmgr ps-data 4096 Jun 6 16:41 hdf5 drwxrwsr-x+ 2 psdatmgr ps-data 4096 Jun 4 12:02 res drwxrwsr-x+ 2 psdatmgr ps-data 4096 Jun 4 12:03 scratch drwxr-sr-x 2 psdatmgr ps-data 4096 Jun 6 16:33 xtc |
Setting up the analysis environment
There are two steps which need to be performed in order to get access to ipsana. The first step is to obtain and properly configure your UNIX account at PCDS. Specific instructions can be found in the Account Setup section of the Analysis Workbook.
...
Code Block | ||
---|---|---|
| ||
% ipython Python 2.7.2 (default, Jan 14 2013, 21:09:22) Type "copyright", "credits" or "license" for more information. IPython 0.13.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: import psana In [2]: psana. Display all 108 possibilities? (y or n) psana.Acqiris psana.Gsc16ai psana.ndarray_float32_1 psana.ndarray_int32_5 psana.ndarray_uint32_3 psana.Andor psana.Imp psana.ndarray_float32_2 psana.ndarray_int32_6 psana.ndarray_uint32_4 psana.Bld psana.Ipimb psana.ndarray_float32_3 psana.ndarray_int64_1 psana.ndarray_uint32_5 psana.BldInfo psana.Lusi psana.ndarray_float32_4 psana.ndarray_int64_2 psana.ndarray_uint32_6 .. |
The first application
Here is the ipsana version of the traditional "Hello World!" program:
...
More details on parameters of the get() function will be provided later in this document. The module exports many other definitions. Some of them will be introduced and explained in the rest of the document as needed.
Data set specification
The data set string encodes various parameters, some of which are needed to locate data files, while others would affect the behavior of the file reader. The general syntax of the string is:
...
The complete description of the data set string syntax and allowed parameters can be found in the specification document.
Extracting data from an event
In this section we're going to focus on an event object to see how to get various information from it. Let's begin with an example where we're fetching and plotting an image captured at the Princeton camera (which is one of the detectors available at the XCS instrument). In this example we won't be iterating over all events. Only the first event will be considered:
...
Info | ||
---|---|---|
| ||
By looking at how the get() method was invoked in the example one may argue that the component type information isn't really required, and knowing the detector source alone is all one would need here. That's not quite true. A problem is that, for the some detectors (sources) there may be more than one object stored within an event per such detector. Those objects would have different types. Hence the get() method requires at least two (and in some occasions - even three) keys to be provided to tell the method which of those objects to return. More information on this subject will be given in a subsection found below. |
The psana event is a data container
The most correct way to perceive the psana event is by treating it as a data container storing pertinent information recorded by the LCLS DAQ system (or later produced by the psana framework modules) in a context of a particular LCLS shot. Note that the framework event object has a transient state. It's up to the psana framework how to initialize this object when reading relevant data from the input files. Moreover, the event is a dynamic container whose contents may change during its life time within an application. More information on the event lifetime will be provided later in this document when discussing external psana modules.
...
Anchor | ||||
---|---|---|---|---|
|
Four forms of the get() method
The psana event contains objects which may have different origins, such as:
...
The ipython interpreter makes it easy to explore the namespace of the framework module to see what's available. This unfortunately doesn't address the question - "Now, as I have my event, how do I know what's in that event, which form of the get() method should I use, and which specific values of parameters should I put? Unless you already know the answer, proceed to the next section to find the one!
Browsing through a catalog of objects stored within an event
The following example demonstrates how to dump a catalog of event components:
...
- the last component (the one which has type=None) reported by the keys() method should be ignored. This is an artifact of the current implementation of ipsana. It may go away in some future release of the software. We mention it here just to address possible confusion which users may have when seeing this output.
- note the variations in the syntax of the components' addresses.
Working with a list of keys
For those would like to build some automation in discovering which components and of what kind exist in the event there is another option. A user can iterate over the list of key elements to examine their attributes. Each such element would encapsulate a type, a source and a key (string) of the corresponding event component. Consider the following example:
...
Finally, the last method key() returns a string representing an optional third key which was already explained in section "Four forms of the get() method". In most cases the method will return an empty string.
Other operations with events
The event object also provides two operations for manipulating the contents of the event: adding more or removing existing components. Signatures of both operations are very similar to the ones of method get(). Specifically these are four forms of method remove():
...
Code Block | ||
---|---|---|
| ||
ds = DataSource("exp=CXI/cxitut13:run=22,23,24,25) for evt in ds.events(): print "run: ", evt.run() .. |
Accessing event environment data
The event environment encapsulate a broad spectrum of data and services which have various origins. This environment is needed to evaluate or process event data in a proper context. Some of these data may have different life cycles than events. Other parts of this information (such as calibrations) may not even come directly from the input data stream (the DAQ system). The information is available through a special object which is obtained by calling the data set object's method env():
...
The same environment object can also be obtained by calling a similar method of classes Run.env() and Step.env(). The environment can be split into a number of categories which are explained in a dedicated sub-subsection below.
Job configuration information
Methods found in this category are meant to be used for information purposes. Though, one of their practical uses could be to create output (log, data, etc.) files which would have unique yet meaningful names relevant to an input data set and a job processing the data:
...
Code Block | ||
---|---|---|
| ||
framework name: psana job name: cxitut13:run=22 instrument: CXI experiment id: 304 experiment name: cxitut13 subprocess number: 0 |
Calibration Store
There are two methods in this category:
...
The second method would return an object of the generic environment container class EnvObjectStore . Specific details of this interface are beyond a scope of the present document. The Calibration Store is mainly used by special calibration modules.
Configuration Store
This Store encapsulate various detector/device configuration information which is typically (in reality - it may vary) recorded by the DAQ system at the beginning of each run.
...
The get() method of the Configuration Store will return objects describing the corresponding components of the events. The configuration objects are explained in "The psana Reference Manual". As an example, here is a direct link to the documentation of class EvrData.ConfigV7.
ControlPV
ControlPV is a configuration data which is updated on every step (steps are explained later in the document when discussing various ways of iterating over events in a data set). Like any other configuration data it is accessible through the environment object. Here is an example of getting controlPV data:
...
Code Block | ||
---|---|---|
| ||
[('lxt_ttc', -1.9981747581849466e-12)] [('lxt_ttc', -1.8004943676675365e-12)] [('lxt_ttc', -1.6001426205245978e-12)] [('lxt_ttc', -1.3997908733800049e-12)] [('lxt_ttc', -1.199439126235412e-12)] [('lxt_ttc', -9.990873790924733e-13)] [('lxt_ttc', -7.987356319478803e-13)] [('lxt_ttc', -5.983838848049417e-13)] [('lxt_ttc', -3.9803213766034873e-13)] [('lxt_ttc', -2.0035174714459297e-13)] [('lxt_ttc', 0.0)] [('lxt_ttc', 2.0035174714459297e-13)] [('lxt_ttc', 4.007034942875316e-13)] [('lxt_ttc', 6.010552414321246e-13)] [('lxt_ttc', 8.014069885767175e-13)] .. |
Anchor | ||||
---|---|---|---|---|
|
EPICS Store
All EPICS variables can be accessed through the EpicsStore object of the environment:
...
Code Block | ||
---|---|---|
| ||
ds = DataSource(dsname) epics = ds.env().epicsStore() prev_val = None for i, evt in enumerate(ds.events()): val = epics.value('LAS:FS0:ACOU:amp_rf1_17_2:rd') if val != prev_val: print "%6d:" % i, val prev_val = val 0: 2016 725: 2024 845: 2019 966: 2014 1932: 2021 2053: 2020 2174: 2016 3381: 2020 3502: 2024 4830: 2018 6279: 2019 6400: 2018 7728: 2023 7849: 2019 9177: 2016 10626: 2021 10747: 2022 12075: 2016 13524: 2020 .. |
Services
The Histogram Manager is the only public (user-level) service which is implemented in the current version of the framework. A reference to the manager can be obtain using:
Code Block | ||
---|---|---|
| ||
ds = DataSource('exp=CXI/cxitut13:run=22') hist_manager = ds.env().hmgr() |
Iterating over events, steps, runs
The previous examples have already demonstrated the very basic technique for finding all events in a data set. The interactive psana has actually more elaborate ways of browsing through the data:
...
From the performance point of view all methods are equal to each other. A choice of a particular technique depends on specific needs of a user application. Also note that intermediate objects which will be exposed during these iterations may have additional methods. More information on those can be found at the external documentation:
Using modules written for the batch psana
One of the features (benefits) of the interactive psana is that it allows to reuse algorithms which are written as modules for the batch psana. The concept of the modules is explained in details in the Psana User Manual. Here is a simplified architecture of the framework and a data flow between its various components. The first diagram shows using interctive psana w/o any external modules, and the second one - with 3 sample modules doing some additional data transformation/processing on the events:
...
The rest of this chapter will provide a brief introduction into how to turn on and use the feature in the framework.
Configuring psana to use external modules
External modules are activated in the framework by mean of a specially prepared configuration file which has to be given to the framework before opening a data set. Otherwise the file won't make any effect. Here is this example:
...
Info | ||
---|---|---|
| ||
There are two documents which you may want to explore:
|
Anchor | ||||
---|---|---|---|---|
|
Calibration modules example: re-contracting a full CSPad image
In this section we're going to explore in a little bit more details the effect of the configuration file which was used in the previous. First, let's have a look at the contents of that file:
...
The result is shown below. The first (on the left) image represents 32 so called 2x1 CSpad elements stacked into a 2D array. The second images represents a geometrically correct (including relative alignment of the elements) frame:
Advanced techniques
Re-opening data sets, opening multiple data sets simultaneously
The underlying implementation of the psana framework would create a separate instance of the framework upon each successful call to function DataSource(). This opens two possibilities:
...
Code Block | ||
---|---|---|
| ||
ds = DataSource(dsname) for evt in ds.events(): ... ds = DataSource(dsname) ## this is a fresh dataset object in which ## all iterators are poised to the very first event (run, step) |
Working with many events at a time
Note | ||
---|---|---|
| ||
Python is a dynamic language which has its the "garbage collection" machinery which will decide when objects can be deleted. Objects will become eligible for the deletion only after the last reference to an object will disappear. Storing objects in a collection (or as members of other objects) may prevent this from happening. Therefore use the techniques explained in the rest of this section with extreme caution, always know what you're doing with objects, and try not to accumulate too many objects in memory without a good reason to do so. In any case, be prepared that your application may run out of memory. In an extreme case if you still choose to load all events of a dataset into an application's memory then a "rule of thumb" here would be to check if a size of your dataset doesn't exceed the total amount of memory available to your application. |
...
Code Block | ||
---|---|---|
| ||
ds = DataSource('exp=XCS/xcstut13:run=15') epics = ds.env().epicsStore() prev_evt = None prev_pv = None for evt in ds.events() pv = epics.value('LAS:FS0:ACOU:amp_rf1_17_2:rd') if prev_evt is not None: ... ## compare (prev_evt,pv) vs (evt,pv) prev_evt = evt prev_pv = pv |
Performance/memory considerations
Caching psana.Source()
Consider the following example:
...
Code Block | ||
---|---|---|
| ||
ds = DataSource('exp=XCS/xcstut13:run=15') src = Source('DetInfo(XcsBeamline.0:Princeton.0)') for evt in ds.events(): frame = evt.get(Princeton.FrameV1, src) .. |
The cost of accessing detector components
The internal implementation of psana won't fully construct components of an event unless they are requested by a user's code. This will make the following code less efficient when fetching only those components which are actually needed by an application:
...
Another reason why we wouldn't generally recommend this technique (though, we understand it may be quite handy in some cases) is that at some future version of the framework event components may be dynamically loaded from disk into memory. Therefore touching components w/o a good reason may not only incur more CPU usage (to construct object), but it may also incur the higher latency to load data due to extra I/O operations.
XTC vs HDF5
In those cases when the same data set is available in both XTC and HDF5 formats, a choice which one to use may be determined by differences in framework performance for these formats. Although being semantically equal, experimental data stored in XTC and HDF5 formats have fundamentally different internal organization. These differences allows for certain (non-overlapping between formats) optimizations when reading data from files. Leaving apart various details, the differences can be summarized as:
...
These recipes won't cover all possible data access scenarios. But we hope they will give a reader an idea where to look further.
The essential functionality which is not implemented in psana
Anchor | ||||
---|---|---|---|---|
|
Parallel data processing
We recognize an importance of supporting some form of parallel processing in the framework. But we still haven't settled on a specific solution for that. There are many pros and cons of different techniques which we're still evaluating one against each other and weighting their benefits versus complications. A problem is that their effectiveness greatly depends on a class of problems solved by a user. In the realm of various data analysis approaches being taken by LCLS users it's hard to find a common solution which would suit all equally well. For now, we leave it up to a user to find an appropriate scheme.
Direct access to events
Any form of direct access would require to have an index for all objects recorded within a run. The index would complement the (purely) sequential structure of the XTC files. This feature is presently under design. It may appear in some later release of the software. And this will happen the API will be extended with additional methods allowing to address and fetch (with very little overhead) a particular event without needed to iterate over prior events.
Exchanging generic objects (of any types) between external modules and ipsana
Some of our readers may not even recognize that such problem exists, and what it means. In order to illustrate it, consider the following scenario:
...
Any other types are not supported.
APPENDIX
Anchor | ||||
---|---|---|---|---|
|
Glossary of Terms
Here is an explanation of terms which are used throughout the document. Some of them have a specific meaning in a context of the Framework and its API:
- data set (or dataset) - a collection of files associated with a run or many runs
- event - a collection of information associated with a particular X-Ray shot at LCLS. Please, note that event is also a Python object in psana.
- environment (or event environment) - a collection of supplementary information which is needed to interpret event data in the right context. It includes: the latest state of the EPICS variables at a time when the event was recorded by the DAQ system, the DAQ configuration information, the calibrations for the instrument's detectors.
- run - a continuous period of time when the DAQ system was running and recording data.
- step (or Calibration Cycle) - an interval within a run when certain experimental environment was stable (such as motor positions, temperatures, etc.)
- detector - a measuring device within an LCLS instrument. This is just a generalization for sensors, diodes, cameras, etc. Can be also known in this document as an event component.
- data source - is an object withing the framework's event representing a particular detector
- XTC - is a raw data format for files produced by the LCLS DAQ system. The information is stored in these files sequentially. This implies certain limitation on how various data can be extracted from the files. The files have extension of '.xtc'.
- HDF5 - is a portable data format for files which are produced as a result of translating the raw XTC files. The files have extension of '.h5'.
References
- Introduction into PCDS Computing
- Data Analysis Tools at PCDS (frameworks, tools, etc.)
- An overview of the psana framework
- The reference manual for the core classes
- The reference manual for the data classes
- Psana Module Catalog - comprehensive catalog of psana modules in the latest Analysis Release
- Psana Module Examples