Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Include Page
PSDM:PageMenuBegin
PSDM:PageMenuBegin
Table of Contents
Include Page
PSDM:PageMenuEnd
PSDM:PageMenuEnd

Introduction

This document describes C++ analysis framework for LCLS and how users can make use of its features. Psana design borrows ideas from multitude of other framworks such as pyana, myana, BaBar framework, etc. It's main principles are summarized here:

...

The central part of the framework is a regular pre-built application (psana) which can dynamically load one or more user analysis modules which are written in C++ or Python. The core application is responsible for the following tasks:

  • loading and initializing all user modules
  • loading one of the input modules to read data from XTC or HDF5
  • calling appropriate methods of user modules based on the data being processed
  • providing access to data as set of C++ classes and a set of Python classes
  • providing other services such as histogramming to user modules

...

  • user module – instance of the C++ or Python class which inherits pre-defined Module class and defines few special methods which are called by the framework
  • event – special object which transparently stores all event data
  • environment – special object which stores non-event data such as configuration objects or EPICS data

...

Typically psana will iterate through all transitions/events from the input files. User modules have a limited control over this event loop, module can request to skip particular event, stop iteration early or abort job using one of the methods described below.

User Modules

User module in psana is A user module provides an instance of a class that inherits from the Psana Module class. Below we discuss this for C++ class which inherits from the . The Psana Module class (is defined in the file pasanapsana/Module.h) and implements several methods. These methods are already mentioned above, here is more formal description of each method:

...

In addition to event() method every module class must provide a constructor which takes a string argument giving the name of the module. Additionally it has to provide a special factory function use used to instantiate the modules from the shared libraries, there is special macro defined for definition of this factory function.

Here is the minimal example of the module class declaration with only the event() method implemented and many non-essential details are skipped:

Code Block
borderStylesolid
titlePackage/ExampleModule.h
borderStylesolid

#include "psana/Module.h"

namespace Package {
class ExampleModule: public Module {
public:

  // Constructor takes module name as a parameter
  ExampleModule(const std::string& name);

  // Implementation of event() from base class
  virtual void event(Event& evt, Env& env);

};
} // namespace Package

Definition of the factory function and methods:

Code Block
borderStylesolid
titlePackage/ExampleModule.cpp
borderStylesolid

#include "Package/ExampleModule.h"
#include "MsgLogger/MsgLogger.h"
#include "PSEvt/EventId.h"

// define factory function
using namespace Package;
PSANA_MODULE_FACTORY(ExampleModule)

// Constructor
ExampleModule::ExampleModule(const std::string& name)
  : Module(name)
{
}

void
ExampleModule::event(Event& evt, Env& env)
{
  // get event ID
  shared_ptr<EventId> eventId = evt.get();
  if (not eventId.get()) {
    MsgLog(name(), info, "event ID not found");
  } else {
    MsgLog(name(), info, "event ID: " << *eventId);
  }
}

...

The easiest way to write new user modules is to use codegen script to generate class from predefined template. This command will create new module ExampleModule in package TestPackage and will copy generated files to the directories in TestPackage:

Code Block

codegen -l psana-module TestPackage ExampleModule

...

Here is an example of the code using above functions:

Code Block

void ExampleModule::event(Event& evt, Env& env) {

  ...

  if (pixelsAboveThreshold < 1000) {
    // This event is not worth looking at, skip it
    skip();
    // I do not want to continue with this algorithm either
    return;
  }

  if (nGoodEvents > 1000) {
    // we collected enough data, can stop now and go to endJob()
    stop();
    // I do not want to continue with this algorithm either
    return;
  }

  if (temperatureKelvin < 0) {
    // data is junk, stop right here and don't call endJob()
    terminate();
    // I do not want to continue with this algorithm either
    return;
  }

}

Skipped events can be used in further analysis or saved in the "filtered" Xtc file, as explained in Package PSXtcOutput.

Job and Module Configuration

...

Configuration file has a simple format which is similar to well-known INI file format. The file consists of the sections, each section begins with the section header in the form:

Code Block

[<section-name>]

Section names can be arbitrary strings, but in psana case section names are the names of the modules which cannot be arbitrary and should not contain spaces.

Following the section header there may be zero or more parameter lines in the form

Code Block

<param-name> = <param-value>

Parameter name is anything between beginning of line and '=' character with leading and trailing spaces and tabs stripped. Parameter value is anything after '=' character with leading and trailing spaces and tabs stripped, parameter value can be empty. Long parameter value can be split over multiple lines if the line ends with the backslash character, e.g.:

Code Block

files = /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s00-c00.xtc \
        /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s01-c00.xtc \
        /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s02-c00.xtc

...

The parameters that are needed for the framework are defined in psana modules section. Here is the list of parameters which can appear in that section:

  • modules
    list of module names to include in the analysis job. Each module name is built of a package name and class name separated by dot (e.g. TestPackage.ExampleModule) optionally followed by colon and modifier. Modifier is not needed if there is only one instance of the module in the job. If there is more than on instance then modules need to include unique modifier to distinguish instances. If the module comes from psana package then package name can be omitted. Module names can also be specified on the command line with -m option, for multiple modules use multiple -m options or comma-separated names in single -m option.
  • input or files
    list of specifies input data, list of datasets or file names to process. File names Input data can also be specified on the command line which will override anything specified in configuration file. See section Specifying input data for more details on dataset syntax.
  • events
    maximum number of events to process in a job, can also be given on the commnad command line with -n or --num-events option.
  • skip-events
    number of events to skip before starting even processing, can also be given on the commnad line with -s or --skip-events option.
  • instrument
    Instrument name.
  • experiment
    Experiment name. Instrument and expriment names can be specified on the commnad line with -e or --experiment option, option value has format XPP:xpp12311 or xpp12311. By default instrument and experiment names are determined from input file names, you can use these options to override defaults (or when your file has non-standard naming).
  • calib-dir
    Path to the calibration directory, can also be given on the commnad line with -b or --calib-dir option. Path can include {instr} and {exp} strings which will be replaced with instrument and experiment names respectively. Default value for path is /reg/d/psdm/{instr}/{exp}/calib.

Here is an example of the framework configuration section:

Code Block

[psana]
# list of file names
files = /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s00-c00.xtc \
        /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s01-c00.xtc \
        /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s02-c00.xtc
# list of modules, PrintSeparator and PrintEventId are from psana package
# and do not need package name
modules = PrintSeparator PrintEventId psana_examples.DumpAcqiris

...

Parameters for user modules appear in the separate sections named after the modules. For example the module with name "TestPackage.ExampleModule" will read its parameters from the section [TestPackage.ExampleModule]. If the module name includes modifier after colon then it will try to find parameter value in the corresponding section first and if it does not exist there it will try to read parameter form section which does not have modifier. In this way the modules can share common parameters. For example the module "TestPackage.ExampleModule:test" will try to read a parameter from [TestPackage.ExampleModule:test] section first and [TestPackage.ExampleModule] section after that.

To help manage configuration options, Psana provides a way select between several sets of parameters in a config file, as well as to override a default set with a few specific values. When specifying a module to load, it can be tagged as follows:

modules = TestPackage.Analysis:mode1

The modifier after the colon tells Psana to first look for configuration parameters in the section [TestPackage.Analysis:model] and then in the section [TestPackage.ExampleModule]. It is also possible to load the same module several times, specifying different configuration options for each instance. Psana will construct each instance with a different name - based on the tag provided.

Here is an Here is an example of configuration for some fictional analysis job:

Code Block

[psana]
modules = TestPackage.Analysis:mode1 TestPackage.Analysis:mode2

[TestPackage.Analysis]
# these are common parameters for all TestPackage.Analysis modules,
# but instances can override then in their own sections
calib-mode = fancy
subpixel = off
threshold = 0.001

[TestPackage.Analysis:mode1]
# parameters specific to :mode1 module
range-min = 0
range-max = 1000000

[TestPackage.Analysis:mode2]
# parameters specific to :mode2 module
range-min = 1000
range-min = 10000
subpixel = on

...

Here is an example of the code in user module which uses these methods:

Code Block

  Source src = configStr("source", "DetInfo(:Evr)");
  int repeat = config("repeat");
  std::list<std::string> options = configList("options");

...

Here are few examples of using these macros:

Code Block

  MsgLog("MyModule", info, "reading pedestals from file " << fileName);
  MsgLog("MyModule", debug, "intermediate result: count=" << count << " sum=" << sum);
  MsgLogRoot(warning, "warp engine overheating");

...

Above macros are simple to use in most cases as they hide all details from user. In more complex situations (printing array elements) there are two macros which provide access to underlying stream object which can be used in more interesting ways:


  • this macro declares stream object which can be used by the code in compound statement which follows the macro. The lifetime of the stream is the code block, after the code block is executed the message is published and stream disappears.


  • variation of the above macro which publishes message to root logger.

Here is an example of their use:

Code Block

  WithMsgLog("MyModule", debug, str) {
    str << "array elements:";
    for (int i = 0; i < size; ++ i) {
      str << " " << array[i];
    }
  }

...

Note: when the message level is disabled the code in the corresponding macros is not executed at all. Do not put any expressions with side effects into message or code blocks, these are strictly for messaging, not part of your algorithm.

Histogramming Service

Psana includes a histogramming service which is wrapper for ROOT histogramming package. This service simplifies several tasks such as opening ROOT file, saving histograms to file, etc.

Center piece of the histogramming service is the histogram manager class. Histogram manager's responsibilities is to open ROOT file, create histograms, and to store histograms to the file. All these tasks are performed transparently to user, there is no need for additional configuration of this service. To create histograms one needs first to obtain a reference to a manager instance which is a part of the standard psana environment and is accessible through a method of the environment class. One then can call factory methods of the manager class to create new histograms which will be automatically saved to a ROOT file. The manager creates a single ROOT file to store all histograms created in a single job. Then name of the ROOT file is the same as the job name with ".root" extension added. The name of psana job is auto-generated from the name of the first input file, but it can also be set on the command line with -j <job-name> option.

All factory methods of the histogram manager use special class to describe histogram axis (or axes for 2-dim histograms). The name of the class is PSHist::Axis (in the user module PSHist:: prefix is optional) and it contains binning information for single histogram axis. It can be constructed in two different ways:

  • Axis(int nbins, double amin, double amax)
    defines axis with fixed-width bins in the range from amin to amax.
  • Axis(int nbins, const double* edges)
    defines axis with variable-width bins, array contains the low edge of each bin plus high edge of the last bin. Total size of the edges array must be nbins+1.

Here is the list of the factory methods (see also reference for more information):

  • PSHist::H1* hist1i(const std::string& name, const std::string& title, const Axis& axis)
    creates one-dimensional histogram with integer bin contents. Returns pointer to histogram object.
  • PSHist::H1* hist1d(name, title, axis)
    (argument types same as above) creates one-dimensional histogram with double (64-bit) bin contents. Returns pointer to histogram object.
  • PSHist::H1* hist1f(name, title, axis)
    creates one-dimensional histogram with float (32-bit) bin contents. Returns pointer to histogram object.
  • PSHist::H2* hist2i(name, title, xaxis, yaxis)
    creates two-dimensional histogram with integer bin contents. Returns pointer to histogram object.
  • PSHist::H2* hist2d(name, title, xaxis, yaxis)
    creates two-dimensional histogram with double (64-bit) bin contents. Returns pointer to histogram object.
  • PSHist::H2* hist2f(name, title, xaxis, yaxis)
    creates two-dimensional histogram with float (32-bit) bin contents. Returns pointer to histogram object.
  • PSHist::Profile* prof1(name, title, xaxis, const std::string& option="")
    creates profile histogram, option string can be empty, "s", or "i", for meaning see reference. Returns pointer to histogram object.

User code should store the returned histogram pointers (as the module data members) and use is later in the code, there is no way currently to retrieve a pointer to the histogram created earlier.

Here is an example of the correct use of the histogramming package (from psana_examples.EBeamHist module):

Code Block

// ==== EBeamHist.h ====
class EBeamHist: public Module {
public:
  .....
private:
  Source m_ebeamSrc;
  PSHist::H1* m_ebeamHisto;
  PSHist::H1* m_chargeHisto;
};

// ==== EBeamHist.cpp ====
EBeamHist::EBeamHist(const std::string& name)
  : Module(name)
  , m_ebeamHisto(0)
  , m_chargeHisto(0)
{
  m_ebeamSrc = configStr("eBeamSource", "BldInfo(EBeam)");
}

void EBeamHist::beginJob(Env& env)
{
  m_ebeamHisto = env.hmgr().hist1i("ebeamHisto", "ebeamL3Energy value", Axis(1000, 0, 50000));
  m_chargeHisto = env.hmgr().hist1i("echargeHisto", "ebeamCharge value", Axis(250, 0, 0.25));
}

void EBeamHist::event(Event& evt, Env& env)
{
  shared_ptr<Psana::Bld::BldDataEBeamV1> ebeam = evt.get(m_ebeamSrc);
  if (ebeam.get()) {
    m_ebeamHisto->fill(ebeam->ebeamL3Energy());
    m_chargeHisto->fill(ebeam->ebeamCharge());
  }
}

More extensive example is available in Psana User Examples.

Writing User Modules

Here are few simple steps and guidelines which should help users to write their analysis modules.

  • Everything is done in the context of the off-line analysis releases, your environment should be prepared and you should have test release setup based on one of the recent analysis releases. Consult Workbook which should help you going.
  • You need your own package which may host several analysis modules. Package name must be unique. If the package has not be created yet run this command:
    Code Block
    
    newpkg MyPackage
    mkdir MyPackage/include MyPackage/src
    
  • Generate skeleton module class from template:
    Code Block
    
    codegen -l psana-module MyPackage MyModule
    
    this will create two files: MyPackage/include/MyModule.h and MyPackage/src/MyModule.cpp
  • Edit these two files, add necessary data members and implementation of the methods.
  • For examples of accessing different data types see collection of modules in psana_examples package. Reference for all event and configuration data types is located at https://pswww.slac.stanford.edu/swdoc/releases/ana-current/psddl_psana/
  • Reference for other classes in psana framework: Psana Reference Manual
  • Run scons to build the module library.
  • Create psana config file if necessary.
  • Run psana providing input data, configuration file, etc.
  • It is also possible that somebody wrote a module which you can reuse for your analysis, check the module catalog: Psana Module Catalog

To add your own compiler or linker options to the build (such as to link to a third party library), see this section on customizing the scons build.

Running Psana

After writing and compiling the modules (or choosing standard modules) one can run psana application with these modules. Psana application is pre-built and does not need to be recompiled. To start application one needs to either provide a configuration file or corresponding command-line options. Some information (e.g. user module options) cannot be specified on the command line and always require configuration file. Here is the list of command-line options recognized by psana:

Code Block

Usage: psana [options] [dataset ...]

  Available options:
    {-h|-?|--help    }         print help message
    {-v|--verbose    } (incr)  verbose output, multiple allowed (initial: 0)
    {-q|--quiet      } (incr)  quieter output, multiple allowed (initial: 2)
    {-b|--calib-dir  } path    calibration directory name, may include {exp} and {instr}, if left empty then do not do calibrations (default: "")
    {-c|--config     } path    configuration file, by default use psana.cfg if it exists (default: "")
    {-e|--experiment } string  experiment name, format: XPP:xpp12311 or xpp12311, by default guess it from data (default: "")
    {-j|--job-name   } string  job name, default is to generate from input file names (default: "")
    {-m|--module     } name    module name, more than one possible
    {-n|--num-events } number  maximum number of events to process, 0 means all (default: 0)
    {-s|--skip-events} number  number of events to skip (default: 0)
    {-o|--option     } string  configuration options, format: module.option[=value]

  Positional parameters:
    dataset - input dataset specification (list of file names or exp=cxi12345:run=123:...)

If both options -c and -m are missing from the command line then psana reads configuration file psana.cfg from current directory. Otherwise if -c option is provided with the file name psana reads corresponding configuration file.

Modules loaded by psana can be specified in configuration and on command line with -m option. If -m option is provided then its value overrides module list specified in the configuration file. One can provide comma-separated list of module names or multiple -m options on the command line, following command lines are all equivalent:

Code Block

% psana -m ModuleA,ModuleB,ModuleC ...
% psana -m ModuleA -m ModuleB -m ModuleC ...
% psana -m ModuleA,ModuleB -m ModuleC ...

Option -j can change job name which defines then names of the output histogram file. By default job name is constructed from the name of the first input file.

Input data files can also be specified in the configuration file or on command line, command-line arguments override configuration file values.

Command-line options -v and -q can increase or decrease verbosity of the output generated by messaging service. By default psana outputs messages at info and higher levels. With one -v option trace messages will be printed also, and with two or more -v options debug messages will be printed too. With -q option info messages will not be printed, only warning, error, and fatal.

Here are few examples of running psana applications:

...

Writing User Modules

Here are few simple steps and guidelines which should help users to write their analysis modules.

  • Everything is done in the context of the off-line analysis releases, your environment should be prepared and you should have test release setup based on one of the recent analysis releases. Consult Workbook which should help you going.
  • You need your own package which may host several analysis modules. Package name must be unique. If the package has not be created yet run this command:

    Code Block
    newpkg MyPackage
    mkdir MyPackage/include MyPackage/src
    
  • Generate skeleton module class from template:

    Code Block
    codegen -l psana-module MyPackage MyModule
    

    this will create two files: MyPackage/include/MyModule.h and MyPackage/src/MyModule.cpp

  • Edit these two files, add necessary data members and implementation of the methods.
  • For examples of accessing different data types see collection of modules in psana_examples package. Reference for all event and configuration data types is located at https://pswww.slac.stanford.edu/swdoc/releases/ana-current/psddl_psana/
  • Reference for other classes in psana framework:  Psana Reference Manual
  • Run scons to build the module library.
  • Create psana config file if necessary.
  • Run psana providing input data, configuration file, etc.
  • It is also possible that somebody wrote a module which you can reuse for your analysis, check the module catalog: psana - Module Catalog

To add your own compiler or linker options to the build (such as to link to a third party library), see this section on customizing the scons build.

Running Psana

After writing and compiling the modules (or choosing standard modules) one can run psana application with these modules. Psana application is pre-built and does not need to be recompiled. To start application one needs to either provide a configuration file or corresponding command-line options. Some information (e.g. user module options) cannot be specified on the command line and always require configuration file. Here is the list of command-line options recognized by psana:

Code Block
Usage: psana [options] [dataset ...]

  Available options:
    {-h|-?|--help    }         print help message
    {-v|--verbose    } (incr)  verbose output, multiple allowed (initial: 0)
    {-q|--quiet      } (incr)  quieter output, multiple allowed (initial: 2)
    {-b|--calib-dir  } path    calibration directory name, may include {exp} and {instr}, if left empty then do not do calibrations (default: "")
    {-c|--config     } path    configuration file, by default use psana.cfg if it exists (default: "")
    {-e|--experiment } string  experiment name, format: XPP:xpp12311 or xpp12311, by default guess it from data (default: "")
    {-j|--job-name   } string  job name, default is to generate from input file names (default: "")
    {-m|--module     } name    module name, more than one possible
    {-n|--num-events } number  maximum number of events to process, 0 means all (default: 0)
    {-s|--skip-events} number  number of events to skip (default: 0)
    {-o|--option     } string  configuration options, format: module.option[=value]

  Positional parameters:
    dataset - input dataset specification (list of file names or exp=cxi12345:run=123:...)

If both options -c and -m are missing from the command line then psana reads configuration file psana.cfg from current directory. Otherwise if -c option is provided with the file name psana reads corresponding configuration file.

Modules loaded by psana can be specified in configuration and on command line with -m option. If -m option is provided then its value overrides module list specified in the configuration file. One can provide comma-separated list of module names or multiple -m options on the command line, following command lines are all equivalent:

Code Block
% psana -m ModuleA,ModuleB,ModuleC ...
% psana -m ModuleA -m ModuleB -m ModuleC ...
% psana -m ModuleA,ModuleB -m ModuleC ...

Option -j can change job name which defines then names of the output histogram file. By default job name is constructed from the name of the first input file.

Input data can also be specified in the configuration file or on command line, command-line arguments override configuration file values. Check section below for complete description of dataset format.

Command-line options -v and -q can increase or decrease verbosity of the output generated by messaging service. By default psana outputs messages at info and higher levels. With one -v option trace messages will be printed also, and with two or more -v options debug messages will be printed too. With -q option info messages will not be printed, only warning, error, and fatal.

Here are few examples of running psana applications:

Code Block
% psana -m EventKeys /reg/d/psdm/...
% psana -m psana_examples.EBeamHist -j ebeam-hist-r1000 /reg/d/psdm/...
% psana -c psana_examples/data/DumpAll.cfg exp=cxi12345:run=123
% psana                  # everything will be specified in psana.cfg file

Specifying input data

Input data for psana are specified on command line or in configuration file using special dataset syntax. More than one dataset can be specified in arbitrary order, psana will will order datasets accordingly, so that events from all datasets are time-ordered.

In simplest case dataset is  just a file name containing input data in either XTC of HDF5 format. File name should be given as a full path name, if there are more than one stream or chunk in XTC data, all of them must be specified.

More advanced and recommended way is to provide input data as a special dataset string. The dataset string encodes various parameters, some of which are needed to locate data files, while others specify optional behavior such as filtering or live data reading. The general syntax of the dataset string is a list colon-separated parameters, parameters have optional values separated from parameter name by equal sign:

Code Block
languagenone
param[=value][:param[=value][...]

These are some of the parameters which are supported in psana:

  • experiment name (which may optionally contain the name of an instrument)

    Code Block
    languagenone
    exp=CXI/cxi12313
    exp=cxi12313
  • run number specification (can be a single run, a range of runs, a series of runs, or a combination of all above)

    Code Block
    languagenone
    run=1
    run=10-20
    run=1,2,3,4
    run=1,20-20,31,41
  • file type, if not specified then 'xtc' is the default

    Code Block
    languagenone
    xtc
    h5
  • Location of the files, if not specified then files will be searched in a standard location (/reg/d/psdm/...). If this parameter is specified it needs to be full path name of the directory where files are located

    Code Block
    languagenone
    dir=/reg/d/ffb/cxi/cxi12345/xtc
  • Input number stream number for XTC files, if value is omitted then one pseudo-random stream is selected (this is useful to balance the load on FFB storage system for example):

    Code Block
    languagenone
    one-stream=1
    one-stream
  • allow reading from live XTC files while they're still being recorded (by the DAQ or by the Data Migration service). Note that this feature is only available when running psana at PCDS, in all other cases the option will be ignored:

    Code Block
    languagenone
    live

Few examples of dataset specification:

  • To read XTC data from specific run number:

    Code Block
    languagenone
    exp=xpp12345:run=123
  • To read HDF5 from several runs:

    Code Block
    languagenone
    exp=xpp12345:run=1,5,7-10:h5
  • To read live XTC data from a random stream from FFB directory

    Code Block
    languagenone
    exp=xpp12345:run=1123:live:one-stream:dir=/reg/d/ffb/xpp/xpp12345/xtc

The complete description of the data set string syntax and allowed parameters can be found in the specification document.

...

Psana Module Examples

A set of psana modules is available in current release as explained in Psana Module Catalog. Part of them demonstrates how data can be accessed from user module code . Other modules can be used in data analysis or event filtering. Example of application for these modules are available in separate document:

We permanently work on algorithms implemented in continually develop algorithms for the standard set of the psana modules. If you find that the algorithm which you need is missing in our collection you have two options:

...

we would be interested in hearing about it (email pcds-help@slac.stanford.edu). We are interested in implementing algorithms that are useful to our users. Of course, following this document, you can develop a Psana modules that implements the algorithm. A resource for sharing the module is the Users' Software Repository.