Introduction
This document describes C++ analysis framework for LCLS and how users can make use of its features. Psana design borrows ideas from multitude of other framworks such as pyana, myana, BaBar framework, etc. It's main principles are summarized here:
- support processing of both XTC and HDF5 data format
- user code should be independent of specific data format
- should be easy to use and extend for end users
- support re-use of the existing analysis code
- common simple configuration of user analysis code
This manual is accompanied by the [Psana Reference Manual] which describes interfaces of the classes available in Psana.
Framework Architecture
The central part of the framework is a regular pre-built application (psana) which can dynamically load one or more user analysis modules which are written in C++. The core application is responsible for the following tasks:
- loading and initializing all user modules
- loading one of the input modules to read data from XTC or HDF5
- calling appropriate methods of user modules based on the data being processed
- providing access to data as set of C++ classes
- providing other services such as histogramming to user modules
Other important components of the Psana architecture:
- user module – instance of the C++ class which inherits pre-defined Module class and defines few special methods which are called by framework
- event – special object which transparently stores all event data
- environment – special object which stores non-event data such as configuration objects or EPICS data
Analysis Job Life Cycle
Psana analysis job goes through cycles of state changes such as initialization, configuration, event processing, etc. calling methods of the user modules at every such change. This model follows closely the production activities in LCLS on-line system. DAQ system defines many types of transitions in its data-taking activity, most interesting are here:
Configure
- provides configuration data for complete setupBeginRun
- start of data taking for one runBeginCalibCycle
- start of the new scan, some configuration data may change at his pointL1Accept
- this is regular event containing event data from all detectorsEndCalibCycle
- end of single scanEndRun
- end of data taking for one runUnconfigure
- stop of all activity
Typically there will be more than one run taken with the same configuration, so there may be more than one BeginRun/EndRun
transition for one Configure/Unconfigure
, but a data file from single run should contain only one BeginRun/EndRun
. Depending on a setup there could be one or more BeginCalibCycle/EndCalibCycle
transitions in single run.
For each of the above transitions psana will call corresponding method in user modules notifying them of the possible change in the configuration or just providing event data. Following method names are defined in the user modules:
- beginJob() – this method is called once per analysis job when first
Configure
transition happens. If there is more than oneConfigure
in single job (when processing multiple runs) this method is not called, use beginRun() to observe configuration changes in this case. This method can access all configuration data through environment object. - beginRun() – this method is called for every new
BeginRun
, so it will be called multiple times when processing multiple runs in the same job. This method can access all configuration data through environment object. - beginCalibCycle() – this method is called for every new
BeginCalibCycle
, so it will be called multiple times when processing multiple runs in the same job or when single run contains multiple scans. This method can access all configuration data through environment object. - event() – this method is called for every new
L1Accept
, it has access to event data through event object as well as configuration data through environment object. - endCalibCycle() – this method is called for every new
EndCalibCycle
, it has access to configuration data through environment object. - endRun() – this method is called for every new
EndRun
, it has access to configuration data through environment object. - endJob() – this method is called once at the end of analysis job, it has access to configuration data through environment object.
Typically psana will iterate through all transitions/events from the input files. User modules have a limited control over this event loop, module can request to skip particular event, stop iteration early or abort job using one of the methods described below.
User Modules
User module in psana is an instance of the C++ class which inherits from the Module
class (defined in file pasana/Module.h
) and implements several methods. These methods are already mentioned above, here is more formal description of each method:
void beginJob(Event& evt, Env& env)
Method called once at the beginning of the job. Environment object contains configuration data from the firstConfigure
transition. Default implementation of this method does not do anything.void beginRun(Event& evt, Env& env)
Method called at the beginning of every new run. Default implementation of this method does not do anything.void beginCalibCycle(Event& evt, Env& env)
Method called at the beginning of every new scan. Default implementation of this method does not do anything.void event(Event& evt, Env& env)
Method called for every regular event. Even data is accessible through =evt= argument. There is no default implementation for this method and user module must provide at least this method.void endCalibCycle(Event& evt, Env& env)
Method called at the end of every new scan, can be used to process scan-level statistics collected inevent()
. Default implementation of this method does not do anything.void endRun(Event& evt, Env& env)
Method called at the end of every run, can be used to process run-level statistics collected inevent()
. Default implementation of this method does not do anything.void endJob(Event& evt, Env& env)
Method called once at the end of analysis job, can be used to process job-level statistics collected inevent()
. Default implementation of this method does not do anything.
In addition to event()
method every module class must provide a constructor which takes string argument giving the name of the module. Additionally it has to provide a special factory function use to instantiate the modules from the shared libraries, there is special macro defined for definition of this factory function.
Here is the minimal example of the module class declaration with only the event()
method implemented and many non-essential details are skipped:
#include "psana/Module.h" namespace Package { class ExampleModule: public Module { public: // Constructor takes module name as a parameter ExampleModule(const std::string& name); // Implementation of event() from base class virtual void event(Event& evt, Env& env); }; } // namespace Package
Definition of the factory function and methods:
#include "Package/ExampleModule.h" #include "MsgLogger/MsgLogger.h" #include "PSEvt/EventId.h" // define factory function using namespace Package; PSANA_MODULE_FACTORY(ExampleModule) // Constructor ExampleModule::ExampleModule(const std::string& name) : Module(name) { } void ExampleModule::event(Event& evt, Env& env) { // get event ID shared_ptr<EventId> eventId = evt.get(); if (not eventId.get()) { MsgLog(name(), info, "event ID not found"); } else { MsgLog(name(), info, "event ID: " << *eventId); } }
This simple example already does something useful, it retrieves and prints event ID (copied from standard PrintEventId module). Actual modules will do more complex things but this is a simple example of obtaining something from event data.
The easiest way to write new user modules is to use codegen
script to generate class from predefined template. This command will create new module ExampleModule
in package TestPackage
and will copy generated files to the directories in TestPackage:
codegen -l psana-module TestPackage ExampleModule
Data Access in User Modules
As already mentioned above all event data is accessible to user module via Event
object, and all non-event data is accessible through Env
object. Previous example shows simple use case of extracting data from the event. This section give more detailed description of the Event
and Env
types and their methods.
When extracting data from event or environment it is necessary to specify at least the type of the data (EventId
in the above example). If there are multiple object of the same type in the event then an additional identifying information must be provided – source address and/or additional string key.
Data Source Address
Many pieces of data in the event originate from devices or processes which are parts of the LCLS DAQ. Devices in DAQ system are identified their addresses, which are special C++ data types. There are three types of addresses defined by DAQ:
- DetInfo (class name
Pds::DetInfo
) – this is the most frequently used type and it defines all regular devices used in DAQ such as cameras, Acqiris, etc. Complete address specification includes 4 items:- Detector type, one of the
Pds::DetInfo::Detector
enum values. - Detector ID, a number, in case there is more than one detector of the same type in a system they will have different IDs.
- Device type, one of the
Pds::DetInfo::Device
enum values. - Device ID, a number, in case there is more than one device of the same type in a system they will have different IDs.
- Detector type, one of the
- BldInfo (class name
Pds::BldInfo
) – this address type is used for Beam Line Data sources, particular source is identified by thePds::BldInfo::Type
enum value. - ProcInfo (class name
Pds::ProcInfo
) – this address type is used rarely, and only for information produced by applications constituting DAQ. Sources of this type are identified by IP address of the host where application is running.
(If you look at the C++ code you'll notice that all above classes also include process ID, but it is not used by psana and can be set to 0 if needed.)
User modules should not need to use above C++ classes directly, instead psana provides facility that simplifies specification of the addresses and does not require exact addresses to be known. Class which provides support for these features is called Source
(full name is PSEvt::Source
). It can be constructed from one of the three above classes, but the most interesting use case is the constructor which accepts string specification of an address. The string specification accept following string formats:
"DetInfo(Detector.DetID:Device.DevID)"
Corresponds to DetInfo address type. Detector is the detector name (one of the names of the constants inPds::DetInfo::Detector
enum. DetID is a detector ID number. Device is the device name (one of the names of the constants inPds::DetInfo::Device
enum. DevID is a device ID number. Any or all parts of the specification may be missing. If detector ID or device ID is missing then separating dot is optional. If both device and device ID are missing the separating colon is optional. Missing parts could also be replaces with wildcard '*' symbol."Detector.DetID:Device.DevID"
Same as the above specification, DetInfo and parentheses can be omitted."Detector-DetID|Device-DevID"
Same as above, this format is supported for compatibility with pyana but is deprecated."BldInfo(BldType)"
Corresponds to BldInfo address type. BldType is one of the names of the constants inPds::BldInfo::Type
enum (currently defined types areEBeam
,PhaseCavity
,FEEGasDetEnergy
,Nh2Sb1Ipm01
). BldType can be omitted."BldType"
Same as above, but you cannot omit BldType here."ProcInfo(ipAddr)"
Corresponds to ProcInfo address type. ipAddr is an IPv4 address in decimal dot notation (123.123.123.123). ipAddr can be omitted.
If the specification includes all pieces then specification is exact and can only match a single data source. If there are missing parts in specification then specification is a match. When requesting data from event with match specification there may be more than one source of data matching it. In this case the first matching source (in unspecified order) will be used. Inexact specification can simplify data access when exact addresses are not known in advance, but one has to be careful if there are multiple devices matching the same address.
Here are few examples of the exact address specifications:
"DetInfo(AmoITof.0:Acqiris.0)"
"AmoITof.0:Acqiris.0"
– same as above"DetInfo(SxrEndstation.0:Opal1000.0)"
"BldInfo(FEEGasDetEnergy)"
"FEEGasDetEnergy"
– same as above"BldInfo(FEEGasDetEnergy)"
"ProcInfo(0.0.0.0)"
Here are the examples of the address matches:
"DetInfo(AmoITof.*:Acqiris.*)"
"DetInfo(AmoITof:Acqiris)"
– same as above"AmoITof:Acqiris"
– same as above"DetInfo(AmoITof:*)"
"DetInfo(AmoITof)"
– same as above"AmoITof"
– same as above"DetInfo(*:Acqiris)"
"DetInfo(:Acqiris)"
– same as above"*:Acqiris"
– same as above"DetInfo(*.*:*.*)"
"DetInfo()"
– same as above"BldInfo()"
""
– will match any address type
String Key
Additional key that may be provided when storing or retrieving the data from event is used to distinguish between data objects of the same type and address. As an example the raw data that come from XTC file are stored with the default empty key. User algorithm can apply some algorithm to the data and store new version of the same data using non-default key (such as "fixed" or "calibrated").
Event Data
Event data are accessible through the Event
object which is the parameter to event()
method of the user module. To access the data on needs to use overloaded get()
method which can take different number of arguments. There are three different method signatures:
get(const std::string& key="", Pds::Src* foundSrc=0)
This method does not accept data source address argument. It will try to find the data object which was stored without address (such asEventId
data which has no corresponding device), otherwise it will return data with any source address.get(const Pds::Src& source, const std::string& key="", Pds::Src* foundSrc=0)
This method takes an exact data source address in the form ofPds::Src
class. This method is occasionally useful and its use is explained above.get(const Source& source, const std::string& key="", Pds::Src* foundSrc=0)
This method takes an data source address in the form ofSource
class which is explained above.
All three above methods take an optional string key which is empty by default. Additionally one can provide a pointer to Pds::Src
object as the last argument and the pointed object will be filled with the exact source address of the found object.
All three methods return a special object type that is convertible to a pointer to a specific data type. Thanks to this intermediate special object type the user does not need to provide data type as an argument to get()
method which simplifies user code. In fact all important work is done during the conversion of this intermediate object to final pointer, and if this conversion does not happen then get()
method does not actually do anything. This implies that the code:
Pds::Src src; evt.get("AmoITof:Acqiris", "", &src);
does not do anything at all and does not update src
object. To make it useful one needs to assign the result of get()
to a smart pointer:
Pds::Src src; shared_ptr<Psana::Acqiris::DataDescV1> acqData = evt.get("AmoITof:Acqiris", "", &src);
The result of the conversion is a special smart pointer class (boost::shared_ptr
) which controls the lifetime of the pointed object. The control is actually shared between Event
object and user code, the pointed object will not be destroyed until there is at least one smart pointer for this object. User code can store shared pointer and use the object later, even across multiple events if necessary.
Configuration Data
Access to configuration data happens similarly to event data, except that configuration objects are stored inside environment object. Special configuration storage inside environment can be accessed with the env.configStore()
method. Configuration storage object has only two overloaded get()
methods:
get(const Pds::Src& source)
This method takes an exact data source address in the form ofPds::Src
class.get(const PSEvt::Source& source, Pds::Src* foundSrc=0)
This method takes an data source address in the form ofSource
class which can be exact address or match. Optional second argument can point to an object which will be updated with the exact address of the data if data object is found.
Here is an example of accessing configuration data:
shared_ptr<Psana::Acqiris::ConfigV1> acqConfig = env.configStore().get("AmoITof:Acqiris");
Matching Event Data to Configuration
In many cases user code is written to use source address match (or approximate address like "AmoITof.:Acqiris."). When it is necessary to find matching configuration object for an event object the approximate addresses (matches) cannot be used because approximate source match can find different devices in event and configuration store. In this case one has to use exact source address (Pds::Src
or fully-specified Source
) for either both or one of the objects. If exact source address is used for both types of objects one has to pass this address to both get()
methods:
// ==== ExampleModule.h ==== class ExampleModule: public Module { public: ..... private: Source m_src; }; // ==== ExampleModule.cpp ==== ExampleModule::ExampleModule(const std::string& name) : Module(name) , m_src("AmoITof.0:Acqiris.0") // fully-specified source { } void ExampleModule::beginJob(Env& env) { shared_ptr<Psana::Acqiris::ConfigV1> acqConfig = env.configStore().get(m_src); ...... } void ExampleModule::event(Event& evt, Env& env) { shared_ptr<Psana::Acqiris::DataDescV1> acqData = evt.get(m_src); ...... }
If exact source address is not known then one can still use matches for one get()
but obtain exact address from the first get()
and use it for the second one. There are two possible options here, first is to get exact address from configuration store:
// ==== ExampleModule.h ==== class ExampleModule: public Module { public: ..... private: Source m_srcMatch; Pds::Src m_src; }; // ==== ExampleModule.cpp ==== ExampleModule::ExampleModule(const std::string& name) : Module(name) , m_srcMatch("AmoITof.*:Acqiris.*") // matching address , m_src() { } void ExampleModule::beginJob(Env& env) { // use match but obtain exact address shared_ptr<Psana::Acqiris::ConfigV1> acqConfig = env.configStore().get(m_srcMatch, &m_src); ...... } void ExampleModule::event(Event& evt, Env& env) { // use exact address here shared_ptr<Psana::Acqiris::DataDescV1> acqData = evt.get(m_src); ...... }
Second option is to use match for getting event object and obtain exact address at the same time, then use exact address to get configuration object:
// ==== ExampleModule.h ==== class ExampleModule: public Module { public: ..... private: Source m_srcMatch; }; // ==== ExampleModule.cpp ==== ExampleModule::ExampleModule(const std::string& name) : Module(name) , m_srcMatch("AmoITof.*:Acqiris.*") // matching address { } void ExampleModule::event(Event& evt, Env& env) { // use match but obtain exact address Pds::Src src; shared_ptr<Psana::Acqiris::DataDescV1> acqData = evt.get(m_srcMatch, "", &src); if (acqData.get()) { // use exact address here shared_ptr<Psana::Acqiris::ConfigV1> acqConfig = env.configStore().get(src); } ...... }
Latter code is less efficient because it searches for configuration object on every event which can be avoided if one uses first option.
Accessing EPICS Data
Access to EPICS data is provided through one more special object in the environments. This object can be accessed trough the cal to env.epicsStore()
method which returns reference to the object of PSEnv::EpicsStore
class.
It is possible to obtain the full list of PV names using corresponding methods of the EpicsStore
object:
const std::vector<std::string>& pvNames = env.epicsStore().pvNames();
To obtain current value of particular PV the value()
method can be used, for example:
double value = env.epicsStore().value("BEAM:LCLS:ELEC:Q");
or for the array EPICS data:
double value = env.epicsStore().value("BEAM:LCLS:ELEC:Q", index);
The result returned from value()
method can be converted to any numeric type or std::string
. The method will throw an exception (which will terminate application if not handled) if the PV name does not exist or if conversion fails.
Status information for particular PV can be obtained with status()
method:
int status, severity; PSTime::Time time; env.epicsStore().status("BEAM:LCLS:ELEC:Q", status, severity, time);
which returns standard EPICS codes for status and severity plus time of the most recent change of the PV status or value. Time will be set to 0 (UNIX epoch time) when its value is unknown, typically at the beginning of job and may be few first events.
Updating Event Data
User modules can not only read data from event object but also add more data to it. This can be used to exchange information between modules when one module produces some data and another modules use it to calculate their results.
To add data to event one can use Event::put
method which accepts smart pointer to the data object and optional source address and string key. There are two overloaded methods in this case:
void put(const shared_ptr<T>& data, const std::string& key=std::string())
Adds object to the event without source address, can be used for generic non-device-specific data such asEventId
.void put(const shared_ptr<T>& data, const Pds::Src& source, const std::string& key=std::string())
Add object and specify its source address, should be used for detector/device-specific data.
Both methods take optional string key which should be used to distinguish different "versions" of the same data such as data after calibration.
Here is an example code which adds one new object:
shared_ptr<Image> img(new Image(...)); evt.put(img, src, "filtered");
Controlling Framework from User Module
Code in user modules can control framework event loop by calling one of the three methods:
void skip()
Signal framework to skip current event and do not call other downstream modules. Note that this method does not skip code in the current module, control is returned back to the module. If you want to stop processing after this call then add a return statement.void stop()
Signal framework to stop event loop and finish job gracefully (with calling endRun/endJob/etc.). Note that this method does not terminate processing in the current module. If you want to stop processing after this call then add a return statement.void terminate()
Signal framework to terminate immediately. Note that this method does not terminate processing in the current module. If you want to stop processing after this call then add a return statement.
Here is an example of the code using above functions:
void ExampleModule::event(Event& evt, Env& env) { ... if (pixelsAboveThreshold < 1000) { // This event is not worth looking at, skip it skip(); // I do not want to continue with this algorithm either return; } if (nGoodEvents > 1000) { // we collected enough data, can stop now and go to endJob() stop(); // I do not want to continue with this algorithm either return; } if (temperatureKelvin < 0) { // data is junk, stop right here and don't call endJob() terminate(); // I do not want to continue with this algorithm either return; } }
Skipped events can be used in further analysis or saved in the "filtered" Xtc file, as explained in [PackagePSXtcOutput].
Job and Module Configuration
Psana framework has multiple configuration parameters that can be changed via command line or special configuration file. Configuration file can also specify parameters for user modules so that modules' behavior can be changed at run time without the need to recompile the code.
If no options are specified on the command line then psana tries to read configuration file named psana.cfg
from the current directory if that file exists. The location of the configuration file can be changed with the -c <path>
option which should provide path of the configuration file.
Configuration File Format
Configuration file has a simple format which is similar to well-known INI file format
. The file consists of the sections, each section begins with the section header in the form:
[<section-name>]
Section names can be arbitrary strings, but in psana case section names are the names of the modules which cannot be arbitrary and should not contain spaces.
Following the section header there may be zero or more parameter lines in the form
<param-name> = <param-value>
Parameter name is anything between beginning of line and '=' character with leading and trailing spaces and tabs stripped. Parameter value is anything after '=' character with leading and trailing spaces and tabs stripped, parameter value can be empty. Long parameter value can be split over multiple lines if the line ends with the backslash character, e.g.:
files = /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s00-c00.xtc \ /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s01-c00.xtc \ /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s02-c00.xtc
Lines starting with '#' character are considered comments and ignored.
Parameter Types
Configuration file does not specify parameter types, all values in the file are strings. Psana framework provides conversion of these strings to several basic C++ types or sequences. Following types and conversion rules are supported by framework:
bool
value strings "yes", "true", "on" becometrue
, "no", "false", "off" becomefalse
. Strings which represent non-zero numbers becometrue
, string "0" becomesfalse
.char
value string must be single-character string and it will be assigned to a result.- C++ numeric types
option value must represent valid number. std::string
option value will be assigned to result string without change.- C++ sequence types (e.g.
std::list<T>
)
option value will be split into single words at space/tab characters, individual words will be converted to resulting typeT
.
When the conversion fails because of the incorrectly formatted input framework will throw an exception with the type ExceptionCvtFail
.
Psana Parameters
The parameters that are needed for the framework are defined in [psana]
section. Here is the list of parameters which can appear in that section:
modules
list of module names to include in the analysis job. Each module name is built of a package name and class name separated by dot (e.g.TestPackage.ExampleModule
) optionally followed by colon and modifier. Modifier is not needed if there is only one instance of the module in the job. If there is more than on instance then modules need to include unique modifier to distinguish instances. If the module comes from psana package then package name can be omitted. Module names can also be specified on the command line with-m
option, for multiple modules use multiple-m
options or comma-separated names in single -m option.files
list of file names to process. File names can also be specified on the command line which will override anything specified in configuration file.events
maximum number of events to process in a job, can also be given on the commnad line with-n
or--num-events
option.skip-events
number of events to skip before starting even processing, can also be given on the commnad line with-s
or--skip-events
option.instrument
Instrument name.experiment
Experiment name. Instrument and expriment names can be specified on the commnad line with-e
or--experiment
option, option value has formatXPP:xpp12311
orxpp12311
. By default instrument and experiment names are determined from input file names, you can use these options to override defaults (or when your file has non-standard naming).calib-dir
Path to the calibration directory, can also be given on the commnad line with-b
or--calib-dir
option. Path can include {instr
} and {exp
} strings which will be replaced with instrument and experiment names respectively. Default value for path is/reg/d/psdm/{instr}/{exp}/calib
.
Here is an example of the framework configuration section:
[psana] # list of file names files = /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s00-c00.xtc \ /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s01-c00.xtc \ /reg/d/psdm/AMO/amo00000/xtc/e00-r0000-s02-c00.xtc # list of modules, PrintSeparator and PrintEventId are from psana package # and do not need package name modules = PrintSeparator PrintEventId psana_examples.DumpAcqiris
User Modules Parameters
Parameters for user modules appear in the separate sections named after the modules. For example the module with name "TestPackage.ExampleModule" will read its parameters from the section [TestPackage.ExampleModule]
. If the module name includes modifier after colon then it will try to find parameter value in the corresponding section first and if it does not exist there it will try to read parameter form section which does not have modifier. In this way the modules can share common parameters. For example the module "TestPackage.ExampleModule:test" will try to read a parameter from [TestPackage.ExampleModule:test]
section first and [TestPackage.ExampleModule]
section after that.
Here is an example of configuration for some fictional analysis job:
[psana] modules = TestPackage.Analysis:mode1 TestPackage.Analysis:mode2 [TestPackage.Analysis] # these are common parameters for all TestPackage.Analysis modules, # but instances can override then in their own sections calib-mode = fancy subpixel = off threshold = 0.001 [TestPackage.Analysis:mode1] # parameters specific to :mode1 module range-min = 0 range-max = 1000000 [TestPackage.Analysis:mode2] # parameters specific to :mode2 module range-min = 1000 range-min = 10000 subpixel = on
Accessing Configuration Parameters
User module base class defines few convenience methods which simplify access to configuration parameters. Here is the list of the methods:
std::string configStr(const std::string& param)
this method takes the name of the parameter and returns full parameter value as a string. If parameter cannot be found the exception will be thrown.T config(const std::string& param)
this method takes the name of the parameter and returns parameter value converted to typeT
. If parameter cannot be found the exception will be thrown.std::string configStr(const std::string& param, const std::string& def)
this method takes the name of the parameter and returns full parameter value as a string. If parameter cannot be found then the value of second argument will be returned.T config(const std::string& param, T def)
this method takes the name of the parameter and returns parameter value converted to typeT
. If parameter cannot be found then the value of second argument will be returned.Seq configList(const std::string& param)
this method takes the name of the parameter and returns parameter value converted to sequence. Sequence can be any of standard container types such asstd::list<std::string>
orstd::vector<double>
. If parameter cannot be found the exception will be thrown.std::list<T> configList(const std::string& param, const std::list<T>& def)
this method takes the name of the parameter and returns parameter value converted tostd::list<T>
. If parameter cannot be found then the value of second argument will be returned.
Here is an example of the code in user module which uses these methods:
Source src = configStr("source", "DetInfo(:Evr)"); int repeat = config("repeat"); std::list<std::string> options = configList("options");
Messaging Service
In many cases the user modules want to produce/print messages such as errors, warnings, or debugging information. In most cases C++ code uses standard C++ facilities such as std::cout
, std::cerr
, or even printf
to format/print something to the terminal or log file. Psana framework provides different approach for messaging which provides better control for the output level (e.g. turning on/off debugging) and better flexibility.
Each message produced by messaging service carries corresponding level. There are several levels of messages defined by the service:
debug
– lowest message level reserved for debugging messages, normally turned off during normal runningtrace
– one level higher thandebug
, normally turned off during normal runninginfo
– level for regular informational messages, normally printed but can be turned offwarning
– level for warnings which are not errorserror
– level for error messagesfatal
– level for fatal errors, after the message is published the program will terminate
The levels are ordered, enabling messages of one level also enables messages of all higher levels.
Each logging message is associated with one logger. Loggers have names which form hierarchical structure such as "GrandParent.Parent.Child". Top-level logger has no name and is called root logger. Loggers were introduced for flexibility, it is possible to configure individual loggers, for example to enable debug logging from one particular logger. Good practice is to use logger name which is the same as user module name for identification purposes.
To use messaging service one has to include header file "MsgLogger/MsgLogger.h"
which defines a set of macros for message logging and all related classes. User code interacts with the messaging service through this set of macros:
MsgLog(logger, level, message)
// send a message to specific logger, takes logger name, logging level, and message. Message is a construct which can appear after stream insertion operator (e.g.cout << message
).MsgLogRoot(level, message)
// same as above but message is sent to root logger.
Here are few examples of using these macros:
MsgLog("MyModule", info, "reading pedestals from file " << fileName); MsgLog("MyModule", debug, "intermediate result: count=" << count << " sum=" << sum); MsgLogRoot(warning, "warp engine overheating");
Note: in user module replace "MyModule" string with the name() call which returns the name of the user module.
Above macros are simple to use in most cases as they hide all details from user. In more complex situations (printing array elements) there are two macros which provide access to underlying stream object which can be used in more interesting ways:
- Unknown macro: {html}
<tt>WithMsgLog(logger, level, stream)
Unknown macro: { ... }</tt>
this macro declares stream object which can be used by the code in compound statement which follows the macro. The lifetime of the stream is the code block, after the code block is executed the message is published and stream disappears. - Unknown macro: {html}
<tt>WithMsgLogRoot(level, stream)
Unknown macro: { ... }</tt>
variation of the above macro which publishes message to root logger.
Here is an example of their use:
WithMsgLog("MyModule", debug, str) { str << "array elements:"; for (int i = 0; i < size; ++ i) { str << " " << array[i]; } }
When messaging service delivers (prints) the message in addition to message itself it provides additional information about message. In psana it will print level name and logger name; for trace messages it will also print timestamp; for debug and error messages it will print timestamp and location (file name and line number) where message originated.
By default psana enables messages of the info
level (and higher). To enable lower level messages one can provide -v option to psana: one -v will enable trace
messages, two -v options will enable debug
messages. To disable info
and warning
messages one can provide one or two -q options. Error and fatal messages cannot be disabled.
Note: when the message level is disabled the code in the corresponding macros is not executed at all. Do not put any expressions with side effects into message or code blocks, these are strictly for messaging, not part of your algorithm.
Histogramming Service
Psana includes a histogramming service which is wrapper for ROOT histogramming package. This service simplifies several tasks such as opening ROOT file, saving histograms to file, etc.
Center piece of the histogramming service is the histogram manager class. Histogram manager's responsibilities is to open ROOT file, create histograms, and to store histograms to the file. All these tasks are performed transparently to user, there is no need for additional configuration of this service. To create histograms one needs first to obtain a reference to a manager instance which is a part of the standard psana environment and is accessible through a method of the environment class. One then can call factory methods of the manager class to create new histograms which will be automatically saved to a ROOT file. The manager creates a single ROOT file to store all histograms created in a single job. Then name of the ROOT file is the same as the job name with ".root" extension added. The name of psana job is auto-generated from the name of the first input file, but it can also be set on the command line with -j <job-name>
option.
All factory methods of the histogram manager use special class to describe histogram axis (or axes for 2-dim histograms). The name of the class is PSHist::Axis
(in the user module PSHist::
prefix is optional) and it contains binning information for single histogram axis. It can be constructed in two different ways:
Axis(int nbins, double amin, double amax)
defines axis with fixed-width bins in the range fromamin
toamax
.Axis(int nbins, const double* edges)
defines axis with variable-width bins, array contains the low edge of each bin plus high edge of the last bin. Total size of the edges array must benbins+1
.
Here is the list of the factory methods (see also reference for more information):
PSHist::H1* hist1i(const std::string& name, const std::string& title, const Axis& axis)
creates one-dimensional histogram with integer bin contents. Returns pointer to histogram object.PSHist::H1* hist1d(name, title, axis)
(argument types same as above) creates one-dimensional histogram with double (64-bit) bin contents. Returns pointer to histogram object.PSHist::H1* hist1f(name, title, axis)
creates one-dimensional histogram with float (32-bit) bin contents. Returns pointer to histogram object.PSHist::H2* hist2i(name, title, xaxis, yaxis)
creates two-dimensional histogram with integer bin contents. Returns pointer to histogram object.PSHist::H2* hist2d(name, title, xaxis, yaxis)
creates two-dimensional histogram with double (64-bit) bin contents. Returns pointer to histogram object.PSHist::H2* hist2f(name, title, xaxis, yaxis)
creates two-dimensional histogram with float (32-bit) bin contents. Returns pointer to histogram object.PSHist::Profile* prof1(name, title, xaxis, const std::string& option="")
creates profile histogram, option string can be empty, "s", or "i", for meaning see reference. Returns pointer to histogram object.
User code should store the returned histogram pointers (as the module data members) and use is later in the code, there is no way currently to retrieve a pointer to the histogram created earlier.
Here is an example of the correct use of the histogramming package (from psana_examples.EBeamHist module):
// ==== EBeamHist.h ==== class EBeamHist: public Module { public: ..... private: Source m_ebeamSrc; PSHist::H1* m_ebeamHisto; PSHist::H1* m_chargeHisto; }; // ==== EBeamHist.cpp ==== EBeamHist::EBeamHist(const std::string& name) : Module(name) , m_ebeamHisto(0) , m_chargeHisto(0) { m_ebeamSrc = configStr("eBeamSource", "BldInfo(EBeam)"); } void EBeamHist::beginJob(Env& env) { m_ebeamHisto = env.hmgr().hist1i("ebeamHisto", "ebeamL3Energy value", Axis(1000, 0, 50000)); m_chargeHisto = env.hmgr().hist1i("echargeHisto", "ebeamCharge value", Axis(250, 0, 0.25)); } void EBeamHist::event(Event& evt, Env& env) { shared_ptr<Psana::Bld::BldDataEBeamV1> ebeam = evt.get(m_ebeamSrc); if (ebeam.get()) { m_ebeamHisto->fill(ebeam->ebeamL3Energy()); m_chargeHisto->fill(ebeam->ebeamCharge()); } }
Writing User Modules
Here are few simple steps and guidelines which should help users to write their analysis modules.
- Everything is done in the context of the off-line analysis releases, your environment should be prepared and you should have test release setup based on one of the recent analysis releases. Consult Workbook which should help you going.
- You need your own package which may host several analysis modules. Package name must be unique. If the package has not be created yet run this command:
newpkg MyPackage mkdir MyPackage/include MyPackage/src
- Generate skeleton module class from template:
this will create two files:
codegen -l psana-module MyPackage MyModule
MyPackage/include/MyModule.h
andMyPackage/src/MyModule.cpp
- Edit these two files, add necessary data members and implementation of the methods.
- For examples of accessing different data types see collection of modules in
psana_examples
package. Reference for all event and configuration data types is located at https://pswww.slac.stanford.edu/swdoc/releases/ana-current/psddl_psana/ - Reference for other classes in psana framework: [Psana Reference Manual]
- Run
scons
to build the module library. - Create psana config file if necessary.
- Run
psana
providing input data, configuration file, etc. - It is also possible that somebody wrote a module which you can reuse for your analysis, check the module catalog: [Psana Module Catalog]
Running Psana
After writing and compiling the modules (or choosing standard modules) one can run psana application with these modules. Psana application is pre-built and does not need to be recompiled. To start application one needs to either provide a configuration file or corresponding command-line options. Some information (e.g. user module options) cannot be specified on the command line and always require configuration file. Here is the list of command-line options recognized by psana:
Usage: psana [options] [data-file ...] Available options: {-h|-?|--help } print help message {-v|--verbose } (incr) verbose output, multiple allowed {-q|--quiet } (incr) quieter output, multiple allowed {-c|--config } path configuration file, def: psana.cfg {-j|--job-name} string job name, def: from input files {-m|--module } name module name, more than one possible Positional parameters: data-file file name(s) with input data
If both options -c
and -m
are missing from the command line then psana reads configuration file psana.cfg
from current directory. Otherwise if -c
option is provided with the file name psana reads corresponding configuration file.
Modules loaded by psana can be specified in configuration and on command line with -m
option. If -m
option is provided then its value overrides module list specified in the configuration file. One can provide comma-separated list of module names or multiple -m
options on the command line, following command lines are all equivalent:
% psana -m ModuleA,ModuleB,ModuleC ... % psana -m ModuleA -m ModuleB -m ModuleC ... % psana -m ModuleA,ModuleB -m ModuleC ...
Option -j
can change job name which defines then names of the output histogram file. By default job name is constructed from the name of the first input file.
Input data files can also be specified in the configuration file or on command line, command-line arguments override configuration file values.
Command-line options -v
and -q
can increase or decrease verbosity of the output generated by messaging service. By default psana outputs messages at info
and higher levels. With one -v
option trace
messages will be printed also, and with two or more -v
options debug
messages will be printed too. With -q
option info
messages will not be printed, only warning
, error
, and fatal
.
Here are few examples of running psana applications:
% psana -m EventKeys /reg/d/psdm/... % psana -m psana_examples.EBeamHist -j ebeam-hist-r1000 /reg/d/psdm/... % psana -c psana_examples/data/DumpAll.cfg /reg/d/psdm/... % psana # everything will be specified in psana.cfg file