Introduction
This page collects information about Python-based analysis framework for LCLS called pyana. This framework design borrows heavily from the various sources such as Online Analysis (a.k.a. myana), BaBar framework, etc. It's main principles are summarized here:
- oriented on XTC processing, but could be extended to work with HDF5 data
- should be easy to use and extend for end users
- support re-use of the existing analysis code
- allow parallel processing on multi-core systems
- common simple configuration of user analysis code
Framework composition
The centerpiece of the framework is a regular Python application (pyana) which can load one or more user analysis modules which are also written in Python. The core application is responsible for the following tasks:
- loading and initializing all user modules
- reading XTC data from a list of input files
- calling appropriate user methods based on the data being processed
- providing data access to user modules
- providing other services such as histogramming to user modules
User modules
User analysis module is a regular Python module (a Python file) which satisfies additional requirements:
- it contains a class with the same name as the module name
- the class defines constructor method with optional arguments and five regular methods:
beginjob()
,beginrun()
,event()
,endrun()
, andendjob()
.
The application loads one or more user modules, the names of the modules to load are specified either in the job configuration file or on the command line. After loading the modules the application creates one or more instance objects of the class defined in the module. More than one instance may be useful if one wants to run the same analysis with different set of parameters in the same job. Number of instances and their parameters are determined by the job configuration file (see below).
Initialization
User analysis class can define zero or more parameters in its constructor (__init__()
method). Parameters are initialized from the values defined in the job configuration file (see below). All parameters are passed to the Python code as strings, if the code expects a number or some other type then it's the code responsibility to convert the strings to appropriate type. If there are no default values defined for some parameters in constructor declaration then those parameters must be present in the configuration file.
For quick example suppose that we have this class defined in user module:
# user analysis class class myana(object): def __init__(self, name, lower, uppper, bins=100) self.lower = float(lower) self.upper = float(upper) self.bin = int(bins) ...
and this job configuration file:
[pyana] modules = mypackage.myana mypackage.myana:wide [mypackage.myana] lower = 0 upper = 100 name = default [mypackage.myana:wide] lower = 0 upper = 1000 bins = 1000 name = wide
With this the analysis job will instantiate two analysis objects with different parameters, equivalent to this pseudo-code:
# import class myana from mypackage.myana import myana # create instances instances = [ myana(lower = "0", upper = "100", name = "default"), myana(lower = "0", upper = "1000", bins = "1000", name = "wide") ]
(the order of parameters in constructor and configuration file does not matter as all parameters are passed as keyword parameters.)
Configuration
Multi-processing
The framework can be run in single-process or multi-process mode, by default everything runs in singe-process mode. In multi-process mode analysis job spawns a number of processes all running on the same host. In that case framework is also responsible for distributing individual events to a single or multiple processes and collecting and merging the results of the processing at the end of the job.