Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This manual is accompanied by Pyana Reference Manual which describes interface of all analysis objects accessible to the user analysis job.

...

The core application have a number of configuration options which can be set or changed from configuration file or from a command line. If the same option appears both in configuration file and command line the n the command line value overrides value in the configuration file.

...

Code Block
none
none
titlemypackage/src/myana.py
borderStylesolid
# user analysis class
class myana(object):
    def __init__(self, name, lower, uppperupper, bins=100)
        self.name = name
        self.lower = float(lower)
        self.upper = float(upper)
        self.bins = int(bins)
    ...

...

There are two types of data that framework passes to the user analysis modules – event data and environment data. Event data contains the data corresponding to current event that triggered the call to the user methods. In case of XTC input the event data contains complete datagram as read from DAQ. Event data in user module is represented with a special object of type pyana.event.Event which has an extended interface for extracting individual object from datagram. This interface is described in the reference guide.

Environment data include all kinds of data which are not part of the event data. Usually environment data either stay the same for the whole job or change at a slower rate than event data. Example of the environment data could be configuration data read from XTC at the beginning of the job, EPICS data which is not updated on every event, and few other things. Environment data is represented for user code through the object of type pyana.event.Env. Its interface is described in the reference guide.

Anchor
DataSourceAddress
DataSourceAddress

...

For some pieces of data one needs to specify data "address" which identifies (maybe partially) particular DAQ device which produced the data. This is needed because the instrument setup may include multiple devices producing the same data type. The DAQ defines a type which serves as a most specific device identification, the type is xtc.DetInfo in package pypdsdata. One can pass this DetInfo instance to a method which accepts device address to select that specific device. DetInfo object contains four essential pieces of information:

  • detector – one of the DetInfo.Detector.* values
  • detId – ID number selecting one of multiple detectors
  • device – one of the DetInfo.Device.* values
  • devId – ID number selecting one of multiple devices

...

  • "AmoETof-0|Acqiris-0" – selects data produced by detector AmoETof, detId 0, device Acqiris, devId 0
  • "AmoETof|Acqiris" – selects data produced by detector AmoETof, any detId, device Acqiris, any devId
  • "AmoETof-*|Acqiris-*" – same as above
  • "AmoETof-0" – selects data produced by detector AmoETof, detId 0, any device, any devId
  • "|Acqiris-0" – selects data produced by any detector, any detId, device Acqiris, devId 0
  • "*-*|Acqiris-0" – same as above

Anchor
ConfigurationMethodsConfiguration
Methods

Configuration

Methods

As mentioned above the class in the user module defines number of methods. These methods are called by the Pyana framework at the appropriate moments during data analysis. Here is the explanation when these methods are called and what arguments they accept.

  • beginjob(evt, env) – this method is called for at a Configure transition. Typically this is the place to initialize various things that may depend on the data being processed. Configuration objects which are part of the Configure transition are accessed through the env object. evt object provides interface to the datagram data and can be used to extract all contained data too, but preferred way to access configuration data objects is through the environment object. This method is usually called once per job, but in case when pyana is instructed to process multiple runs it can be called several times if there is more than one Configure transition happened during those runs.
  • endjob(env) – this method is called at Unconfigure transition. Typically used to process collected statistics, close output files, etc. Like beginjob() it can be called multiple times if there is more that one Configure transition happes during the run range being processed.
  • beginrun(evt, env) – this method is called for at a BeginRun transition. There is usually no data associated with this transition so evt object would be empty, but env object contains all configuration objects. This method is called once for every run and is a good place to prepare for the processing of the next run.
  • endrun(env) – this method is called for at a EndRun transition. Typically used to process statistics collected during the run.
  • begincalibcycle(evt, env) – this method is called for at a BeginCalibCycle transition. This method is called once for every calibration cycle.
  • endcalibcycle(env) – this method is called for at a EndCalibCycle transition. Typically used to process statistics collected during the calibration cycle.

Methods beginrun(), endrun(), begincalibcycle(), and endcalibcycle() are optional, analysis module does not have to define them and they are called only if defined.

Two methods evt.put and evt.get allow to transfer data between different modules.

  • Save new object in event:
    evt.put( object, object_name ) – this method is called when any newly evaluated
    object needs to be saved in the evt store. To access this object from other module it needs to be associated with unique object_name – string parameter.
  • Retrieve object from event:
    object = evt.get( oobject_name ) – this method is called when object needs to be retrieved form the evt store.

Anchor
EventLoopControl
EventLoopControl

Event Loop Control

Code in user modules can control framework event loop by returning a value from event() method which is different from None (if there is no return statement in the method it is equivalent to returning None). Following values are recognized by framework:

  • pyana.Skip
    This will skip event() all downstream modules
  • pyana.Stop
    This will stop event loop, all end*() methods are called as usual
  • pyana.Terminate
    This will cause immediate job termination, end*() methods are not called

Values pyana.Stop and pyana.Terminate only work in single-process mode, in multi-process they are ignored with warning message issued if user module tries to use them.

Here is simplified example of this feature use:

Code Block

# import is necessary to use return codes
import pyana

class ExampleModule(object):
  
    def event(self, evt, env):

        ...

        if pixelsAboveThreshold < 1000:
            # This event is not worth looking at, skip it
            return pyana.Skip

        if self.nGoodEvents > 1000:
           // we collected enough data, can stop now and go to endjob()
           return pyana.Stop

        if temperatureKelvin < 0:
            # data is junk, stop right here and don't call endJob()
            return pyana.Terminate

Anchor
ExceptionHandling
ExceptionHandling

Exception Handling

Pyana does not do anything special to handle exceptions which happen in user modules, main reason for this is that it is not safe in general to continue after unknown exception was raised. If user code knows which exception can be raised and is prepared to handle those exception then corresponding code should be added to the user module.

If user module generates an exception and does not handle it the whole job is terminated immediately. In single-process mode the standard traceback will printed by the interpreter and you should see clearly the reason and location of the exception. In multi-process mode (see #Multi-processing) the job will still fail but failure will look more complex. The original exception will cause termination of only a single worker process, the standard traceback for that exception will be printed as usual. The termination of one worker process will cause communication failure inside the main process which will terminate immediately with error message (Broken pipe). This in turn will cause exceptional failures of other worker processes which will print their own tracebacks. So instead of one single traceback for an exception there will be more than one error message appearing in the output.

Anchor
Configuration
Configuration

Configuration

Analysis job can read the configuration options from the command line and/or the configuration file. Command line can be used to set options only for the pyana application itself but not user analysis modules. Options for user modules Analysis job can read the configuration options from the command line and/or the configuration file. Command line can be used to set options only for the pyana application itself but not user analysis modules. Options for user modules can be set in configuration file only.

...

Anchor
CoreOptions
CoreOptions

Core Options

...

By default the core application options are read from {{\[pyana\]}} section of the configuration file. If the option {{\-C _name_}} or {{\-\-config-name=_name_}} is given on the command line then additional section {{\[pyana._name_]\}} is read and values in that section override values from {{\[pyana\]}} section.

Here is the list of all command line and configuration file options availabale currently:

Short

Long

Config File

Option type

Default

Description

-v

--verbose

verbose

integer

0

Command line options do not need any values but can be repeated multiple times, configuration file option accepts single integer number.

-c file

--config=file

 

path

pyana.cfg

Name of the configuration file.

cfg

Name of the configuration file. <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="e8d24838-0a2c-4573-89a2-68f7fa9edb00"><ac:plain-text-body><![CDATA[

-C name

--config-name=name

 

string

 

If non-empty string is given then configuration will be read from section [pyana.name] in addition to [pyana]. ]]></ac:plain-text-body></ac:structured-macro>

-l file

--file-list=file file-list

files

path

 

The list of input data files will be read form a given file which must contain one file name per line.

-n number

--num-events=number

num-events

integer

0

Maximum number of events to process, this counter will include damaged events too.

-n s number

--numskip-events=number

num skip-events

integer

0

Maximum number of events to process, this counter will include damaged events too. skip

-j name

--job-name=name

job-name

string

 

Sets job name which is accessible to user code via environment method. Default name is based on the input file names.

-m name

--module=name

modules

string

 

User analysis module(s). Command line options can be repeated several times, configuration file option accepts space-separated list of names.

-p number

--num-cpu=number

num-cpu

integer

1

Number of processes to run, if greater than 1 then multi-processing mode will be used.

Anchor
UserModuleOptions
UserModuleOptions

User Module Options

...

For every user module the configuration file may contain one or more configuration sections. The section header for the user module has format {{\[module\]}} or {{\[module:_name_\]}}. When defining the user modules either with {{--module}} command line option or {{modules}} configuration file option one can optionally qualify the module name with a colon followed by arbitrary single-word string. Without this optional qualification the framework will load the user module and will use the options from {{\[module\]}} section to initialize the instance of the analysis class (as explained in [Initialization|#Initialization] section). If, on the other hand, the qualified name is used then the framework will initialize the instance with the options combined from the sections {{\[module\]}} and {{\[module:_name_\]}} with the latter section overriding the values from the former section. One could use several qualified forms of the same module name to produce several instances of the analysis class in the same job with different options.

Here is an almost identical example from Initialization section above which illustrates the inheritance and overriding of the user options:

...

One significant complication comes from the multi-processing capabilities of Pyana. With multi-processing enabled jobs runs in many processes with each process analyzing only a subset of the data set. At the end of the job the output files from all independent processes needs to be merged into a single file. Depending on the format of the output files merging can be either very easy, or very hard, or impossible. Pyana supports one simple merging mechanism for files when the files from all processes are copied into a single output file, very much like 'cat file1 ... fileN > file' command does. The order in which files are copied is not specified, so if the order is important some additional processing may be required. To enable Pyana merging mechanism one needs to use a special construct when opening output file from an analysis code. Instead of plain open(...) or file(..) functions one needs to use env.mkfile(...) method with the same arguments. In this call a temporary file will be created somewhere (most likely in /tmp directory) with a unique name. The function returns a regular Python file object which can be used with all standard tools. At the end of the job Pyana will collect the names of those temporary files and merge them together into one file with the same name as was given to env.mkfile(...) deleting all temporary files. this special method is safe to use even when running in a single-process mode in which case it is equivalent to regular open(...) method so there is no unnecessary copy involved.

...

At present we use Python interface to ROOT. The main interface for creating new histograms is a special histogram manager object which is responsible for histogram bookkeeping in Pyana jobs. This object is accessible to user code through the method env.hmgr(). The object has several methods for booking new histograms such as h1d(...), h2i(...), etc. For detailed description of the methods and calling conventions consult Reference Manual. Filling of the histograms is performed through the methods of the histograms objects, the Reference Manual has links to the relevant documentation.

...

SciPy Algorithms

Few data classes such as camera.FrameV1 and acqiris.DataDescV1 present their data as NumPy arrays. There are several packages out there that implement efficient algorithms working with NumPy arrays. Probably one of the most widely used packages is SciPy which is a collection of various types of algorithms including optimization, integration, FFT, image processing, statistics, special functions, and few more. The rich interface and close integration with NumPy makes it a good candidate for use in user analysis modules.

...

Framework handles few data types specially. For example EPICS data which is a part of the vent data does not appear in every L1Accept transition but every sub-process needs to have an access to current value of EPICS variable. So for EPICS the framework reads EPICS data from every event and accumulates current state in a separate structure. This structure is made available to all sub-processes as a part of the environment so the sub-processes need to access EPICS data through the environment and not reply on event data.

...

For other types of the data it would be a user responsibility to store it in some location and then do manual merging after the job is finished.

Anchor
WritingUserModules
WritingUserModules

Writing User Modules

Preferred way to run user analysis code is to create a separate package for each user and store all user modules in src directory in that package. If you plan to commit you code to repository then the package name must be unique and probably include your user name (or experiment name). To create an empty package run this command (it implies that analysis environment has been set):

...