Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Change links to new auto-generated reference

...

This manual is accompanied by Pyana Reference Manual which describes interface of all analysis objects accessible to the user analysis job.

...

There are two types of data that framework passes to the user analysis modules – event data and environment data. Event data contains the data corresponding to current event that triggered the call to the user methods. In case of XTC input the event data contains complete datagram as read from DAQ. Event data in user module is represented with a special object of type pyana.event.Event which has an extended interface for extracting individual object from datagram. This interface is described in the reference guide.

Environment data include all kinds of data which are not part of the event data. Usually environment data either stay the same for the whole job or change at a slower rate than event data. Example of the environment data could be configuration data read from XTC at the beginning of the job, EPICS data which is not updated on every event, and few other things. Environment data is represented for user code through the object of type pyana.event.Env. Its interface is described in the reference guide.

Anchor
DataSourceAddress
DataSourceAddress

...

For some pieces of data one needs to specify data "address" which identifies (maybe partially) particular DAQ device which produced the data. This is needed because the instrument setup may include multiple devices producing the same data type. The DAQ defines a type which serves as a most specific device identification, the type is xtc.DetInfo in package pypdsdata. One can pass this DetInfo instance to a method which accepts device address to select that specific device. DetInfo object contains four essential pieces of information:

  • detector – one of the DetInfo.Detector.* values
  • detId – ID number selecting one of multiple detectors
  • device – one of the DetInfo.Device.* values
  • devId – ID number selecting one of multiple devices

...

  • beginjob(evt, env) – this method is called for at a Configure transition. Typically this is the place to initialize various things that may depend on the data being processed. Configuration objects which are part of the Configure transition are accessed through the env object. evt object provides interface to the datagram data and can be used to extract all contained data too, but preferred way to access configuration data objects is through the environment object. This method is usually called once per job, but in case when pyana is instructed to process multiple runs it can be called several times if there is more than one Configure transition happened during those runs.

...

Short

Long

Config File

Option type

Default

Description

-v

--verbose

verbose

integer

0

Command line options do not need any values but can be repeated multiple times, configuration file option accepts single integer number.

-c file

--config=file

 

path

pyana.cfg

Name of the configuration file.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="a6e909b9e0ced8ae-f262e29a-441049bb-8aad951e-acac92b710bfdd0b65116136"><ac:plain-text-body><![CDATA[

-C name

--config-name=name

 

string

 

If non-empty string is given then configuration will be read from section [pyana.name] in addition to [pyana].

]]></ac:plain-text-body></ac:structured-macro>

-l file

--file-list=file

file-list

path

 

The list of input data files will be read form a given file which must contain one file name per line.

-n number

--num-events=number

num-events

integer

0

Maximum number of events to process, this counter will include damaged events too.

-s number

--skip-events=number

skip-events

integer

0

number of events to skip

-j name

--job-name=name

job-name

string

 

Sets job name which is accessible to user code via environment method. Default name is based on the input file names.

-m name

--module=name

modules

string

 

User analysis module(s). Command line options can be repeated several times, configuration file option accepts space-separated list of names.

-p number

--num-cpu=number

num-cpu

integer

1

Number of processes to run, if greater than 1 then multi-processing mode will be used.

...

One significant complication comes from the multi-processing capabilities of Pyana. With multi-processing enabled jobs runs in many processes with each process analyzing only a subset of the data set. At the end of the job the output files from all independent processes needs to be merged into a single file. Depending on the format of the output files merging can be either very easy, or very hard, or impossible. Pyana supports one simple merging mechanism for files when the files from all processes are copied into a single output file, very much like 'cat file1 ... fileN > file' command does. The order in which files are copied is not specified, so if the order is important some additional processing may be required. To enable Pyana merging mechanism one needs to use a special construct when opening output file from an analysis code. Instead of plain open(...) or file(..) functions one needs to use env.mkfile(...) method with the same arguments. In this call a temporary file will be created somewhere (most likely in /tmp directory) with a unique name. The function returns a regular Python file object which can be used with all standard tools. At the end of the job Pyana will collect the names of those temporary files and merge them together into one file with the same name as was given to env.mkfile(...) deleting all temporary files. this special method is safe to use even when running in a single-process mode in which case it is equivalent to regular open(...) method so there is no unnecessary copy involved.

...

At present we use Python interface to ROOT. The main interface for creating new histograms is a special histogram manager object which is responsible for histogram bookkeeping in Pyana jobs. This object is accessible to user code through the method env.hmgr(). The object has several methods for booking new histograms such as h1d(...), h2i(...), etc. For detailed description of the methods and calling conventions consult Reference Manual. Filling of the histograms is performed through the methods of the histograms objects, the Reference Manual has links to the relevant documentation.

...

SciPy Algorithms

Few data classes such as camera.FrameV1 and acqiris.DataDescV1 present their data as NumPy arrays. There are several packages out there that implement efficient algorithms working with NumPy arrays. Probably one of the most widely used packages is SciPy which is a collection of various types of algorithms including optimization, integration, FFT, image processing, statistics, special functions, and few more. The rich interface and close integration with NumPy makes it a good candidate for use in user analysis modules.

...

Framework handles few data types specially. For example EPICS data which is a part of the vent data does not appear in every L1Accept transition but every sub-process needs to have an access to current value of EPICS variable. So for EPICS the framework reads EPICS data from every event and accumulates current state in a separate structure. This structure is made available to all sub-processes as a part of the environment so the sub-processes need to access EPICS data through the environment and not reply on event data.

...