Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents

Introduction

...

  • Datasets need not be aligned.  That is the 5th image in a detector dataset may come from a different event than the 5th record in a gas detector dataset. One can match up records from different datasets by use the time datasets.
  • One should use the _mask datasets to identify valid data. A _mask dataset record is 1 when the corresponding record of the data dataset if valid, 0 if it is not. When the _mask record is 0, the data record will be all zeros and should not be processed. The mask is 0 when the xtc data is damaged. The type damaged data can then be found in the _damage dataset. The main reason to record damaged data blanks in the data datasets when damage occurs is to keep datasets as aligned as possible.
  • The hdf5 group hierarchy has the following levels: run, calib cycle. type, source  - regular event data is organized into datasets that live at the source level. Epics is special, rather then the two groups type and source, there are three groups for epics: type, source and epics pv name. Epics aliases live alongside epics pv names in this group hierarchy. Finally configuration data (that usually arrives once) is found in subgroups to the configure groups at the top of the hdf5 hierarchy.

...

The Translator supports split scan mode. In this mode, calib cycles are written to separate hdf5 files. A master file will have external links to these separate hdf5 files. Users need only work with the master file. The master file uses the same schema as one finds without split scan mode, with one exception discussed below (the logging of filtered events). Otherwise, little . Little modification to user code is required when working with the master file. What is required, is following external links (see below for tips on this). Not all experiments use more than one calib cycle. For experiments that use one calib cycle per run, split scan mode provides no benefit. Two reasons to use split scan mode is first, that the resulting hdf5 file from normal translation is too large, and second, to make translation faster by translating the data in parallel. Two versions of The split scan mode have been implemented. The most recent (soon to be available in ana-0.13.2 and later) is an MPI (Message Passing Interface) based version. This is the recommended way to run split scan mode. However the previous version is still available and is documented below.

...

translator is implemented using MPI. It has its own driver program to be launched using MPI called h5-mpi-translate.

Launching Split Scan

Here is an example command for launching the mpi based splitscan Translator:

...

For online analysis with live data, one of the impediments to keeping up with the data is the time it takes h5-mpi-translate's to read through the data to find the calib cycles. Typically the data is recorded in 6 or more separate files and each must be read through to identify the start of calib cycles. Unfortunately this read speed can be 30-40hz for a typical experiment - far short of the 120hz we'd like to obtain. A recent feature added to h5-mpi-translate takes advantage of the unique signature of each new calib cycle, combined with the regular structure of the separate data files in order to limit the reading to just one of the files. In this way, the h5-mpi-translate master rank need only get through the data it reads/searches at 20hz to keep up with the data. We have had good success with this feature recently, but it is not guaranteed to work . It - in particular high level of damaged data degrades the regular structure of the DAQ files. This in turn will increase the fastindex search time to the point where it is no longer useful. fastindex is a temporary solution until a more robust way to do fast/live indexing is put in place. However in the meantime, starting with analysis release ana-0.13.17, the translator supports the following options to turn on fast indexing and controlling how much time is spent searching the other files:

Code Block
fast_index=1                 # do the fast indexing, be default it is off
fi_mb_half_block=12          # when fast indexing is on, use 12MB on each side, or 24MB for each block that is searched
fi_num_blocks=340              # this it half the number of 'other' blocks to try. The translator will try 1 + 2*340 = 781 blocks if this is 3  40 (about 1GB total search)

More information can be found in the Psana Configuration File and All the Options section below.

Non-MPI Split Scan

To run the Translator in split scan mode without MPI, three options must be set. These options do the following:

  • Tell the Translator to split (option split=SplitScan)

  • Tell the Translator how many jobs to run to parallelize the work (option jobTotal=N)

  • Tell the Translator which job it is (option jobNumber=K)

Next, one must run N different Translator jobs where the jobNumber parameter varies. An example of using two jobs for translation is:

psana -m Translator.H5Output -o Translator.H5Output.output_file=mydir/split.h5 -o Translator.H5Output.split=SplitScan -o Translator.H5Output.jobNumber=0 -o Translator.H5Output.jobTotal=2 exp=xpp123:run=10
psana -m Translator.H5Output -o Translator.H5Output.output_file=mydir/split.h5 -o Translator.H5Output.split=SplitScan -o Translator.H5Output.jobNumber=1 -o Translator.H5Output.jobTotal=2 exp=xpp123:run=10

(This mode always writes one calib cycle per file).

In this mode, each Translator job reads through the entire set of xtc files. As more and more Translator jobs are run simultaneously, the overall speed of translation diminishes while the load on the network steadily increases. It is recommended that users run no more than 5 Translator jobs. Testing has shown 5 jobs provide up to a three times speed up in translation for the non-mpi mode. (greater speedups have been seen with the MPI version).

Reading While Translating

HDF5 presently has little support for reading a file that is being created, and in general does not recommend this. However the master file is written in a way to support this as well as possible. When using the MPI split scan translator, links are not added to the master file after the calib files are done. Thus it is always safe to traverse those links. With the non-MPI split scan translator this is true except for the last N links, where N is the number of jobs running. These links may be written before the calib cycle files they link to are finished. To see updates in the master file, users may need to shut down programs like Matlab and h5py and restart them. It is not sufficient to close and reopen the master file within a Python or Matlab session.

Translation differences with split scan mode

The only difference users should see is if they provide modules that use the special key 'do_not_translate' to drop events from Translation. Ordinarily, as discussed below, in addition to dropping the event from translation, a event id for the dropped event is recorded in a hdf5 group such as /Configure:0000/Run:0000/Filtered:0000. These groups are not created in split scan mode (the event will still be be dropped).

When working with the master file, it is necessary to follow external links. For instance, to get a recursive listing of all the groups in the output file using h5ls, one must do

h5ls -r -E master.h5

as opposed to h5ls -r file.h5. The -E option instructs h5ls to follow external links. Similar functionality should exist in h5py, Matlab, and other software that works with HDF5.

Presentations

Attached is a link to a presentation given during the LCLS 2014 users meeting. It goves over dataset alignment as well as new features: Using_the_HDF5_Translator.pdf

New Features

With psana-translate, you can

  • filter out whole events from translation
  • filter out certain data, by data type, or by data source, or key string
  • write ndarray's that other modules add to the event store or configStore
  • write std::string's that other C++ modules add to the event store or configStore
  • include summary data for calib cycles
  • advanced: have a C++ module register a new type for translation

New Features Subject to Change

Three aspects of these new features are subject to change. These are highlighted in warning boxes below. In brief, these are

  • hdf5 group names for ndarray types
  • How event key strings are incorporated into hdf5 paths
  • How new C++ types are registered for translation.

Important Changes between o2o-translate and psana-translate

The translation that psana-translate produces is most always backward compatible with what o2o-translate produced. The only difference likely to affect users is where the CsPad calibration constants are found, this is discussed in the The XTC-to-HDF5 Translator section below. There are also a number of minor differences which should make no difference to user code written to process o2o-translate hdf5 files. These are documented in the section Difference's with o2o-translate. hdf5 files created by o2o-translate or psana-translate contain an attribute defining the schema number. Below we document important changes introduced with Schema 4 as implemented in V00-01-00 and above for psana-translate. o2o-translate implemented schema versions 1,2 and 3. These important changes are the use of CalibStore for calibration constants, and dropping PNCCD::FullFrames from translation.

Calibrated Data

o2o-translate knows how to calibrate CsPad data. If o2o-translate was told where a calib-dir was (which it is for automatic translation) and calibration constants have been recorded in this directory (typically carried out by the calibration management tool by processing a dark run) then o2o-translate calibrates cspad data and write the calibrated data instead of the raw xtc data. It writes the calibrated data in the same place where the raw xtc would have gone. It would also write the calibration constants used (such as pedestals and pixel status) in a special group. Finally, if the common mode calibration was done (this depends on what files are deployed to the calib-dir) which is a correction calculated for each event, the source group containing all the event data will include a common_mode dataset with the common mode values. This allows users to recover the raw data from the calibrated data.

With psana-translate calibration is handled by external psana modules. These modules will produce calibrated data and psana-translate will find it and translate it to the hdf5 file. Understanding this flow of data is not necessary for automatic translation, however if users want to customize calibration, some understanding of how psana modules pass data through the event store, and are configured through config files is necessary.  The calibrated data will be distinguished from uncalibrated data with the use of a key in the event store. The key defaults to the value 'calibrated' but this is configurable through the psana.cfg file, in the section for the calibration module used. psana-translate provides special treatment for the calibration key. For psana-translate, the default value for the calibration key is 'calibrated' as well, but again, this is configurable through the psana.cfg file, in the section for Translator.H5Output. If psana-translate sees data with the key calibrated - it defaults to only translate data with the calibrated key and not the raw data. In the hdf5 file, one will find calibrated data where one would have otherwise found uncalibrated data. This is consistent with how o2o-translate translated calibrated cspad data. The calibrated key is not present in the hdf5 path names. This is different than what one finds for keys with ndarrays. For ndarrays the key is part of the h5 path name (see below).  The psana-translate option skip_calibrated can be set to true to get the uncalibrated data instead of calibrated data.

Calibration makes use of calibration constants - such as pedestals and pixel status. A key difference between psana-translate and o2o-translate is where these calibration constants are found, and the datatypes used to store them. For psana-translate they are found in the group CalibStore to the current configure group. For example, if we translate the first event in a run of the cxi tutorial data where we add the cspad calibration module before the psana-translate module:

psana -n 1 -m cspad_mod.CsPadCalib,Translator.H5Output -o Translator.H5Output.output_file=calib.h5 exp=xpptut13:run=71

Then we will see

Reading While Translating

HDF5 presently has little support for reading a file that is being created, and in general does not recommend this. However the master file is written in a way to support this as well as possible. When using the MPI split scan translator, links are not added to the master file after the calib files are done. Thus it is always safe to traverse those links. To see updates in the master file, users may need to shut down programs like Matlab and h5py and restart them. That is it may not be sufficient to close and reopen the master file within a Python or Matlab session.

EndData and Split Translation

The EndData feature, discussed below, provides a way for Psana user modules to translate data during beginRun and endRun. However in split scan mode, each calib cycle is translated by a separate translator, and each separate translator will create an independent instance of any Psana modules that have been specified in the h5-mpi-translate command line. Consequently if a Psana module outputs summary information during endRun, it will not be summary information for the whole run, just those calib cycles translated by the Translator that loaded it. Moreover the master file will make a link to the first EndData within Run:0000 that it finds. That is, if there are 10 different external calib cycle files, with 10 different EndRun groups, there will be a link to only one of them from the master file.

When working with the master file, it is necessary to follow external links. For instance, to get a recursive listing of all the groups in the output file using h5ls, one must do

h5ls -r -E master.h5

as opposed to h5ls -r file.h5. The -E option instructs h5ls to follow external links. Similar functionality should exist in h5py, Matlab, and other software that works with HDF5.

Presentations

Attached is a link to a presentation given during the LCLS 2014 users meeting. It goves over dataset alignment as well as new features: Using_the_HDF5_Translator.pdf

New Features

With psana-translate, you can

  • filter out whole events from translation
  • filter out certain data, by data type, or by data source, or key string
  • write ndarray's that other modules add to the event store or configStore
  • write std::string's that other C++ modules add to the event store or configStore
  • include summary data for calib cycles
  • advanced: have a C++ module register a new type for translation

New Features Subject to Change

Three aspects of these new features are subject to change. These are highlighted in warning boxes below. In brief, these are

  • hdf5 group names for ndarray types
  • How event key strings are incorporated into hdf5 paths
  • How new C++ types are registered for translation.

Important Changes between o2o-translate and psana-translate

The translation that psana-translate produces is most always backward compatible with what o2o-translate produced. The only difference likely to affect users is where the CsPad calibration constants are found, this is discussed in the The XTC-to-HDF5 Translator section below. There are also a number of minor differences which should make no difference to user code written to process o2o-translate hdf5 files. These are documented in the section Difference's with o2o-translate. hdf5 files created by o2o-translate or psana-translate contain an attribute defining the schema number. Below we document important changes introduced with Schema 4 as implemented in V00-01-00 and above for psana-translate. o2o-translate implemented schema versions 1,2 and 3. These important changes are the use of CalibStore for calibration constants, and dropping PNCCD::FullFrames from translation.

Calibrated Data

o2o-translate knows how to calibrate CsPad data. If o2o-translate was told where a calib-dir was (which it is for automatic translation) and calibration constants have been recorded in this directory (typically carried out by the calibration management tool by processing a dark run) then o2o-translate calibrates cspad data and write the calibrated data instead of the raw xtc data. It writes the calibrated data in the same place where the raw xtc would have gone. It would also write the calibration constants used (such as pedestals and pixel status) in a special group. Finally, if the common mode calibration was done (this depends on what files are deployed to the calib-dir) which is a correction calculated for each event, the source group containing all the event data will include a common_mode dataset with the common mode values. This allows users to recover the raw data from the calibrated data.

With psana-translate calibration is handled by external psana modules. These modules will produce calibrated data and psana-translate will find it and translate it to the hdf5 file. Understanding this flow of data is not necessary for automatic translation, however if users want to customize calibration, some understanding of how psana modules pass data through the event store, and are configured through config files is necessary.  The calibrated data will be distinguished from uncalibrated data with the use of a key in the event store. The key defaults to the value 'calibrated' but this is configurable through the psana.cfg file, in the section for the calibration module used. psana-translate provides special treatment for the calibration key. For psana-translate, the default value for the calibration key is 'calibrated' as well, but again, this is configurable through the psana.cfg file, in the section for Translator.H5Output. If psana-translate sees data with the key calibrated - it defaults to only translate data with the calibrated key and not the raw data. In the hdf5 file, one will find calibrated data where one would have otherwise found uncalibrated data. This is consistent with how o2o-translate translated calibrated cspad data. The calibrated key is not present in the hdf5 path names. This is different than what one finds for keys with ndarrays. For ndarrays the key is part of the h5 path name (see below).  The psana-translate option skip_calibrated can be set to true to get the uncalibrated data instead of calibrated data.

Calibration makes use of calibration constants - such as pedestals and pixel status. A key difference between psana-translate and o2o-translate is where these calibration constants are found, and the datatypes used to store them. For psana-translate they are found in the group CalibStore to the current configure group. For example, if we translate the first event in a run of the cxi tutorial data where we add the cspad calibration module before the psana-translate module:

psana -n 1 -m cspad_mod.CsPadCalib,Translator.H5Output -o Translator.H5Output.output_file=calib.h5 exp=xpptut13:run=71

Then we will see

Code Block
h5ls -r calib.h5 | grep -i "calibstore\|cspad"  # this command will include the following 
Code Block
h5ls -r calib.h5 | grep -i "calibstore\|cspad"  # this command will include the following output

/Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.0/common_mode Dataset {1/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.0/data Dataset {1/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.0/element Dataset {1/Inf}
...
/Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.1/common_mode Dataset {1/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.1/data Dataset {1/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.1/element Dataset {1/Inf}
...
/Configure:0000/CalibStore/pdscalibdata::CsPad2x2PedestalsV1/XppGon.0:Cspad2x2.0/pedestals Dataset {185, 388, 2}
/Configure:0000/CalibStore/pdscalibdata::CsPad2x2PedestalsV1/XppGon.0:Cspad2x2.1/pedestals Dataset {185, 388, 2}
/Configure:0000/CalibStore/pdscalibdata::CsPad2x2PixelStatusV1/XppGon.0:Cspad2x2.0/status Dataset {185, 388, 2}
/Configure:0000/CalibStore/pdscalibdata::CsPadCommonModeSubV1/XppGon.0:Cspad2x2.0/data Dataset {SCALAR}
...

...

Since psana-translate runs as a psana module, it is possible to filter translated events through psana options and other modules. psana options allow you to start at a certain event, and process a certain number of events.  Moreover a user module that is loaded before the Translator module can tell psana that it should not pass this event on to any other modules, hence the Translator.H5Output module will never see the event and it will not get translated.

A downside of having modules loaded before the translator skip events is that updates to epics pv's, or configuration data will not get recorded. One may also wish to record the reason for filtering the event in the hdf5 file, as well as the event id's for the filtered events.  psana-translate provides an interface for doing these things. One can also filter events by putting an object in the event store with a special key string. To use this mechanism, a module must put an object in the eventStore with a key that starts with

...

Code Block
languagecpp
titlefiltering
  virtual void event(Event& evt, Env& env) {
	boost::shared_ptr<int> messagedummyVariable = boost::make_shared<int>();        
    evt.put(messagedummyVariable,"do_not_translate");
  }

Then none of the event data will get translated in any of the calib cycles.  The Translator will do the following

  • For each calib cycle, it will make a filtered group
    • For instance, if the file has the group /Configure:0000/Run:0000/CalibCycle:0000, then it will also have:
      the group: /Configure:0000/Run:0000/Filtered:0000
  • within each Filtered group, a time dataset that holds event id's of the filtered events.  

Suppose a user module has made some measurements that indicate this event should be filtered (for instance the beam energy is wrong). These measurements can be recorded in the hdf5 file by adding data to the event store that the Translator knows how to write.  As discussed below, the Translator can write ndarrays and strings as well as simple new types that user modules register. If a user module implements event to do the following:

Code Block
languagecpp
titleC++ do not translate example - with logging
// define user Module,

  virtual void event(Event& evt, Env& env) {
    boost::shared_ptr<std::string> message = boost::make_shared<std::string>("The beam energy is bad");        
    evt.put(message,"do_not_translate:message");
    const unsigned shape[1] = {4};
    boost::shared_ptr< ndarray<float,1> measurements = boost::make_shared< ndarray<float,1> >(shape);
    float *data = measurements->data();
    data[0] = 0.4;  data[1] = 1.3; data[2] = 2.2; data[3] = 3.1;
    evt.put(measurements,"do_not_translate:measurements");
  }

Note the use of the key "do_not_translate:xxx" the :xxx is not necessary, but it helps to uniquely quality the event data, and it will become a part of the 'src' group name where the do_not_translate event data is written to.  Since both std::string and a ndarray<float,1> are types that the Translator knows how to write, it will create the following groups and datasets in the hdf5 file:

  • /Configure:0000/Run:0000/Filtered:0000/time  - this is as discussed above, the event id's for all filtered events
  • /Configure:0000/Run:0000/Filtered:0000/std::string/noSrc__message/data  - this will be a dataset of variable length strings, each entry will be the string "The beam energy is bad"
  • /Configure:0000/Run:0000/Filtered:0000/std::string/noSrc__message/time  - this will be a dataset of eventId's for the data above (there need not have been a std::string in all the filtered events).
  • /Configure:0000/Run:0000/Filtered:0000/ndarray_float32_1/noSrc__measurements/data - this will be a dataset where each entry is a 1D array of 4 floats, with the values 0.4, 1.3, 2.2, 3.1
  • /Configure:0000/Run:0000/Filtered:0000/ndarray_float32_1/noSrc__measurements/time - likewise the event ids for the ndarrays of the filtered events.

Note the src level group names: noSrc__mesage and noSrc__measurements. Since no source was specified with the calls to evt.put, the Translator starts with the string noSrc in the group name. Two underscores, __, separate the source from the keystring.

Note the fully qualified type information about the ndarray's written. This allows translation of different ndarrays in the event store that differ only by this type information (i.e: they have the same key and source).

Warning

This example illustrates the way our current hdf5 schema, schema 4, forms hdf5 paths that involve key strings for event data: source__key where the string noSrc can be used for source. This is one aspect of the new features that is subject to change. 

It also demonstrates the hdf5 group names for ndarray's, such as ndarray_float32_1. This is subject to change.

Filtering from Python Modules

A Python module can use standard psana features to skip events as discussed above. It can also add any Python object into the event store that has the key "do_not_translate". This will create the Filtered:0000/time dataset as above. However to use the Translator filtering features that record user data, the Python module will have to add data that psana knows how to convert for C++ modules. Presently the only types that a Python module can add to the event store which will be seen by C++ modules are a number of ndarrays and str. A Python module will need to add one of these ndarray types or str to filter events.

Below is a complete example. First we make a release environment, and create a package for our Python Module that will filter events:

newrel ana-current myrel
cd myrel
newpkg mypkg
mkdir mypkg/src
sit_setup

Suppose we want to filter based on the photon count of the calibrated cspad from the CxiDs1.0:Cspad.0 source in the tutorial data. Rather then work with the quad's of the cspad, we will use an image producer module so that we can work with a 2D image. Further documentation on the various calibration modules, including image producers, can be found at psana - Module Catalog. We now add the following two files:

Code Block
languagebash
titlemyrel/trans.cfg
linenumberstrue
collapsetrue
[psana]
modules = cspad_mod.CsPadCalib \
          CSPadPixCoords.CSPadImageProducer \
          mypkg.mymod \
          Translator.H5Output 
events = 10
files = exp=cxitut13:run=1150

[CSPadPixCoords.CSPadImageProducer]
source        = DetInfo(CxiDs1.0:Cspad.0)
key           = calibrated
imgkey        = image

[Translator.H5Output]
deflate=-1
shuffle=False
overwrite=True
output_file=cxitut13-run1150-filt.h5

and the file

Code Block
languagepython
titlemyrel/mypkg/src/mymod.py
linenumberstrue
collapsetrue
import psana

class mymod(object):
    def __init__(self):
        self.threshold = self.configInt('threshold',176000000)
        self.source = self.configSrc('source','DetInfo(CxiDs1.0:Cspad.0)')

    def event(self, evt, env):
        image = evt.get(psana.ndarray_int16_2, 
                        self.source, 
                        'image')
	    if image is None: return
        photonCount = image[:].sum()
        if photonCount < self.threshold:
            self.skip()

After putting these files in place, and doing

scons

we can run this example by

psana -c trans.cfg

The configuration file trans.cfg sets up a module chain with 4 modules: cspad_mod.CsPadCalib, CSPadPixCoords.CSPadImageProducer, mypkg.mymod, Translator.H5Output. The first module calibrates all cspad it finds. The second module turns a cspad from a specific source into an image - placing quads and ascics in the correct position based on geometry. After these two modules, we load our own module - mypkg.mymod. Finally the Translator module runs last.

Reading through trans.cfg you will see how the raw cspad moves through the event store. The default behavior of cspad_mod.CsPadCalib is to place calibrated cspad in the Event with the key "calibrated". The CSPadPixCoords.CSPadImageProducer module has been told to look for the "calibrated" input key, for the source DetInfo(CxiDs1.0:Cspad.0) and produce an image with the key "image". 

In mymod.py, we see a class called mymod derived from object. It is important that the class name be the same as the file name. This is part of how psana finds the class. In the event, the module gets data of type psana.ndarray_int16_2. Identifying the correct type to use can be a challenge. Starting with code in event() that does

print evt.keys()

shows what the keys look like. After getting a valid image, a sum and simple threshold is performed.

Note the option

events = 10

in the psana section of the config file. This means one would translate 10 events in the data. This is just for testing and development. One would remove the option, or set it to 0 for a full translation. With events=10, after translation, if one does

h5ls -r cxitut13-run1150-filt.h5  | grep -i "ndarray"
/Configure:0000/Run:0000/CalibCycle:0000/ndarray_const_int16_2/CxiDs1.0:Cspad.0__image/data Dataset {5/Inf}

one sees that only 5 events were translated. The other 5 were skipped.

Another point to make about this example is that the cspad is effectively getting translated twice. The Translator is going to see the event keys:

EventKey(type=psana.CsPad.DataV2, src='DetInfo(CxiDs1.0:Cspad.0)')
EventKey(type=psana.CsPad.DataV2, src='DetInfo(CxiDs1.0:Cspad.0)', key='calibrated')
EventKey(type=psana.ndarray_int16_2, src='DetInfo(CxiDs1.0:Cspad.0)', key='image')

The Translator's default behavior is to treat the key 'calibrated'  as special. Since the first two keys differ only by the keystring 'calibrated', the Translator assumed the one marked 'calibrated' should replace the first in the translation. Hence it will not translate the raw cspad. It only translated the calibrated cspad. However the Translator does not know that the ndarray with key 'image' is a copy of the cspad. If one is only going to work with the 'image' array data and not the 'calibrated' cspad data, one could add the filtering option

Cspad=exclude

To the Translator.H5Output section of the config file. Then none of the cspad data will be translated (including both the configuration cspad object as well as event data) while the 'image' arrays will still be translated.

Filtering Types

The psana.cfg file accepts a number of parameters that will filter out sets of psana types.  For example setting

EBeam = exclude

would cause any of the types Psana::Bld::BldDataEBeamV0, Psana::Bld::BldDataEBeamV1, Psana::Bld::BldDataEBeamV2, Psana::Bld::BldDataEBeamV3 or Psana::Bld::BldDataEBeamV4 to be excluded from translation.

All types are translated by default. To exclude a few types, you can add lines like EBeam = exclude to the psana.cfg file. You can also list them with the type_filter parameter:

type_filter exclude EBeam Andor

The type_filter parameter is useful for including a few types:

type_filter include CsPad Frame

A shortcut is available to turn off translation of all the Xtc data:

type_filter exclude psana

One would use this to only translate user module data, such as ndarrays, strings and newly registered types.

The complete list of type aliases that users can use to filter is found in the default_psana.cfg file included below.

Src Filtering

Specific src's can be filtered by providing a list such as

src_filter = exclude NoDetector.0:Evr.2  CxiDs1.0:Cspad.0  CxiSc2.0:Cspad2x2.1  EBeam  FEEGasDetEnergy  CxiDg2_Pim

the syntax for a src in the filter list is what is supported by the Psana::Source class. This is a flexible syntax allowing for several ways to specify a src. It will match any detector or device number if this is not specified. See the section Psana Configuration File and all Options below for more details. If DAQ src aliases are present in the xtc file, these can be used for src filtering as well.  For example if the alias

acq01 -> SxrEndstation.0:Acqiris.0

is present, one can do

src_filter = exclude acq01

to exclude all data from the SxrEndstation.0:Acqiris.0 src.

Writing User Data

The translator will write NDarrays, C++ std::strings, and C++ types that the user registers.  Presently, registering new types is an advanced feature that requires familiarity with hdf5 programming. To add data to the translation, one must write a Psana module that adds this data into the event or configStore before the Translator.H5Output module runs.

Event vs. ConfigStore

Most user modules will add data to the event. Such data will be written into stacked datasets, that is a 1D dataset of a type X based on the user data. In this 1D dataset, there will be one entry for each event the user module added data. Alongside this data, in a dataset named "data" will be a dataset named "time". The "time" dataset will have the event id's corresponding to the Psana Events the from which the user module added data.

Data added to the configStore is not written out in "stacked" datasets. It is written out in "one shot". The intention is that users may add some configuration during begincalibcycle, and some summary information during endcalibcycle. It is not recommended that users add data to the configStore for the purpose of translation during regular events. During endcalibcycle, the Translator will check for new data in the configStore(). If it finds it, it will create a subgroup called EndData, such as

Filtering from Python Modules

A Python module can use standard psana features to skip events as discussed above. It can also add any Python object into the event store that has the key "do_not_translate".

Below is a complete example. First we make a release environment, and create a package for our Python Module that will filter events:

newrel ana-current myrel
cd myrel
newpkg mypkg
mkdir mypkg/src
sit_setup

Suppose we want to filter based on the photon count of the calibrated cspad from the CxiDs1.0:Cspad.0 source in the tutorial data. Rather then work with the quad's of the cspad, we will use an image producer module so that we can work with a 2D image. Further documentation on the various calibration modules, including image producers, can be found at psana - Module Catalog. We now add the following two files:

Code Block
languagebash
titlemyrel/trans.cfg
linenumberstrue
collapsetrue
[psana]
modules = cspad_mod.CsPadCalib \
          CSPadPixCoords.CSPadImageProducer \
          mypkg.mymod \
          Translator.H5Output 
events = 10
files = exp=cxitut13:run=1150

[CSPadPixCoords.CSPadImageProducer]
source        = DetInfo(CxiDs1.0:Cspad.0)
key           = calibrated
imgkey        = image

[Translator.H5Output]
deflate=-1
shuffle=False
overwrite=True
output_file=cxitut13-run1150-filt.h5

and the file

Code Block
languagepython
titlemyrel/mypkg/src/mymod.py
linenumberstrue
collapsetrue
import psana

class mymod(object):
    def __init__(self):
        self.threshold = self.configInt('threshold',176000000)
        self.source = self.configSrc('source','DetInfo(CxiDs1.0:Cspad.0)')

    def event(self, evt, env):
        image = evt.get(psana.ndarray_int16_2, 
                        self.source, 
                        'image')
	    if image is None: return
        photonCount = image[:].sum()
        if photonCount < self.threshold:
            self.skip()

After putting these files in place, and doing

scons

we can run this example by

psana -c trans.cfg

The configuration file trans.cfg sets up a module chain with 4 modules: cspad_mod.CsPadCalib, CSPadPixCoords.CSPadImageProducer, mypkg.mymod, Translator.H5Output. The first module calibrates all cspad it finds. The second module turns a cspad from a specific source into an image - placing quads and ascics in the correct position based on geometry. After these two modules, we load our own module - mypkg.mymod. Finally the Translator module runs last.

Reading through trans.cfg you will see how the raw cspad moves through the event store. The default behavior of cspad_mod.CsPadCalib is to place calibrated cspad in the Event with the key "calibrated". The CSPadPixCoords.CSPadImageProducer module has been told to look for the "calibrated" input key, for the source DetInfo(CxiDs1.0:Cspad.0) and produce an image with the key "image". 

In mymod.py, we see a class called mymod derived from object. It is important that the class name be the same as the file name. This is part of how psana finds the class. In the event, the module gets data of type psana.ndarray_int16_2. Identifying the correct type to use can be a challenge. Starting with code in event() that does

print evt.keys()

shows what the keys look like. After getting a valid image, a sum and simple threshold is performed.

Note the option

events = 10

in the psana section of the config file. This means one would translate 10 events in the data. This is just for testing and development. One would remove the option, or set it to 0 for a full translation. With events=10, after translation, if one does

h5ls -r cxitut13-run1150-filt.h5  | grep -i "ndarray"
/Configure:0000/Run:0000/CalibCycle:0000/ndarray_const_int16_2/CxiDs1.0:Cspad.0__image/data Dataset {5/Inf}

one sees that only 5 events were translated. The other 5 were skipped.

Another point to make about this example is that the cspad is effectively getting translated twice. The Translator is going to see the event keys:

EventKey(type=psana.CsPad.DataV2, src='DetInfo(CxiDs1.0:Cspad.0)')
EventKey(type=psana.CsPad.DataV2, src='DetInfo(CxiDs1.0:Cspad.0)', key='calibrated')
EventKey(type=psana.ndarray_int16_2, src='DetInfo(CxiDs1.0:Cspad.0)', key='image')

The Translator's default behavior is to treat the key 'calibrated'  as special. Since the first two keys differ only by the keystring 'calibrated', the Translator assumed the one marked 'calibrated' should replace the first in the translation. Hence it will not translate the raw cspad. It only translated the calibrated cspad. However the Translator does not know that the ndarray with key 'image' is a copy of the cspad. If one is only going to work with the 'image' array data and not the 'calibrated' cspad data, one could add the filtering option

Cspad=exclude

To the Translator.H5Output section of the config file. Then none of the cspad data will be translated (including both the configuration cspad object as well as event data) while the 'image' arrays will still be translated.

Filtering Types

The psana.cfg file accepts a number of parameters that will filter out sets of psana types.  For example setting

EBeam = exclude

would cause any of the types Psana::Bld::BldDataEBeamV0, Psana::Bld::BldDataEBeamV1, Psana::Bld::BldDataEBeamV2, Psana::Bld::BldDataEBeamV3 or Psana::Bld::BldDataEBeamV4 to be excluded from translation.

All types are translated by default. To exclude a few types, you can add lines like EBeam = exclude to the psana.cfg file. You can also list them with the type_filter parameter:

type_filter exclude EBeam Andor

The type_filter parameter is useful for including a few types:

type_filter include CsPad Frame

A shortcut is available to turn off translation of all the Xtc data:

type_filter exclude psana

One would use this to only translate user module data, such as ndarrays, strings and newly registered types.

The complete list of type aliases that users can use to filter is found in the default_psana.cfg file included below.

Src Filtering

Specific src's can be filtered by providing a list such as

src_filter = exclude NoDetector.0:Evr.2  CxiDs1.0:Cspad.0  CxiSc2.0:Cspad2x2.1  EBeam  FEEGasDetEnergy  CxiDg2_Pim

the syntax for a src in the filter list is what is supported by the Psana::Source class. This is a flexible syntax allowing for several ways to specify a src. It will match any detector or device number if this is not specified. See the section Psana Configuration File and all Options below for more details. If DAQ src aliases are present in the xtc file, these can be used for src filtering as well.  For example if the alias

acq01 -> SxrEndstation.0:Acqiris.0

is present, one can do

src_filter = exclude acq01

to exclude all data from the SxrEndstation.0:Acqiris.0 src.

Writing User Data

The translator will write NDarrays, C++ std::strings, and C++ types that the user registers.  Presently, registering new types is an advanced feature that requires familiarity with hdf5 programming. To add data to the translation, one must write a Psana module that adds this data into the event or configStore before the Translator.H5Output module runs.

Event vs. ConfigStore, EndData subgroups

Most user modules will add data to the event. Such data will be written into stacked datasets, that is a 1D dataset of a type X based on the user data. In this 1D dataset, there will be one entry for each event the user module added data. Alongside this data, in a dataset named "data" will be a dataset named "time". The "time" dataset will have the event id's corresponding to the Psana Events the from which the user module added data.

Data added to the configStore is not written out in "stacked" datasets. It is written out in "one shot". The intention is that users may add some configuration during beginrun or begincalibcycle as well as some summary information during endcalibcycle or endrun. It is not recommended that users add data to the configStore for the purpose of translation during regular events. During all three of endcalibcycle, endrun and endjob, the Translator will check for new data in the configStore(). If it finds it, it will create a subgroup called EndData in the appropriate place. For example

Code Block
/Configure:0000/Run:0000/CalibCycle:0000/EndData             # triggered by new data in the configStore() found during endcalibcycle
/Configure:0000/Run:0000/EndData                             # triggered by new data in the configStore() found during endrun
/Configure:0000/EndData                                      # triggered by new data in the configStore() found during endjob

In this way, when a user Psana module processes endcalibcycle or endrun, it can add summary data to the configStore that will be picked up by the Translator. Psana modules could also add configuration information during beginrun or beginjob.

One limitation users may run into is overwriting keys - Psana does not allow Python modules to replace ndarrays in the configStore as C++ modules may be relying on the data to be unchanging. So for example, if a user module is going to create new summary information for each endcalibcycle, they must use different keys.

NDArrays and Strings

ndarrays (up to dimension 4 of the standard integral types, floats and doubles) as well as std::string's that are written into the event store will be written to the hdf5 by default.  ndarrays can be passed to the Translator by Python modules as well as C++ modules. The schema for translating is to join the source and keystring with double underscore. For instance, given a psana user module that looks like this

Code Block
languagepython
import numpy as np
import psana

def MyModule(object):
  def __init__(self):
    self.src = psana.Source("DetInfo(XppEndstation.0:Opal1000.0)")
 
  def event(self, evt, env):
	a = np.zeros(3)
    evt.put(a,"mykeyA")
    evt.put("my string", "mykeyB")
    evt.put(a,self.src, "mykeyA")
    evt.put("my string", self.src, "mykeyB")   

One would get these new groups in the HDF5 file:

Code Block
/Configure:0000/Run:0000/CalibCycle:0000/ndarray_float64_1/noSrc__mykeyA/data
/Configure:0000/Run:0000/CalibCycle:0000/ndarray_float64_1/noSrc__mykeyA/time
/Configure:0000/Run:0000/CalibCycle:0000/ndarray_float64_1/XppEndstation.0:Opal1000.0__mykeyA/data
/Configure:0000/Run:0000/CalibCycle:0000/ndarray_float64_1/XppEndstation.0:Opal1000.0__mykeyA/time
/Configure:0000/Run:0000/CalibCycle:0000/std::string/noSrc__mykeyB/data
/Configure:0000/Run:0000/CalibCycle:0000/std::string/noSrc__mykeyB/time
/Configure:0000/Run:0000/CalibCycle:0000/std::string/XppEndstation.0:Opal1000.0__mykeyB/data
Code Block
/Configure:0000/Run:0000/CalibCycle:0000/EndData

In this way, when a user Psana module processes endcalibcycle, it can add summary data to the configStore that will be picked up by the Translator.

NDArrays and Strings

...

std::string/XppEndstation.0:Opal1000.0__mykeyB/time

Note that data put in the event store without a src specified go under a group that starts with "noSrc". All data gets a "stacked" dataset named data, and a time dataset.

's that are written into the event store will be written to the hdf5 by default.  ndarrays can be passed to the Translator by Python modules as well as C++ modules. These events can be filtered as well.  The example in The XTC-to-HDF5 Translator above illustrates the group names used for ndarrays and strings. Note, the type group name for ndarrays is fully qualified by the template arguments, some examples of type names are

...

exp

size(GB)

WJobs

CC/WJ

evts/CC

nodes

time

MB/sec

mread

calib

WJtime

w0

w1

w2

w3

w4

w5

w6

w7

w8

w9

w10

xppi0214:run=279

100.0

28

1.0

634.1

10

9.8min

174.1

66%=6.5min

1

2.4min

3/78%

4/95%

4/90%

3/68%

3/69%

3/68%

3/67%

3/71%

2/70%

  

xpp74813:run=69

1.00

1

1.0

160.0

5

0.7min

25.5

5.8%=0.0min

0

0.6min

1/93%

0/0%

0/0%

0/0%

       

xpp72213:run=146

2.03

31

1.0

243.0

5

0.8min

43.9

77%=0.6min

0

0.1min

9/72%

5/86%

9/70%

8/61%

       

xpp72213:run=122

4.00

41

1.0

363.0

5

1.0min

68.3

91%=0.9min

0

0.1min

11/85%

11/84%

10/81%

9/70%

       

xpp65013:run=40

16.02

68

1.5

79.6

5

2.5min

109.4

66%=1.7min

0

0.1min

17/88%

18/85%

17/85%

16/86%

       

xpp61412:run=75

32.19

138

1.2

99.2

5

4.3min

127.8

73%=3.1min

0

0.1min

32/89%

35/88%

36/89%

35/87%

       

xppa1814:run=173

64.04

10

1.0

1203.0

6

18.0min

60.7

66%=11.9min

0

5.4min

3/93%

3/78%

2/64%

1/35%

1/32%

      

xppi0214:run=325

127.57

36

1.0

629.1

12

12.0min

181.4

65%=7.8min

1

2.7min

4/74%

2/81%

2/74%

3/74%

4/85%

4/88%

4/68%

4/70%

3/78%

3/66%

3/49%

xpp40312:run=48

390.51

1

1.0

444427.0

12

220.0min

30.3

26%=57.2min

0

220.0min

1/1e+02%

0/0%

0/0%

0/0%

0/0%

0/0%

0/0%

0/0%

0/0%

0/0%

0/0%

xppa4513:run=173

478.13

204

1.0

483.0

12

52.0min

156.9

72%=37.4min

1

2.6min

14/98%

15/94%

18/93%

19/96%

19/92%

19/93%

17/93%

22/93%

18/90%

25/91%

18/90%

xppc3614:run=271

390.25

125

1.0

603.0

12

100.0min

66.6

87%=87.0min

1

4.4min

17/85%

20/75%

17/73%

16/49%

9/46%

8/35%

8/39%

8/42%

8/37%

7/37%

7/34%

Release Notes

Please see the pages under Release Notes for changes to the software. We document a few items of particular important for the Translator here as well

  • ana-0.13.17, Translator Tag V00-02-15 -
    • deprecated feature: the creation of "filtered:000x groups
    • deprecated feature: drop the non MPI split scan translat
    • deprecated feature: filtered groups - in response to "do_not_translate" group, these groups that kept track of what events were filtered by this mechanism are no longer created.
    • EndData added
    • Schema changed from 4 to 5
      • signals potential presence of EndData
      • reserve future existence of a /usr/ group from the root group
      • removal of filtered:0000x groups
      • external calib cycle files have same schema as main file