Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Include Page
PageMenuBegin
PageMenuBegin
Table of Contents
Include Page
PageMenuEnd
PageMenuEnd

Introduction

One of the responsibilities of the offline system is to translate the data coming from online system to the "scientific format". The basis for the scientific data format is HDF5 - Hierarchical Data format - developed and supported by HDF Group.

...

The names of the groups in HFD5 are derived from the names of the corresponding transition names and the object types. Because there may be multiple nested transitions of the same type inside one parent transition one needs to distinguish these multiple transitions and give them different names. Our current approach for the groups derived from transitions is to have a transition name, such as CalibCycle extended with the sequential number of a transition of the same type, for example Run:0000 or CalibCycle:0004. In case there is only transition of the specific type in XTC file there will be an option to enable "simplified" groups names, in which case group names will look like Configure, Run, etc. Here is an example of the group structure for a fictional data file containing data from several runs, each run containing few calibration cycles:

Code Block

/Configure:0000
    |
    +- Run:0000
    |    |
    |    +- CalibCycle:0000
    |    +- CalibCycle:0001
    |
    +- Run:0001
         |
         +- CalibCycle:0000
         +- CalibCycle:0001
         +- CalibCycle:0002

...

Here is an example which shows group structure including data groups for few devices (BLD Ebeam and couple of CsPad devices), this does not show any datasets yet:

Code Block

/Configure:0000/
/Configure:0000/Bld::BldDataEBeamV3/
/Configure:0000/Bld::BldDataEBeamV3/EBeam/
/Configure:0000/CsPad::ConfigV4/
/Configure:0000/CsPad::ConfigV4/CxiDs1.0:Cspad.0/
/Configure:0000/CsPad::ConfigV4/CxiDsd.0:Cspad.0/
/Configure:0000/Run:0000/
/Configure:0000/Run:0000/CalibCycle:0000/
/Configure:0000/Run:0000/CalibCycle:0000/Bld::BldDataEBeamV3/
/Configure:0000/Run:0000/CalibCycle:0000/Bld::BldDataEBeamV3/EBeam/
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/CxiDs1.0:Cspad.0/
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/CxiDsd.0:Cspad.0/

...

Standard XTC information (such as data source) does not provide enough information to identify individual PVs, instead identification information is stored inside the data itself. To simplify data access for EPICS data we store it in HDF5 by splitting data into multiple additional groups based on PV names. The structure of the groups for EPCIS data hase one additional group level below device group, names of these groups are the names of PVs. Here is an example of few EPICS groups:

Code Block

/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/CXI:DG1:CLZ:01.RBV/
/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/CXI:DG1:JAWS:XTRANS.C/
/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/EVNT:SYS0:1:LCLSBEAMRATE/
/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/GATT:FEE1:310:R_ACT/

Additionally, DAQ defines set of aliases for EPICS PV names which provide easy-to-remember meaningful names for PVs. These aliases are represented in HDF5 by symbolic names, the alias name pointing to some PV name becomes a symbolic link inside an Epics group pointing to corresponding PV group name:

Code Block

"/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Gas detector 1 pressure" -> "VGBA:FEE1:240:P"
"/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/KB MOTIONS" -> "CXI:KB1:MMS:05.RBV"

...

Non-split configuration-type data is usually stored as a scalar dataset with the name "config". If configuration type is split into multiple datasets then types of the individual dataset may vary. If, for example, one piece of data is an array it will be stored as array dataset. For example camera data can be split into two datasets, one containing small piece of general information about image and another containing the image itself, if analysis only needs information from first dataset it can read it significantly faster than if datasets were not split. Here is an example of dataset definitions (output from h5ls) for two configuration data types, one is not split, another is split:

Code Block

# non-split data type stored as scalar dataset "config"
/Configure:0000/CsPad::ConfigV5/CxiDs1.0:Cspad.0 Group
/Configure:0000/CsPad::ConfigV5/CxiDs1.0:Cspad.0/config Dataset {SCALAR}

# split data type stored as two scalar datasets and 3 array (rank=1) datasets
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0 Group
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0/config Dataset {SCALAR}
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0/eventcodes Dataset {2}
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0/output_maps Dataset {9}
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0/pulses Dataset {3}
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0/seq_config Dataset {SCALAR}

...

Here is an example of dataset definitions (output from h5ls) for event data types, one is not split, another is split:

Code Block

# non-split event type has only one "data" dataset (plus three special datasets)
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0 Group
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0/_damage Dataset {9310/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0/_mask Dataset {9310/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0/data Dataset {9310/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV3/NoDetector.0:Evr.0/time Dataset {9310/Inf}

# split event type has "data" and "element" datasets (plus three special datasets)
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/CxiDs1.0:Cspad.0 Group
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/CxiDs1.0:Cspad.0/_damage Dataset {9310/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/CxiDs1.0:Cspad.0/_mask Dataset {9310/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/CxiDs1.0:Cspad.0/data Dataset {9310/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/CxiDs1.0:Cspad.0/element Dataset {9310/Inf}
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/CxiDs1.0:Cspad.0/time Dataset {9310/Inf}

...

There are complications in the dataset structure due to that. First, because the true event-type data from BLD can appear inside Configure transition there will be corresponding event-type datasets (array datasets) in the groups inside /Configure:0000 group. Here is an example of BLD EBeam data datasets inside /Configure:

Code Block

/Configure:0000/Bld::BldDataEBeamV3 Group
/Configure:0000/Bld::BldDataEBeamV3/EBeam Group
/Configure:0000/Bld::BldDataEBeamV3/EBeam/_damage Dataset {3/Inf}
/Configure:0000/Bld::BldDataEBeamV3/EBeam/_mask Dataset {3/Inf}
/Configure:0000/Bld::BldDataEBeamV3/EBeam/data Dataset {3/Inf}
/Configure:0000/Bld::BldDataEBeamV3/EBeam/time Dataset {3/Inf}

...

  • BLD configuration data from Configure transition is stored in scalar datasets instead of event-type array datasets in /Configure:0000 group
  • translator now processes only one Configure transition instead of one-per-stream, as a result number of entries in event-type datasets in /Configure:0000 groups should be 1.

Schema version 4

This version was introduced around February 2014. It introduced some new features and changes discussed in the The XTC-to-HDF5 Translator page. In particular,

  • CsPad calibration constants have moved
  • Some File attributes are not stored
  • A few integer types changed size
  • All Epics pv's are stored in the source folder EpicsArch.0:NoDevice.0.  Before they could appear in several folders. As epics pv names uniquely identify the data, the source information should not be needed
  • OutOfOrder damage is by default no longer translated.
  • Creation of a Filtered:000x group under run when users filter events using "do_not_translate" key.
  • For split calib cycle translation, schema for split calib files started at CalibCycle:000x rather than /Configure:0000.

Schema version 5

This version was introduced around February 2015 with Translator Tag V00-02-15 and ana-release 0.13.17.

  • Remove Filtered:0000x groups introduced in Schema 4.
  • Changed schema for split files - it now looks the same as schema for main/master file
  • Translator will look for user data from Psana modules during begin/end job/run/calibcycle. Data found during the end* methods of a Psana module can trigger creation of a EndData subgroup to either CalibCycle:000x, Run:000x or Configure:000x groups.
  • We would like to reserve the creation of a new group from root for a simpler, source as opposed to type, based alternative schema. Something like
    • /Configure:0000
    • /usr
    This would be in addition to the existing schema - providing an simpler hierarchy to the data.