Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

This issue is to address the document is for developing a new schema for LCLS Hdf5 files - most important of which is to provide a src/type vs type/src hierarchy. Here is some text from the parent issue (PSAS-101):

* Our hierarchy requires users to go through a potentially long list of Type names before they get to the data.
**    These typenames come from the C++ code and can be complicated.
**    Users are generally more familiar with sources, and in particular the DAQ aliases for the sources, these are currently not in our hierarchy.
* CalibCycle is not intuitive, and misleading for XPP due to when calibration data is created. They would like to call them CalibCycles steps.

Initially, this schema could sit alongside the current schema and use softlinks to the actual data. This would not brake anybodies code. However I am interested in developing something robust enough that people are happy with for which we could replace the previous schema. Hence the below schema should be readable by frameworks as well as users browsing hdf5 files.

This document is just about changing the group hierarchy and names for data. Issues like aligning data are orthogonal to this issue and not covered here.

Current Schema

Here is an example of the current schema

Code Block
collapsetrue
*** DAQ configure
/Configure:0000         
/Configure:0000/Alias::ConfigV1/Control   
/Configure:0000/Bld::BldDataEBeamV7/EBeam 
/Configure:0000/TimeTool::ConfigV2/XppEndstation.0:Opal1000.1 
/Configure:0000/Camera::FrameFexConfigV1/XppEndstation.0:Opal1000.1 
/Configure:0000/ControlData::ConfigV3/Control 
/Configure:0000/CsPad2x2::ConfigV2/XppGon.0:Cspad2x2.0 
/Configure:0000/CsPad::ConfigV5/XppGon.0:Cspad.0 
/Configure:0000/Epics::ConfigV1/EpicsArch.0:NoDevice.0 
/Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0 
/Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Attenuator_transmission {Soft Link}
/Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/BEAM:LCLS:ELEC:Q 
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0 
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.1 
/Configure:0000/EvrData::IOConfigV2/Control 
/Configure:0000/Ipimb::ConfigV2/NH2-SB1-IPM-01 
/Configure:0000/Ipimb::ConfigV2/XppEnds_Ipm0 
/Configure:0000/L3T::ConfigV1/Event 
/Configure:0000/Lusi::IpmFexConfigV2/NH2-SB1-IPM-01 
Configure:0000/Lusi::IpmFexConfigV2/XppEnds_Ipm0 
/Configure:0000/Opal1k::ConfigV1/XppEndstation.0:Opal1000.1 
/Configure:0000/Partition::ConfigV1/Control 
*** Run/CalibCycle
/Configure:0000/Run:0000 
/Configure:0000/Run:0000/CalibCycle:0000 
/Configure:0000/Run:0000/CalibCycle:0000/Bld::BldDataEBeamV7/EBeam 
/Configure:0000/Run:0000/CalibCycle:0000/Camera::FrameV1/XppEndstation.0:Opal1000.1 
/Configure:0000/Run:0000/CalibCycle:0000/ControlData::ConfigV3/Control 
/Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.0 
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/XppGon.0:Cspad.0 
/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Attenuator_transmission {Soft Link}
/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/BEAM:LCLS:ELEC:Q 
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::ConfigV7/NoDetector.0:Evr.0 
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::ConfigV7/NoDetector.0:Evr.1 
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV4/NoDetector.0:Evr.0 
/Configure:0000/Run:0000/CalibCycle:0000/Ipimb::DataV2/NH2-SB1-IPM-01 
/Configure:0000/Run:0000/CalibCycle:0000/Ipimb::DataV2/XppEnds_Ipm0 
/Configure:0000/Run:0000/CalibCycle:0000/L3T::DataV2/Event 
/Configure:0000/Run:0000/CalibCycle:0000/Lusi::IpmFexV1/NH2-SB1-IPM-01 
/Configure:0000/Run:0000/CalibCycle:0000/Lusi::IpmFexV1/XppEnds_Ipm0 
*** CalibStore
/Configure:0000/CalibStore 
/Configure:0000/CalibStore/pdscalibdata::CsPad2x2PedestalsV1/XppGon.0:Cspad2x2.0
/Configure:0000/CalibStore/pdscalibdata::CsPadPedestalsV1/XppGon.0:Cspad.0 

List of Schema Changes

Here are changes to make:

  1. We don't need to number Configure:0000 and Run:0000
    1. we only translate one run per file.
    2. Lets make a root group called Data. Main groups:

      /Data

      /Data/Run

  2. Make new group for Configure Data
    1. Now, we have

      /Configure:0000/TypeA

      /Configure:0000/TypeB

      /Configure:0000/Run:0000

      why not put all Config data, TypeA, TypeB in one place?

      /Data

      /Data/Config   this in turn will have TypeA and TypeB as children

      /Data/Run

  3. Put Epics in its own group and remove Epics Source name with Arch from Schema

    1. /Data/EpicsConfig

      /Data/Config

      /Data/Run
      /Data/Run/Step:0000/Epics/pvName

  4. Invert Type/Src relationship

    1. /Data/Config/SrcA/TypeA

      /Data/Config/SrcA/TypeB

      /Data/Config/SrcB/TypeA

  5. Use DAQ aliases when possible.

    /Data/Config/alias  

    /Data/Config/alias/TypeA

    /Data/Config/alias/TypeB

    /Data/Config/SrcB/TypeA

  6. Translator option for a alias.
    Sometimes inverting type/src makes it harder to find data. An example is timetool data. It will be attached to a source like opal_1, but the user may not know this. They will be looking for timetool data and now they have to go through all the sources, for all the ipimb's, etc. The thought is to allow the user to specify a one group alias name for a src/type combination. So, while the hdf5 file has
    /Data/Run/Step:0000/
    /Data/Run/Step:0000/opal_1/TimeToolData
    we'll also create
    /Data/Run/Step:0000/TimeToolData       {Soft Link to}   /Data/Run/Step:0000/opal_1/TimeToolData

  7. Use type aliases in place of full C++ type names with version

    1. Basically, we will eliminate the V* and the :: from the typenames

    2. see section below for all aliases

  8. Use Step:000x rather than CalibCycle:000x

  9. separate config and epics in steps

    1. /Data/Run/Step:0000/Config

      /Data/Run/Step:0000/Epics

  10. Psana Module Keystrings treated like types

    1. if a module does

      evt.put(myndarray,src,"mykey")

      then we translate

      /Data/Run/Step:0000/src/mykey

List of Type Aliases

There are three classes of types in use - DAQ, CalibStore, and user - from the event store

DAQ

Below is a list of Type Aliases for Daq types. For the most part, this alias is remove version and the :: a few exceptions are notated with a -

...

Note, the shared types should not show up in the translation. Psana breaks them up.

Calib Store

We also need to introduce simpler type names for the calibStore types:

...

This is not a complete list, calib store types are not in the DDL

User Types

This refers to types the Translator finds in the Event that other Psana modules place there. Per the schema change "Use event key strings like types" we will not be using a type, just the key string - so no alias is required. If for some reason a user adds an ndarray or string to the event without a keystring, then we'll use the below aliases:

ndarray       will be an alias for all of these: ndarray<T,R>, ndarray<const T,R> as well as the special vlen versions of these ndarrays that the Translator understands
string          std::string

Alternatives

There are a few alternatives I was thinking about

Remove the Types

Removing the types is complicated because there can be several types associated with one source. If one puts all the datasets associated with the different types into one group, the issue is name collisions for datasets with the same name (like standard dataset names like 'data' or 'config' or 'image').  Moreover the different types may have different _damage or _mask datasets. More important for users is different time datasets that affect alignment, but another project is to align the 'DAQ readout groups' which means all types from each source will be aligned.

Smaller and fewer Type Aliases

Just use one alias for both config and data. For example:

...

The drawback is a higher risk of a name collision (see problems below). For instance if there is both config and regular event data occurring during the event, then the Translator will try to put them in the same group. When it fails, it will have to make a messy name to distinguish them.

A new Group for EventData

Since I Config group seems like a good idea, it seems natural to group the non-config data:

/Data/Run/Step:0000/Config/UsdUsbConfig        # the config data
/Data/Run/Step:0000/EventData/UsdUsbData     # the event data

Just starting with a Run group

Trying to have the hierarchy start here

...

The drawback is that this is not how the xtc data is formed. In xtc files, the beginRun transition is preceded by the Configure transition. Collapsing the information from both transitions into a Run group is probably reasonable, but makes it more awkward to recover information that belonged to one xtc transition and not the other (users generally don't care about this, but it when psana reads hdf5 it is important),

Problems/Issues/Surprises

Group Name Collisions

A group name collision occurs when the Translator has already made a group for one kind of data, when all of a sudden another kind of data comes along with the same name.

...

  1. Rename the first, then the users gets the original messy names CsPad::ElementV1 and CsPadElementV2. However if reading while writing ever works, this seems very problematic. What if you started reading from a group that got renames? I don't like it.
  2. So making  a new name for the second - CsPadElement_01 - this seems more reasonable.

Document Attributes

With corner cases like that, users may find they need to know exactly what type they are dealing with. This will be stored in hdf5 attributes to the groups.

Programmatic Interface

The programmatic interface to the new schema is more difficult - without using exact information in the attributes, that is just basing your code on the group names, some Issues

...

So I think it important to have the full source and typename available in the group attributes.

Could DAQ Aliases be Confusing?

For example, if there are mistakes in the aliases, or suppose just one of several similar sources is aliases, then a user browsing the hdf5 would see

/Data/Config/evr0/Evr                            daq alias
/Data/Config/NoDetector.0:Evr.1/Evr      no alias

Is Src/Type ever more confusing than TypeSrc

If DAQ aliases are not used for all sources, there can be a number of technical looking source names that show up. For instance

NH2-SB1-IPM-01

New Schema

This is what the new schema might look like.

...