Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This document is for developing a new schema for LCLS Hdf5 files - most important of which is to provide a src/type vs type/src hierarchy. Here is some text from the motivating JIRA issue (PSAS-101):
* Our hierarchy requires users to go through a potentially long list of Type names before they get to the data.
**    . The schema defines how the data is layed out in the hdf5 files, and the user interface to that data. There are two main parts to the schema,

  • group hierarchy - the names the user works through to navigate to the data
  • data types in datasets  - the actual datatypes used in the datasets, this includes the names for all the subfields in compound data types (that look like C struct's in the data)

Presently we are only proposing changes to the group hierarchy. Issues we wish to address in the current schema

  • Presently our hierarchy requires users to go through a potentially long list of Type names before they get to the data. These typenames come from the C++ code and can be complicated.

...

  • Users are generally more familiar with sources, and in particular the DAQ aliases for the sources, these are currently not in our hierarchy.

...

  • CalibCycle is not intuitive, and misleading for XPP due to when calibration data is created. They would like to call them CalibCycles steps.

There are several other things that seem good to do as well. These have been listed below. A few alternatives were considered that are discussed below.

Initially, this Initially, this schema could sit alongside the current schema and use softlinks to the actual data. This would not brake anybodies code. However two schemas that do the same thing adds confusion, so I am interested in developing something robust enough that people are happy with for which we could use to replace the previous current schema. Hence the The below schema should be readable by frameworks as well as users browsing hdf5 files. This document is just about changing the group hierarchy and names for data. Issues like aligning data are orthogonal to this issue and not covered hereWe do not want to do schema changes that brake peoples code unless necessary. If we are going to change the schema, we would like to cover all the issues we can all at once. If you have any comments or suggestions, please feel to use confluence to add comments to the bottom of the document, add to the document, or email me at davidsch@slac.stanford.edu.

Current Schema

Here is an example of the current schema. Click on the box to expand the schema.

Code Block
collapsetrue
*** DAQ configure
/Configure:0000         
/Configure:0000/Alias::ConfigV1/Control   
/Configure:0000/Bld::BldDataEBeamV7/EBeam 
/Configure:0000/TimeTool::ConfigV2/XppEndstation.0:Opal1000.1 
/Configure:0000/Camera::FrameFexConfigV1/XppEndstation.0:Opal1000.1 
/Configure:0000/ControlData::ConfigV3/Control 
/Configure:0000/CsPad2x2::ConfigV2/XppGon.0:Cspad2x2.0 
/Configure:0000/CsPad::ConfigV5/XppGon.0:Cspad.0 
/Configure:0000/Epics::ConfigV1/EpicsArch.0:NoDevice.0 
/Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0 
/Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Attenuator_transmission {Soft Link}
/Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/BEAM:LCLS:ELEC:Q 
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0 
/Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.1 
/Configure:0000/EvrData::IOConfigV2/Control 
/Configure:0000/Ipimb::ConfigV2/NH2-SB1-IPM-01 
/Configure:0000/Ipimb::ConfigV2/XppEnds_Ipm0 
/Configure:0000/L3T::ConfigV1/Event 
/Configure:0000/Lusi::IpmFexConfigV2/NH2-SB1-IPM-01 
Configure:0000/Lusi::IpmFexConfigV2/XppEnds_Ipm0 
/Configure:0000/Opal1k::ConfigV1/XppEndstation.0:Opal1000.1 
/Configure:0000/Partition::ConfigV1/Control 
*** Run/CalibCycle
/Configure:0000/Run:0000 
/Configure:0000/Run:0000/CalibCycle:0000 
/Configure:0000/Run:0000/CalibCycle:0000/Bld::BldDataEBeamV7/EBeam 
/Configure:0000/Run:0000/CalibCycle:0000/Camera::FrameV1/XppEndstation.0:Opal1000.1 
/Configure:0000/Run:0000/CalibCycle:0000/ControlData::ConfigV3/Control 
/Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.0 
/Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/XppGon.0:Cspad.0 
/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Attenuator_transmission {Soft Link}
/Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/BEAM:LCLS:ELEC:Q 
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::ConfigV7/NoDetector.0:Evr.0 
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::ConfigV7/NoDetector.0:Evr.1 
/Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV4/NoDetector.0:Evr.0 
/Configure:0000/Run:0000/CalibCycle:0000/Ipimb::DataV2/NH2-SB1-IPM-01 
/Configure:0000/Run:0000/CalibCycle:0000/Ipimb::DataV2/XppEnds_Ipm0 
/Configure:0000/Run:0000/CalibCycle:0000/L3T::DataV2/Event 
/Configure:0000/Run:0000/CalibCycle:0000/Lusi::IpmFexV1/NH2-SB1-IPM-01 
/Configure:0000/Run:0000/CalibCycle:0000/Lusi::IpmFexV1/XppEnds_Ipm0 
*** CalibStore
/Configure:0000/CalibStore 
/Configure:0000/CalibStore/pdscalibdata::CsPad2x2PedestalsV1/XppGon.0:Cspad2x2.0
/Configure:0000/CalibStore/pdscalibdata::CsPadPedestalsV1/XppGon.0:Cspad.0 

...

Note, the shared types should not show up in the translation. Psana breaks them upper-processes them and puts the sub types in the event.

Calib Store

We also need to introduce simpler type names for the calibStore types:

...

The drawback is a higher risk of a name collision (see problems below). For instance if there is both config and regular event data occurring during the event, then the Translator will try to put them in the same group. When it fails, it will have to make a messy name to distinguish them.

A A new Group for EventData

Since I there is a Config group, it seems like a good idea , it seems natural to group the non-config datahave a group for EventData, i.e:

/Data/Run/Step:0000/Config/UsdUsbConfig        # the config data
/Data/Run/Step:0000/EventData/UsdUsbData     # the event data

However maybe some will find this new group gets in the way.

Just starting with a Run group

Trying to have Have the hierarchy start here

/Run
/Run/Config
/Run/EpicsConfig
/Run/Step:0000
/Run/Step:0000/Config/srcAlias/UsdUsbConfig        # the config data
/Run/Step:0000/srcAlias/UsdUsbData                     # the event data

The drawback is that this is not how the xtc data is formed. In xtc files, the beginRun transition is preceded by the Configure transition. Collapsing the information from both transitions into a Run group is probably reasonable, but makes it more awkward to recover information that belonged to one xtc transition and not the other (users generally don't care about this, but it when psana reads hdf5 it is importanta framework like psana does),

Problems/Issues/Surprises

...

Presently, there should not be a collision. If one happens, it is treated as a fatal error.

All the They don't happen because currently there is a near 1-1 mapping between the psana event keys from which the Translator gets the data, to the group names. This mapping uses the distinct pairs of C++ type names are distinct, and all the DAQ sources map to distinct stringsand  DAQ source names in the event keys.

That means one can always add a new TypeName parallel to the list of existing typenamesType without colliding with existing types, as long as the Translator uses a fully qualified C++ typename for the group name.

Likewise for DAQ sources - but we currently do simplify some of these, in particular the messy sources that have a distinct ipaddress in them from each stream.

An example of a collision would be

  • Daq Alias called noSrc
  • user does evt.put(myndarray,'mykey')

The Translator already uses the string noSrc for user data without a source - collision.

...

  1. Rename the first, then the users gets the original messy names CsPad::ElementV1 and CsPadElementV2. However if reading while writing ever works, this seems very problematic. What if you started reading from a group that got renames? I don't like it.renamed?
  2. So making  a new name for the second - CsPadElement_01 - this seems more reasonable.

...

With corner cases like that, users, and frameworks, may find they need to know exactly what type they are dealing with. This will be stored in hdf5 attributes to the groups (Exactly how to extract this information will be documents for users and framework writers).

Programmatic Interface

The programmatic interface to the new schema is more difficult - without using exact information in the attributes, that is just basing your code on the group names, some Issues

  1. When you read the group
    /Data/Run/Step:0000/EvrData
    you don't know if you are reading a V3 or a V4. If it is V4, there will be two datasets (data and present) but for V3 there will only be 1.
    1. In general, the full type information must be discovered by looking at the types in the datasets, as well as the number of datasets.
  2. Another place where you might like to use full names, is looping over sources by the id. Suppose the experiment has the four sources
    XppGon.0:Cspad2x2.0, XppGon.0:Cspad2x2.1, XppGon.0:Cspad2x2.2 and XppGon.0:Cspad2x2.3
    but the have been aliased to cspad2x2_front, cspad2x2_left, cspad2x2_right, cspad_2x2_back.
  3. You need to know the DAQ aliases to find the data. This may make things more difficult for a framework. Another idea is to write a separate group with the full DAQ source name, and have the DAQ alias be a soft link to this group.

So I think it important to have the full source and typename available in the group attributes.

...

/Data/Config/evr0/Evr                            daq alias
/Data/Config/NoDetector.0:Evr.1/Evr      no alias

Is Src/Type ever more confusing than

...

Type/Src?

If DAQ aliases are not used for all sources, there can be a number of technical looking source names that show up. For instance

...

Code Block
titlenew schema
collapsetrue
NEW                                     |    WHERE IT WAS IN OLD SCHEMA OR NOTES
----------------------------------------+---------------------------------------------------------
/Data 
/Data/EpicsConfig 
/Data/EpicsConfig/BEAM:LCLS:ELEC:Q               /Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/BEAM:LCLS:ELEC:Q
/Data/EpicsConfig/Attenuator_transmission        /Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Attenuator_transmission
/Data/Config                                     
/Data/Config/Control                             * a standard source name, not an alias *
/Data/Config/Control/AliasConfig                 /Configure:0000/Alias::ConfigV1/Control 
/Data/Config/Control/ControlDataConfig           /Configure:0000/ControlData::ConfigV3/Control 
/Data/Config/Control/EvrIOConfig                 /Configure:0000/EvrData::IOConfigV2/Control
/Data/Config/Control/PartitionConfig             /Configure:0000/Partition::ConfigV1/Control 
/Data/Config/EBeam                               * a standard source name, not an alias *
/Data/Config/EBeam/BldDataEBeam                  /Configure:0000/Bld::BldDataEBeamV7/EBeam 
/Data/Config/Opal_1                              * this is a Daq Alias for XppEndstation.0:Opal1000.1 *
/Data/Config/Opal_1/TimeToolConfig               /Configure:0000/TimeTool::ConfigV2/XppEndstation.0:Opal1000.1     
/Data/Config/Opal_1/FrameFexConfig               /Configure:0000/Camera::FrameFexConfigV1/XppEndstation.0:Opal1000.1
/Data/Config/Opal_1/Opal1kConfig                 /Configure:0000/Opal1k::ConfigV1/XppEndstation.0:Opal1000.1 
/Data/Config/cs140_0                             * also an alias *
/Data/Config/cs140_0/CsPad2x2Config              /Configure:0000/CsPad2x2::ConfigV2/XppGon.0:Cspad2x2.0 
/Data/Config/evr0                                * alias *
/Data/Config/evr0/EvrConfig                      /Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0
/Data/Config/evr1                                * alias *
/Data/Config/evr1/EvrConfig                      /Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.1
/Data/Config/MyTimeToolConfig                    * special alias from user, soft link to Opal_1/TimeToolConfig
/Data/Config/NH2-SB1-IPM-01/IpimbConfig          /Configure:0000/Ipimb::ConfigV2/NH2-SB1-IPM-01 
/Data/Config/NH2-SB1-IPM-01/IpmFexConfig         /Configure:0000/Lusi::IpmFexConfigV2/NH2-SB1-IPM-01 
/Data/Config/XppEnds_Ipm0/IpimbConfig            /Configure:0000/Ipimb::ConfigV2/XppEnds_Ipm0 
/Data/Config/XppEnds_Ipm0/IpmFexConfig           /Configure:0000/Lusi::IpmFexConfigV2/XppEnds_Ipm0 
/Data/Config/Event/L3TConfig                     /Configure:0000/L3T::ConfigV1/Event 

/Data/Run 
/Data/Run/Config
/Data/Run/Config/noSrc/mykey                      # if a user did configStore().put(myndarray, 'mykey') during beginrun
/Data/Run/Config/Opal_1/mykey                     # likewise, if a user did configStore().put(mystring, psana.Source('Opal_1'),'mykey')

/Data/Run/EndData/noSrc/summary                   # if a user did configStore().put(myndarray, 'summary') during endrun
/Data/Run/EndData/Opal_1/summary                  # if a user did configStore().put(myndarray, 'summary') during endrun
           
# HERE IS WHERE REGULAR EVENT DATA IS 
                                             
/Data/Run/Step:0000                               /Configure:0000/Run:0000/CalibCycle:0000
/Data/Run/Step:0000/Config
/Data/Run/Step:0000/Config/noSrc
/Data/Run/Step:0000/Config/noSrc/myKeyString       # if user adding something to configStore during begincalibcycle
/Data/Run/Step:0000/Epics
/Data/Run/Step:0000/Epics/pvName
/Data/Run/Step:0000/EBeam/BldDataEbeam
/Data/Run/Step:0000/Opal_1/CameraFrame
/Data/Run/Step:0000/Opal_1/TimeToolData
/Data/Run/Step:0000/MyTimeToolData                {soft link to above}
/Data/Run/Step:0000/cs140_0
/Data/Run/Step:0000/cs140_0/CsPadElement
/Data/Run/Step:0000/evr0
/Data/Run/cs140_0/radialIntegration     # a user module ndarray attached to a source
/Data/Run/Step:0000/evr0
/Data/Run/Step:0000/evr1
/Data/Run/Step:0000/evr0/EvrData
/Data/Run/Step:0000/evr1/EvrData
/Data/Run/Step:0000/noSrc/mykey                   # a user module ndarray not attached to a source
/Data/Run/Step:0000/NH2-SB1-IPM-01/IpimbData
/Data/Run/Step:0000/NH2-SB1-IPM-01/IpmFex

/Data/Run/Step:0000/EndData
/Data/Run/Step:0000/EndData/noSrc/myKeyString
/Data/Run/Step:0000/EndData/opal_1/myKeyString

/Data/CalibStore
same as before, but invert type/source, and use DAQ aliases 

Feedback

Here is some feedback I have gotten.

Keep hiearchy close to EventKeys

Simplifying the hiearchy too much could be confusing, keeping closer to what one sees with psana EventKeys is helpful

Original Sources are Useful

Just having the DAQ aliases may not be good, One could put both the DAQ alias and native source in the name, or have them side by side, one a link, or keep aliases separate from original native source names, in different groups.

compound types vs. Basic Types

Use basic types in place of compound types.

Flattened Input

A tool to gather up, and event build the particluar data a user is intereted in. This may be a few fields from EBeam, a particular EPICS PV, links to camera images.