This document is for developing a new schema for LCLS Hdf5 files. The schema defines how the data is layed out in the hdf5 files, and the user interface to that data. There are two main parts to the schema,
Presently we are only proposing changes to the group hierarchy. Issues we wish to address in the current schema
There are several other things that seem good to do as well. These have been listed below. A few alternatives were considered that are discussed below.
Initially, this schema could sit alongside the current schema and use softlinks to the actual data. This would not brake anybodies code. However two schemas that do the same thing adds confusion, so I am interested in developing something robust enough that we could use to replace the current schema. The below schema should be readable by frameworks as well as users browsing hdf5 files. We do not want to do schema changes that brake peoples code unless necessary. If we are going to change the schema, we would like to cover all the issues we can all at once. If you have any comments or suggestions, please feel to use confluence to add comments to the bottom of the document, add to the document, or email me at davidsch@slac.stanford.edu.
Here is an example of the current schema. Click on the box to expand the schema.
*** DAQ configure /Configure:0000 /Configure:0000/Alias::ConfigV1/Control /Configure:0000/Bld::BldDataEBeamV7/EBeam /Configure:0000/TimeTool::ConfigV2/XppEndstation.0:Opal1000.1 /Configure:0000/Camera::FrameFexConfigV1/XppEndstation.0:Opal1000.1 /Configure:0000/ControlData::ConfigV3/Control /Configure:0000/CsPad2x2::ConfigV2/XppGon.0:Cspad2x2.0 /Configure:0000/CsPad::ConfigV5/XppGon.0:Cspad.0 /Configure:0000/Epics::ConfigV1/EpicsArch.0:NoDevice.0 /Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0 /Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Attenuator_transmission {Soft Link} /Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/BEAM:LCLS:ELEC:Q /Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0 /Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.1 /Configure:0000/EvrData::IOConfigV2/Control /Configure:0000/Ipimb::ConfigV2/NH2-SB1-IPM-01 /Configure:0000/Ipimb::ConfigV2/XppEnds_Ipm0 /Configure:0000/L3T::ConfigV1/Event /Configure:0000/Lusi::IpmFexConfigV2/NH2-SB1-IPM-01 Configure:0000/Lusi::IpmFexConfigV2/XppEnds_Ipm0 /Configure:0000/Opal1k::ConfigV1/XppEndstation.0:Opal1000.1 /Configure:0000/Partition::ConfigV1/Control /Configure:0000/Run:0000 /Configure:0000/Run:0000/CalibCycle:0000 /Configure:0000/Run:0000/CalibCycle:0000/Bld::BldDataEBeamV7/EBeam /Configure:0000/Run:0000/CalibCycle:0000/Camera::FrameV1/XppEndstation.0:Opal1000.1 /Configure:0000/Run:0000/CalibCycle:0000/ControlData::ConfigV3/Control /Configure:0000/Run:0000/CalibCycle:0000/CsPad2x2::ElementV1/XppGon.0:Cspad2x2.0 /Configure:0000/Run:0000/CalibCycle:0000/CsPad::ElementV2/XppGon.0:Cspad.0 /Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Attenuator_transmission {Soft Link} /Configure:0000/Run:0000/CalibCycle:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/BEAM:LCLS:ELEC:Q /Configure:0000/Run:0000/CalibCycle:0000/EvrData::ConfigV7/NoDetector.0:Evr.0 /Configure:0000/Run:0000/CalibCycle:0000/EvrData::ConfigV7/NoDetector.0:Evr.1 /Configure:0000/Run:0000/CalibCycle:0000/EvrData::DataV4/NoDetector.0:Evr.0 /Configure:0000/Run:0000/CalibCycle:0000/Ipimb::DataV2/NH2-SB1-IPM-01 /Configure:0000/Run:0000/CalibCycle:0000/Ipimb::DataV2/XppEnds_Ipm0 /Configure:0000/Run:0000/CalibCycle:0000/L3T::DataV2/Event /Configure:0000/Run:0000/CalibCycle:0000/Lusi::IpmFexV1/NH2-SB1-IPM-01 /Configure:0000/Run:0000/CalibCycle:0000/Lusi::IpmFexV1/XppEnds_Ipm0 /Configure:0000/CalibStore /Configure:0000/CalibStore/pdscalibdata::CsPad2x2PedestalsV1/XppGon.0:Cspad2x2.0 /Configure:0000/CalibStore/pdscalibdata::CsPadPedestalsV1/XppGon.0:Cspad.0 |
Here are changes to make:
/Data
/Data/Run
Now, we have
/Configure:0000/TypeA
/Configure:0000/TypeB
/Configure:0000/Run:0000
why not put all Config data, TypeA, TypeB in one place?
/Data
/Data/Config this in turn will have TypeA and TypeB as children
/Data/Run
Put Epics in its own group and remove Epics Source name with Arch from Schema
/Data/EpicsConfig
/Data/Config
/Data/Run
/Data/Run/Step:0000/Epics/pvName
Invert Type/Src relationship
/Data/Config/SrcA/TypeA
/Data/Config/SrcA/TypeB
/Data/Config/SrcB/TypeA
Use DAQ aliases when possible.
/Data/Config/alias
/Data/Config/alias/TypeA
/Data/Config/alias/TypeB
/Data/Config/SrcB/TypeA
Translator option for a alias.
Sometimes inverting type/src makes it harder to find data. An example is timetool data. It will be attached to a source like opal_1, but the user may not know this. They will be looking for timetool data and now they have to go through all the sources. The thought is to allow the user to specify a one group alias name for a src/type combination. So, while the hdf5 file has
/Data/Run/Step:0000/
/Data/Run/Step:0000/opal_1/TimeToolData
we'll also create
/Data/Run/Step:0000/TimeToolData {Soft Link to} /Data/Run/Step:0000/opal_1/TimeToolData
Use type aliases in place of full C++ type names with version
Basically, we will eliminate the V* and the :: from the typenames
see section below for all aliases
Use Step:000x rather than CalibCycle:000x
separate config and epics in steps
/Data/Run/Step:0000/Config
/Data/Run/Step:0000/Epics
Psana Module Keystrings treated like types
if a module does
evt.put(myndarray,src,"mykey")
then we translate
/Data/Run/Step:0000/src/mykey
There are three classes of types in use - DAQ, CalibStore, and user - from the event store
Below is a list of Type Aliases for Daq types. For the most part, this alias is remove version and the :: a few exceptions are notated with a - character. Click on the box to see all the type aliases.
AcqirisTdcConfig Acqiris::TdcConfigV1 AcqirisTdcData Acqiris::TdcDataV1 AcqirisConfig Acqiris::ConfigV1 AcqirisDataDesc Acqiris::DataDescV1 AliasConfig Alias::ConfigV1 AndorConfig Andor::ConfigV1 AndorFrame Andor::FrameV1 ArraycharData Arraychar::DataV1 ControlDataConfig ControlData::ConfigV{1,2,3} CsPadConfig CsPad::ConfigV{1-5} - CsPadElement CsPad::DataV{1,2} CsPad2x2Config CsPad2x2::ConfigV{1,2} CsPad2x2Element CsPad2x2::ElementV1 DiodeFexConfig Lusi::DiodeFexConfigV{1,2} DiodeFex Lusi::DiodeFexV1 - BldDataEBeam Bld::BldDataEBeamV{0,1,2,3,4,5,6,7} EncoderConfig Encoder::ConfigV{1,2} EncoderData Encoder::DataV{1,2} EpicsConfig Epics::ConfigV1 EpixConfig Epix::ConfigV1 EpixElement Epix::ElementV{1,2} Epix100aConfig Epix::Config100aV1 Epix10kConfig Epix::Config10KV1 EpixSamplerConfig EpixSampler::ConfigV1 EpixSamplerElement EpixSampler::ElementV1 EvrConfig EvrData::ConfigV{1,2,3,4,5,6,7} EvrData EvrData::DataV{3,4} # I don't know why we don't have DataV1 or 2 in the ddl? EvrIOConfig EvrData::IOConfigV{1,2} EvrSrcConfig EvrData::SrcConfigV1 - BldDataFEEGasDetEnergy Bld::BldDataFEEGasDetEnergy, Bld::BldDataFEEGasDetEnergyV1 FccdConfig FCCD::FccdConfigV{1,2} FliConfig Fli::ConfigV1 FliFrame Fli::FrameV1 CameraFrame Camera::FrameV1 CameraFrameFccdConfig Camera::FrameFccdConfigV1 CameraFrameFexConfig Camera::FrameFexConfigV1 BldDataGMD Bld::BldDataGMDV{0,1,2} GenericPgpConfig GenericPgp::ConfigV1 Gsc16aiConfig Gsc16ai::ConfigV1 Gcs16aiData Gsc16ai::DataV1 ImpConfig Imp::ConfigV1 ImpElement Imp::ElementV1 IpimbConfig Ipimb::ConfigV{1,2} IpimbData Ipimb::DataV{1,2} IpmFexConfig Lusi::IpmFexConfigV{1,2} IpmFex Lusi::IpmFexV1 L3TConfig L3T::ConfigV1 L3TData L3T::DataV1, L3T::DataV2 OceanOpticsConfig OceanOptics::ConfigV{1,2} OceanOpticsData OceanOptics::DataV{1,2,3} Opal1kConfig Opal1k::ConfigV1 OrcaConfig Orca::ConfigV1 PartitionConfig Partition::ConfigV1 BldDataPhaseCavity Bld::BldDataPhaseCavity PimImageConfig Lusi::PimImageConfigV1 PimaxConfig Pimax::ConfigV1 PimaxFrame Pimax::FrameV1 PrincetonConfig Princeton::ConfigV{1,2,3,4,5} PrincetonFrame Princeton::FrameV{1,2} PrincetonInfo Princeton::InfoV1 QuartzConfig Quartz::ConfigV{1,2} RayonixConfig Rayonix::ConfigV{1,2} - BldDataAcqADC Bld::BldDataAcqADCV1 # shared type - BldDataIpimb Bld::BldDataIpimbV{0,1} # shared type - BldDataPim Bld::BldDataPimV1 # shared type BldDataSpectrometer Bld::BldDataSpectrometerV{0,1} PulnixTM6740Config Pulnix::TM6740ConfigV{1,2} TimeToolConfig TimeTool::ConfigV{1,2} TimeToolData TimeTool::DataV{1,2} TimepixConfig Timepix::ConfigV{1,2,3} TimepixData Timepix::DataV{1,2} CameraTwoDGaussian Camera::TwoDGaussianV1 UsdUsbConfig UsdUsb::ConfigV1 UsdUsbData UsdUsb::DataV1 PNCCDConfig PNCCD::ConfigV{1,2} - PNCCDFrames PNCCD::FramesV1 # the DAQ sends PNCCD::FrameV1. psana intercepts this and # creates both FullFrameV1 and FramesV1 from it. We will only translate FramesV1 # note - we used to call this PNCCD::FrameV1 in the translation |
Note, the shared types should not show up in the translation. Psana per-processes them and puts the sub types in the event.
We also need to introduce simpler type names for the calibStore types:
CsPad2x2Pedestals pdscalibdata::CsPad2x2PedestalsV1 CsPad2Pedestals pdscalibdata::CsPadPedestalsV1 |
This is not a complete list, calib store types are not in the DDL
This refers to types the Translator finds in the Event that other Psana modules place there. Per the schema change "Use event key strings like types" we will not be using a type, just the key string - so no alias is required. If for some reason a user adds an ndarray or string to the event without a keystring, then we'll use the below aliases:
ndarray will be an alias for all of these: ndarray<T,R>, ndarray<const T,R> as well as the special vlen versions of these ndarrays that the Translator understands
string std::string
There are a few alternatives I was thinking about
Removing the types is complicated because there can be several types associated with one source. If one puts all the datasets associated with the different types into one group, the issue is name collisions for datasets with the same name (like standard dataset names like 'data' or 'config' or 'image'). Moreover the different types may have different _damage or _mask datasets. More important for users is different time datasets that affect alignment, but another project is to align the 'DAQ readout groups' which means all types from each source will be aligned.
Just use one alias for both config and data. For example:
UsdUsb UsdUsb::ConfigV1, UsdUsb::DataV1
The thinking is that since config vs data is in the hierarchy paths, it will be clear from context, i.e.
/Data/Run/Step:0000/srcAlias/Config/UsdUsb # the config data
/Data/Run/Step:0000/srcAlias/UsdUsb # the event data
vs
/Data/Run/Step:0000/Config/srcAlias/UsdUsbConfig # the config data
/Data/Run/Step:0000/srcAlias/UsdUsbData # the event data
The drawback is a higher risk of a name collision (see problems below). For instance if there is both config and regular event data occurring during the event, then the Translator will try to put them in the same group.
Since there is a Config group, it seems like a good idea to have a group for EventData, i.e:
/Data/Run/Step:0000/Config/UsdUsbConfig # the config data
/Data/Run/Step:0000/EventData/UsdUsbData # the event data
However maybe some will find this new group gets in the way.
Have the hierarchy start here
/Run
/Run/Config
/Run/EpicsConfig
/Run/Step:0000
/Run/Step:0000/Config/srcAlias/UsdUsbConfig # the config data
/Run/Step:0000/srcAlias/UsdUsbData # the event data
The drawback is that this is not how the xtc data is formed. In xtc files, the beginRun transition is preceded by the Configure transition. Collapsing the information from both transitions into a Run group is probably reasonable, but makes it more awkward to recover information that belonged to one xtc transition and not the other (users generally don't care about this, but a framework like psana does),
A group name collision occurs when the Translator has already made a group for one kind of data, when all of a sudden another kind of data comes along with the same name.
Presently, there should not be a collision. If one happens, it is treated as a fatal error.
They don't happen because currently there is a near 1-1 mapping between the psana event keys from which the Translator gets the data, to the group names. This mapping uses the distinct pairs of C++ type names and DAQ source names in the event keys.
That means one can always add a new Type without colliding with existing types, as long as the Translator uses a fully qualified C++ typename for the group name.
Likewise for DAQ sources.
An example of a collision would be
The Translator already uses the string noSrc for user data without a source - collision.
Another example would be
All three of these want to go to /Data/Run/Step:0000/cspad_front/CsPadElement
I think though, collisions will be rare and most likely something a user can change by specifying different output keys for the psana modules they load. So the default will be to throw a fatal error, but I'll add an option to make this a non-fatal error and have the Translator rename the colliding group.
For example, there is some old data where occasionally a CsPad::DataV1 would be sent while most all the data was DataV2. This was to debug the new compression being used. In the non-fatal mode, the Translator will start with the type alias above, CsPadElement, based on the first data it sees.
The next data will get called CsPadElement_01. Instead, I could
With corner cases like that, users, and frameworks, may find they need to know exactly what type they are dealing with. This will be stored in hdf5 attributes to the groups (Exactly how to extract this information will be documents for users and framework writers).
The programmatic interface to the new schema is more difficult - without using exact information in the attributes, that is just basing your code on the group names, some Issues
So I think it important to have the full source and typename available in the group attributes.
For example, if there are mistakes in the aliases, or suppose just one of several similar sources is aliases, then a user browsing the hdf5 would see
/Data/Config/evr0/Evr daq alias
/Data/Config/NoDetector.0:Evr.1/Evr no alias
If DAQ aliases are not used for all sources, there can be a number of technical looking source names that show up. For instance
NH2-SB1-IPM-01
This is what the new schema might look like.
Lets say the user has specified two shortcuts
Opal_1/TimeToolConfig -> MyTimeToolConfig
Opal_1/TimeToolData -> MyTimeToolData
and calibrated data is translated. And they are translating ndarrays and strings from psana modules that output during BeginRun, BeginCalibCycle, EndCalibCycle and EndRun, as well as during regular events.
NEW | WHERE IT WAS IN OLD SCHEMA OR NOTES ----------------------------------------+--------------------------------------------------------- /Data /Data/EpicsConfig /Data/EpicsConfig/BEAM:LCLS:ELEC:Q /Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/BEAM:LCLS:ELEC:Q /Data/EpicsConfig/Attenuator_transmission /Configure:0000/Epics::EpicsPv/EpicsArch.0:NoDevice.0/Attenuator_transmission /Data/Config /Data/Config/Control * a standard source name, not an alias * /Data/Config/Control/AliasConfig /Configure:0000/Alias::ConfigV1/Control /Data/Config/Control/ControlDataConfig /Configure:0000/ControlData::ConfigV3/Control /Data/Config/Control/EvrIOConfig /Configure:0000/EvrData::IOConfigV2/Control /Data/Config/Control/PartitionConfig /Configure:0000/Partition::ConfigV1/Control /Data/Config/EBeam * a standard source name, not an alias * /Data/Config/EBeam/BldDataEBeam /Configure:0000/Bld::BldDataEBeamV7/EBeam /Data/Config/Opal_1 * this is a Daq Alias for XppEndstation.0:Opal1000.1 * /Data/Config/Opal_1/TimeToolConfig /Configure:0000/TimeTool::ConfigV2/XppEndstation.0:Opal1000.1 /Data/Config/Opal_1/FrameFexConfig /Configure:0000/Camera::FrameFexConfigV1/XppEndstation.0:Opal1000.1 /Data/Config/Opal_1/Opal1kConfig /Configure:0000/Opal1k::ConfigV1/XppEndstation.0:Opal1000.1 /Data/Config/cs140_0 * also an alias * /Data/Config/cs140_0/CsPad2x2Config /Configure:0000/CsPad2x2::ConfigV2/XppGon.0:Cspad2x2.0 /Data/Config/evr0 * alias * /Data/Config/evr0/EvrConfig /Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.0 /Data/Config/evr1 * alias * /Data/Config/evr1/EvrConfig /Configure:0000/EvrData::ConfigV7/NoDetector.0:Evr.1 /Data/Config/MyTimeToolConfig * special alias from user, soft link to Opal_1/TimeToolConfig /Data/Config/NH2-SB1-IPM-01/IpimbConfig /Configure:0000/Ipimb::ConfigV2/NH2-SB1-IPM-01 /Data/Config/NH2-SB1-IPM-01/IpmFexConfig /Configure:0000/Lusi::IpmFexConfigV2/NH2-SB1-IPM-01 /Data/Config/XppEnds_Ipm0/IpimbConfig /Configure:0000/Ipimb::ConfigV2/XppEnds_Ipm0 /Data/Config/XppEnds_Ipm0/IpmFexConfig /Configure:0000/Lusi::IpmFexConfigV2/XppEnds_Ipm0 /Data/Config/Event/L3TConfig /Configure:0000/L3T::ConfigV1/Event /Data/Run /Data/Run/Config /Data/Run/Config/noSrc/mykey # if a user did configStore().put(myndarray, 'mykey') during beginrun /Data/Run/Config/Opal_1/mykey # likewise, if a user did configStore().put(mystring, psana.Source('Opal_1'),'mykey') /Data/Run/EndData/noSrc/summary # if a user did configStore().put(myndarray, 'summary') during endrun /Data/Run/EndData/Opal_1/summary # if a user did configStore().put(myndarray, 'summary') during endrun # HERE IS WHERE REGULAR EVENT DATA IS /Data/Run/Step:0000 /Configure:0000/Run:0000/CalibCycle:0000 /Data/Run/Step:0000/Config /Data/Run/Step:0000/Config/noSrc /Data/Run/Step:0000/Config/noSrc/myKeyString # if user adding something to configStore during begincalibcycle /Data/Run/Step:0000/Epics /Data/Run/Step:0000/Epics/pvName /Data/Run/Step:0000/EBeam/BldDataEbeam /Data/Run/Step:0000/Opal_1/CameraFrame /Data/Run/Step:0000/Opal_1/TimeToolData /Data/Run/Step:0000/MyTimeToolData {soft link to above} /Data/Run/Step:0000/cs140_0 /Data/Run/Step:0000/cs140_0/CsPadElement /Data/Run/Step:0000/cs140_0/radialIntegration # a user module ndarray attached to a source /Data/Run/Step:0000/evr0 /Data/Run/Step:0000/evr1 /Data/Run/Step:0000/evr0/EvrData /Data/Run/Step:0000/evr1/EvrData /Data/Run/Step:0000/noSrc/mykey # a user module ndarray not attached to a source /Data/Run/Step:0000/NH2-SB1-IPM-01/IpimbData /Data/Run/Step:0000/NH2-SB1-IPM-01/IpmFex /Data/Run/Step:0000/EndData /Data/Run/Step:0000/EndData/noSrc/myKeyString /Data/Run/Step:0000/EndData/opal_1/myKeyString /Data/CalibStore same as before, but invert type/source, and use DAQ aliases |
Here is some feedback I have gotten.
Simplifying the hiearchy too much could be confusing, keeping closer to what one sees with psana EventKeys is helpful
Just having the DAQ aliases may not be good, One could put both the DAQ alias and native source in the name, or have them side by side, one a link, or keep aliases separate from original native source names, in different groups.
Use basic types in place of compound types.
A tool to gather up, and event build the particluar data a user is intereted in. This may be a few fields from EBeam, a particular EPICS PV, links to camera images.