Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Include Page
PSDMInternal:PageMenuBeginPSDMInternal:
PageMenuBegin
Table of Contents
Include Page
PSDMInternal:PageMenuEndPSDMInternal:
PageMenuEnd

Introduction

One of the responsibilities of the offline system is to translate the data coming from online system to the "scientific format". The basis for the scientific data format is HDF5 - Hierarchical Data format - developed and supported by HDF Group.

...

Actual data are stored inside data groups as a number of HDF5 datasets. For the purpose of describing organization of the datasets we distinguish two types of data objects – configuration-type and event-type:

  • configuration-type data appears only once per group, many types of this data appear during the Configure transition and there is only a single instance of them, some configuration types are updated on every BeginCalibCycle but there is still just one instance per calibration cycle;
  • event-type data usually appears in L1Accept transitions, and they are updated in every event or less frequently but there still may be more that one instance of the data object per CalibCycle.

Many data types are stored as a single dataset (usually with the name "config" or "data" depending on type of the data), some more complex data types can be split into multiple datasets. Splitting is done to improve performance of data reading when only a subset (slice) of the data may be needed for particular analysis.

...

All datasets which store event-type data are 1-dimensional arrays. Like configuration type event types can be split as well and stored as multiple datasets. In addition to regular datasets produced from event data few special datasets are stored for event-type data:

  • "time" – required, stores corresponding event time
  • "_damage" – optional, stores corresponding DAQ damage value
  • "_mask" – optional, stores "data is usable" flag

...

All datasets in a group corresponding to event-type data, including dataset produced from data and above special datasets, are 1-dimentional datasets with identical number of entries in each dataset. All datasets in the same group are _aligned_ meaning that entries with the same index correspond to the same event. For example, data stored in {{data\[100\]}} comes from the event whose time is defined by {{time\[100\]}} and it has damage defined in {{\_damage\[100\]}} and it can be used if {{\_mask\[100\]}} is non-zero.

Note
Important
Important

One should be aware that datasets from different groups are not guaranteed to be aligned, they can have different sizes, and even when their sizes are equal there is no guarantee that they correspond to the same events. To match events from different groups one has to use content of "time" dataset.

...

Special dataset "time" contains timestamp of the event as it comes from DAQ. This information is sufficient to identify the event and match this event with the data from other groups. In most cases the data in HDF5 is stored in event time order, meaning that timestamps in time dataset should be in increasing order (making it easier to do binary search for a timestamp). Time dataset is a structure (compound data type in HDF5 speak) with these fields:

  • seconds – whole seconds part of the timestamp
  • nanoseconds – nanoseconds (range 0-999999999) part of the timestamp
  • fiducials – fiducials counter for the event
  • ticks – 119MHz counter within the fiducial
  • vector – event counter since Configure (not necessarily in increasing order)
  • control – EVR event code

Older schema only had first two fields, last four fields are defined only if the global attribute (attribute of "/" group) with the name ":schema:timestamp-format" exists and its value is "full".

...

Optional dataset "_damage" contains information about damage recorded by DAQ for this data source. Usually when damage is non-zero the data is not usable, with the exception of BLD Ebeam data which can be used if user damage flags are set. "_mask" dataset provides exact information about when the data is usable. See also description of mask dataset for more details about schema versions. Damage dataset is a structure (compound data type in HDF5 speak) with these fields:

  • bits – packed value of the damage bits, non-user damage only
  • userBits – packed value of the damage bits, user damage only
  • DroppedContribution – value of particular damage bit
  • OutOfOrder – value of particular damage bit
  • OutOfSynch – value of particular damage bit
  • UserDefined – value of particular damage bit
  • IncompleteContribution – value of particular damage bit

Mask dataset

Special dataset "_mask" contains calculated value based on the damage bits and type of data which has non-zero value when corresponding data are usable. If the mask value for the specific event is zero that one should not attempt to read or interpret the data from regular dataset as the data may not be consistent. Currently when mask value is zero corresponding entries in data datasets will be zero-filled but that can change in the future. Time dataset type is an integer number, so it is an integer array, one number per recorder event.

...

Small set of attributes provides information about the structure of a file, they are usually defined at the global level for the topmost "/" group:

  • :schema:version – integer number, defines schema version of the whole HDF5 file
  • :schema:timestamp-format – string, either "full" or "short", defines the structure of "time" datasets in event-data groups
  • :schema:bld-shared-split – integer number, non-zero means that shared BLD data is split into individual components
  • :schema:bld-config-as-evt – integer number, if non-zero then BDL configuration data types are stored in event-type datasets (arrays)

Global attributes

  • UUID – string, defines unique file identifier (not used currently)
  • created – string, time when the file was created
  • experiment – string, name of the experiment
  • origin – string representing the user or process who created the file ("translator" is probably most common)
  • runNumber – integer, run number
  • runType – string, currently contains "DATA"

Start/end times

Groups that correspond to DAQ transitions (Configure, Run, and CalibCycle) may define attributes which store start/end times of their corresponding intervals:

  • start.seconds – integer, whole seconds of the starting timestamp
  • start.nanoseconds – integer, nanoseconds part of the starting timestamp
  • end.seconds – integer, whole seconds of the end timestamp
  • end.nanoseconds – integer, nanoseconds part of the end timestamp

In some cases (when corresponding transition is missing from a file) timestamps may not be known, in this case corresponding attributes will be missing.

...

Individual data groups (groups that contain datasets) may have these attributes defined:

  • _schemaVersion – integer, schema version number used for defining datasets (their number and types)
  • _xtcSrc – 64-bit integer, representation of DAQ source (Pds::Src value) where data originated

Schema versions and features

...

This is the schema that was used since the beginning of experiments until November 2012. Some features of this schema version:

  • "time" datasets had only seconds and nanoseconds fields, fiducials, ticks and other fields are not defined
  • schema attributes are not defined

Schema version 2

This version was introduced around November 2011. Changes since previous schema version:

  • added schema attributes

Some features are dynamic and can be turned on or off independently of the schema version number. Here are few features:

  • time dataset format: if global attribute :schema:timestamp-format is present and contains "full" then time datasets will have all fields defined above, otherwise only seconds and nanoseconds fields defined (EPICS time dataset always have only seconds/nanoseconds)
  • damage/mask datasets: if these datasets are present then their content can be used to filter out unusable data, if these datasets are missing then damaged data is simply discarded
  • shared BLD data: if global attribute :schema:bld-shared-split is present and is non-zero then shared BLD data is split into individual object before storing in HDF5. Recently DAQ started splitting shared BLD before writing to XTC, so it should not matter anymore for recent data.
  • BLD configuration data: if global attribute :schema:bld-config-as-evt is present and is non-zero then configuration data from BLD sources is stored in the same way as regular event data (in rank-1 array datasets instead of scalar datasets)

Schema version 3

This version was introduced around September 2013. Changes since previous schema version:

  • BLD configuration data from Configure transition is stored in scalar datasets instead of event-type array datasets in /Configure:0000 group
  • translator now processes only one Configure transition instead of one-per-stream, as a result number of entries in event-type datasets in /Configure:0000 groups should be 1.