Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Introduction

Presently there are two xtc to hdf5 translators, o2o-translate and psana-translate. o2o-translate is the original translator. It is being phased out of use and replaced by psana-translate. Translation is primarily carried out by automatic hdf5 translation that users can execute from the web portal. Documentation on o2o-translate, which discusses some history with regards to selecting hdf5 for a scientific data format for general use can be found

...

Three aspects of these new features are subject to change. These are highlighted in warning boxes below. In brief, these are

  • The group name for NDArrays will most likely split from the one name NDArray to several that fully distinguish the type (such as ndarray_uint8_2)
  • How event key strings are incorporated into hdf5 paths
  • How new C++ types are registered for translation.

...

  • /Configure:0000/Run:0000/Filtered:0000/time  - this is as discussed above, the event id's for all filtered events
  • /Configure:0000/Run:0000/Filtered:0000/std::string/noSrc__message/data  - this will be a dataset of variable length strings, each entry will be the string "The beam energy is bad"
  • /Configure:0000/Run:0000/Filtered:0000/std::string/noSrc__message/time  - this will be a dataset of eventId's for the data above (there need not have been a std::string in all the filtered events).
  • /Configure:0000/Run:0000/Filtered:0000/NDArrayndarray_float32_1/noSrc__measurements/data - this will be a dataset where each entry is a 1D array of 4 floats, with the values 0.4, 1.3, 2.2, 3.1
  • /Configure:0000/Run:0000/Filtered:0000/NDArrayndarray_float32_1/noSrc__measurements/time - likewise the event ids for the ndarrays of the filtered events.

Note the src level group names: noSrc__mesage and noSrc__measurements. Since no source was specified with the calls to evt.put, the Translator starts with the string noSrc in the group name. Two underscores, __, separate the source from the keystring.

Note the fully qualified type information about the ndarray's written. This allows translation of different ndarrays in the event store that differ only by this type information (i.e: they have the same key and source).

Warning

This example illustrates the way our current hdf5 schema, schema 4, forms hdf5 paths that involve key strings for event data: source__key where the string noSrc can be used for source. This is one aspect of the new features that is subject to change. It also illustrates the group name for ndarrays - NDArray. This subject to change (see The XTC-to-HDF5 Translator above). 

Filtering from Python Modules

...

ndarrays (up to dimension 4 of the standard integral types, floats and doubles) as well as std::string's that are written into the event store will be written to the hdf5 by default.  ndarrays can be passed to the Translator by Python modules as well as C++ modules. These events can be filtered as well.  The example in The XTC-to-HDF5 Translator above illustrates the group names used for ndarrays and strings. As noted in The XTC-to-HDF5 Translator the NDArray group name is subject to change.See the section The XTC-to-HDF5 Translator for more detailsNote, the type group name for ndarrays is fully qualified by the template arguments, some examples of type names are

ndarray_int8_1           # a one dimensional array of 8 bit signed integers    (the C type char)
ndarray_uint8_2 # a two dimensional array of 8 bit unsigned integers
ndarray_int32_1 # a one dimensional array of 32 bit signed integers (the C type int)
ndarray_uint64_3 # a 3D array of 64 bit unsigned integers
ndarray_float32_2 # a 2D array of 32 bit floats (the C type float)
ndarray_float64_1 # a 3D array of 64 bit floats (the C type double)

These names agree with what users find in the Python interface to psana.

Less common are the names used to store an ndarray of const data. An example name for such data is

ndarray_const_float32_2 

Fixed Dimensions vs. Variable Dimensions

The Translator defaults to using a fixed set of dimensions for all the ndarrays that go into the same dataset. The array received for the first data of the dataset determine these dimensions. For example, if from python one did

event.put(numpy.zeros((3,4),"mykey")

during the first event, but then

event.put(numpy.zeros((5,4),"mykey")

during the second event, the Translator would throw an error. Both of these arrays are supposed to go into an hdf5 path that ends with

/ndarray_float32_2/noSrc__mykey

but the underlying hdf5 type for this dataset has been set to a 2D array with dimensions (3,4). At present, one can start a new dataset during the second event

event.put(numpy.zeros((5,4),"mykey_larger")

to resolve this. In the near future, the Translator will support translation of ndarrays that vary only in the slow dimension to the same dataset. This feature will be activated with a modified key. One would prepend 'translate_vlen:' to the start of the keys. For example:

event.put(numpy.zeros((3,4),"translate_vlen:mykey")        # event one
event.put(numpy.zeros((5,4),"translate_vlen:mykey") # event two

Both ndarrays will go to the same hdf5 path as before, /ndarray_float32_2/noSrc__mykey, but now the underlying hdf5 datatype is a vlen type of 1D arrays with dimension 4.

Registering New Types

C++ modules can register new types. Note, this is an advanced feature that requires familiarity with the Hdf5 programming in C.  Presently this feature is only suitable for simple types. An example is found in the file Translator/src/TestModuleNewWriter.cpp. We go through the example here. First a module will define the data type that they want to store. This type is a simple C struct of native types in the C language:

...