Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
SWMR

...

Recently (Jan 29 2015) I (davidsch) took a look at the beta work of the SWMR model (single writer multiple readers) that the hdf5 group is working on. As it is planned now, there are a few limitations that would necessitate changes in how the DAQ records data, as well as in the Translator. Below is an email exchange with the hdf5 group that covers this. The summary is that SWMR will only support the datasets, all the groups have to be known and created before SWMR can take place. They need more funding to support the groups, as well as vlen data.

Another limitation that we gathered after meeting with Quincy at SLAC sometime in late 2014 was that you cannot start moving hdf5 files as soon as they are written. That is once the file is closed, the library needs to do some things to the header of the file. This would affect our data management model were we to write natively to hdf5.

...

This would be a big change for the Translator. It seems we would have to wait for the xtc to be written so that we know what datasets to create, and how long each one is (how many events we took, etc). So we wouldn't be able to do 'live' translation, and let users read parts of the translation (like a translated calib cycle) while the whole run is getting translated (see Outdated: The XTC - to - HDF5 Translator - the MPI split scan section). Also no compression is a big limitation for users moving files offsite. However I would expect faster translation.

...

The hdf5 group has announced (around Jan 2015) their plans for a virtual view layer. I believe this is coming within 1-2 years? Now we can create a master hdf5 file that contains links to hdf5 groups or whole datasets in other hdf5 files. My understanding of the virtual I/O layer is that within a dataset in the master file, we can link each dataset entry to a different dataset entry in another hdf5 file. This would allow us to translate each of the DAQ streams in parallel, into separate hdf5 files, while maintaining an easy to navigate master file that orders the events appropriately from the different streams.

Compression and HDF5 Translation

Presently when we compress, we limit ourselves to the standard compression algorithms that are available with the Hdf5 library. When users export their data offsite, their offsite installation of hdf5 will be able to decompress the data. Although we could use a custom compression algorithm and get much better performance, we would then we responsible for maintaining builds of the custom compression algorithm for all of our users target platforms, or a mechanism for users to build the library themselves. In a recent technical talk that Qunicy gave (January 2015) I asked if they had thought about including the decompression filter in the hdf5 file in some way, perhaps as Java byte code. He said they had, but one of the goals of hdf5 is longevity of the data - on the order of 40 years. Including java byte code or even a Python script, adds a dependency on Java or Python that is not desirable. He then contrasted this to the current dependency that one gets by using the standard compression algorithms, such as on an external gzip library - but since those algorithms are so common, it is not as problematic of a dependency.