Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Psana ddl based Translator can be used to write ndarrays, strings and a few simple types that C++ modules register. These will be organized in the same groups that we use to translate xtc to hdf5. Datasets with event times will be written as well. To use this, create a psana config file that turns off the translation of all xtc types but allows translation of ndarrays and strings. An example cfg file is here: psana_translate_noxtc.cfg You would just change the modules and files parameters for psana and the output_file parameter to Translator.H5Output. Load modules before the translator that put ndarrays into the event store. The Translator will pick them up and write them to the hdf5 file

Chunks, compression, and why is my random access algorithm so slow?

Be default, Hdf5 files are translated in compressed chunks. The compression (standard gzip, with deflate=1 (the range in [0,9])) reduces file size by about 50%. The chunk size varies with the data type. The chunk policy focuses on not having too many chunks, as we believe this degrades performance. Typically we shoot for chunks of 16MB, however for full cspad, this is only 4 objects per chunk - so we default to using 22 objects per chunk - or 100MB chunks. This is fine for a program that linearly reads through a hdf5 file, or a parallel program that divides the file into sections - i.e, start, middle, end, but it is not optimal for a random access analysis.If you read one cspad, you read the other 21 in its chunk, and decompressed the whole 100MB chunk.

In terms of how this plays with the filesystem cache, and hdf5 cache, the filesystem will cache the original data in the hdf5 - so if you revisit cspad from the same chunk, the compressed data may be cached by the filesystem, but hdf5 will have to decompress the whole chunk again. Now hdf5 maintains it's own chunk cache which stores recently used chunks (uncompressed). There is one cache global cache, but one can also create caches on a per dataset basis. The Translator creates datasets with a cache to hold 3 chunks - so a 300MB cache for a cspad dataset. This information is available to any program that reads the hdf5 file, but whether or not it is used is unclear. A high level program that reads the file - like hdffor the library overais in the fil

TimeTool

Here we cover topics specific to the offline TimeTool module. The TimeTool results can be obtained in one of two ways depending on the experimental setup. The first is directly during data acquisition, the second is during offline analysis using the psana module TimeTool.Analyze. Regrading data acquisition, for data recorded prior to Oct 13, 2014, the timetool results were always recorded as EPICS PV's. After Oct 13, 2014, they are still recorded as EPICS PV's, but also recorded in their own data type: TimeTool::DataV*. The version increments as the time tool develops. For data recorded up to around November 2014, this was DataV1. Around December of 2014 it changed to DataV2. The preferred method of accessing the data is through the high level DataV* objects, but users must take care to use the correct type. EPICS PV's are still provided for backward compatibility. Regarding offline analysis with the TimeTool.Analyze psana module, similarly to the experiment data files, this module puts a TimeTool::DataV* object in the event store depending on the version of the software. The initial versions of the software put a collection of floats, or ndarrays in the event store (for backward compatibility, it still puts the floats in the event store). Documentation on the psana TimeTool modules can be found in the psana - Module Catalog.

...