Analysis Workbook. Data Formats

LCLS produces data in two different data formats. On-line Data Acquisition system (DAQ) produces data files in XTC format, sometimes also called raw format, or on-line format. Later these files are translated into HDF5 format, also called off-line or scientific format. These two formats should contain (almost) identical data but have different data organization.

XTC format was specifically designed for on-line data production where reliability and efficiency matter most. This is a sequential data format meaning that the data is stored in the file record-after-record and there is no efficient way to jump to an arbitrary record in a file. Reading the data from XTC file is efficient only if an analysis job needs to process every record in a file.

HDF5 format, on the other hand, is an indexed format. With properly designed indices it is much more efficient when a job needs to process only a subset of the full data. The data is stored in HDF5 as multi-dimensional arrays of numbers or complex records. Similarly to in-memory arrays they provide direct access to arbitrary index inside the array.

At SLAC the data is stored on a high-performance parallel file system which provides uniform naming space for all data collected by different experiments. On analysis machines this file system is accessible under directory /reg/d/psdm. Under this directory there are separate directories for every instrument, such as "AMO", "SXR", etc. Inside the instrument directory there are directories for every experiment named after the proposal number and a year, e.g. "amo02809", "amo14410", etc. In the experiment directory there are directories named "xtc" and "hdf5" which contain all XTC and HDF5 files respectively.

For XTC format the data from a single run are usually split into several files. The names of XTC files have the format

eEEE-rRRRR-sSS-cCC.xtc

where EEE is the experiment number as defined in experiment registration database, RRRR is the run number, SS is the stream number and CC is the chunk number.

For HDF5 the data from one run can be stored in one or multiple files depending on the production job configuration. When the data are stored in a single file the file name has this format:

EXPNAME-rRRRR.h5

where EXPNAME is the experiment name and RRRR is the run number. When there are multiple files per run then the file names have format:

EXPNAME-rRRRR-NN.h5

where NN is the sequential part number starting with 0.

Page tree

Analysis Workbook. Data Formats