Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Draft, in work. See whiteboard notes.

TBW.

Use Cases

TBW.

  • Provide support for distributed parallel processing by coupling each large raw data file a small metadata "manifest" file containing only offset pointers and filter criteria that allow a parallel processing node to <make XYZ decisions about how to carve up and process the data>

Concepts

TBW.

  • self-describing
  • include the algorithms/versionnumbers (e.g. "gzip") where necessary that allow the correct software to be instantiated to analyze a piece of data
  • no dependencies
  • lightweight (2500 lines of C++)
  • same data format used in-mem and on-disk
  • no serialization (copying) step when sending xtc over a network
  • can be read while data is being written
  • lcls-ii will write one or more xtc files per detector
  • det name, det type, det serial number, det "segment" (which piece of a detector we are)
  • det configuration metadata shows up at the beginning of every xtc file

Xtc Detector Data Files

TBW.

Gliffy Diagram
size300
nameXtc_per_file_structure

(Note, this is a user-centric description of the data format, focusing on the most typically used parts of the API, and doesn't cover all of the structure and metadata present in the Xtc format. For example, Xtc data records do not begin with a Names block. For more detail, see Xtc Library Reference)

For typical user code written to parse Xtc data, each record/file effectively begins with a set of data Names,

Xtc Small Data Files

Trying to work up a name for what we labeled "small data" on the whiteboard. "Small data" didn't seem like the right description.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

These files (generated automatically by DAQ) have the same format as big-data files, but include, at a minimum, the "offsets" on disk of the associated big data.  One small-data "event" per big-data "event".  Used for parallelization.   Can also include other small-data (e.g. diode values) that can be used to "filter" to avoid paying the penalty for fetching the large data.  Current tool is "smdwriter" (may change).  if you run "pytest psana/psana/tests" the small data files will be in .tmp/smalldata/*.smd.xtc.

Redesigning Xtc for LCLS-II

(LCLS-I lessons learnedMore self-describing:  arrays of different types (floats, ints) and values (floats, ints)