Page History
Info |
---|
Draft, in workThis documentation was started by Mark Arndt, but then he left SLAC, so this is incomplete. See whiteboard notes. See also Sioan's slides google slides (in particular the one titled "The XTC2 data structure" which include hyperlinks to GitHub code) |
Table of Contents |
---|
TBW.
Concepts and Use Cases
TBW.
- Provide support for distributed parallel processing by coupling each large raw data file a small metadata "manifest" file containing only offset pointers and filter criteria that allow a parallel processing node to <make XYZ decisions about how to carve up and process the data>
Concepts
TBW.
- self-describing data format
- description includes the algorithms/versionnumbers (e.g. "gzip") where necessary that allow the correct software to be instantiated to analyze a piece of data
- no dependencies
- lightweight (2500 lines of C++)
- same data format used in-mem and on-disk
- no serialization (copying) step when sending xtc over a network
- can be read while data is being written
- lcls-ii will write one or more xtc files per detector
- det name, det type, det serial number, det "segment" (which piece of a detector we are)
- det configuration metadata shows up at the beginning of every xtc file
Xtc Big
Xtc DetectorData Files
TBW.
Gliffy Diagram | ||||||
---|---|---|---|---|---|---|
|
(Note, this is a user-centric description of the data format, focusing on the most typically used parts of the API, and doesn't cover all of the structure and metadata present in the Xtc format. For example, Xtc data records do not begin with a Names block. For more detail, see Xtc Library Reference)
For typical user code written to parse Xtc data, each record/file effectively begins with a set of data Names,
Xtc
ManifestSmall Data Files
Trying to work up a name for what we labeled "small data" on the whiteboard. "Small data" didn't seem like the right description.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
These files (generated automatically by DAQ) have the same format as big-data files, but include, at a minimum, the "offsets" on disk of the associated big data. One small-data "event" per big-data "event". Used for parallelization. Can also include other small-data (e.g. diode values) that can be used to "filter" to avoid paying the penalty for fetching the large data. Current tool is "smdwriter" (may change). if you run "pytest psana/psana/tests" the small data files will be in .tmp/smalldata/*.smd.xtc.
Redesigning Xtc for LCLS-II
(LCLS-I lessons learnedMore self-describing: arrays of different types (floats, ints) and values (floats, ints)