Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

MPIDataSource supports "ragged" 1D arrays (also called variable-length, or vlen, arrays).  An example would be an array of photon energies (or positions) whose length changed for every LCLS shot.  If you have such an array you must start the HDF5 dataset name with the string "ragged_".  

An alternative to ragged arrays are "variable" arrays.  These are not limited to 1D, but only first dimension is variable, and the other dimensions must be fixed sizes.  An HDF5 dataset name that starts with the string "var_" will generate a variable length array.  A separate integer array ending with "_len" will be automatically generated with the number of elements of the variable array that belong to each event.  (When reading such a file, a running count of the values from the "_len" array must be kept to locate the next event's data in this dataset.)

Job Submission Script

The following script can be a useful pattern to follow for submitting batch jobs.  Change the script to use the appropriate directory, experiment name, and analysis python-script name.   Then make the script executable with a command like "chmod +x submit.sh" and submit a job with a command like "./submit.sh 123" where "123" is the run number to be analyzed.

...

The MPIDataSource pattern can be used to "translate" data from xtc to hdf5.  It offers the following advantages over the old translation method:

  • users can choose what data they want to store in HDF5 (e.g. raw image data, calibrated)
  • users can use python algorithms (e.g. saving only part of a camera Image) to reduce the output data volume which dramatically speeds up translation.  It also potentially allows the hdf5 files to be moved to a laptop for further analysis.
  • can be run in parallel on many cores
  • datasets are guaranteed to be "aligned"

...