Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The recommended simplest way of running parallel analysis is to use the "MPIDataSource" pattern.  This allows you to write code as if it was running only on one processor and store small per-event information (numbers and small arrays) as well as "end of run" summary data.  This data can optionally be saved to a small HDF5 file, which can be copied, for example, to a laptop computer for analysis with any software that can read that format.  This script can be found in in /reg/g/psdm/tutorials/examplePythonexamplePython3/mpiDataSource.py

This script can be run in real-time while data is being taken, and will typically complete a few minutes after the run ends.  NOTE: when running in parallel, the standard python "break" statement can cause hangs.  Use the "break_after" command here to terminate data processing early.

Code Block
from psana import *

dsource = MPIDataSource('exp=xpptut15:run=54:smd')
cspaddet = Detector('cspad')
smldata = dsource.small_data('run54.h5',gather_interval=100)

dsource.break_after(3) # stop iteration after 3 events (break statements do not work reliably with MPIDataSource).
partial_run_sum = None
for nevt,evt in enumerate(dsource.events()):
    calib = cspaddet.calib(evt)
    if calib is None: 
        continue
    cspad_sum = calib.sum()      # number
    cspad_roi = calib[0][0][3:5] # array
    if partial_run_sum is None:
        partial_run_sum = cspad_roi
    else:
        partial_run_sum += cspad_roi

   # save per-event data
    smldata.event(cspad_sum=cspad_sum,cspad_roi=cspad_roi)

# get (optional) "summary" data
run_sum = smldata.sum(partial_run_sum)
# save HDF5 file, including summary data
smldata.save(run_sum=run_sum)

...