Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The recommended simplest way of running parallel analysis is to use the "MPIDataSource" pattern.  This allows you to write code as if it was running only on one processor and save small per-event information (numbers and arrays) as well as "end of run" summary data.  This data can optionally be saved to a small HDF5 file, which can be moved, for example, to a laptop computer for analysis with any software that can read HDF5that format.  This script can be found in /reg/g/psdm/tutorials/examplePython/mpiDataSource.py

Code Block
from psana import *

dsource = MPIDataSource('exp=xpptut15:run=54:smd')
cspaddet = Detector('cspad')
smldata = dsource.small_data('run54.h5',gather_interval=100)

partial_run_sum = None
for nevt,evt in enumerate(dsource.events()):
   calib = cspaddet.calib(evt)
   if calib is None: continue
   cspad_sum = calib.sum()      # number
   cspad_roi = calib[0][0][3:5] # array
   if partial_run_sum is None:
      partial_run_sum = cspad_roi
   else:
      partial_run_sum += cspad_roi

   # save per-event data
   smldata.event(cspad_sum=cspad_sum,cspad_roi=cspad_roi)

   if nevt>3: break

# get "summary" data
run_sum = smldata.sum(partial_run_sum)
# save HDF5 file, including summary data
smldata.save(run_sum=run_sum)

Run the script on 2 cores with this command:

Code Block
mpirun -n 2 python mpiDataSource.py

...