MPI Parallelization

MPI is a world-standard for large-scale parallel computing, and is supported by every major academic computer batch system. It allows for parallelization across multiple nodes, and also provides tools for gathering the results from different CPUs together. It not only allows you to add more CPU power to a problem, but can also be used to add:

memory (by distributing a large memory-bound problem over multiple nodes)
I/O (by allowing multiple network connections between data senders/receivers)

The recommended simplest way of running parallel analysis is to use the "MPIDataSource" pattern. This allows you to write code as if it was running only on one processor and store small per-event information (numbers and small arrays) as well as "end of run" summary data. This data can optionally be saved to a small HDF5 file, which can be copied, for example, to a laptop computer for analysis with any software that can read that format. This script can be found in /reg/g/psdm/tutorials/examplePython/mpiDataSource.py

from psana import *

dsource = MPIDataSource('exp=xpptut15:run=54:smd')
cspaddet = Detector('cspad')
smldata = dsource.small_data('run54.h5',gather_interval=100)

partial_run_sum = None
for nevt,evt in enumerate(dsource.events()):
   calib = cspaddet.calib(evt)
   if calib is None: continue
   cspad_sum = calib.sum()      # number
   cspad_roi = calib[0][0][3:5] # array
   if partial_run_sum is None:
      partial_run_sum = cspad_roi
   else:
      partial_run_sum += cspad_roi

   # save per-event data
   smldata.event(cspad_sum=cspad_sum,cspad_roi=cspad_roi)

   if nevt>3: break

# get "summary" data
run_sum = smldata.sum(partial_run_sum)
# save HDF5 file, including summary data
smldata.save(run_sum=run_sum)

Run the script on 2 cores with this command:

mpirun -n 2 python mpiDataSource.py

In addition to running offline, these parallel scripts can be run in real time while the data is being taken and can complete within a few minutes of the end of the run (you can see how to submit MPI psana-python batch jobs here). Note that this interface does not currently work with the shared-memory analysis mode.

It is important to emphasize that this code is optimized for producing SMALL HDF5 files. For example, it will not run quickly if you save large images for every event. This may also cause the machines to run out of memory.