Page History
...
- memory (by distributing a large memory-bound problem over multiple nodes)
- I/O (by allowing multiple network connections between data senders/receivers)
The recommended simplest way of running parallel analysis is to use the "MPIDataSource" pattern. This allows you to write code as if it was running only on one processor and save small per-event information (numbers and arrays) as well as "end of run" summary data. This data can optionally be saved to a small HDF5 file, which can be moved, for example, to a laptop computer for analysis with any software that can read HDF5. This script can be found in /reg/g/psdm/tutorials/examplePython/mpiDataSource.py
Code Block |
---|
from psana import * dsource = MPIDataSource('exp=xpptut15:run=54:smd') cspaddet = Detector('cspad') smldata = dsource.small_data('run54.h5',gather_interval=100) partial_run_sum = None for nevt,evt in enumerate(dsource.events()): calib = cspaddet.calib(evt) if calib is None: continue cspad_sum = calib.sum() # number cspad_roi = calib[0][0][3:5] # array if partial_run_sum is None: partial_run_sum = cspad_roi else: partial_run_sum += cspad_roi # save per-event data smldata.event(cspad_sum=cspad_sum,cspad_roi=cspad_roi) if nevt>3: break # get "summary" data run_sum = smldata.sum(partial_run_sum) # save HDF5 file, including summary data smldata.save(run_sum=run_sum) |
Run the script with this command:
Code Block |
---|
mpirun -n 2 python mpiDataSource.py |
These parallel scripts can be run in real time while the data is being taken and can complete within a few minutes of the end of the run-python supports the use of MPI-parallelization for both offline analysis and real-time analysis. You can see how to submit MPI psana-python batch jobs in the next "building blocks" topic.Parallelization will become even more critical with the higher data rates of LCLS-IIhere.