Software version: ps-4.3.2 (Non-blocking MPIs)

Cluster: srcf

MPI Version: openmpi 4.1.0

Data size: 6.6 GB for one smd and bigdata files. Note that the original data are from tmoc00018 run 463 which has 5 set of smd and bigdata files (1 timing and 4 hsds). One of the hsd stream was duplicated to generate up to 60 sets of smd and bigdata files.

Data location: /cds/data/drpsrcf/users/monarin/tmoc00118/xtc

MPI Settings: Rank 0 (Smd0 )runs on its own node using 32 as maximum no. of threads. Other ranks are spread out with no. of maximum ranks per node = 50. Note that these no. of thread (32) and spreading factor (50) were based on empirical runs.

Note on performance shown below: We obtained these rates on a shared network. These performance values were affected by the amount of traffic during the testing session.

Reading performance

This test only captures reading performance when running psana2 to stream data down to bigdata nodes. The plot below shows strong scaling on 4 - 60 streams of datasets using 18 to 2177 cores (2 - 45 nodes). The best performance observed when reading 60 streams is 500 kHz using 45 nodes.

Image Removed

Potential overheads

Reading bandwidth on a single node is ~12 GB/s.
MPI Intra-node communication was not done over IB

More info: Reading performance

Reading with analysis performance

In addition to the streaming-only task above, this test adds a call to the detector interface to access the data. For this test, the interface was the "hsd" and the call is to access peaks found in different channels. We performed the test for the case of 32 and 60 streams and in the plot shows the comparison with the stream-only task.

Image Removed

Peak Finding performance

We obtained an example code for running peaking finding algorithm based of the featured extracted data. The experiment used for running this test is different from the one used in the prior tests. This new dataset (exp='tmolv9418',run=175) contains data with peaks, which can be tested with Xiang's peakfinding algorithm (xiangli@slac.stanford.edu). We have around 30,000 events from 15 data streams and ~10,000 events have usable peaks.

In this test, we investigate the behavior of the algorithms with increasing no. of events. Scaling test is underway when we can obtain more events for running with high no. of nodes. We collected timing spent

Image RemovedThis section summarizes performance of psana2 MPI parallelization performance for different tasks.

Page tree

Versions Compared

Old Version 9

New Version Current

Key

Reading performance

Reading with analysis performance

Peak Finding performance

Page tree

Page History

Versions Compared

Old Version 9

New Version Current

Key

Reading performance

Reading with analysis performance

Peak Finding performance