Software version: ps-4.3.2 (Non-blocking MPIs)
Cluster: srcf
MPI Version: openmpi 4.1.0
Data size: 6.6 GB for one smd and bigdata files. Note that the original data are from tmoc00018 run 463 which has 5 set of smd and bigdata files (1 timing and 4 hsds). One of the hsd stream was duplicated to generate up to 60 sets of smd and bigdata files.
Data location: /cds/data/drpsrcf/users/monarin/tmoc00118/xtc
MPI Settings: Rank 0 (Smd0 )runs on its own node using 32 as maximum no. of threads. Other ranks are spread out with no. of maximum ranks per node = 50. Note that these no. of thread (32) and spreading factor (50) were based on empirical runs.
Note on performance shown below: We obtained these rates on a shared network. These performance values were affected by the amount of traffic during the testing session.
Reading performance
This test only captures reading performance when running psana2 to stream data down to bigdata nodes. The plot below shows strong scaling on 4 - 60 streams of datasets using 18 to 2177 cores (2 - 45 nodes). The best performance observed when reading 60 streams is 500 kHz using 45 nodes.
Potential overheads
- Reading bandwidth on a single node is ~12 GB/s.
- MPI Intra-node communication was not done over IB
More info: Reading performance
Peak Finding performance
We obtained an example code for running peaking finding algorithm based of the featured extracted data. The experiment used for running this test is different from the one used in the prior tests. This new dataset (exp='tmolv9418',run=175) contains data with peaks, which can be tested with Xiang's peakfinding algorithm (xiangli@slac.stanford.edu). We have around 30,000 events from 15 data streams and ~10,000 events have usable peaks.
In this test, we investigate the behavior of the algorithms with increasing no. of events. Scaling test is underway when we can obtain more events for running with high no. of nodes. We collected timing spent
The following plot shows weak scaling when both no. of events and no. of cores increased (200k to 25M events and 18 to 2177 cores).
Peak Finding and Data Writing
This test shows how data streaming, peaking finding, and data writing performance looks like. Data writing is done by limiting the size of array to 100 for each event and all events get written out (zeroed array in case there's no peak). No. of Srv cores was set to increased the same way as no. of Eb cores.
Spectrum analysis
We do not have only-fex data to perform scaling test on this experiment. The plots below just shown differences between Alg. 1 and 2 (Gaussian Fitting) for spectrum analysis).
Total performance on reading, detector interface, and Alg1.
Comparison between detector interface and Alg.1
Comparison between detector interface and Alg.2