Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Software version: ps-4.3.2 (Non-blocking MPIs)

Cluster: srcf

MPI Version: openmpi 4.1.0

Data size: 6.6 GB for one smd and bigdata files. Note that the original data are from tmoc00018 run 463 which has 5 set of smd and bigdata files (1 timing and 4 hsds). One of the hsd stream was duplicated to generate up to 60 sets of smd and bigdata files.

Data location: /cds/data/drpsrcf/users/monarin/tmoc00118/xtc

MPI Settings: Rank 0 (Smd0 )runs on its own node using 32 as maximum no. of threads. Other ranks are spread out with no. of maximum ranks per node = 50. Note that these no. of thread (32) and spreading factor (50) were based on empirical runs. 

Note on performance shown below: We obtained these rates on a shared network. These performance values were affected by the amount of traffic during the testing session.

Reading performance

This test only captures reading performance when running psana2 to stream data down to bigdata nodes. The plot below shows strong scaling on 4 - 60 streams of datasets using 18 to 2177 cores (2 - 45 nodes). The best performance observed when reading 60 streams is 500 kHz using 45 nodes.

Image Removed

Potential overheads

  • Reading bandwidth on a single node is ~12 GB/s. 
  • MPI Intra-node communication was not done over IB

More info: Reading performance

Peak Finding performance

We obtained an example code for running peaking finding algorithm based of the featured extracted data. The experiment used for running this test is different from the one used in the prior tests. This new dataset (exp='tmolv9418',run=175) contains data with peaks, which can be tested with Xiang's peakfinding algorithm (xiangli@slac.stanford.edu). We have around 30,000 events from 15 data streams and ~10,000 events have usable peaks. 

In this test, we investigate the behavior of the algorithms with increasing no. of events. Scaling test is underway when we can obtain more events for running with high no. of nodes. We collected timing spent

Image Removed 

The following plot shows weak scaling when both no. of events and no. of cores increased (200k to 25M events and 18 to 2177 cores). 

Image Removed

Peak Finding and Data Writing

This test shows how data streaming, peaking finding, and data writing performance looks like. Data writing is done by limiting the size of array to 100 for each event and all events get written out (zeroed array in case there's no peak). No. of Srv cores was set to increased the same way as no. of Eb cores.

Image Removed

Spectrum analysis 

We do not have only-fex data to perform scaling test on this experiment. The plots below just shown differences between Alg. 1 and 2 (Gaussian Fitting) for spectrum analysis).

Total performance on reading, detector interface, and Alg1.

Image Removed

Comparison between detector interface and Alg.1

Image Removed

Comparison between detector interface and Alg.2

Image RemovedThis section summarizes performance of psana2 MPI parallelization performance for different tasks.