You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

We explored if allowing disk reading task to overlap with computation task on Smd0 (chunk building) can speed up the processing rate.

Below shows performance of Smd0 (no eventbuilder cores connected) when reading and building chunks synchronously. 

(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 1
found EndRun
total search time: 0.041627854108810425
#Smdfiles: 1 #Events: 10000403 Elapsed Time (s): 0.31 Rate (MHz): 32.35 Bandwidth(GB/s):2.4585713863431047
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 2
found EndRun
total search time: 0.07852859795093536
#Smdfiles: 2 #Events: 10000403 Elapsed Time (s): 0.59 Rate (MHz): 16.95 Bandwidth(GB/s):2.5766052518112645
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 4
found EndRun
total search time: 0.1361357718706131
#Smdfiles: 4 #Events: 10000403 Elapsed Time (s): 0.81 Rate (MHz): 12.36 Bandwidth(GB/s):3.757219106209765
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 8
found EndRun
total search time: 0.2327030450105667
#Smdfiles: 8 #Events: 10000403 Elapsed Time (s): 1.20 Rate (MHz): 8.36 Bandwidth(GB/s):5.085212222769722
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 16
found EndRun
total search time: 0.24143944680690765
#Smdfiles: 16 #Events: 10000403 Elapsed Time (s): 1.97 Rate (MHz): 5.07 Bandwidth(GB/s):6.160083997216601
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 32
found EndRun
total search time: 0.27678677439689636
#Smdfiles: 32 #Events: 10000403 Elapsed Time (s): 3.38 Rate (MHz): 2.96 Bandwidth(GB/s):7.200784223247125
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 52
found EndRun
total search time: 0.3047977387905121
#Smdfiles: 52 #Events: 10000403 Elapsed Time (s): 4.81 Rate (MHz): 2.08 Bandwidth(GB/s):8.223357074851291


We compare these results with a script that just perform disk reading (no computation). 

#Files: 1 Elapsed(s): 0.26s. Bandwidth(GB/s):2.97
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 2
#Files: 2 Elapsed(s): 0.32s. Bandwidth(GB/s):4.70
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 4
#Files: 4 Elapsed(s): 0.40s. Bandwidth(GB/s):7.53
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 8
#Files: 8 Elapsed(s): 0.62s. Bandwidth(GB/s):9.88
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 16
#Files: 16 Elapsed(s): 1.32s. Bandwidth(GB/s):9.20
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 32
#Files: 32 Elapsed(s): 2.36s. Bandwidth(GB/s):10.29
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 52
#Files: 52 Elapsed(s): 3.64s. Bandwidth(GB/s):10.86

Note that we do not see a big improvement eliminating computation task, which is an indication that we are already I/O limited in this case. 

Location of test scripts:

https://github.com/monarin/psana-nersc/blob/master/psana2/dev_smd0.py

https://github.com/monarin/divelite/blob/master/cython/parallel/test_multitasking_threads.py (formerly tst.py)



  • No labels