Overlapping I/O with computation

We explored if allowing disk reading task to overlap with computation task on Smd0 (chunk building) can speed up the processing rate.

Below shows performance of Smd0 (no eventbuilder cores connected) when reading and building chunks synchronously.

(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 1
found EndRun
total search time: 0.041627854108810425
#Smdfiles: 1 #Events: 10000403 Elapsed Time (s): 0.31 Rate (MHz): 32.35 Bandwidth(GB/s):2.4585713863431047
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 2
found EndRun
total search time: 0.07852859795093536
#Smdfiles: 2 #Events: 10000403 Elapsed Time (s): 0.59 Rate (MHz): 16.95 Bandwidth(GB/s):2.5766052518112645
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 4
found EndRun
total search time: 0.1361357718706131
#Smdfiles: 4 #Events: 10000403 Elapsed Time (s): 0.81 Rate (MHz): 12.36 Bandwidth(GB/s):3.757219106209765
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 8
found EndRun
total search time: 0.2327030450105667
#Smdfiles: 8 #Events: 10000403 Elapsed Time (s): 1.20 Rate (MHz): 8.36 Bandwidth(GB/s):5.085212222769722
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 16
found EndRun
total search time: 0.24143944680690765
#Smdfiles: 16 #Events: 10000403 Elapsed Time (s): 1.97 Rate (MHz): 5.07 Bandwidth(GB/s):6.160083997216601
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 32
found EndRun
total search time: 0.27678677439689636
#Smdfiles: 32 #Events: 10000403 Elapsed Time (s): 3.38 Rate (MHz): 2.96 Bandwidth(GB/s):7.200784223247125
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 52
found EndRun
total search time: 0.3047977387905121
#Smdfiles: 52 #Events: 10000403 Elapsed Time (s): 4.81 Rate (MHz): 2.08 Bandwidth(GB/s):8.223357074851291

We compare these results with a script that just perform disk reading (no computation).

#Files: 1 Elapsed(s): 0.26s. Bandwidth(GB/s):2.97
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 2
#Files: 2 Elapsed(s): 0.32s. Bandwidth(GB/s):4.70
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 4
#Files: 4 Elapsed(s): 0.40s. Bandwidth(GB/s):7.53
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 8
#Files: 8 Elapsed(s): 0.62s. Bandwidth(GB/s):9.88
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 16
#Files: 16 Elapsed(s): 1.32s. Bandwidth(GB/s):9.20
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 32
#Files: 32 Elapsed(s): 2.36s. Bandwidth(GB/s):10.29
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) parallel $ python tst.py 52
#Files: 52 Elapsed(s): 3.64s. Bandwidth(GB/s):10.86

Note that we do not see a big improvement eliminating computation task, which is an indication that we are already I/O limited in this case.

Location of test scripts:

https://github.com/monarin/psana-nersc/blob/master/psana2/dev_smd0.py

https://github.com/monarin/divelite/blob/master/cython/parallel/test_multitasking_threads.py (formerly tst.py)

Page tree

Overlapping I/O with computation