Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

We explored if allowing disk reading task to overlap with computation task on Smd0 (chunk building) can speed up the processing rate.

Test setup

Location of test data:

smd_dir = '/cds/data/drpsrcf/users/monarin/xtcdata/10M60n/xtcdata/smalldata'

Parameters for dev_smd0.py

os.environ['PS_SMD_MAX_RETRIES'] = '0'

os.environ['PS_SMD_N_EVENTS'] = '10000'

os.environ['PS_SMD_CHUNKSIZE'] = '16777216'

os.environ['PS_SMD0_NUM_THREADS'] = '32'

Parameters for test_multitasking_threads.py

cdef uint64_t buf_size = 100000000

cdef uint64_t read_size = 5000000

Location of test scripts:

https://github.com/monarin/psana-nersc/blob/master/psana2/dev_smd0.py

https://github.com/monarin/divelite/blob/master/cython/parallel/test_multitasking_threads.py (formerly tst.py)

Smd0 performance w/o overlapping I/O

Below shows performance of Smd0 (no eventbuilder cores connected) when reading and building chunks synchronously. 

(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 1
found EndRun
total search time: 0.041627854108810425
#Smdfiles: 1 #Events: 10000403 Elapsed Time (s): 0.31 Rate (MHz): 32.35 Bandwidth(GB/s):2.4585713863431047
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 2
found EndRun
total search time: 0.07852859795093536
#Smdfiles: 2 #Events: 10000403 Elapsed Time (s): 0.59 Rate (MHz): 16.95 Bandwidth(GB/s):2.5766052518112645
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 4
found EndRun
total search time: 0.1361357718706131
#Smdfiles: 4 #Events: 10000403 Elapsed Time (s): 0.81 Rate (MHz): 12.36 Bandwidth(GB/s):3.757219106209765
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 8
found EndRun
total search time: 0.2327030450105667
#Smdfiles: 8 #Events: 10000403 Elapsed Time (s): 1.20 Rate (MHz): 8.36 Bandwidth(GB/s):5.085212222769722
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 16
found EndRun
total search time: 0.24143944680690765
#Smdfiles: 16 #Events: 10000403 Elapsed Time (s): 1.97 Rate (MHz): 5.07 Bandwidth(GB/s):6.160083997216601
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 32
found EndRun
total search time: 0.27678677439689636
#Smdfiles: 32 #Events: 10000403 Elapsed Time (s): 3.38 Rate (MHz): 2.96 Bandwidth(GB/s):7.200784223247125
(ps-4.3.2) monarin@drp-srcf-eb003 (master *) psana2 $ python dev_smd0.py 52
found EndRun
total search time: 0.3047977387905121
#Smdfiles: 52 #Events: 10000403 Elapsed Time (s): 4.81 Rate (MHz): 2.08 Bandwidth(GB/s):8.223357074851291

I/O Performance w/o computation

We compare these results with a script that just perform disk reading (no computation). 

...

Note that we do not see a big improvement eliminating computation task, which is an indication that we are already I/O limited in this case

Location of test scripts:

https://github.com/monarin/psana-nersc/blob/master/psana2/dev_smd0.py

https://github.com/monarin/divelite/blob/master/cython/parallel/test_multitasking_threads.py (formerly tst.py)