Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Can limit ourselves to 8000 hsd samples (~1us for both ions/electrons)
    • will do development to make slow-ions fit in the 1us window
  • At 1MHz 8000 samples is 16GB/s so too big for one drp node with 4-6GB/s limit (especially with 2 hsd's per node)
    • can reduce with fex by at least a factor of 8 (20 non-contiguous areas of 50 samples each, changing per-event) to get within 4GB/s per drp node (Taran confirmed that we can do this with the 1us window, factor of 8 applies to both electrons and ions)
    • at 100kHz 8000 samples would work from a data volume perspective
  • For the FZP 2048 samples from the piranha have one contiguous "blob" of ~200 pixels that should be used for the outer product, which changes on a per-event basis.
    • find the highest pixel with a window with a fixed-size window

We want these 6 outer-products:

  • electron-electron hsd-hsd outer product (symmetric, same hsd, can save a factor of 2)
  • ion-ion hsd-hsd outer product (symmetric, same hsd, can save a factor of 2)
  • electron-ion hsd-hsd outer product
  • (most important) electron-fzp outer product (fzp is piranha: 2048)
  • ion-fzp outer product (fzp is piranha: 2048)
  • (most important) fzp-fzp outer product (symmetric, can save a factor of 2)

Performance with Fex Data

We tested the 6 outer products outlined above and accumulate the results back to the full size matrices (3 of 8000 x 8000, 2 of 8000 x 2048, and 1 2048 x 2048) on s3df. The performance per core is around 400 Hz. We scale this up to 1MHz with 2048 cores (18 milano nodes).

Image Added

This script for this performance test is test_fast_outer_filling.py and was submitted with submit_s3df.sh

Performance with reduced full data

Numpy Outer Products

Code Block
languagebash
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.int16'>
a,b min: 0.07 max: 0.09 avg: 0.08 rate:12.04Hz
b,b min: 2.03 max: 2.48 avg: 2.31 rate:0.43Hz
total min: 2.10 max: 2.57 avg: 2.40 rate:0.42Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.float32'>
a,b min: 0.14 max: 0.17 avg: 0.16 rate:6.31Hz
b,b min: 3.88 max: 4.64 avg: 4.36 rate:0.23Hz
total min: 4.02 max: 4.82 avg: 4.52 rate:0.22Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.float64'>
a,b min: 0.28 max: 0.33 avg: 0.31 rate:3.26Hz
b,b min: 7.45 max: 8.67 avg: 8.15 rate:0.12Hz
total min: 7.73 max: 9.00 avg: 8.46 rate:0.12Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.int16'>
a,b min: 0.01 max: 0.01 avg: 0.01 rate:77.62Hz
b,b min: 0.04 max: 0.05 avg: 0.05 rate:19.64Hz
total min: 0.06 max: 0.07 avg: 0.06 rate:15.68Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.float32'>
a,b min: 0.02 max: 0.03 avg: 0.03 rate:38.00Hz
b,b min: 0.09 max: 0.11 avg: 0.10 rate:9.87Hz
total min: 0.11 max: 0.14 avg: 0.13 rate:7.84Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.float64'>
a,b min: 0.04 max: 0.05 avg: 0.05 rate:19.58Hz
b,b min: 0.17 max: 0.21 avg: 0.20 rate:5.00Hz
total min: 0.22 max: 0.27 avg: 0.25 rate:3.98Hz

...