Page History
...
- Can limit ourselves to 8000 hsd samples (~1us for both ions/electrons)
- will do development to make slow-ions fit in the 1us window
- At 1MHz 8000 samples is 16GB/s so too big for one drp node with 4-6GB/s limit (especially with 2 hsd's per node)
- can reduce with fex by at least a factor of 8 (20 non-contiguous areas of 50 samples each, changing per-event) to get within 4GB/s per drp node (Taran confirmed that we can do this with the 1us window, factor of 8 applies to both electrons and ions)
- at 100kHz 8000 samples would work from a data volume perspective
- For the FZP 2048 samples from the piranha have one contiguous "blob" of ~200 pixels that should be used for the outer product, which changes on a per-event basis.
- find the highest pixel with a window with a fixed-size window
We want these 6 We want these outer-products:
- electron-electron hsd-hsd outer product (symmetric, same hsd, can save a factor of 2)
- ion-ion hsd-fzp hsd outer product (symmetric, same hsd, can save a factor of 2)
- electron-ion hsd-hsd outer product
- (most important) electron-fzp outer product (fzp is piranha: 2048)
- ion-fzp outer product (fzp is piranha: 2048)
- (most important) fzp-fzp outer product (symmetric, can save a factor of 2)
Performance with Fex Data
We tested the 6 outer products outlined above and accumulate the results back to the full size matrices (3 of 8000 x 8000, 2 of 8000 x 2048, and 1 2048 x 2048) on s3df. The performance per core is around 400 Hz. We scale this up to 1MHz with 2048 cores (18 milano nodes).
This script for this performance test is test_fast_outer_filling.py and was submitted with submit_s3df.sh
Performance with reduced full data
Numpy Outer Products
Code Block | ||
---|---|---|
| ||
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.int16'> a,b min: 0.07 max: 0.09 avg: 0.08 rate:12.04Hz b,b min: 2.03 max: 2.48 avg: 2.31 rate:0.43Hz total min: 2.10 max: 2.57 avg: 2.40 rate:0.42Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.float32'> a,b min: 0.14 max: 0.17 avg: 0.16 rate:6.31Hz b,b min: 3.88 max: 4.64 avg: 4.36 rate:0.23Hz total min: 4.02 max: 4.82 avg: 4.52 rate:0.22Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.float64'> a,b min: 0.28 max: 0.33 avg: 0.31 rate:3.26Hz b,b min: 7.45 max: 8.67 avg: 8.15 rate:0.12Hz total min: 7.73 max: 9.00 avg: 8.46 rate:0.12Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.int16'> a,b min: 0.01 max: 0.01 avg: 0.01 rate:77.62Hz b,b min: 0.04 max: 0.05 avg: 0.05 rate:19.64Hz total min: 0.06 max: 0.07 avg: 0.06 rate:15.68Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.float32'> a,b min: 0.02 max: 0.03 avg: 0.03 rate:38.00Hz b,b min: 0.09 max: 0.11 avg: 0.10 rate:9.87Hz total min: 0.11 max: 0.14 avg: 0.13 rate:7.84Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.float64'> a,b min: 0.04 max: 0.05 avg: 0.05 rate:19.58Hz b,b min: 0.17 max: 0.21 avg: 0.20 rate:5.00Hz total min: 0.22 max: 0.27 avg: 0.25 rate:3.98Hz |
...
Overview
Content Tools