Page History
...
- electron-electron hsd-hsd outer product (symmetric, same hsd, can save a factor of 2)
- ion-ion hsd-hsd outer product (symmetric, same hsd, can save a factor of 2)
- electron-ion hsd-hsd outer product
- (most important) electron-fzp outer product (fzp is piranha: 2048)
- ion-fzp outer product (fzp is piranha: 2048)
- (most important) fzp-fzp outer product (symmetric, can save a factor of 2)
Performance with Fex Data
We tested the 6 outer products outlined above on drp-srcf-eb002. The best rate is 1.25 kHz for all 6 operations 4kHz for the most important ones.
Code Block | ||
---|---|---|
| ||
(ps-4.5.26) monarin@drp-srcf-eb002 (master *) tmolw8819 👁)$ python test_fast_outer.py
ehsd.shape=(20, 50),fzp.shape=(200,) dtype=<class 'numpy.float32'>
Elapsed Time (s): 20 blobs 0.00077 fzp: 0.00003 total:0.00080
Rate (kHz) : 20 blobs 1.30 fzp: 33.29 total:1.25
(ps-4.5.26) monarin@drp-srcf-eb002 (master *) tmolw8819 👁)$ python test_fast_outer.py
ehsd.shape=(20, 50),fzp.shape=(200,) dtype=<class 'numpy.float32'>
Elapsed Time (s): 20 blobs 0.00022 fzp: 0.00003 total:0.00025
Rate (kHz) : 20 blobs 4.64 fzp: 33.45 total:4.07 |
Python script for the results above:
Code Block | ||||
---|---|---|---|---|
| ||||
import numpy as np
import time
import torch
dtype = np.float32
ctor = np
n_samples = 50
n_blobs = 20
n_fzp_samples = 200
ehsd = ctor.random.rand(1000).reshape((n_blobs, n_samples)).astype(dtype)
ihsd = ctor.random.rand(1000).reshape((n_blobs, n_samples)).astype(dtype)
fzp = ctor.random.rand(n_fzp_samples).astype(dtype)
n_events = 10
tt = ctor.zeros((n_events,3))
o_ehsd_ehsd = ctor.zeros((n_blobs, n_samples, n_samples), dtype=dtype)
o_ihsd_ihsd = ctor.zeros((n_blobs, n_samples, n_samples), dtype=dtype)
o_ehsd_ihsd = ctor.zeros((n_blobs, n_samples, n_samples), dtype=dtype)
o_ehsd_fzp = ctor.zeros((n_blobs, n_samples, n_fzp_samples), dtype=dtype)
o_ihsd_fzp = ctor.zeros((n_blobs, n_samples, n_fzp_samples), dtype=dtype)
o_fzp_fzp = ctor.zeros((n_fzp_samples, n_fzp_samples), dtype=dtype)
for i in range(n_events):
t0 = time.monotonic()
for i_blob, (_ehsd, _ihsd) in enumerate(zip(ehsd, ihsd)):
o_ehsd_ehsd[i_blob,:] = ctor.outer(_ehsd, _ehsd)
o_ihsd_ihsd[i_blob,:] = ctor.outer(_ihsd, _ihsd)
o_ehsd_ihsd[i_blob,:] = ctor.outer(_ehsd, _ihsd)
o_ehsd_fzp[i_blob,:] = ctor.outer(_ehsd, fzp)
o_ihsd_fzp[i_blob,:] = ctor.outer(_ihsd, fzp)
t1 = time.monotonic()
o_fzp_fzp[:] = ctor.outer(fzp, fzp)
t2 = time.monotonic()
tt[i, :] = [t1-t0, t2-t1, t2-t0]
print(f'{ehsd.shape=},{fzp.shape=} {dtype=}')
mean_tt = np.mean(tt, axis=0)
print(f'Elapsed Time (s): {n_blobs} blobs {mean_tt[0]:.5f} fzp: {mean_tt[1]:.5f} total:{mean_tt[2]:.5f}')
rate = (n_events/np.sum(tt, axis=0))*1e-3
print(f'Rate (kHz) : {n_blobs} blobs {rate[0]:.2f} fzp: {rate[1]:.2f} total:{rate[2]:.2f}')
|
Performance with reduced full data
Numpy Outer Products
Code Block | ||
---|---|---|
| ||
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.int16'> a,b min: 0.07 max: 0.09 avg: 0.08 rate:12.04Hz b,b min: 2.03 max: 2.48 avg: 2.31 rate:0.43Hz total min: 2.10 max: 2.57 avg: 2.40 rate:0.42Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.float32'> a,b min: 0.14 max: 0.17 avg: 0.16 rate:6.31Hz b,b min: 3.88 max: 4.64 avg: 4.36 rate:0.23Hz total min: 4.02 max: 4.82 avg: 4.52 rate:0.22Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.float64'> a,b min: 0.28 max: 0.33 avg: 0.31 rate:3.26Hz b,b min: 7.45 max: 8.67 avg: 8.15 rate:0.12Hz total min: 7.73 max: 9.00 avg: 8.46 rate:0.12Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.int16'> a,b min: 0.01 max: 0.01 avg: 0.01 rate:77.62Hz b,b min: 0.04 max: 0.05 avg: 0.05 rate:19.64Hz total min: 0.06 max: 0.07 avg: 0.06 rate:15.68Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.float32'> a,b min: 0.02 max: 0.03 avg: 0.03 rate:38.00Hz b,b min: 0.09 max: 0.11 avg: 0.10 rate:9.87Hz total min: 0.11 max: 0.14 avg: 0.13 rate:7.84Hz (ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.float64'> a,b min: 0.04 max: 0.05 avg: 0.05 rate:19.58Hz b,b min: 0.17 max: 0.21 avg: 0.20 rate:5.00Hz total min: 0.22 max: 0.27 avg: 0.25 rate:3.98Hz |
...
Overview
Content Tools