Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • electron-electron hsd-hsd outer product (symmetric, same hsd, can save a factor of 2)
  • ion-ion hsd-hsd outer product (symmetric, same hsd, can save a factor of 2)
  • electron-ion hsd-hsd outer product
  • (most important) electron-fzp outer product (fzp is piranha: 2048)
  • ion-fzp outer product (fzp is piranha: 2048)
  • (most important) fzp-fzp outer product (symmetric, can save a factor of 2)

Performance with Fex Data

We tested the 6 outer products outlined above on drp-srcf-eb002. The best rate is 1.25 kHz for all 6 operations 4kHz for the most important ones. 

Code Block
languagebash
(ps-4.5.26) monarin@drp-srcf-eb002 (master *) tmolw8819 👁)$ python test_fast_outer.py 
ehsd.shape=(20, 50),fzp.shape=(200,) dtype=<class 'numpy.float32'>
Elapsed Time (s): 20 blobs 0.00077 fzp: 0.00003 total:0.00080
Rate (kHz)      : 20 blobs 1.30 fzp: 33.29 total:1.25
(ps-4.5.26) monarin@drp-srcf-eb002 (master *) tmolw8819 👁)$ python test_fast_outer.py 
ehsd.shape=(20, 50),fzp.shape=(200,) dtype=<class 'numpy.float32'>
Elapsed Time (s): 20 blobs 0.00022 fzp: 0.00003 total:0.00025
Rate (kHz)      : 20 blobs 4.64 fzp: 33.45 total:4.07

Python script for the results above:

Code Block
languagepy
titletest_fast_outer.py
import numpy as np
import time
import torch

dtype = np.float32

ctor = np

n_samples = 50
n_blobs = 20
n_fzp_samples = 200
ehsd = ctor.random.rand(1000).reshape((n_blobs, n_samples)).astype(dtype)
ihsd = ctor.random.rand(1000).reshape((n_blobs, n_samples)).astype(dtype)
fzp = ctor.random.rand(n_fzp_samples).astype(dtype)

n_events = 10
tt = ctor.zeros((n_events,3))
o_ehsd_ehsd = ctor.zeros((n_blobs, n_samples, n_samples), dtype=dtype)
o_ihsd_ihsd = ctor.zeros((n_blobs, n_samples, n_samples), dtype=dtype)
o_ehsd_ihsd = ctor.zeros((n_blobs, n_samples, n_samples), dtype=dtype)
o_ehsd_fzp = ctor.zeros((n_blobs, n_samples, n_fzp_samples), dtype=dtype)
o_ihsd_fzp = ctor.zeros((n_blobs, n_samples, n_fzp_samples), dtype=dtype)
o_fzp_fzp = ctor.zeros((n_fzp_samples, n_fzp_samples), dtype=dtype)      
for i in range(n_events):
    t0 = time.monotonic()
    for i_blob, (_ehsd, _ihsd) in enumerate(zip(ehsd, ihsd)):
        o_ehsd_ehsd[i_blob,:] = ctor.outer(_ehsd, _ehsd)
        o_ihsd_ihsd[i_blob,:] = ctor.outer(_ihsd, _ihsd)
        o_ehsd_ihsd[i_blob,:] = ctor.outer(_ehsd, _ihsd)
        o_ehsd_fzp[i_blob,:] = ctor.outer(_ehsd, fzp)
        o_ihsd_fzp[i_blob,:] = ctor.outer(_ihsd, fzp)
    t1 = time.monotonic()
    o_fzp_fzp[:] = ctor.outer(fzp, fzp)
    t2 = time.monotonic()
    tt[i, :] = [t1-t0, t2-t1, t2-t0]
    

print(f'{ehsd.shape=},{fzp.shape=} {dtype=}')
mean_tt = np.mean(tt, axis=0)
print(f'Elapsed Time (s): {n_blobs} blobs {mean_tt[0]:.5f} fzp: {mean_tt[1]:.5f} total:{mean_tt[2]:.5f}')
rate = (n_events/np.sum(tt, axis=0))*1e-3
print(f'Rate (kHz)      : {n_blobs} blobs {rate[0]:.2f} fzp: {rate[1]:.2f} total:{rate[2]:.2f}')

Performance with reduced full data

Numpy Outer Products

Code Block
languagebash
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.int16'>
a,b min: 0.07 max: 0.09 avg: 0.08 rate:12.04Hz
b,b min: 2.03 max: 2.48 avg: 2.31 rate:0.43Hz
total min: 2.10 max: 2.57 avg: 2.40 rate:0.42Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.float32'>
a,b min: 0.14 max: 0.17 avg: 0.16 rate:6.31Hz
b,b min: 3.88 max: 4.64 avg: 4.36 rate:0.23Hz
total min: 4.02 max: 4.82 avg: 4.52 rate:0.22Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(59400,) dtype=<class 'numpy.float64'>
a,b min: 0.28 max: 0.33 avg: 0.31 rate:3.26Hz
b,b min: 7.45 max: 8.67 avg: 8.15 rate:0.12Hz
total min: 7.73 max: 9.00 avg: 8.46 rate:0.12Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.int16'>
a,b min: 0.01 max: 0.01 avg: 0.01 rate:77.62Hz
b,b min: 0.04 max: 0.05 avg: 0.05 rate:19.64Hz
total min: 0.06 max: 0.07 avg: 0.06 rate:15.68Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.float32'>
a,b min: 0.02 max: 0.03 avg: 0.03 rate:38.00Hz
b,b min: 0.09 max: 0.11 avg: 0.10 rate:9.87Hz
total min: 0.11 max: 0.14 avg: 0.13 rate:7.84Hz
(ps-4.5.26) monarin@drp-srcf-eb003 (master *) tmolw8819 👁)$ python test_np_outer.py 
a.shape=(2048,),b.shape=(8000,) dtype=<class 'numpy.float64'>
a,b min: 0.04 max: 0.05 avg: 0.05 rate:19.58Hz
b,b min: 0.17 max: 0.21 avg: 0.20 rate:5.00Hz
total min: 0.22 max: 0.27 avg: 0.25 rate:3.98Hz

...