Results

Table shows mean time and its statistical uncertainty from histograms for each time point increment.

t point	time increment	point description	time for rank 0/1	rank 0/80	rank 30/80	rank 60/80
1	t1 - t0	det.raw	0.8±0.2 ms	4.0 ±0.6 ms	3.2±0.4 ms	3.5 ±0.8 ms
2	t2 - t1	det.pedestals	15±3 μs	36 ±10 μs	31±6 μs	39 ±17 μs
3	t3 - t2	det.gain,offset	15±2 μs	27 ±4 μs	26±4 μs	27 ±6 μs
4	...	cmpars	25±1 μs	50 ±7 μs	58±26 μs	71 ±33 μs
5		gfac	2±0 μs	6 ±1 μs	7±1 μs	7 ±2 μs
6		gr0,1,2	1.3±0.2 ms	10.5 ±1.1 ms	7.0±0.9 ms	9.7 ±1.6 ms
7		make arrf	1.76±0.05 ms	9.2 ±0.9 ms	6.3±0.7 ms	9.0 ±1.5 ms
8		subtract peds	93.7±3.1 ms	191 ±11 ms	181±15 ms	259 ±26 ms
9		eval gain factor for gain ranges	4.9±0.6 ms	20.3 ±1.5 ms	14.6±1.2 ms	17.3 ±2.0 ms
10		eval offset for gain ranges	6.2±0.4 ms	18.5 ±1.3 ms	18.4±1.4 ms	19.2 ±2.1 ms
11		subtract offset	1.0±0.2 ms	6.0 ±0.7 ms	5.3±0.6 ms	6.2 ±1.2 ms
12		get mask	3±2 μs	6 ±2 μs	6±2 μs	7 ±2 μs
13		common mode turned off	7±1 μs	15 ±2 μs	17±2 μs	20 ±3 μs
14	t14 - t13	apply gain factor and mask	4.0±0.7 ms	14.9 ±2.0 ms	13.9±1.6 ms	19.2 ±3.5 ms
99	t14 - t0	per evt time, inside det.calib	109.8±4.2 ms	276 ±15 ms	247±13 ms	345 ±29 ms
0	t0 - t0 previous evt	time between consecutive det.calib	115.4±3.9 ms	335 ±16 ms	307±14 ms	398 ±32 ms

Summary

single core processing is faster than per/core time in 80–core case, factor 2.5-3 FOR ALL OPERATIONS
in 80-core case: time per core is consistent between cores
all constants are cashed and access to constants is fast at sub-milisecond level
common mode correction is turned off, as well as mask?
most time consuming operation is indexed pedestal subtraction

indexed by gain ranges pedestal subtraction

    t07 = time()

    arrf[gr0] -= peds[0,gr0]
    arrf[gr1] -= peds[1,gr1]
    arrf[gr2] -= peds[2,gr2]

    t08 = time()

bad single-to-multicore scaling issue has nothing to do with particular algorithm, it is common problem for any algorithm