2024-02-14 test for reduced resource consumption
After Wilko's test and mtg with AMD reps, it is assumed that the size of consumed memory is close to the threshold, which degrades performance. Let's try the same timing tests with replacement of float64 to 32 and 8-panel jungfrau to single-panel.
Description
Use the same resource consumption code as in Algorithm code with minor variation of code, float64 replaced with 32, 8-panel jungfrau replaced with a single-panel.
Results
Summary
Per-CPU, per-event time consumption, ms
Number CPUs | CPU numbers | float64 8panel | Ratio N/1 | float32 8panel | Ratio N/1 | float32 1panel | Ratio N/1 |
---|---|---|---|---|---|---|---|
1 | 5,10,64,127 | 172.3 | 1 | 172.3 | 1 | 20.8 | 1 |
8 | 0-8 | 199 | 1.15 | 189 | 1.10 | 21.4 | 1.03 |
16 | 0-15 | 257 | 1.49 | 223 | 1.29 | 21.8 | 1.05 |
32 | 32-63 | 260 | 1.51 | 229 | 1.33 | 22.8 | 1.10 |
64 | 64-127 | 493 | 2.86 | 345 | 2.00 | 27.8 | 1.34 |
120 | all but 16-23 weka | 379 | 2.19 | 303 | 1.76 | 27.3 | 1.31 |
- Max consumed time is observed for 64-CPU occupancy, not for 120 ...
- Reduction of memory consumption improves scaling
2024-02-16 jungfrau single panel timing in real data
Description
Results
single core timing
mpirun -n 1 python Detector/examples/test-scaling-mpi.py 2
80-core timing
mpirun -n 80 python Detector/examples/test-scaling-mpi.py 2
Summary
time per event per core
- single core: 43 ms
- 80-core: 84 ms
2024-02-20 jungfrau single panel timing in real data
Description
Use local version of calib_jungfrau with timing points:
def calib_jungfrau(det, evt, cmpars=(7,3,200,10), **kwa): """ Returns calibrated jungfrau data - gets constants - gets raw data - evaluates (code - pedestal - offset) - applys common mode correction if turned on - apply gain factor Parameters - det (psana.Detector) - Detector object - evt (psana.Event) - Event object - cmpars (tuple) - common mode parameters - cmpars[0] - algorithm # 7-for jungfrau - cmpars[1] - control bit-word 1-in rows, 2-in columns - cmpars[2] - maximal applied correction - **kwa - used here and passed to det.mask_v2 or det.mask_comb - nda_raw - if not None, substitutes evt.raw() - mbits - DEPRECATED parameter of the det.mask_comb(...) - mask - user defined mask passed as optional parameter """ t00 = time() src = det.source # - src (psana.Source) - Source object nda_raw = kwa.get('nda_raw', None) arr = det.raw(evt) if nda_raw is None else nda_raw # shape:(<npanels>, 512, 1024) dtype:uint16 if arr is None: return None t01 = time() peds = det.pedestals(evt) # - 4d pedestals shape:(3, 1, 512, 1024) dtype:float32 if peds is None: return None t02 = time() gain = det.gain(evt) # - 4d gains offs = det.offset(evt) # - 4d offset t03 = time() detname = string_from_source(det.source) cmp = det.common_mode(evt) if cmpars is None else cmpars t04 = time() if gain is None: gain = np.ones_like(peds) # - 4d gains if offs is None: offs = np.zeros_like(peds) # - 4d gains #print(info_ndarr(peds, 'peds')) #print(info_ndarr(gain, 'gain')) #print(info_ndarr(offs, 'offs')) # cache gfac = store.gfac.get(detname, None) # det.name if gfac is None: gfac = divide_protected(np.ones_like(peds), gain) store.gfac[detname] = gfac store.arr1 = np.ones_like(arr, dtype=np.int8) t05 = time() # Define bool arrays of ranges # faster than bit operations gr0 = arr < BW1 # 490 us gr1 =(arr >= BW1) & (arr<BW2) # 714 us gr2 = arr >= BW3 # 400 us t06 = time() # Subtract pedestals arrf = np.array(arr & MSK, dtype=np.float32) t07 = time() arrf[gr0] -= peds[0,gr0] arrf[gr1] -= peds[1,gr1] #- arrf[gr1] arrf[gr2] -= peds[2,gr2] #- arrf[gr2] t08 = time() factor = np.select((gr0, gr1, gr2), (gfac[0,:], gfac[1,:], gfac[2,:]), default=1) # 2msec t09 = time() offset = np.select((gr0, gr1, gr2), (offs[0,:], offs[1,:], offs[2,:]), default=0) t10 = time() arrf -= offset # Apply offset correction t11 = time() #print(' time to subtract offset(sec): %.06f' % (t11-t10)) # ~< 100us if store.mask is None: store.mask = det.mask_total(evt, **kwa) mask = store.mask t12 = time() if cmp is not None: mode, cormax = int(cmp[1]), cmp[2] npixmin = cmp[3] if len(cmp)>3 else 10 if mode>0: #arr1 = store.arr1 #grhg = np.select((gr0, gr1), (arr1, arr1), default=0) logger.debug(info_ndarr(gr0, 'gain group0')) logger.debug(info_ndarr(mask, 'mask')) t0_sec_cm = time() gmask = np.bitwise_and(gr0, mask) if mask is not None else gr0 #sh = (nsegs, 512, 1024) hrows = 256 #512/2 for s in range(arrf.shape[0]): if mode & 4: # in banks: (512/2,1024/16) = (256,64) pixels # 100 ms common_mode_2d_hsplit_nbanks(arrf[s,:hrows,:], mask=gmask[s,:hrows,:], nbanks=16, cormax=cormax, npix_min=npixmin) common_mode_2d_hsplit_nbanks(arrf[s,hrows:,:], mask=gmask[s,hrows:,:], nbanks=16, cormax=cormax, npix_min=npixmin) if mode & 1: # in rows per bank: 1024/16 = 64 pixels # 275 ms common_mode_rows_hsplit_nbanks(arrf[s,], mask=gmask[s,], nbanks=16, cormax=cormax, npix_min=npixmin) if mode & 2: # in cols per bank: 512/2 = 256 pixels # 290 ms common_mode_cols(arrf[s,:hrows,:], mask=gmask[s,:hrows,:], cormax=cormax, npix_min=npixmin) common_mode_cols(arrf[s,hrows:,:], mask=gmask[s,hrows:,:], cormax=cormax, npix_min=npixmin) logger.debug('TIME: common-mode correction time = %.6f sec' % (time()-t0_sec_cm)) t13 = time() resp = arrf * factor if mask is None else arrf * factor * mask # gain correction t14 = time() times = np.array((t00, t01, t02, t03, t04, t05, t06, t07, t08, t09, t10, t11, t12, t13, t14), dtype=np.float64) return resp, times
Results
single-core processing ====================== mpirun -n 1 python Detector/examples/test-scaling-mpi.py 2 figs/fig-mpi-data-1p-v2-sdfmilan216-ncores01-summary.txt: hostname:sdfmilan216 rank:000 cpu:000 cmt:1p-v2 proc time (sec) mean: 0.0427 +/- 0.0010 rms: 0.0011 +/- 0.0007 hostname:sdfmilan216 rank:000 cpu:000 cmt:1p-v2 proc time (sec) mean: 0.0428 +/- 0.0011 rms: 0.0012 +/- 0.0008 hostname:sdfmilan216 rank:000 cpu:000 cmt:1p-v2 proc time (sec) mean: 0.0428 +/- 0.0014 rms: 0.0015 +/- 0.0010 hostname:sdfmilan216 rank:000 cpu:000 cmt:1p-v2 proc time (sec) mean: 0.0427 +/- 0.0010 rms: 0.0011 +/- 0.0007 80-core processing ================== mpirun -n 80 python Detector/examples/test-scaling-mpi.py 2 python Detector/examples/test-scaling-mpi.py -99 figs/fig-mpi-data-1p-v02-sdfmilan216-ncores80-summary-ordered.txt: hostname:sdfmilan216 rank:000 cpu:007 cmt:1p-v2 proc time (sec) mean: 0.1041 +/- 0.0114 rms: 0.0480 +/- 0.0080 hostname:sdfmilan216 rank:001 cpu:027 cmt:1p-v2 proc time (sec) mean: 0.0775 +/- 0.0058 rms: 0.0192 +/- 0.0041 hostname:sdfmilan216 rank:002 cpu:042 cmt:1p-v2 proc time (sec) mean: 0.0692 +/- 0.0048 rms: 0.0123 +/- 0.0034 hostname:sdfmilan216 rank:003 cpu:054 cmt:1p-v2 proc time (sec) mean: 0.0516 +/- 0.0020 rms: 0.0028 +/- 0.0014 hostname:sdfmilan216 rank:004 cpu:069 cmt:1p-v2 proc time (sec) mean: 0.1066 +/- 0.0100 rms: 0.0432 +/- 0.0071 hostname:sdfmilan216 rank:005 cpu:086 cmt:1p-v2 proc time (sec) mean: 0.0974 +/- 0.0076 rms: 0.0309 +/- 0.0054 hostname:sdfmilan216 rank:006 cpu:100 cmt:1p-v2 proc time (sec) mean: 0.1153 +/- 0.0363 rms: 0.1002 +/- 0.0257 hostname:sdfmilan216 rank:007 cpu:122 cmt:1p-v2 proc time (sec) mean: 0.1206 +/- 0.0205 rms: 0.0796 +/- 0.0145 hostname:sdfmilan216 rank:008 cpu:009 cmt:1p-v2 proc time (sec) mean: 0.1003 +/- 0.0107 rms: 0.0457 +/- 0.0076 hostname:sdfmilan216 rank:009 cpu:028 cmt:1p-v2 proc time (sec) mean: 0.0759 +/- 0.0064 rms: 0.0188 +/- 0.0045 hostname:sdfmilan216 rank:010 cpu:041 cmt:1p-v2 proc time (sec) mean: 0.0680 +/- 0.0057 rms: 0.0137 +/- 0.0040 hostname:sdfmilan216 rank:011 cpu:060 cmt:1p-v2 proc time (sec) mean: 0.0527 +/- 0.0020 rms: 0.0028 +/- 0.0014 hostname:sdfmilan216 rank:012 cpu:076 cmt:1p-v2 proc time (sec) mean: 0.1057 +/- 0.0097 rms: 0.0433 +/- 0.0069 hostname:sdfmilan216 rank:013 cpu:082 cmt:1p-v2 proc time (sec) mean: 0.0901 +/- 0.0070 rms: 0.0272 +/- 0.0050 hostname:sdfmilan216 rank:014 cpu:106 cmt:1p-v2 proc time (sec) mean: 0.1058 +/- 0.0227 rms: 0.0702 +/- 0.0161 hostname:sdfmilan216 rank:015 cpu:120 cmt:1p-v2 proc time (sec) mean: 0.1235 +/- 0.0215 rms: 0.0813 +/- 0.0152 hostname:sdfmilan216 rank:016 cpu:004 cmt:1p-v2 proc time (sec) mean: 0.1057 +/- 0.0119 rms: 0.0499 +/- 0.0084 hostname:sdfmilan216 rank:017 cpu:025 cmt:1p-v2 proc time (sec) mean: 0.0779 +/- 0.0063 rms: 0.0204 +/- 0.0045 hostname:sdfmilan216 rank:018 cpu:039 cmt:1p-v2 proc time (sec) mean: 0.0676 +/- 0.0054 rms: 0.0126 +/- 0.0038 hostname:sdfmilan216 rank:019 cpu:049 cmt:1p-v2 proc time (sec) mean: 0.0520 +/- 0.0019 rms: 0.0025 +/- 0.0014 hostname:sdfmilan216 rank:020 cpu:075 cmt:1p-v2 proc time (sec) mean: 0.1052 +/- 0.0098 rms: 0.0408 +/- 0.0069 hostname:sdfmilan216 rank:021 cpu:094 cmt:1p-v2 proc time (sec) mean: 0.0900 +/- 0.0068 rms: 0.0268 +/- 0.0048 hostname:sdfmilan216 rank:022 cpu:108 cmt:1p-v2 proc time (sec) mean: 0.1113 +/- 0.0193 rms: 0.0703 +/- 0.0136 hostname:sdfmilan216 rank:023 cpu:124 cmt:1p-v2 proc time (sec) mean: 0.1199 +/- 0.0197 rms: 0.0774 +/- 0.0140 hostname:sdfmilan216 rank:024 cpu:014 cmt:1p-v2 proc time (sec) mean: 0.1060 +/- 0.0104 rms: 0.0457 +/- 0.0073 hostname:sdfmilan216 rank:025 cpu:028 cmt:1p-v2 proc time (sec) mean: 0.0747 +/- 0.0054 rms: 0.0173 +/- 0.0038 hostname:sdfmilan216 rank:026 cpu:036 cmt:1p-v2 proc time (sec) mean: 0.0694 +/- 0.0051 rms: 0.0135 +/- 0.0036 hostname:sdfmilan216 rank:027 cpu:055 cmt:1p-v2 proc time (sec) mean: 0.0523 +/- 0.0020 rms: 0.0026 +/- 0.0014 hostname:sdfmilan216 rank:028 cpu:066 cmt:1p-v2 proc time (sec) mean: 0.1039 +/- 0.0093 rms: 0.0396 +/- 0.0066 hostname:sdfmilan216 rank:029 cpu:092 cmt:1p-v2 proc time (sec) mean: 0.0914 +/- 0.0061 rms: 0.0231 +/- 0.0043 hostname:sdfmilan216 rank:030 cpu:097 cmt:1p-v2 proc time (sec) mean: 0.1175 +/- 0.0258 rms: 0.0805 +/- 0.0183 hostname:sdfmilan216 rank:031 cpu:113 cmt:1p-v2 proc time (sec) mean: 0.1162 +/- 0.0171 rms: 0.0704 +/- 0.0121 hostname:sdfmilan216 rank:032 cpu:001 cmt:1p-v2 proc time (sec) mean: 0.1041 +/- 0.0120 rms: 0.0500 +/- 0.0085 hostname:sdfmilan216 rank:033 cpu:031 cmt:1p-v2 proc time (sec) mean: 0.0792 +/- 0.0065 rms: 0.0212 +/- 0.0046 hostname:sdfmilan216 rank:034 cpu:038 cmt:1p-v2 proc time (sec) mean: 0.0692 +/- 0.0052 rms: 0.0132 +/- 0.0037 hostname:sdfmilan216 rank:035 cpu:050 cmt:1p-v2 proc time (sec) mean: 0.0527 +/- 0.0021 rms: 0.0031 +/- 0.0015 hostname:sdfmilan216 rank:036 cpu:072 cmt:1p-v2 proc time (sec) mean: 0.1025 +/- 0.0100 rms: 0.0420 +/- 0.0070 hostname:sdfmilan216 rank:037 cpu:095 cmt:1p-v2 proc time (sec) mean: 0.0945 +/- 0.0067 rms: 0.0272 +/- 0.0047 hostname:sdfmilan216 rank:038 cpu:110 cmt:1p-v2 proc time (sec) mean: 0.1074 +/- 0.0191 rms: 0.0660 +/- 0.0135 hostname:sdfmilan216 rank:039 cpu:119 cmt:1p-v2 proc time (sec) mean: 0.1095 +/- 0.0190 rms: 0.0714 +/- 0.0134 hostname:sdfmilan216 rank:040 cpu:011 cmt:1p-v2 proc time (sec) mean: 0.1043 +/- 0.0102 rms: 0.0453 +/- 0.0072 hostname:sdfmilan216 rank:041 cpu:029 cmt:1p-v2 proc time (sec) mean: 0.0768 +/- 0.0063 rms: 0.0199 +/- 0.0044 hostname:sdfmilan216 rank:042 cpu:037 cmt:1p-v2 proc time (sec) mean: 0.0683 +/- 0.0050 rms: 0.0119 +/- 0.0035 hostname:sdfmilan216 rank:043 cpu:058 cmt:1p-v2 proc time (sec) mean: 0.0523 +/- 0.0021 rms: 0.0029 +/- 0.0015 hostname:sdfmilan216 rank:044 cpu:070 cmt:1p-v2 proc time (sec) mean: 0.1074 +/- 0.0117 rms: 0.0471 +/- 0.0083 hostname:sdfmilan216 rank:045 cpu:087 cmt:1p-v2 proc time (sec) mean: 0.0907 +/- 0.0068 rms: 0.0261 +/- 0.0048 hostname:sdfmilan216 rank:046 cpu:107 cmt:1p-v2 proc time (sec) mean: 0.1116 +/- 0.0225 rms: 0.0752 +/- 0.0159 hostname:sdfmilan216 rank:047 cpu:125 cmt:1p-v2 proc time (sec) mean: 0.1165 +/- 0.0220 rms: 0.0799 +/- 0.0155 hostname:sdfmilan216 rank:048 cpu:013 cmt:1p-v2 proc time (sec) mean: 0.1087 +/- 0.0104 rms: 0.0462 +/- 0.0073 hostname:sdfmilan216 rank:049 cpu:030 cmt:1p-v2 proc time (sec) mean: 0.0776 +/- 0.0063 rms: 0.0199 +/- 0.0045 hostname:sdfmilan216 rank:050 cpu:035 cmt:1p-v2 proc time (sec) mean: 0.0680 +/- 0.0049 rms: 0.0119 +/- 0.0035 hostname:sdfmilan216 rank:051 cpu:061 cmt:1p-v2 proc time (sec) mean: 0.0526 +/- 0.0020 rms: 0.0028 +/- 0.0014 hostname:sdfmilan216 rank:052 cpu:074 cmt:1p-v2 proc time (sec) mean: 0.1010 +/- 0.0103 rms: 0.0405 +/- 0.0073 hostname:sdfmilan216 rank:053 cpu:089 cmt:1p-v2 proc time (sec) mean: 0.0953 +/- 0.0070 rms: 0.0288 +/- 0.0049 hostname:sdfmilan216 rank:054 cpu:104 cmt:1p-v2 proc time (sec) mean: 0.1162 +/- 0.0385 rms: 0.1044 +/- 0.0272 hostname:sdfmilan216 rank:055 cpu:121 cmt:1p-v2 proc time (sec) mean: 0.1134 +/- 0.0249 rms: 0.0804 +/- 0.0176 hostname:sdfmilan216 rank:056 cpu:005 cmt:1p-v2 proc time (sec) mean: 0.1029 +/- 0.0114 rms: 0.0490 +/- 0.0081 hostname:sdfmilan216 rank:057 cpu:028 cmt:1p-v2 proc time (sec) mean: 0.0747 +/- 0.0059 rms: 0.0176 +/- 0.0042 hostname:sdfmilan216 rank:058 cpu:039 cmt:1p-v2 proc time (sec) mean: 0.0679 +/- 0.0051 rms: 0.0124 +/- 0.0036 hostname:sdfmilan216 rank:059 cpu:052 cmt:1p-v2 proc time (sec) mean: 0.0520 +/- 0.0020 rms: 0.0027 +/- 0.0014 hostname:sdfmilan216 rank:060 cpu:072 cmt:1p-v2 proc time (sec) mean: 0.1065 +/- 0.0113 rms: 0.0461 +/- 0.0080 hostname:sdfmilan216 rank:061 cpu:081 cmt:1p-v2 proc time (sec) mean: 0.0913 +/- 0.0071 rms: 0.0281 +/- 0.0050 hostname:sdfmilan216 rank:062 cpu:105 cmt:1p-v2 proc time (sec) mean: 0.1148 +/- 0.0327 rms: 0.0935 +/- 0.0231 hostname:sdfmilan216 rank:063 cpu:116 cmt:1p-v2 proc time (sec) mean: 0.1075 +/- 0.0161 rms: 0.0647 +/- 0.0114 hostname:sdfmilan216 rank:064 cpu:006 cmt:1p-v2 proc time (sec) mean: 0.1062 +/- 0.0110 rms: 0.0480 +/- 0.0078 hostname:sdfmilan216 rank:065 cpu:046 cmt:1p-v2 proc time (sec) mean: 0.0681 +/- 0.0052 rms: 0.0125 +/- 0.0037 hostname:sdfmilan216 rank:066 cpu:040 cmt:1p-v2 proc time (sec) mean: 0.0692 +/- 0.0055 rms: 0.0143 +/- 0.0039 hostname:sdfmilan216 rank:067 cpu:063 cmt:1p-v2 proc time (sec) mean: 0.0525 +/- 0.0021 rms: 0.0030 +/- 0.0015 hostname:sdfmilan216 rank:068 cpu:071 cmt:1p-v2 proc time (sec) mean: 0.1087 +/- 0.0116 rms: 0.0486 +/- 0.0082 hostname:sdfmilan216 rank:069 cpu:090 cmt:1p-v2 proc time (sec) mean: 0.0975 +/- 0.0068 rms: 0.0286 +/- 0.0048 hostname:sdfmilan216 rank:070 cpu:102 cmt:1p-v2 proc time (sec) mean: 0.1169 +/- 0.0312 rms: 0.0952 +/- 0.0221 hostname:sdfmilan216 rank:071 cpu:118 cmt:1p-v2 proc time (sec) mean: 0.1113 +/- 0.0191 rms: 0.0697 +/- 0.0135 hostname:sdfmilan216 rank:072 cpu:008 cmt:1p-v2 proc time (sec) mean: 0.1047 +/- 0.0105 rms: 0.0455 +/- 0.0074 hostname:sdfmilan216 rank:073 cpu:040 cmt:1p-v2 proc time (sec) mean: 0.0684 +/- 0.0053 rms: 0.0131 +/- 0.0038 hostname:sdfmilan216 rank:074 cpu:043 cmt:1p-v2 proc time (sec) mean: 0.0684 +/- 0.0051 rms: 0.0122 +/- 0.0036 hostname:sdfmilan216 rank:075 cpu:050 cmt:1p-v2 proc time (sec) mean: 0.0531 +/- 0.0020 rms: 0.0029 +/- 0.0014 hostname:sdfmilan216 rank:076 cpu:077 cmt:1p-v2 proc time (sec) mean: 0.1007 +/- 0.0091 rms: 0.0381 +/- 0.0065 hostname:sdfmilan216 rank:077 cpu:083 cmt:1p-v2 proc time (sec) mean: 0.0909 +/- 0.0078 rms: 0.0273 +/- 0.0055 hostname:sdfmilan216 rank:078 cpu:105 cmt:1p-v2 proc time (sec) mean: 0.1118 +/- 0.0241 rms: 0.0763 +/- 0.0170 hostname:sdfmilan216 rank:079 cpu:114 cmt:1p-v2 proc time (sec) mean: 0.1148 +/- 0.0229 rms: 0.0815 +/- 0.0162 mean time (sec): 0.0908
Summary
det.calib fir fungfrau already uses float32
time per event per core
- single core: 43 ms
- 80-core: 91 ms
References
- Scaling behavior of psana1 - Part 1 - det.calib method in multicore processing with mpi
- Scaling behavior of psana1 - Part 2 - test with command perf stat