Content
2024-08-30 timing of the calib components in the event loop without mpi
Dataset and Detector
ds = DataSource(exp='uedcom103',run=812) # dark run
det = orun.Detector('epixquad')
Script or test
/lcls2/psana/psana/detector/test-scaling-mpi-epix10ka.py
Code of the det.raw.calib method with removed common mode correction
def calib_epix10ka_any_local_v2(det_raw, evt, **kwa):
""" v2: get rid of common mode correction
"""
t0 = time()
nda_raw = kwa.get('nda_raw', None)
raw = det_raw.raw(evt) if nda_raw is None else nda_raw # shape:(352, 384) or suppose to be later (<nsegs>, 352, 384) dtype:uint16
if ue.cond_msg(raw is None, msg='raw is None'): return None
t1 = time()
gmaps = ue.gain_maps_epix10ka_any(det_raw, evt) #tuple: 7 x shape:(4, 352, 384)
if ue.cond_msg(gmaps is None, msg='gmaps is None'): return None
t2 = time()
store = det_raw._store_ = ue.Storage(det_raw, **kwa) if det_raw._store_ is None else det_raw._store_ #perpix=True
store.counter += 1
if store.counter < 1: ue.print_gmaps_info(gmaps)
t3 = time()
factor = ue.event_constants_for_gmaps(gmaps, store.gfac, default=1) # 3d gain factors
pedest = ue.event_constants_for_gmaps(gmaps, store.peds, default=0) # 3d pedestals
t4 = time()
arrf = np.array(raw & det_raw._data_bit_mask, dtype=np.float32)
t5 = time()
if pedest is not None: arrf -= pedest
logger.debug(ue.info_ndarr(arrf, 'arrf:'))
if ue.cond_msg(factor is None, msg='factor is None - substitute with 1', output_meth=logger.warning): factor = 1
t6 = time()
mask = store.mask
res = arrf * factor if mask is None else arrf * factor * mask # gain correction
t7 = time()
return res, (t0, t1, t2, t3, t4, t5, t6, t7)
Time intervals meaning
- t0 - total time consumed by the calib method
- t1 - access raw = det_raw.raw
- t2 - access gain_maps
- t3 - access cached store of calibration constants
- t4 - evaluate peds and gact - combined constants for gain range
- t5 - make re-writable raw array, arrf, for data bits only, truncate the gain bit
- t6 - subtract pedestals
- t7 - evaluate arrf * factor * mask
Test and results
ps-4.6.3 [dubrovin@sdfiana004:~/LCLS/con-lcls2/lcls2/psana/psana/detector]$ srun --partition milano --account lcls:prjdat21 -n 1 --time=05:00:00 --exclusive --pty /bin/bash
> sdfmilan063
OTHER WINDOW:
kinit
ssh -Y sdfmilan063
cd ~/LCLS/con-lcls2/lcls2
. setup_env.sh
./psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 2
dt, ms: t00 t01 t02 t03 t04 t05 t06 t07
on sdfiana004 - shared resource
medi: 7.2465 0.1516 1.8420 0.0021 2.6863 0.5474 0.9129 1.1203
on reserved sdfmilan063 a few attempts of the event loop over 100 events
medi: 9.9332 0.2401 3.5620 0.0043 3.8865 0.4408 0.7961 1.0107
medi: 8.5163 0.2205 2.3572 0.0031 3.2568 0.5431 0.6912 1.0154
medi: 7.2405 0.1922 2.2745 0.0021 2.7044 0.4904 0.2604 1.0061
medi: 5.3618 0.1471 1.7295 0.0024 1.7509 0.2248 0.7367 0.6764
medi: 6.5324 0.1631 2.0473 0.0019 2.2929 0.4990 0.2739 1.0476
medi: 4.9076 0.0868 1.5934 0.0026 1.8897 0.1476 0.5717 0.7081
2024-09-09 test with mpi
det.raw.calib code with timing points
def calib_epix10ka_any_local_v2(det_raw, evt, **kwa):
""" v2: add time points, get rid of common mode correction
"""
t0 = time()
nda_raw = kwa.get('nda_raw', None)
raw = det_raw.raw(evt) if nda_raw is None else nda_raw # shape:(352, 384) or suppose to be later (<nsegs>, 352, 384) dtype:uint16
if ue.cond_msg(raw is None, msg='raw is None'): return None
t1 = time()
gmaps = ue.gain_maps_epix10ka_any(det_raw, evt) #tuple: 7 x shape:(4, 352, 384)
if ue.cond_msg(gmaps is None, msg='gmaps is None'): return None
t2 = time()
store = det_raw._store_ = ue.Storage(det_raw, **kwa) if det_raw._store_ is None else det_raw._store_ #perpix=True
store.counter += 1
if store.counter < 1: ue.print_gmaps_info(gmaps)
t3 = time()
factor = ue.event_constants_for_gmaps(gmaps, store.gfac, default=1) # 3d gain factors
pedest = ue.event_constants_for_gmaps(gmaps, store.peds, default=0) # 3d pedestals
t4 = time()
arrf = np.array(raw & det_raw._data_bit_mask, dtype=np.float32)
t5 = time()
if pedest is not None: arrf -= pedest
logger.debug(ue.info_ndarr(arrf, 'arrf:'))
if ue.cond_msg(factor is None, msg='factor is None - substitute with 1', output_meth=logger.warning): factor = 1
if store.cmpars is not None:
ue.common_mode_epix_multigain_apply(arrf, gmaps, store)
t6 = time()
mask = store.mask
res = arrf * factor if mask is None else arrf * factor * mask # gain correction
t7 = time()
return res, (t0, t1, t2, t3, t4, t5, t6, t7)
Timing results
ps-4.6.3 [dubrovin@sdfmilan202:~/LCLS/con-lcls2]$ python ./lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 99
dt, ms: t00 t01 t02 t03 t04 t05 t06 t07
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:000/032 cpu:010 number of recs: 0
medi: 4.4625 0.0668 1.4832 0.0014 1.8053 0.1662 0.3390 0.5541 rank:016/032 cpu:011 number of recs: 100
medi: 4.4079 0.0591 1.3950 0.0019 1.7035 0.1221 0.3853 0.5271 rank:008/032 cpu:005 number of recs: 100
medi: 4.4863 0.0591 1.3795 0.0012 1.8027 0.1166 0.5839 0.5794 rank:012/032 cpu:078 number of recs: 100
medi: 4.8709 0.0613 1.4193 0.0012 1.7776 0.1583 0.6139 0.8502 rank:004/032 cpu:077 number of recs: 100
medi: 6.5660 0.1509 1.8942 0.0019 2.3808 0.4027 0.6335 0.8013 rank:014/032 cpu:107 number of recs: 100
medi: 5.0323 0.0982 1.5402 0.0012 1.9429 0.2332 0.3085 0.6614 rank:006/032 cpu:106 number of recs: 100
medi: 4.1854 0.0622 1.4117 0.0012 1.7064 0.1297 0.2422 0.4416 rank:002/032 cpu:047 number of recs: 100
medi: 4.7133 0.0758 1.4617 0.0014 1.8322 0.1779 0.5119 0.5944 rank:010/032 cpu:033 number of recs: 100
medi: 6.8538 0.1929 2.1393 0.0019 2.4624 0.4187 0.5145 0.8738 rank:030/032 cpu:097 number of recs: 100
medi: 4.6625 0.0696 1.4958 0.0012 1.8172 0.1550 0.2692 0.6146 rank:028/032 cpu:066 number of recs: 100
medi: 4.5612 0.0603 1.4145 0.0012 1.7626 0.1309 0.6027 0.6244 rank:026/032 cpu:032 number of recs: 100
medi: 4.3330 0.0608 1.3919 0.0010 1.7300 0.1273 0.2801 0.3810 rank:018/032 cpu:043 number of recs: 100
medi: 4.6442 0.0663 1.4257 0.0010 1.8663 0.1822 0.6874 0.4866 rank:020/032 cpu:067 number of recs: 100
medi: 6.8028 0.1123 2.1322 0.0019 2.4273 0.4113 0.6311 0.6907 rank:022/032 cpu:103 number of recs: 100
medi: 4.3499 0.0603 1.3831 0.0012 1.6901 0.1256 0.2396 0.7706 rank:024/032 cpu:002 number of recs: 100
medi: 7.5989 0.2463 2.1796 0.0019 2.6369 0.5052 0.5240 0.9267 rank:029/032 cpu:081 number of recs: 100
medi: 7.8244 0.2604 2.3451 0.0019 2.7165 0.6130 0.6328 1.0064 rank:005/032 cpu:080 number of recs: 100
medi: 6.8302 0.2003 2.0401 0.0014 2.4984 0.4439 0.7031 0.9379 rank:021/032 cpu:093 number of recs: 100
medi: 6.0575 0.1566 1.8210 0.0014 2.1310 0.3321 0.5412 0.7870 rank:013/032 cpu:088 number of recs: 100
medi: 4.4155 0.0696 1.4629 0.0012 1.7750 0.1688 0.2556 0.4897 rank:007/032 cpu:126 number of recs: 100
medi: 7.9112 0.2367 2.3787 0.0024 2.7568 0.5982 0.5360 1.0090 rank:031/032 cpu:121 number of recs: 100
medi: 10.4554 0.3047 3.3541 0.0033 3.3829 0.7384 0.8595 1.2519 rank:023/032 cpu:119 number of recs: 100
medi: 7.7000 0.1955 2.4045 0.0029 2.7111 0.3808 0.8230 0.8404 rank:015/032 cpu:114 number of recs: 100
medi: 4.7333 0.0815 1.4558 0.0014 1.7860 0.1791 0.6092 0.6125 rank:003/032 cpu:049 number of recs: 100
medi: 5.2340 0.0913 1.5881 0.0017 1.9169 0.2320 0.4444 0.7405 rank:027/032 cpu:050 number of recs: 100
medi: 5.0166 0.0982 1.5976 0.0017 1.9584 0.2248 0.2692 0.6938 rank:009/032 cpu:026 number of recs: 100
medi: 6.7837 0.1671 2.0921 0.0026 2.3327 0.3541 0.7384 0.8953 rank:017/032 cpu:024 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:001/032 cpu:027 number of recs: 0
medi: 6.4714 0.1876 2.0485 0.0021 2.2919 0.3479 0.6726 0.9446 rank:025/032 cpu:025 number of recs: 100
medi: 5.2359 0.0956 1.5531 0.0014 2.0502 0.2713 0.4284 0.7312 rank:019/032 cpu:059 number of recs: 100
medi: 5.0220 0.0882 1.4799 0.0014 1.8868 0.1721 0.6988 0.7000 rank:011/032 cpu:057 number of recs: 100
summary-uedcom103-r0095-ncpu-032.txt
mean: 5.7407 0.1245 1.7723 0.0016 2.1180 0.2873 0.5193 0.7339 for 30 fully loaded cpus
dt, ms: t00 t01 t02 t03 t04 t05 t06 t07
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:000/080 cpu:012 number of recs: 0
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:035/080 cpu:055 number of recs: 25
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:037/080 cpu:083 number of recs: 25
medi: 4.7834 0.0699 1.6897 0.0012 2.0449 0.1559 0.2739 0.5205 rank:036/080 cpu:067 number of recs: 100
medi: 5.5213 0.0713 1.7049 0.0014 2.1768 0.1652 0.4768 0.8631 rank:032/080 cpu:011 number of recs: 100
medi: 5.3682 0.0687 1.5824 0.0014 2.0261 0.1438 0.5341 0.9081 rank:034/080 cpu:042 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:033/080 cpu:027 number of recs: 29
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:025/080 cpu:028 number of recs: 25
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:027/080 cpu:051 number of recs: 33
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:023/080 cpu:112 number of recs: 25
medi: 5.4362 0.0811 1.7524 0.0014 2.1808 0.1893 0.2961 0.7646 rank:018/080 cpu:036 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:031/080 cpu:119 number of recs: 25
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:029/080 cpu:082 number of recs: 25
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:021/080 cpu:094 number of recs: 29
medi: 5.7065 0.1163 1.7970 0.0014 2.1470 0.2491 0.5081 0.8142 rank:028/080 cpu:078 number of recs: 100
medi: 6.2273 0.1183 1.9257 0.0021 2.2767 0.2582 0.5827 0.8440 rank:020/080 cpu:076 number of recs: 100
medi: 5.7538 0.0975 1.7393 0.0017 2.2135 0.1929 0.5908 0.7372 rank:016/080 cpu:015 number of recs: 100
medi: 5.3303 0.0727 1.7507 0.0014 2.1241 0.1690 0.2680 0.7360 rank:024/080 cpu:003 number of recs: 100
medi: 5.2130 0.0725 1.5800 0.0014 1.9674 0.1569 0.8097 0.7577 rank:022/080 cpu:098 number of recs: 100
medi: 4.9729 0.0732 1.6198 0.0014 2.0058 0.1867 0.3698 0.7305 rank:030/080 cpu:106 number of recs: 100
medi: 5.8093 0.1056 1.7817 0.0014 2.2006 0.2313 0.7620 0.8721 rank:026/080 cpu:032 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:019/080 cpu:052 number of recs: 25
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:017/080 cpu:026 number of recs: 25
medi: 5.1904 0.0730 1.5664 0.0014 2.0349 0.1826 0.2835 0.6614 rank:014/080 cpu:104 number of recs: 100
medi: 4.7851 0.0718 1.6029 0.0014 1.9958 0.1776 0.2861 0.5546 rank:012/080 cpu:075 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:015/080 cpu:121 number of recs: 33
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:013/080 cpu:093 number of recs: 25
medi: 5.0147 0.0770 1.6847 0.0014 2.0275 0.1750 0.2654 0.7322 rank:010/080 cpu:046 number of recs: 100
medi: 5.0604 0.0727 1.6952 0.0014 2.1152 0.1628 0.2606 0.8254 rank:008/080 cpu:007 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:009/080 cpu:025 number of recs: 28
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:011/080 cpu:060 number of recs: 24
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:007/080 cpu:125 number of recs: 24
medi: 5.1908 0.0730 1.5776 0.0014 2.0099 0.1688 0.3877 0.7532 rank:004/080 cpu:068 number of recs: 100
medi: 5.1894 0.0710 1.5886 0.0014 1.9858 0.1545 0.7951 0.6561 rank:006/080 cpu:103 number of recs: 100
medi: 4.8039 0.0663 1.5576 0.0014 1.9674 0.1457 0.2797 0.6392 rank:002/080 cpu:035 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:005/080 cpu:092 number of recs: 25
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:003/080 cpu:056 number of recs: 31
medi: 5.2354 0.0727 1.6475 0.0014 2.0664 0.1667 0.2975 0.7975 rank:072/080 cpu:013 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:069/080 cpu:091 number of recs: 21
medi: 5.4092 0.1032 1.8005 0.0017 2.1825 0.2339 0.3221 0.4630 rank:064/080 cpu:002 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:067/080 cpu:049 number of recs: 16
medi: 4.5877 0.0713 1.5218 0.0012 1.9431 0.1781 0.2527 0.5710 rank:068/080 cpu:072 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:065/080 cpu:041 number of recs: 17
medi: 5.4433 0.0823 1.6727 0.0017 2.0676 0.1843 0.3526 0.8087 rank:066/080 cpu:047 number of recs: 100
medi: 5.1517 0.0894 1.6503 0.0014 2.1155 0.2186 0.3095 0.4370 rank:076/080 cpu:064 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:075/080 cpu:058 number of recs: 21
medi: 4.8540 0.0846 1.6973 0.0014 2.1009 0.1535 0.2766 0.4494 rank:074/080 cpu:045 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:073/080 cpu:033 number of recs: 16
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:077/080 cpu:080 number of recs: 17
medi: 4.6051 0.0715 1.5781 0.0014 1.9629 0.1628 0.2522 0.5059 rank:070/080 cpu:111 number of recs: 100
medi: 5.1448 0.0832 1.6077 0.0014 2.0604 0.1988 0.3479 0.7272 rank:078/080 cpu:107 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:079/080 cpu:115 number of recs: 16
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:071/080 cpu:114 number of recs: 17
medi: 5.2035 0.0718 1.6663 0.0019 2.0545 0.1619 0.2637 0.7360 rank:048/080 cpu:001 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:051/080 cpu:059 number of recs: 20
medi: 5.4414 0.0696 1.5652 0.0019 2.2063 0.1512 0.5698 0.7472 rank:050/080 cpu:043 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:053/080 cpu:089 number of recs: 16
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:061/080 cpu:084 number of recs: 16
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:059/080 cpu:050 number of recs: 17
medi: 5.1453 0.0739 1.6716 0.0014 2.1503 0.1819 0.3159 0.5887 rank:058/080 cpu:039 number of recs: 100
medi: 5.2912 0.0749 1.8258 0.0014 1.9712 0.1643 0.3657 0.7515 rank:060/080 cpu:070 number of recs: 100
medi: 4.7836 0.0708 1.6456 0.0017 1.9751 0.1626 0.2484 0.5286 rank:052/080 cpu:073 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:063/080 cpu:122 number of recs: 21
medi: 5.2607 0.0708 1.8601 0.0012 2.0831 0.1698 0.3712 0.7606 rank:062/080 cpu:100 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:055/080 cpu:113 number of recs: 16
medi: 5.0969 0.0715 1.6377 0.0014 2.1002 0.1585 0.2685 0.6599 rank:054/080 cpu:108 number of recs: 100
medi: 5.5714 0.0970 1.6961 0.0014 2.1813 0.1996 0.5479 0.7553 rank:056/080 cpu:009 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:057/080 cpu:031 number of recs: 21
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:045/080 cpu:086 number of recs: 20
medi: 5.3375 0.0761 1.6117 0.0014 2.0356 0.1752 0.4885 0.7393 rank:044/080 cpu:066 number of recs: 100
medi: 4.9605 0.0796 1.5984 0.0014 2.1014 0.1886 0.2751 0.6235 rank:046/080 cpu:096 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:047/080 cpu:127 number of recs: 17
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:049/080 cpu:029 number of recs: 16
medi: 5.2097 0.0713 1.6658 0.0017 2.0304 0.1612 0.2677 0.6661 rank:040/080 cpu:005 number of recs: 100
medi: 5.1026 0.0684 1.6205 0.0012 2.1272 0.1595 0.2685 0.7036 rank:042/080 cpu:034 number of recs: 100
medi: 5.0023 0.0713 1.5657 0.0017 1.9486 0.1631 0.3657 0.7005 rank:038/080 cpu:097 number of recs: 100
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:043/080 cpu:061 number of recs: 17
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:039/080 cpu:124 number of recs: 20
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:041/080 cpu:024 number of recs: 16
medi: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 rank:001/080 cpu:030 number of recs: 0
summary-uedcom103-r0095-ncpu-080.txt
mean: 5.2101 0.0789 1.6667 0.0015 2.0760 0.1784 0.3861 0.6946 for 39 fully loaded cpus
Without mpi
dt, ms: t00 t01 t02 t03 t04 t05 t06 t07
medi: 4.2410 0.0598 1.4005 0.0014 1.6875 0.1295 0.2534 0.5906 rank:000/001 cpu:120 number of recs: 100
Ratio
dt, ms: t00 t01 t02 t03 t04 t05 t06 t07
01: 4.2410 0.0598 1.4005 0.0014 1.6875 0.1295 0.2534 0.5906
32: 5.7407 0.1245 1.7723 0.0016 2.1180 0.2873 0.5193 0.7339 for 30 fully loaded cpus
80: 5.2101 0.0789 1.6667 0.0015 2.0760 0.1784 0.3861 0.6946 for 39 fully loaded cpus
r32/1 1.35
r80/1 1.23
2024-09-23 simulation
PAY ATTENTION TO RESULT UNITS: μs OR ms !
Node reservation
ps-4.6.3 [dubrovin@sdfiana003:~/LCLS/con-lcls2/lcls2]$ srun --partition milano --account lcls:prjdat21 -n 1 --time=05:00:00 --exclusive --pty /bin/bash
srun: job 56010892 queued and waiting for resources
srun: job 56010892 has been allocated resources
ps-4.6.3 [dubrovin@sdfmilan090:~/LCLS/con-lcls2/lcls2]$
Simulation of numpy arrays in the loop
import psana.pyalgos.generic.NDArrGenerators as ag
DTYPE_RAWD = np.uint16
DTYPE_PEDS = np.float32
DTYPE_GAIN = np.float32
DTYPE_MASK = np.uint8
DTYPE_REST = np.float32
sh = (16, 352, 384)
for i in range(nloops):
t0 = time()
mask = ag.random_0or1(shape=sh, p1=0.90, dtype=DTYPE_MASK)
peds = ag.random_standard(shape=sh, mu=1000, sigma=100, dtype=DTYPE_PEDS)
gain = ag.random_standard(shape=sh, mu=5, sigma=1, dtype=DTYPE_GAIN)
raw = ag.random_standard(shape=sh, mu=1000, sigma=100, dtype=DTYPE_RAWD)
t1 = time()
arrf = np.array(raw & M14, dtype=DTYPE_REST)
t2 = time()
Discrete numpy opereations
if CALIBMET == SIM0:
arrf -= peds
t3 = time()
arrf *= gain
t4 = time()
arrf *= mask
t5 = time()
arrf = np.array(raw & M14, dtype=DTYPE_REST)
t6 = time()
arrf = (arrf - peds) * gain
t7 = time()
arrf = np.array(raw & M14, dtype=DTYPE_REST)
t8 = time()
arrf = (arrf - peds) * gain * mask
t9 = time()
times = t0, t1, t2, t3, t4, t5, t6, t7, t8, t9
dt, ms: t00 t01 t02 t03 t04 t05 t06 t07 t08 t09
medi: 225.9027 217.7382 0.6297 0.5398 0.5188 0.4407 0.6311 2.2424 0.5478 2.2854
medi: 225.5189 215.9863 0.6931 0.5149 0.5026 0.4370 0.6458 2.3782 0.5354 2.2491
medi: 228.7627 219.0838 0.6614 0.4798 0.5418 0.4429 0.6727 2.3458 0.5847 2.3065
dt | operation | 1-time, μs | 2 | 3 |
---|
dt1 | simulation of 4 arrays, shape = (16, 352, 384) | 217,738 | 215,986 | 219,083 |
dt2 | arrf = np.array(raw & M14, dtype=DTYPE_REST) | 630 | 693 | 661 |
dt3 | arrf -= peds | 540 | 515 | 480 |
dt4 | arrf *= gain | 519 | 503 | 542 |
dt5 | arrf *= mask | 441 | 437 | 443 |
dt6 | arrf = np.array(raw & M14, dtype=DTYPE_REST) | 631 | 646 | 673 |
dt7 | arrf = (arrf - peds) * gain | 2242 | 2378 | 2346 |
dt8 | arrf = np.array(raw & M14, dtype=DTYPE_REST) | 548 | 535 | 585 |
dt9 | arrf = (arrf - peds) * gain * mask | 2285 | 2249 | 2306 |
elif CALIBMET == SIM1:
arrf = (arrf - peds)*gain
t3 = time()
arrf = np.select((mask>0,), (arrf,), default=0) #.astype(DTYPE_REST))
t4 = time()
times = t0, t1, t2, t3, t4
dt, ms: t00 t01 t02 t03 t04
medi: 224.5922 216.8022 0.7602 2.4498 4.7780
medi: 230.6868 222.4416 0.8460 2.6546 4.8482
medi: 226.6681 217.6425 0.8794 2.9192 4.8187
dt | operation | 1-time, μs | 2 | 3 |
---|
dt1 | simulation of 4 arrays, shape = (16, 352, 384) | 216,802 | 222,441 | 217,642 |
dt2 | arrf = np.array(raw & M14, dtype=DTYPE_REST) | 760 | 846 | 879 |
dt3 | arrf = (arrf - peds)*gain | 2450 | 2655 | 2919 |
dt4 | arrf = np.select((mask>0,), (arrf,), default=0) | 4778 | 4848 | 4819 |
Numpy ufunc operetions
elif CALIBMET == SIM2:
np.subtract(arrf, peds, out=arrf)
t3 = time()
np.multiply(arrf, gain, out=arrf)
t4 = time()
np.multiply(arrf, mask, out=arrf)
t5 = time()
times = t0, t1, t2, t3, t4, t5
dt, ms: t00 t01 t02 t03 04 t05
medi: 220.4628 216.7808 0.6970 1.1839 0.4658 0.4884
medi: 219.9044 216.8930 0.6951 0.8079 0.5212 0.4835
medi: 221.0090 217.4455 0.7130 1.0222 0.4929 0.4852
dt | operation | 1-time, μs | 2 | 3 |
---|
dt3 | np.subtract(arrf, peds, out=arrf) | 1184 | 808 | 1022 |
dt4 | np.multiply(arrf, gain, out=arrf) | 466 | 521 | 493 |
dt5 | np.multiply(arrf, mask, out=arrf) | 488 | 484 | 485 |
Numpy vectorization
def myfunc(a, p, g):
return (a - p) * g
uf = np.frompyfunc(myfunc, 3, 1)
vf = np.vectorize(myfunc)
elif CALIBMET == SIM3:
arrf = vf(arrf.ravel(), peds.ravel(), gain.ravel())
elif CALIBMET == SIM4:
arrf = uf(arrf.ravel(), peds.ravel(), gain.ravel())
ps-4.6.3 [dubrovin@sdfmilan090:~/LCLS/con-lcls2]$ lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 83
dt, ms: t00 t01 t02 t03
medi: 649.1894 217.1867 1.0426 428.8200
medi: 641.9865 217.1310 1.1019 419.4318
medi: 650.3067 216.7441 1.0673 431.5038
ps-4.6.3 [dubrovin@sdfmilan090:~/LCLS/con-lcls2]$ lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 84
dt, ms: t00 t01 t02 t03
medi: 506.2655 216.1238 18.0372 273.3681
medi: 505.9680 215.6790 18.2364 271.6496
medi: 505.0484 214.7954 18.1587 272.0717
dt | operation | 1-time, ms | 2 | 3 |
---|
dt3(83) | arrf = vf(arrf.ravel(), peds.ravel(), gain.ravel()) | 429 | 419 | 432 |
dt3(84) | arrf = uf(arrf.ravel(), peds.ravel(), gain.ravel()) | 273 | 272 | 272 |
Cytonized/pythonized c++
In C++:
=======
void calib_std(const fraw_t *raw, const peds_t *peds, const gain_t *gain, const mask_t *mask, const size_t& size, fraw_t *out)
{
for (size_t i=0; i<size; ++i) {
out[i] = mask[i]>0 ? (raw[i] - peds[i])*gain[i] : 0;
}
}
In python:
==========
elif CALIBMET == SIM5:
ud.calib_std(arrf, peds, gain, mask, arrf)
dt, ms: t00 t01 t02 t03
medi: 221.0429 216.8013 0.6830 3.3973
medi: 232.0428 224.8044 1.6861 3.5982
medi: 222.6292 218.0824 0.6918 3.4066
dt | operation | 1-time, ms | 2 | 3 |
---|
dt3(85) | ud.calib_std(arrf, peds, gain, mask, arrf) | 3.4 | 3.6 | 3.4 |
2024-09-30 simulation c++ vs cython vs numpy
Test description
Chris suggested a test of how much time calib-like code consums in C++
Modifications:
- Fix types in malloc
- Types brought to consistency in all tests:
- uint16_t* raw
- uint8_t* mask
- Add/use M14 in C++: *raw & M14
- 16*352*352 →16*352*384
- #define NEVENT 500
// g++ -O3 -o test_cpo -g test_cpo.cc
#define EVENTS 500
#define SIZE 16*352*384
#define M14 0x3fff // 16383 or (1<<14)-1 - 14-bit mask
#include <stdint.h>
#include <stdlib.h>
#include <chrono>
#include <iostream>
#include <cstdint> // uint8_t
void calibrate(uint16_t* raw, uint8_t* mask, float* gain, float* ped, float* result) {
uint16_t* end = raw+SIZE;
while (raw<end) {
*result = ((*raw & M14) - *ped)*(*gain)*(*mask);
raw++; ped++; gain++; mask++; result++;
}
}
int main() {
uint16_t* raw = (uint16_t*)malloc(EVENTS*SIZE*sizeof(uint16_t));
uint8_t* mask = (uint8_t*)malloc(SIZE*sizeof(uint8_t));
float* result = (float*)malloc(SIZE*sizeof(float));
float* ped = (float*)malloc(SIZE*sizeof(float));
float* gain = (float*)malloc(SIZE*sizeof(float));
for (int i=0; i<EVENTS*SIZE; i++) {
raw[i]=1234;
}
for (int i=0; i<SIZE; i++) {
mask[i]=1;
//result[i]=0.0;
ped[i]=1233.1;
gain[i]=1.234;
}
std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
for (int i=0; i<EVENTS; i++) {
calibrate(raw+i*SIZE, mask, gain, ped, result);
}
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
std::cout << "Time per event = " << (std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count())/EVENTS << "[us]" << std::endl;
}
My test differs slightly
- use random numbers to fill out arrays for data and constants
// g++ -O3 -o test_calib_sim test_calib_sim.cc
// srun --partition milano --account lcls:prjdat21 -n 1 --time=05:00:00 --exclusive --pty /bin/bash
// normal_distribution
//#include <iostream>
#include <string>
#include <random>
#include <chrono>// time
#include <iomanip>
#include <iostream>
#define time_t std::chrono::steady_clock::time_point
#define time_now std::chrono::steady_clock::now
#define duration_us std::chrono::duration_cast<std::chrono::microseconds>
...
#include <stdint.h>
#include <stdlib.h>
#include <cstdint> // uint8_t #define PSIZE 2162688 // // 100000 // 2162688 = 16*352*384
#define EVENTS 500
#define M14 0x3fff // 16383 or (1<<14)-1 - 14-bit mask
//#define RAWD_T float
#define RAWD_T uint16_t
#define MASK_T uint8_t
#define GAIN_T float
#define PEDS_T float
#define REST_T float
void calib(RAWD_T* raw, MASK_T* mask, GAIN_T* gain, PEDS_T* ped, REST_T* res) {
RAWD_T* end = raw+PSIZE;
while (raw<end) {
*res = ((*raw & M14) - *ped)*(*gain)*(*mask);
raw++; ped++; gain++; mask++; res++;
}
}
void test_calib_simulation()
{
//constants
//RAWD_T rawd[EVENTS][PSIZE];
time_t t0 = time_now();
RAWD_T* rawd = (RAWD_T*)malloc(EVENTS*PSIZE*sizeof(RAWD_T));
MASK_T* mask = (MASK_T*)malloc(PSIZE*sizeof(MASK_T));
REST_T* rest = (REST_T*)malloc(PSIZE*sizeof(REST_T));
PEDS_T* peds = (PEDS_T*)malloc(PSIZE*sizeof(PEDS_T));
GAIN_T* gain = (GAIN_T*)malloc(PSIZE*sizeof(GAIN_T));
std::cout << "time for malloc: " << duration_us(time_now() - t0).count() << " us" << std::endl;
t0 = time_now();
standard_normal_array<RAWD_T>(1000., 10., PSIZE*EVENTS, rawd);
standard_normal_array<PEDS_T>(1000., 10., PSIZE, peds);
standard_normal_array<GAIN_T>(20., 1., PSIZE, gain);
random_array_0or1<MASK_T>(0.9, PSIZE, mask);
std::cout << "time for random data and constants: " << duration_us(time_now() - t0).count() << " us" << std::endl;
//std::cout << "\nrawd: "; for (int i=0; i<10; i++){std::cout << rawd[0][i] << " ";}
std::cout << "\nrawd: "; for (int i=0; i<10; i++){std::cout << rawd[i] << " ";}
std::cout << "\npeds: "; for (int i=0; i<10; i++){std::cout << peds[i] << " ";}
std::cout << "\ngain: "; for (int i=0; i<10; i++){std::cout << gain[i] << " ";}
std::cout << "\nmask: "; for (int i=0; i<10; i++){std::cout << unsigned(mask[i]) << " ";}
std::cout << std::endl;
std::cout << "events: " << std::to_string(EVENTS) << " panel size:" << std::to_string(PSIZE) << std::endl;
t0 = time_now();
for (int i=0; i<EVENTS; i++){
calib(rawd+i*PSIZE, mask, gain, peds, rest);
}
std::cout << "time per event: " << duration_us(time_now() - t0).count()/EVENTS << " us" << std::endl;
}
1. There is a difference between memory allocation
- RAWD_T rawd[EVENTS][PSIZE];
- RAWD_T* rawd = (RAWD_T*)malloc(EVENTS*PSIZE*sizeof(RAWD_T));
2. In Chris' example #define NEVENT 1000 increased to 2000 cause
test_cpo.cc:23:36: warning: argument 1 value ‘18446744073049473024’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
uint16_t* raw = (uint16_t*)malloc(NEVENT*SHAPE*sizeof(uint16_t));
~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3. Chris uses for raw uint16_t, in real data processing we have to use float32_t,
that is why in my test the number of events is decreased to a half:
#define EVENTS 500
Test results
ps-4.6.3 [dubrovin@sdfiana003:~/LCLS/con-lcls2/lcls2]$ srun --partition milano --account lcls:prjdat21 -n 1 --time=05:00:00 --exclusive --pty /bin/bash
In other window:
ps-4.6.3 [dubrovin@sdfiana003:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ssh -Y sdfmilan108
cd ~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos/
Tests
- ./test_cpo
- ./test_calib_sim
- ../detector/testman/test-scaling-mpi-epix10ka.py 80 # t9 stands for operation under numpy arrays: ((raw & M14) - peds) * gain * mask
- ../detector/testman/test-scaling-mpi-epix10ka.py 85 # t3 stands for call of cythonized/pythonized C++ calib_std(raw, peds, gain, mask, databits, out)
ps-4.6.3 [dubrovin@sdfmilan108:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ./test_calib_sim
time for malloc: 39 us
time for random data and constants: 52810299 us
rawd: 1007 990 1003 1008 1007 996 1000 1010 994 996
peds: 991.406 1004.71 1008.58 989.066 1012.95 981.918 989.906 996.518 1020.47 1007.59
gain: 19.7885 20.2578 19.5383 21.1444 20.1666 20.0037 18.3353 20.5157 19.5657 20.3046
mask: 1 1 1 1 1 1 0 1 1 1
events: 500 panel size:2162688
time per event: 623 us
ps-4.6.3 [dubrovin@sdfmilan108:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ../detector/testman/test-scaling-mpi-epix10ka.py 80
t09 for numpy arrays: (arrf - peds) * gain * mask
If all arrays are generated in advance, before the event loop
medi: 3.1800 0.0005 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 3.1785
medi: 2.9902 0.0005 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 2.9888
medi: 3.8408 0.0005 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 3.8399
per event times:
493 1.9538 0.0005 0.0002 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 1.9529
494 3.8667 0.0005 0.0002 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 3.8657
495 1.9505 0.0005 0.0005 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 1.9493
496 3.8567 0.0005 0.0005 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 3.8555
497 1.9560 0.0007 0.0002 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 1.9548
498 3.8526 0.0005 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 0.0002 3.8517
499 1.9510 0.0005 0.0002 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 1.9500
ps-4.6.3 [dubrovin@sdfmilan108:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ../detector/testman/test-scaling-mpi-epix10ka.py 85
t03 for cythonized/pythonized C++ calib_std(raw, peds, gain, mask, out)
UtilsDetector.hh:
=================
typedef uint16_t rawd_t;
typedef float peds_t;
typedef float gain_t;
typedef float out_t;
typedef uint8_t mask_t;
UtilsDetector.cc:
=================
time_t calib_std(const rawd_t *raw, const peds_t *peds, const gain_t *gain, const mask_t *mask, const size_t& size, const rawd_t databits, out_t *out)
{
const rawd_t *r = raw;
const peds_t *p = peds;
const gain_t *g = gain;
const mask_t *m = mask;
out_t *o = out;
//const rawd_t* end = raw+size;
time_point_t t0 = time_now();
while (r<raw+size) {
*o++ = ((*r++ & databits) - *p++)*(*g++)*(*m++);
//r++; p++; g++; m++; o++;
}
return duration_us(time_now() - t0).count();
}
Per-event time,μs, consumed in C++, cython, and python, respectively:
dt_us_cpp: 1079.0, dt_us_cy: 1094.6, dt_us_py: 1100.3
dt_us_cpp: 904.0, dt_us_cy: 918.6, dt_us_py: 923.6
dt_us_cpp: 1091.0, dt_us_cy: 1107.2, dt_us_py: 1112.5
If all arrays are generated in advance, before the event loop
medi: 0.6473 0.0002 0.0002 0.6466
medi: 0.6249 0.0002 0.0002 0.6244
medi: 0.6549 0.0005 0.0002 0.6542
Tests on sdfmilan108:
test # | test description | 1 try, time per event, μs | 2 try | 3 try | comment |
---|
1 | ./test_cpo | 611 | 620 | 613 | entire loop in C++ |
2 | ./test_calib_sim | 623 | 666 | 604 | the same as #1, but with random numbers |
3/80 | numpy: arrf = ((raw & M14) - peds) * gain * mask | 3176 | 2989 | 3839 | random arrays generated before the event loop |
3/80 | ud.calib_std(raw, peds, gain, mask, M14, arrf) | 646 | 624 | 654 | measurement shows that 98% of this time is in C++ |
2024-10-03 tests with mpi
test in c++ without MPI
g++ -O3 -o test_calib_sim test_calib_sim.cc
ps-4.6.3 [dubrovin@sdfmilan021:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ./test_calib_sim
argc:1 argv[0]:./test_calib_sim
time for malloc: 15 us
test_calib_simulation: time for random data and constants: 53411244 us
rawd: 999 994 986 998 1004 1005 1005 998 983 1016
peds: 1006.53 991.995 991.25 1010.08 1029.81 1008.97 1004.68 1004.46 1006.12 1009.98
gain: 21.8439 18.5639 20.2683 19.0008 20.4222 19.3143 18.923 20.3292 21.6366 19.7765
mask: 1 1 0 1 1 1 1 0 1 1
events: 500 panel size:2162688
time per event: 694 us
time per event: 694 us
adding mpi
#include <mpi.h>
int rank=0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
std::cout << "rank: " << rank << std::endl;
...
MPI_Finalize();
ps-4.6.3 [dubrovin@sdfmilan021:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ mpic++ -O3 -o test_calib_sim test_calib_sim.cc
ps-4.6.3 [dubrovin@sdfmilan021:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ./test_calib_sim 2
argc:2 argv[0]:./test_calib_sim
rank: 0
test_calib_simulation_mpi time for malloc: 20 us
time for random data and constants: 32500309 us
rawd: 1011 995 997 994 1004 996 1003 989 1022 985
peds: 1005.32 997.622 1005.59 1002.57 1001.05 1004.12 1006.53 989.399 1004.91 990.733
gain: 19.7067 21.0016 18.7361 19.4826 19.644 19.9969 20.044 19.4837 19.7477 23.48
mask: 1 1 1 1 1 1 1 1 1 1
events: 500 panel size:2162688
rank: 0 time per event: 1229 us
rank: 0 time per event: 1229 us
ps-4.6.3 [dubrovin@sdfmilan021:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ mpirun -n 4 test_calib_sim 2
argc:2 argv[0]:test_calib_sim
argc:2 argv[0]:test_calib_sim
argc:2 argv[0]:test_calib_sim
argc:2 argv[0]:test_calib_sim
rank: 0
test_calib_simulation_mpi time for malloc: 22 us
rank: 1
test_calib_simulation_mpi time for malloc: 22 us
rank: 2
test_calib_simulation_mpi time for malloc: 22 us
rank: 3
test_calib_simulation_mpi time for malloc: 24 us
time for random data and constants: 32006325 us
rawd: 1013 994 994 988 1011 987 1003 994 1004 997
peds: 996.304 1012.87 993.041 992.972 1017.44 1013.74 1017.16 997.31 1020.95 988.039
gain: 18.7863 19.9596 21.8928 16.6827 18.23 21.1276 20.088 20.9273 19.5633 22.1232
mask: 1 1 1 1 0 1 1 1 1 1
events: 500 panel size:2162688
rank: 1 time per event: 587 us
time for random data and constants: 32378824 us
rawd: 999 996 1002 997 998 1025 988 1021 1004 1004
peds: 1008.2 1010.15 999.783 999.987 1013.67 1012.62 982.788 1003.99 1006.88 1003.11
gain: 21.5295 19.4924 19.7031 21.7397 19.3176 20.5381 19.1976 18.9456 19.6323 16.7032
mask: 1 1 1 1 1 1 1 1 1 1
events: 500 panel size:2162688
time for random data and constants: 32410477 us
rawd: 996 1026 980 1002 1013 985 996 995 990 995
peds: 1017.01 998.566 1006.81 1020.24 1006.75 1010.22 1002.24 1006.99 1021.96 998.486
gain: 19.7089 18.5243 19.0248 19.4491 21.5207 19.8621 17.9884 20.2176 20.1161 19.549
mask: 1 1 1 1 1 1 1 1 1 1
events: 500 panel size:2162688
time for random data and constants: 32421671 us
rawd: 986 996 1016 993 1000 1008 1021 1006 993 1009
peds: 995.158 993.944 1001.34 1019.85 1002.45 1014.37 1000.41 1012.82 1018.91 1007.91
gain: 18.1761 21.0744 19.895 20.0998 21.2834 19.5609 19.8797 20.4085 19.8719 19.3744
mask: 1 1 1 1 1 1 1 1 1 1
events: 500 panel size:2162688
rank: 0 time per event: 1260 us
rank: 3 time per event: 1309 us
rank: 2 time per event: 1295 us
rank: 0 time per event: 1260 us
2024-10-23 tests with mpi
Code
#define PSIZE 16*352*384 // // 100000 // 2162688 = 16*352*384
#define M14 0x3fff // 16383 or (1<<14)-1 - 14-bit mask
#define RAWD_T uint16_t
#define MASK_T uint8_t
#define GAIN_T float
#define PEDS_T float
#define REST_T float
void calib(RAWD_T* raw, MASK_T* mask, GAIN_T* gain, PEDS_T* ped, REST_T* res) {
RAWD_T* end = raw+PSIZE;
while (raw<end) {
*res = ((*raw & M14) - *ped)*(*gain)*(*mask);
raw++; ped++; gain++; mask++; res++;
}
}
#define EVENTS 500
void test_calib_simulation_mpi(int argc, char* argv[])
{
int icpu = sched_getcpu();
std::stringstream sscpu; sscpu << "cpu-" << std::setfill('0') << std::setw(3) << std::right << icpu;
std::string scpu = sscpu.str();
double times_s[EVENTS];
double durats_us[EVENTS];
time_t t0 = time_now();
RAWD_T* rawd = (RAWD_T*)malloc(EVENTS*PSIZE*sizeof(RAWD_T));
MASK_T* mask = (MASK_T*)malloc(PSIZE*sizeof(MASK_T));
REST_T* rest = (REST_T*)malloc(PSIZE*sizeof(REST_T));
PEDS_T* peds = (PEDS_T*)malloc(PSIZE*sizeof(PEDS_T));
GAIN_T* gain = (GAIN_T*)malloc(PSIZE*sizeof(GAIN_T));
std::cout << scpu
<< " test_calib_simulation_mpi time for malloc: "
<< duration_us(time_now() - t0).count() << " us" << std::endl;
t0 = time_now();
standard_normal_array<RAWD_T>(1000., 10., PSIZE*EVENTS, rawd);
standard_normal_array<PEDS_T>(1000., 10., PSIZE, peds);
standard_normal_array<GAIN_T>(20., 1., PSIZE, gain);
random_array_0or1<MASK_T>(0.9, PSIZE, mask);
if (icpu == 0){
std::cout << scpu << " time for random data and constants: "
<< duration_us(time_now() - t0).count() << " us";
std::cout << "\n rawd: "; for (int i=0; i<10; i++){std::cout << rawd[i] << " ";}
std::cout << "\n peds: "; for (int i=0; i<10; i++){std::cout << peds[i] << " ";}
std::cout << "\n gain: "; for (int i=0; i<10; i++){std::cout << gain[i] << " ";}
std::cout << "\n mask: "; for (int i=0; i<10; i++){std::cout << unsigned(mask[i]) << " ";}
std::cout << scpu << " events: " << std::to_string(EVENTS)
<< " panel size:" << std::to_string(PSIZE) << std::endl;
}
struct timespec tbeg, tcur;
int status = clock_gettime(CLOCK_REALTIME, &tbeg);
time_t tt0 = time_now();
for (int i=0; i<EVENTS; i++){
status = clock_gettime(CLOCK_REALTIME, &tcur);
t0 = time_now();
calib(rawd+i*PSIZE, mask, gain, peds, rest);
durats_us[i] = duration_us(time_now() - t0).count();
times_s[i] = time_sec(tcur) - TOFSET;
}
std::cout << scpu << " time per event: " << duration_us(time_now() - tt0).count()/EVENTS << " us" << std::endl;
//std::stringstream fname; fname << "results-" << scpu << "-v80.txt";
std::string version = (argc>2)? argv[2] : "vXX";
std::stringstream fname; fname << "results-" << scpu << '-' << version << ".txt";
std::cout << "save file: " << fname.str() << std::endl;
std::ofstream ofile;
ofile.open(fname.str());
for (int i=0; i<EVENTS; i++){
ofile << std::setw(3) << std::right << i;
ofile << std::fixed
<< std::setprecision(6);
ofile << " t,s:" << std::setw(14) << times_s[i];
ofile << " dt,us: " << std::setprecision(0) << std::setw(10) << durats_us[i] << std::endl;
}
ofile << '\n' << scpu << " time per event: " << duration_us(time_now() - tt0).count()/EVENTS << " us" << std::endl;
ofile << "begin event loop time_since_epoch, sec: "
<< std::setw(14) << std::setprecision(3)
<< time_sec(tbeg)
<< " offset: " << TOFSET << std::endl;
ofile.close();
}
g++ -O3 -o test_calib_sim test_calib_sim.cc
srun --partition milano --account lcls:prjdat21 -n 128 --time=05:00:00 --exclusive --pty /bin/bash
mpirun -n 4 test_calib_sim 2 v04
mpirun -n 8 test_calib_sim 2 v08
mpirun -n 16 test_calib_sim 2 v16
mpirun -n 32 test_calib_sim 2 v32
mpirun -n 64 test_calib_sim 2 v64
mpirun -n 96 test_calib_sim 2 v96
mpirun -n 80 test_calib_sim 2 v80
Results
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ mpirun -n 4 ../lcls2/psana/psana/pycalgos/test_calib_sim 2 v04
cpu-048 test_calib_simulation_mpi time for malloc: 18 us
cpu-000 time for random data and constants: 53567294 us
rawd: 1003 1008 996 1011 997 1000 1013 997 1005 980
peds: 993.482 1003.47 991.595 1008.37 1000.43 1019.13 1012.39 1009.26 991.405 1006.49
gain: 20.3113 19.8855 21.0679 21.0502 19.5256 20.4852 19.504 21.0347 17.7494 19.3758
mask: 1 1 1 1 1 1 1 1 1 1 cpu-000 events: 500 panel size:2162688
cpu-000 time per event, us: 710 us
save file: results-cpu-000-v04.txt
cpu-032 time per event, us: 845 us
save file: results-cpu-032-v04.txt
cpu-025 time per event, us: 675 us
save file: results-cpu-025-v04.txt
cpu-048 time per event, us: 1080 us
save file: results-cpu-048-v04.txt
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ tail -20 results-cpu-000-v04.txt
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ tail -20 results-cpu-000-v04.txt
483 t,s: 24885.980090 dt,us: 719
484 t,s: 24885.980809 dt,us: 713
485 t,s: 24885.981523 dt,us: 709
486 t,s: 24885.982233 dt,us: 717
487 t,s: 24885.982950 dt,us: 715
488 t,s: 24885.983666 dt,us: 718
489 t,s: 24885.984385 dt,us: 706
490 t,s: 24885.985092 dt,us: 713
491 t,s: 24885.985806 dt,us: 729
492 t,s: 24885.986535 dt,us: 720
493 t,s: 24885.987255 dt,us: 715
494 t,s: 24885.987971 dt,us: 708
495 t,s: 24885.988679 dt,us: 707
496 t,s: 24885.989387 dt,us: 691
497 t,s: 24885.990079 dt,us: 700
498 t,s: 24885.990779 dt,us: 704
499 t,s: 24885.991484 dt,us: 692
cpu-000 time per event, us: 710 us
begin event loop time_since_epoch, sec: 1729724885.637 offset: 1729700000
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ mpirun -n 8 ../lcls2/psana/psana/pycalgos/test_calib_sim 2 v08
cpu-112 test_calib_simulation_mpi time for malloc: 22 us
cpu-000 time for random data and constants: 53597532 us
rawd: 974 996 1001 1006 993 1005 1008 1002 1010 1006
peds: 987.812 1005.22 984.196 1008.19 993.455 987.517 1007.64 1020.04 1015.84 986.13
gain: 21.2333 20.5797 20.0341 21.4536 20.5128 18.5795 21.5463 20.1028 20.5724 19.9281
mask: 1 1 1 1 1 1 1 1 0 1 cpu-000 events: 500 panel size:2162688
cpu-025 time per event, us: 666 us
save file: results-cpu-025-v08.txt
cpu-096 time per event, us: 692 us
save file: results-cpu-096-v08.txt
cpu-080 time per event, us: 685 us
save file: results-cpu-080-v08.txt
cpu-112 time per event, us: 703 us
save file: results-cpu-112-v08.txt
cpu-032 time per event, us: 865 us
save file: results-cpu-032-v08.txt
cpu-000 time per event, us: 746 us
save file: results-cpu-000-v08.txt
cpu-079 time per event, us: 681 us
save file: results-cpu-079-v08.txt
cpu-048 time per event, us: 1124 us
save file: results-cpu-048-v08.txt
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ mpirun -n 96 ../lcls2/psana/psana/pycalgos/test_calib_sim 2 v96
argc:3 argv[0]:../lcls2/psana/psana/pycalgos/test_calib_sim
cpu-097 test_calib_simulation_mpi time for malloc: 34 us
argc:3 argv[0]:../lcls2/psana/psana/pycalgos/test_calib_sim
cpu-124 test_calib_simulation_mpi time for malloc: 18 us
cpu-000 time for random data and constants: 60955768 us
rawd: 1007 993 983 988 1004 988 1017 991 993 1016
peds: 1003.6 1005.69 991.574 989.046 1004.69 1001.21 1001.52 998.964 1010.3 1013.41
gain: 19.2027 18.2575 20.0576 21.4993 19.7058 18.8875 20.3433 18.5005 18.1918 19.6954
mask: 1 1 1 1 0 1 0 1 1 1 cpu-000 events: 500 panel size:2162688
cpu-108 time per event, us: 12330 us
save file: results-cpu-108-v96.txt
cpu-064 time per event, us: 12940 us
save file: results-cpu-064-v96.txt
cpu-071 time per event, us: 12898 us
save file: results-cpu-071-v96.txt
cpu-074 time per event, us: 14708 us
save file: results-cpu-074-v96.txt
cpu-066 time per event, us: 14627 us
save file: results-cpu-066-v96.txt
cpu-096 time per event, us: 14484 us
save file: results-cpu-096-v96.txt
cpu-079 time per event, us: 14358 us
save file: results-cpu-079-v96.txt
cpu-065 time per event, us: 15051 us
save file: results-cpu-065-v96.txt
cpu-076 time per event, us: 15297 us
save file: results-cpu-076-v96.txt
cpu-068 time per event, us: 15289 us
save file: results-cpu-068-v96.txt
cpu-067 time per event, us: 15317 us
save file: results-cpu-067-v96.txt
cpu-110 time per event, us: 15419 us
save file: results-cpu-110-v96.txt
cpu-072 time per event, us: 16174 us
save file: results-cpu-072-v96.txt
cpu-106 time per event, us: 16031 us
save file: results-cpu-106-v96.txt
cpu-101 time per event, us: 16071 us
save file: results-cpu-101-v96.txt
cpu-111 time per event, us: 16039 us
save file: results-cpu-111-v96.txt
cpu-107 time per event, us: 16107 us
save file: results-cpu-107-v96.txt
cpu-077 time per event, us: 16490 us
save file: results-cpu-077-v96.txt
cpu-075 time per event, us: 16322 us
save file: results-cpu-075-v96.txt
cpu-102 time per event, us: 16870 us
save file: results-cpu-102-v96.txt
cpu-097 time per event, us: 16886 us
save file: results-cpu-097-v96.txt
cpu-098 time per event, us: 17044 us
save file: results-cpu-098-v96.txt
cpu-103 time per event, us: 17060 us
save file: results-cpu-103-v96.txt
cpu-099 time per event, us: 17098 us
save file: results-cpu-099-v96.txt
cpu-087 time per event, us: 16991 us
save file: results-cpu-087-v96.txt
cpu-089 time per event, us: 17257 us
save file: results-cpu-089-v96.txt
cpu-094 time per event, us: 19119 us
save file: results-cpu-094-v96.txt
cpu-086 time per event, us: 19207 us
save file: results-cpu-086-v96.txt
cpu-091 time per event, us: 19111 us
save file: results-cpu-091-v96.txt
cpu-080 time per event, us: 19416 us
save file: results-cpu-080-v96.txt
cpu-090 time per event, us: 19075 us
save file: results-cpu-090-v96.txt
cpu-088 time per event, us: 19535 us
save file: results-cpu-088-v96.txt
cpu-083 time per event, us: 19476 us
save file: results-cpu-083-v96.txt
cpu-118 time per event, us: 19930 us
save file: results-cpu-118-v96.txt
cpu-113 time per event, us: 20147 us
save file: results-cpu-113-v96.txt
cpu-112 time per event, us: 20181 us
save file: results-cpu-112-v96.txt
cpu-082 time per event, us: 19911 us
save file: results-cpu-082-v96.txt
cpu-116 time per event, us: 19649 us
save file: results-cpu-116-v96.txt
cpu-081 time per event, us: 19212 us
save file: results-cpu-081-v96.txt
cpu-121 time per event, us: 20267 us
save file: results-cpu-121-v96.txt
cpu-095 time per event, us: 19337 us
save file: results-cpu-095-v96.txt
cpu-125 time per event, us: 20579 us
save file: results-cpu-125-v96.txt
cpu-126 time per event, us: 20600 us
save file: results-cpu-126-v96.txt
cpu-127 time per event, us: 20653 us
save file: results-cpu-127-v96.txt
cpu-124 time per event, us: 20103 us
save file: results-cpu-124-v96.txt
cpu-117 time per event, us: 20687 us
save file: results-cpu-117-v96.txt
cpu-115 time per event, us: 20693 us
save file: results-cpu-115-v96.txt
cpu-122 time per event, us: 20619 us
save file: results-cpu-122-v96.txt
cpu-000 time per event, us: 15578 us
save file: results-cpu-000-v96.txt
cpu-014 time per event, us: 16207 us
save file: results-cpu-014-v96.txt
cpu-048 time per event, us: 19661 us
save file: results-cpu-048-v96.txt
cpu-060 time per event, us: 19429 us
save file: results-cpu-060-v96.txt
cpu-049 time per event, us: 20020 us
save file: results-cpu-049-v96.txt
cpu-030 time per event, us: 19916 us
save file: results-cpu-030-v96.txt
cpu-009 time per event, us: 19936 us
save file: results-cpu-009-v96.txt
cpu-026 time per event, us: 19812 us
save file: results-cpu-026-v96.txt
cpu-008 time per event, us: 20140 us
save file: results-cpu-008-v96.txt
cpu-025 time per event, us: 20272 us
save file: results-cpu-025-v96.txt
cpu-028 time per event, us: 20241 us
save file: results-cpu-028-v96.txt
cpu-013 time per event, us: 20422 us
save file: results-cpu-013-v96.txt
cpu-031 time per event, us: 20138 us
save file: results-cpu-031-v96.txt
cpu-058 time per event, us: 20777 us
save file: results-cpu-058-v96.txt
cpu-027 time per event, us: 20380 us
save file: results-cpu-027-v96.txt
cpu-004 time per event, us: 20405 us
save file: results-cpu-004-v96.txt
cpu-029 time per event, us: 20460 us
save file: results-cpu-029-v96.txt
cpu-011 time per event, us: 20628 us
save file: results-cpu-011-v96.txt
cpu-015 time per event, us: 20969 us
save file: results-cpu-015-v96.txt
cpu-024 time per event, us: 19811 us
save file: results-cpu-024-v96.txt
cpu-003 time per event, us: 21001 us
save file: results-cpu-003-v96.txt
cpu-002 time per event, us: 21133 us
save file: results-cpu-002-v96.txt
cpu-007 time per event, us: 21245 us
save file: results-cpu-007-v96.txt
cpu-001 time per event, us: 21316 us
save file: results-cpu-001-v96.txt
cpu-037 time per event, us: 20374 us
save file: results-cpu-037-v96.txt
cpu-054 time per event, us: 21694 us
save file: results-cpu-054-v96.txt
cpu-050 time per event, us: 21793 us
save file: results-cpu-050-v96.txt
cpu-034 time per event, us: 22151 us
save file: results-cpu-034-v96.txt
cpu-061 time per event, us: 22809 us
save file: results-cpu-061-v96.txt
cpu-059 time per event, us: 22978 us
save file: results-cpu-059-v96.txt
cpu-056 time per event, us: 23415 us
save file: results-cpu-056-v96.txt
cpu-057 time per event, us: 23422 us
save file: results-cpu-057-v96.txt
cpu-055 time per event, us: 23598 us
save file: results-cpu-055-v96.txt
cpu-052 time per event, us: 23598 us
save file: results-cpu-052-v96.txt
cpu-047 time per event, us: 25059 us
save file: results-cpu-047-v96.txt
cpu-032 time per event, us: 24526 us
save file: results-cpu-032-v96.txt
cpu-042 time per event, us: 25774 us
save file: results-cpu-042-v96.txt
cpu-038 time per event, us: 25807 us
save file: results-cpu-038-v96.txt
cpu-043 time per event, us: 25756 us
save file: results-cpu-043-v96.txt
cpu-033 time per event, us: 25837 us
save file: results-cpu-033-v96.txt
cpu-040 time per event, us: 25531 us
save file: results-cpu-040-v96.txt
cpu-036 time per event, us: 25541 us
save file: results-cpu-036-v96.txt
cpu-046 time per event, us: 25871 us
save file: results-cpu-046-v96.txt
cpu-041 time per event, us: 24943 us
save file: results-cpu-041-v96.txt
cpu-045 time per event, us: 26104 us
save file: results-cpu-045-v96.txt
cpu-039 time per event, us: 25723 us
save file: results-cpu-039-v96.txt
cpu-044 time per event, us: 25928 us
save file: results-cpu-044-v96.txt
cpu-035 time per event, us: 25595 us
save file: results-cpu-035-v96.txt
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ tail -20 results-cpu-035-v96.txt
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ tail -20 results-cpu-035-v96.txt
483 t,s: 25419.392541 dt,us: 6775
484 t,s: 25419.399317 dt,us: 6086
485 t,s: 25419.405404 dt,us: 6654
486 t,s: 25419.412059 dt,us: 4871
487 t,s: 25419.416931 dt,us: 4087
488 t,s: 25419.421018 dt,us: 3582
489 t,s: 25419.424601 dt,us: 3766
490 t,s: 25419.428367 dt,us: 3624
491 t,s: 25419.431992 dt,us: 3544
492 t,s: 25419.435537 dt,us: 2762
493 t,s: 25419.438299 dt,us: 2159
494 t,s: 25419.440459 dt,us: 2348
495 t,s: 25419.442807 dt,us: 1916
496 t,s: 25419.444725 dt,us: 1201
497 t,s: 25419.445926 dt,us: 1234
498 t,s: 25419.447161 dt,us: 2814
499 t,s: 25419.449976 dt,us: 1534
cpu-035 time per event, us: 25595 us
begin event loop time_since_epoch, sec: 1729725406.654 offset: 1729700000
Start-stop time
Plots show start(blue)-stop(red) time along x axis vs cpu index along y axis for mpirun with 4, 8, 16, 32, 64, 80, and 96 cpus.
Each cpu job generates its own random arrays for constants and data for 500 events and process them wiith. calib method.
Results
v04 tmed_sel: 701 us
v08 tmed_sel: 684 us
v16 tmed_sel: 1106 us
v32 tmed_sel: 5488 us
v64 tmed_sel: 28244 us
v80 tmed_sel: 13018 us
v96 tmed_sel: 26146 us
2024-10-29 tests with mpi
Code difference since 2024-10-23
- use struct pixs - to combine constants per pixel close in memory
- beside loop over EVENTS=100 add loop over NLOOPS=100 - to increase time for each cpu
- reduce number of instructions for time measurement - for loop only
// g++ -O3 -o test_cpo -g test_cpo.cc
// ../lcls2/psana/psana/pycalgos/test_cpo
// mpirun -n 4 ../lcls2/psana/psana/pycalgos/test_cpo
// mpirun -n 64 ../lcls2/psana/psana/pycalgos/test_cpo
#define NLOOPS 100
#define EVENTS 100
#define SIZE 16*352*384
#define M14 0x3fff // 16383 or (1<<14)-1 - 14-bit mask
#include <stdint.h>
#include <stdlib.h>
#include <chrono>
#include <iostream>
#include <cstdint> // uint8_t
void calibrate(uint16_t* raw, uint8_t* mask, float* gain, float* ped, float* result) {
uint16_t* end = raw+SIZE;
while (raw<end) {
*result = ((*raw & M14) - *ped)*(*gain)*(*mask);
raw++; ped++; gain++; mask++; result++;
}
}
int main() {
uint16_t* raw = (uint16_t*)malloc(EVENTS*SIZE*sizeof(uint16_t));
uint8_t* mask = (uint8_t*)malloc(SIZE*sizeof(uint8_t));
float* result = (float*)malloc(SIZE*sizeof(float));
float* ped = (float*)malloc(SIZE*sizeof(float));
float* gain = (float*)malloc(SIZE*sizeof(float));
for (int i=0; i<EVENTS*SIZE; i++) {
raw[i]=1234;
}
for (int i=0; i<SIZE; i++) {
mask[i]=1;
//result[i]=0.0;
ped[i]=1233.1;
gain[i]=1.234;
}
std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
for (int n=0; n<NLOOPS; n++){
for (int i=0; i<EVENTS; i++){
calibrate(raw+i*SIZE, mask, gain, ped, result);
}
}
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
std::cout << "NLOOPS: " << NLOOPS << " EVENTS: " << EVENTS << std::endl;
std::cout << "Time per event = " << (std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count())/EVENTS/NLOOPS << "[us]" << std::endl;
}
#define RAWD_T uint16_t
#define MASK_T uint8_t
#define GAIN_T float
#define PEDS_T float
#define REST_T float
struct pixstr {
MASK_T mask;
PEDS_T ped;
GAIN_T gain;
REST_T rest;
};
void calib_v0(RAWD_T* raw, MASK_T* mask, GAIN_T* gain, PEDS_T* ped, REST_T* res) {
RAWD_T* end = raw+PSIZE;
while (raw<end) {
*res = ((*raw & M14) - *ped)*(*gain)*(*mask);
raw++; ped++; gain++; mask++; res++;
}
}
void calib(RAWD_T* raw, pixstr* pu) {
RAWD_T* end = raw+PSIZE;
while (raw<end) {
(*pu).rest = ((*raw & M14) - (*pu).ped)*((*pu).gain)*((*pu).mask);
raw++; pu++;
}
}
void calib(RAWD_T* raw, pixstr* pu) {
RAWD_T* end = raw+PSIZE;
while (raw<end) {
pu->rest = ((*raw & M14) - (pu->ped)) * (pu->gain) * (pu->mask);
raw++; pu++;
}
}
#define PSIZE 16*352*384 // // 100000 // 2162688 = 16*352*384
#define NLOOPS 100
#define EVENTS 100
void test_calib_simulation_mpi(int argc, char* argv[])
{
int icpu = sched_getcpu();
std::stringstream sscpu; sscpu << "cpu-" << std::setfill('0') << std::setw(3) << std::right << icpu;
std::string scpu = sscpu.str();
time_t t0 = time_now();
RAWD_T* rawd = (RAWD_T*)malloc(EVENTS*PSIZE*sizeof(RAWD_T));
MASK_T* mask = (MASK_T*)malloc(PSIZE*sizeof(MASK_T));
REST_T* rest = (REST_T*)malloc(PSIZE*sizeof(REST_T));
PEDS_T* peds = (PEDS_T*)malloc(PSIZE*sizeof(PEDS_T));
GAIN_T* gain = (GAIN_T*)malloc(PSIZE*sizeof(GAIN_T));
pixstr* pixs = (pixstr*)malloc(PSIZE*sizeof(pixstr));
std::cout << scpu
<< " test_calib_simulation_mpi time for malloc: "
<< duration_us(time_now() - t0).count() << " us" << std::endl;
t0 = time_now();
standard_normal_array<RAWD_T>(1000., 10., PSIZE*EVENTS, rawd);
standard_normal_array<PEDS_T>(1000., 10., PSIZE, peds);
standard_normal_array<GAIN_T>(20., 1., PSIZE, gain);
random_array_0or1<MASK_T>(0.9, PSIZE, mask);
for (int i=0; i<PSIZE; i++){
pixs[i].mask = mask[i];
pixs[i].ped = peds[i];
pixs[i].gain = gain[i];
pixs[i].rest = rest[i];
}
if (icpu == 0){
std::cout << scpu << " time for random data and constants: "
<< duration_us(time_now() - t0).count() << " us";
std::cout << "\n rawd: "; for (int i=0; i<10; i++){std::cout << rawd[i] << ' ';}
std::cout << "\n peds: "; for (int i=0; i<10; i++){std::cout << peds[i] << ' ';}
std::cout << "\n gain: "; for (int i=0; i<10; i++){std::cout << gain[i] << ' ';}
std::cout << "\n mask: "; for (int i=0; i<10; i++){std::cout << unsigned(mask[i]) << ' ';}
std::cout << "\n " << scpu << " events: " << std::to_string(EVENTS)
<< " panel size:" << std::to_string(PSIZE) << std::endl;
}
double times_s[NLOOPS];
double durats_us[NLOOPS];
struct timespec tbeg, tcur;
int status = clock_gettime(CLOCK_REALTIME, &tbeg);
time_t tt0 = time_now();
for (int n=0; n<NLOOPS; n++){
status = clock_gettime(CLOCK_REALTIME, &tcur);
t0 = time_now();
for (int i=0; i<EVENTS; i++){
//calib_v0(rawd+i*PSIZE, mask, gain, peds, rest);
calib(rawd+i*PSIZE, pixs);
}
durats_us[n] = duration_us(time_now() - t0).count() / EVENTS;
times_s[n] = time_sec(tcur) - TOFSET;
}
int time_per_event_us = duration_us(time_now() - tt0).count() / EVENTS / NLOOPS;
std::cout << scpu << " NLOOPS: " << NLOOPS << " EVENTS: " << EVENTS << std::endl;
std::cout << scpu << " time per event: " << time_per_event_us << " us" << std::endl;
//std::stringstream fname; fname << "results-" << scpu << "-v80.txt";
std::string version = (argc>2)? argv[2] : "vXX";
std::stringstream fname; fname << "results-" << scpu << '-' << version << ".txt";
std::cout << "save file: " << fname.str() << std::endl;
std::ofstream ofile;
ofile.open(fname.str());
for (int i=0; i<NLOOPS; i++){
ofile << std::setw(3) << std::right << i;
ofile << std::fixed
<< std::setprecision(6);
ofile << " t,s:" << std::setw(14) << times_s[i];
ofile << " dt,us: " << std::setprecision(0) << std::setw(10) << durats_us[i] << std::endl;
}
ofile << '\n' << scpu << " time per event: " << time_per_event_us << " us" << std::endl;
ofile << "begin event loop time_since_epoch, sec: "
<< std::setw(14) << std::setprecision(3)
<< time_sec(tbeg)
<< " offset: " << TOFSET << std::endl;
ofile.close();
}
Start-stop time
calib_v0 - WITHOUT struct
calib - WITH struct
Results
WITHOUT struct
ps-4.6.3 [dubrovin@sdfiana004:~/LCLS/con-lcls2/2024-10-29-test-calib-mpi]$ ../lcls2/psana/psana/pycalgos/test_calib_sim_proc.py 0
[655. 662. 738. 684.]
vers-v04 tmed_sel: 673.000 us
[686. 660. 691. 698. 683. 691. 662. 651.]
vers-v08 tmed_sel: 684.500 us
[ 902. 867. 1270.5 1270. 887.5 901. 891. 894. 701. 714.5
713. 696. 707. 741. 719. 679. ]
vers-v16 tmed_sel: 804.000 us
[2147. 1774. 1667. 2138. 2761.5 2882. 3215.5 3049. 4044.5 3792.5
3293.5 3722.5 3536.5 2921.5 4321.5 2914. 3710. 3673. 3462. 3847.5
3201. 3455.5 3699. 3333.5 5042.5 1712. 3093.5 1417. 2212. 2989.5
2979. 2367.5]
vers-v32 tmed_sel: 3147.250 us
[6093.5 5976. 3803. 5847. 3830. 5775.5 6039. 5991. 9122.5 9182.5
8612.5 9522.5 9444. 9625. 8601.5 9349. 5564.5 4952.5 4452. 4910.
4798.5 5558.5 5180. 4985.5 4278.5 6675.5 4751.5 6734. 6782.5 4611.5
6559. 3879. 5412. 4424. 5790.5 7023. 5237. 5033.5 4383.5 4814.5
6688.5 3735. 6699. 3070. 3628.5 4030. 6482. 6734.5 5849. 6543.5
6387. 5711. 6133. 6411.5 6160. 5054. 5919.5 5927. 5932. 5731.
5743.5 5835. 5358. 5631.5]
vers-v64 tmed_sel: 5812.750 us
[ 9114.5 8477. 9651.5 9360. 9396. 9147. 8780. 8705.5 9522.
10188.5 10050.5 9569.5 9256. 9469. 9528. 9410. 9777.5 9642.
12993.5 15383. 13701.5 13568. 14442. 14046. 13572. 14404. 12163.5
14579. 12906.5 12162.5 14432.5 13915.5 12282. 14164. 13221.5 14284.
13257. 14616.5 8478.5 13870.5 9972. 10462.5 9499.5 11857.5 12074.5
9767.5 12130. 10941. 9896. 12578.5 8909. 10594.5 11424. 11449.5
6321.5 11541. 10492.5 5870. 11554. 8126.5 9913.5 9754. 12638.5
14634.5 13241.5 8302. 7723.5 13560. 10189. 14148. 10893.5 3873.5
9580. 6394.5 10004.5 11660.5 13060.5 5162. 11885. 5410.5]
vers-v80 tmed_sel: 10543.500 us
[16550.5 12927. 11448.5 11659.5 10667.5 10602.5 16205.5 11793.5 10502.5
11115.5 10150.5 9693. 19591. 19411. 18271.5 19618. 17706. 18302.5
20111. 18211. 17372.5 17029.5 16757. 17514.5 16426. 16444.5 16154.5
16320. 16239. 17003.5 16431. 17180. 17514. 15808.5 15936.5 16020.5
11964.5 12068. 11431. 11541.5 11257.5 10805. 11759. 10783.5 11748.5
11533. 10559.5 11501.5 8935. 12518.5 12722.5 12491. 12445. 12700.
13166.5 12859.5 12760. 12943. 13120. 9103. 15802. 16433. 14994.5
15505.5 16394. 16506. 15839.5 16057. 15575.5 15631.5 15916.5 16261.5
14161.5 15411.5 15484. 13775. 11333. 14175.5 15346.5 14014.5 10677.
13059.5 14200.5 13001. 10216.5 10119. 10138. 9900.5 9263.5 9298.
10481.5 11008. 10017.5 10170. 9302.5 9964.5]
vers-v96 tmed_sel: 13089.750 us
WITH struct
ps-4.6.3 [dubrovin@sdfiana004:~/LCLS/con-lcls2/2024-10-29-test-calib-mpi]$ ../lcls2/psana/psana/pycalgos/test_calib_sim_proc.py 10
[2170. 2146. 2129.5 2138.5]
vers-s04 tmed_sel: 2142.250 us
[2173. 2180. 2130. 2145. 2171.5 2169. 2142.5 2145. ]
vers-s08 tmed_sel: 2157.000 us
[2288. 2262. 3026. 3052. 2269. 2259. 2251. 2254. 2295. 2253.
2254.5 2268. 2245.5 2260. 2265. 2267. ]
vers-s16 tmed_sel: 2263.500 us
[4780. 4796.5 4774. 4776.5 6258. 6284.5 6300. 6322. 4774. 4765.5
4777. 4805. 4820. 4807. 4802. 4819. 4825.5 4812. 4836. 4848.5
4802. 4818.5 4824.5 4847.5 4801. 4804. 4831.5 4839.5 4817. 4864.
4801. 4853. ]
vers-s32 tmed_sel: 4817.750 us
[ 8306.5 13282. 13358. 13412. 8195.5 13425.5 13383. 13496.5 14567.
14525. 14469.5 14516. 14545.5 14480.5 14576. 14583. 24729.5 24777.5
24789. 24797. 24728.5 24867.5 24867.5 24870.5 13074. 25406. 25570.
13590. 13116.5 20931.5 11182.5 21530. 9274.5 12976. 13194. 13424.
9441. 13224.5 13195. 13220. 12862. 10347.5 12876.5 13279. 10044.
13051. 13128. 13150.5 13344. 8406. 13375. 13474. 13356.5 13399.5
8500. 13495.5 13071.5 9189.5 13307.5 13299. 9350. 12196.5 13278.5
13198. ]
vers-s64 tmed_sel: 13350.250 us
[25933. 25913. 25879. 25939.5 25960. 25881. 25949.5 26157. 25972.5
26011.5 36925.5 37028. 37053.5 37542. 36622.5 36928.5 37571.5 37509.5
40781. 40651.5 41444.5 41419.5 41470. 41375. 41499. 41538. 41598.
41509.5 40990. 41614. 18383.5 35410. 36254.5 36407.5 36343.5 28216.5
20603.5 35878.5 36097. 35472.5 19491.5 34830. 34842. 17331.5 35183.
35044.5 17942.5 35293. 20891.5 17194.5 16066. 11550.5 16953.5 17074.5
17227. 10389. 17858. 17958.5 18111.5 17971.5 17158. 9726. 17269.
17258. 17291.5 17841. 17830. 9633. 17984.5 17945. 17858. 25352.
24370.5 21753. 25108.5 24745.5 27712. 25151.5 27495.5 27630. ]
vers-s80 tmed_sel: 25954.750 us
[ 46949.5 47818. 45682. 14672.5 4791. 10277. 44990. 46062.5
16817.5 47038. 9210.5 4779.5 44849.5 45122. 44938. 45048.
22662.5 44665.5 44882.5 24278.5 73613. 60302.5 30383.5 33713.5
27477.5 30422. 32844.5 32522. 54694. 37028. 27972. 34785.5
35219.5 36218.5 36685. 70136. 35885.5 33949.5 15809.5 35992.
36394.5 2899.5 3518.5 29984. 36764. 3561.5 36717.5 36960.
114616.5 116601. 117007.5 116855. 116884. 116603. 116627. 116781.5
116992.5 116968.5 117192. 117397. 64190. 63651. 64902. 64559.
64347.5 65072.5 71813. 64748. 64059.5 64350.5 64884. 64547.5
31874. 31778.5 31823. 31835. 31880.5 31893. 31790. 31925.5
31878. 31894.5 31958. 31980. 68739. 68802. 68627.5 68811.
68832. 68890. 69332. 69280.5 69016. 69531. 65626.5 68219. ]
vers-s96 tmed_sel: 44964.000 us
- of CPUs
| WITHOUT struct , μs max over cpus | WITH struct , μs max over cpus | ←the same, but (*pu).ped (pu→ped) on sdfmilan122 min-max over cpus | test-scaling-mpi-epix10ka.py 50 arrf = ((rawa[i,:] & M14) - peds) * gain * mask min-max over cpus | Old test w/o nloops: test-scaling-mpi-epix10ka.py 85 pytonized/cytonized/c++ calib | Old test w/o nloops: test-scaling-mpi-epix10ka.py 81 arrf = ((raw & M14) - peds)*gain arrf = np.select((mask>0,), (arrf,), default=0) |
---|
test_cpo | 780, 991, 711 | sdfmilan011 | 704, 600, 602 on sdfmilan122 | sdfmilan122 | sdfmilan011 | sdfmilan011 |
mpirun -n 64 test_cpo | 3179, 3325, 3402,... 8819, 8828, 8839 |
|
|
|
|
|
|
|
|
|
|
|
|
1 NO MPI - test_calib_sim 2 v01 | 689, 686, 681 <- in 3 jobs -> | 2194, 2207, 2216 | 2116, 2122, 2115 | 3310, 3422 | 638, 645,620 )* | 7380 |
4 | 673 | 2142 | 2083-2114 | 2488-3524 | 650 )** | 8219 |
8 | 684 | 2157 | 2075-2130 | 2847-3688 | 645 | 8257 |
16 | 804 | 2264 | 2165-2966 | 5220-9106 | 630 | 9221 |
32 | 3147 | 4818 | 4565-6233 | 12598-27739 | 4035 | 11892 |
64 | 5812 | 13350 | 6742-22157 | 10900-28674 | 8039 | 21043, 32261 |
80 | 10543 | 25955 | 9933-49393 | 12038-75235 | 9598 | 42770, 52406 |
96 | 13090 | 44964 | 14471-73196 | 18341-33499 | 10471 | 36673, 26515 |
ud.calib_std
)* ps-4.6.3 [dubrovin@sdfmilan011:~/LCLS/con-lcls2/2024-10-29-test-calib-mpi]$ ../lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 85
)** ps-4.6.3 [dubrovin@sdfmilan011:~/LCLS/con-lcls2/2024-10-29-test-calib-mpi]$ mpirun -n 4 ../lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 85
def calib_std(raw, peds, gain, mask, databits, out):
"""assume that all numpy arrays have the same shape"""
t0_sec = time()
dt_us_cpp = udext.cy_calib_std(raw.ravel(), peds.ravel(), gain.ravel(), mask.ravel(), raw.size, databits, out.ravel())
return dt_us_cpp, (time()-t0_sec)*1e6
Summary
- timing per event are consistent between test_cpo and test_calib_sim for single core processing ~0.7ms
- scaling is pure for test_cpo between 1cpu and mpirun -n64: 0.7 → 10x0.7
- using struct - decrease calib performance x3
- scalabilipy in mpirun -n## is poor for both WITH/OUT struct
References