Content

2024-08-30 timing of the calib components in the event loop without mpi

Dataset and Detector

ds = DataSource(exp='uedcom103',run=812) # dark run

det = orun.Detector('epixquad')

Script or test

/lcls2/psana/psana/detector/test-scaling-mpi-epix10ka.py

Code of the det.raw.calib method with removed common mode correction

Code of the det.raw.calib method with removed common mode correction
def calib_epix10ka_any_local_v2(det_raw, evt, **kwa):
    """ v2: get rid of common mode correction
    """
    t0 = time()
    nda_raw = kwa.get('nda_raw', None)
    raw = det_raw.raw(evt) if nda_raw is None else nda_raw # shape:(352, 384) or suppose to be later (<nsegs>, 352, 384) dtype:uint16
    if ue.cond_msg(raw is None, msg='raw is None'): return None

    t1 = time()
    gmaps = ue.gain_maps_epix10ka_any(det_raw, evt) #tuple: 7 x shape:(4, 352, 384)
    if ue.cond_msg(gmaps is None, msg='gmaps is None'): return None

    t2 = time()
    store = det_raw._store_ = ue.Storage(det_raw, **kwa) if det_raw._store_ is None else det_raw._store_  #perpix=True
    store.counter += 1
    if store.counter < 1: ue.print_gmaps_info(gmaps)

    t3 = time()
    factor = ue.event_constants_for_gmaps(gmaps, store.gfac, default=1)  # 3d gain factors
    pedest = ue.event_constants_for_gmaps(gmaps, store.peds, default=0)  # 3d pedestals

    t4 = time()
    arrf = np.array(raw & det_raw._data_bit_mask, dtype=np.float32)

    t5 = time()
    if pedest is not None: arrf -= pedest

    logger.debug(ue.info_ndarr(arrf, 'arrf:'))
    if ue.cond_msg(factor is None, msg='factor is None - substitute with 1', output_meth=logger.warning): factor = 1

    t6 = time()
    mask = store.mask
    res = arrf * factor if mask is None else arrf * factor * mask # gain correction

    t7 = time()
    return res, (t0, t1, t2, t3, t4, t5, t6, t7)

Time intervals meaning

  • t0 - total time consumed by the calib method
  • t1 - access raw = det_raw.raw
  • t2 - access gain_maps
  • t3 - access cached store of calibration constants
  • t4 - evaluate peds and gact - combined constants for gain range
  • t5 - make re-writable raw array, arrf, for data bits only, truncate the gain bit
  • t6 - subtract pedestals
  • t7 - evaluate arrf * factor * mask

Test and results

ps-4.6.3 [dubrovin@sdfiana004:~/LCLS/con-lcls2/lcls2/psana/psana/detector]$ srun --partition milano --account lcls:prjdat21 -n 1 --time=05:00:00 --exclusive --pty /bin/bash
> sdfmilan063

OTHER WINDOW:

kinit

ssh -Y sdfmilan063

cd ~/LCLS/con-lcls2/lcls2

. setup_env.sh

./psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 2


dt, ms:  t00          t01          t02         t03          t04          t05          t06          t07

on sdfiana004 - shared resource

medi:    7.2465    0.1516    1.8420    0.0021    2.6863    0.5474    0.9129    1.1203

on reserved sdfmilan063 a few attempts of the event loop over 100 events
medi:    9.9332    0.2401    3.5620    0.0043    3.8865    0.4408    0.7961    1.0107  

medi:    8.5163    0.2205    2.3572    0.0031    3.2568    0.5431    0.6912    1.0154

medi:    7.2405    0.1922    2.2745    0.0021    2.7044    0.4904    0.2604    1.0061

medi:    5.3618    0.1471    1.7295    0.0024    1.7509    0.2248    0.7367    0.6764

medi:    6.5324    0.1631    2.0473    0.0019    2.2929    0.4990    0.2739    1.0476

medi:    4.9076    0.0868    1.5934    0.0026    1.8897    0.1476    0.5717    0.7081

2024-09-09 test with mpi

det.raw.calib code with timing points

det.raw.calib code with timing points
def calib_epix10ka_any_local_v2(det_raw, evt, **kwa):
    """ v2: add time points, get rid of common mode correction
    """
    t0 = time()
    nda_raw = kwa.get('nda_raw', None)
    raw = det_raw.raw(evt) if nda_raw is None else nda_raw # shape:(352, 384) or suppose to be later (<nsegs>, 352, 384) dtype:uint16
    if ue.cond_msg(raw is None, msg='raw is None'): return None

    t1 = time()
    gmaps = ue.gain_maps_epix10ka_any(det_raw, evt) #tuple: 7 x shape:(4, 352, 384)
    if ue.cond_msg(gmaps is None, msg='gmaps is None'): return None

    t2 = time()
    store = det_raw._store_ = ue.Storage(det_raw, **kwa) if det_raw._store_ is None else det_raw._store_  #perpix=True
    store.counter += 1
    if store.counter < 1: ue.print_gmaps_info(gmaps)

    t3 = time()
    factor = ue.event_constants_for_gmaps(gmaps, store.gfac, default=1)  # 3d gain factors
    pedest = ue.event_constants_for_gmaps(gmaps, store.peds, default=0)  # 3d pedestals

    t4 = time()
    arrf = np.array(raw & det_raw._data_bit_mask, dtype=np.float32)

    t5 = time()
    if pedest is not None: arrf -= pedest

    logger.debug(ue.info_ndarr(arrf, 'arrf:'))
    if ue.cond_msg(factor is None, msg='factor is None - substitute with 1', output_meth=logger.warning): factor = 1

    if store.cmpars is not None:
        ue.common_mode_epix_multigain_apply(arrf, gmaps, store)

    t6 = time()
    mask = store.mask
    res = arrf * factor if mask is None else arrf * factor * mask # gain correction

    t7 = time()
    return res, (t0, t1, t2, t3, t4, t5, t6, t7)

Timing results

32 and 80 cpus in mpi
ps-4.6.3 [dubrovin@sdfmilan202:~/LCLS/con-lcls2]$ python ./lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 99                                                                   

dt, ms:  t00       t01       t02       t03       t04       t05       t06       t07
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:000/032 cpu:010 number of recs: 0
medi:    4.4625    0.0668    1.4832    0.0014    1.8053    0.1662    0.3390    0.5541 rank:016/032 cpu:011 number of recs: 100
medi:    4.4079    0.0591    1.3950    0.0019    1.7035    0.1221    0.3853    0.5271 rank:008/032 cpu:005 number of recs: 100
medi:    4.4863    0.0591    1.3795    0.0012    1.8027    0.1166    0.5839    0.5794 rank:012/032 cpu:078 number of recs: 100
medi:    4.8709    0.0613    1.4193    0.0012    1.7776    0.1583    0.6139    0.8502 rank:004/032 cpu:077 number of recs: 100
medi:    6.5660    0.1509    1.8942    0.0019    2.3808    0.4027    0.6335    0.8013 rank:014/032 cpu:107 number of recs: 100
medi:    5.0323    0.0982    1.5402    0.0012    1.9429    0.2332    0.3085    0.6614 rank:006/032 cpu:106 number of recs: 100
medi:    4.1854    0.0622    1.4117    0.0012    1.7064    0.1297    0.2422    0.4416 rank:002/032 cpu:047 number of recs: 100
medi:    4.7133    0.0758    1.4617    0.0014    1.8322    0.1779    0.5119    0.5944 rank:010/032 cpu:033 number of recs: 100
medi:    6.8538    0.1929    2.1393    0.0019    2.4624    0.4187    0.5145    0.8738 rank:030/032 cpu:097 number of recs: 100
medi:    4.6625    0.0696    1.4958    0.0012    1.8172    0.1550    0.2692    0.6146 rank:028/032 cpu:066 number of recs: 100
medi:    4.5612    0.0603    1.4145    0.0012    1.7626    0.1309    0.6027    0.6244 rank:026/032 cpu:032 number of recs: 100
medi:    4.3330    0.0608    1.3919    0.0010    1.7300    0.1273    0.2801    0.3810 rank:018/032 cpu:043 number of recs: 100
medi:    4.6442    0.0663    1.4257    0.0010    1.8663    0.1822    0.6874    0.4866 rank:020/032 cpu:067 number of recs: 100
medi:    6.8028    0.1123    2.1322    0.0019    2.4273    0.4113    0.6311    0.6907 rank:022/032 cpu:103 number of recs: 100
medi:    4.3499    0.0603    1.3831    0.0012    1.6901    0.1256    0.2396    0.7706 rank:024/032 cpu:002 number of recs: 100
medi:    7.5989    0.2463    2.1796    0.0019    2.6369    0.5052    0.5240    0.9267 rank:029/032 cpu:081 number of recs: 100
medi:    7.8244    0.2604    2.3451    0.0019    2.7165    0.6130    0.6328    1.0064 rank:005/032 cpu:080 number of recs: 100
medi:    6.8302    0.2003    2.0401    0.0014    2.4984    0.4439    0.7031    0.9379 rank:021/032 cpu:093 number of recs: 100
medi:    6.0575    0.1566    1.8210    0.0014    2.1310    0.3321    0.5412    0.7870 rank:013/032 cpu:088 number of recs: 100
medi:    4.4155    0.0696    1.4629    0.0012    1.7750    0.1688    0.2556    0.4897 rank:007/032 cpu:126 number of recs: 100
medi:    7.9112    0.2367    2.3787    0.0024    2.7568    0.5982    0.5360    1.0090 rank:031/032 cpu:121 number of recs: 100
medi:   10.4554    0.3047    3.3541    0.0033    3.3829    0.7384    0.8595    1.2519 rank:023/032 cpu:119 number of recs: 100
medi:    7.7000    0.1955    2.4045    0.0029    2.7111    0.3808    0.8230    0.8404 rank:015/032 cpu:114 number of recs: 100
medi:    4.7333    0.0815    1.4558    0.0014    1.7860    0.1791    0.6092    0.6125 rank:003/032 cpu:049 number of recs: 100
medi:    5.2340    0.0913    1.5881    0.0017    1.9169    0.2320    0.4444    0.7405 rank:027/032 cpu:050 number of recs: 100
medi:    5.0166    0.0982    1.5976    0.0017    1.9584    0.2248    0.2692    0.6938 rank:009/032 cpu:026 number of recs: 100
medi:    6.7837    0.1671    2.0921    0.0026    2.3327    0.3541    0.7384    0.8953 rank:017/032 cpu:024 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:001/032 cpu:027 number of recs: 0
medi:    6.4714    0.1876    2.0485    0.0021    2.2919    0.3479    0.6726    0.9446 rank:025/032 cpu:025 number of recs: 100
medi:    5.2359    0.0956    1.5531    0.0014    2.0502    0.2713    0.4284    0.7312 rank:019/032 cpu:059 number of recs: 100
medi:    5.0220    0.0882    1.4799    0.0014    1.8868    0.1721    0.6988    0.7000 rank:011/032 cpu:057 number of recs: 100

summary-uedcom103-r0095-ncpu-032.txt
mean:    5.7407    0.1245    1.7723    0.0016    2.1180    0.2873    0.5193    0.7339 for 30 fully loaded cpus



dt, ms:  t00       t01       t02       t03       t04       t05       t06       t07                                                                                                              
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:000/080 cpu:012 number of recs: 0                                                                    
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:035/080 cpu:055 number of recs: 25                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:037/080 cpu:083 number of recs: 25                                                                   
medi:    4.7834    0.0699    1.6897    0.0012    2.0449    0.1559    0.2739    0.5205 rank:036/080 cpu:067 number of recs: 100                                                                  
medi:    5.5213    0.0713    1.7049    0.0014    2.1768    0.1652    0.4768    0.8631 rank:032/080 cpu:011 number of recs: 100                                                                  
medi:    5.3682    0.0687    1.5824    0.0014    2.0261    0.1438    0.5341    0.9081 rank:034/080 cpu:042 number of recs: 100                                                                  
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:033/080 cpu:027 number of recs: 29                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:025/080 cpu:028 number of recs: 25                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:027/080 cpu:051 number of recs: 33                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:023/080 cpu:112 number of recs: 25                                                                   
medi:    5.4362    0.0811    1.7524    0.0014    2.1808    0.1893    0.2961    0.7646 rank:018/080 cpu:036 number of recs: 100                                                                  
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:031/080 cpu:119 number of recs: 25                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:029/080 cpu:082 number of recs: 25                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:021/080 cpu:094 number of recs: 29                                                                   
medi:    5.7065    0.1163    1.7970    0.0014    2.1470    0.2491    0.5081    0.8142 rank:028/080 cpu:078 number of recs: 100                                                                  
medi:    6.2273    0.1183    1.9257    0.0021    2.2767    0.2582    0.5827    0.8440 rank:020/080 cpu:076 number of recs: 100                                                                  
medi:    5.7538    0.0975    1.7393    0.0017    2.2135    0.1929    0.5908    0.7372 rank:016/080 cpu:015 number of recs: 100                                                                  
medi:    5.3303    0.0727    1.7507    0.0014    2.1241    0.1690    0.2680    0.7360 rank:024/080 cpu:003 number of recs: 100                                                                  
medi:    5.2130    0.0725    1.5800    0.0014    1.9674    0.1569    0.8097    0.7577 rank:022/080 cpu:098 number of recs: 100                                                                  
medi:    4.9729    0.0732    1.6198    0.0014    2.0058    0.1867    0.3698    0.7305 rank:030/080 cpu:106 number of recs: 100                                                                  
medi:    5.8093    0.1056    1.7817    0.0014    2.2006    0.2313    0.7620    0.8721 rank:026/080 cpu:032 number of recs: 100                                                                  
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:019/080 cpu:052 number of recs: 25                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:017/080 cpu:026 number of recs: 25                                                                   
medi:    5.1904    0.0730    1.5664    0.0014    2.0349    0.1826    0.2835    0.6614 rank:014/080 cpu:104 number of recs: 100                                                                  
medi:    4.7851    0.0718    1.6029    0.0014    1.9958    0.1776    0.2861    0.5546 rank:012/080 cpu:075 number of recs: 100                                                                  
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:015/080 cpu:121 number of recs: 33                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:013/080 cpu:093 number of recs: 25                                                                   
medi:    5.0147    0.0770    1.6847    0.0014    2.0275    0.1750    0.2654    0.7322 rank:010/080 cpu:046 number of recs: 100                                                                  
medi:    5.0604    0.0727    1.6952    0.0014    2.1152    0.1628    0.2606    0.8254 rank:008/080 cpu:007 number of recs: 100                                                                  
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:009/080 cpu:025 number of recs: 28                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:011/080 cpu:060 number of recs: 24                                                                   
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:007/080 cpu:125 number of recs: 24                                                                   
medi:    5.1908    0.0730    1.5776    0.0014    2.0099    0.1688    0.3877    0.7532 rank:004/080 cpu:068 number of recs: 100                                                                  
medi:    5.1894    0.0710    1.5886    0.0014    1.9858    0.1545    0.7951    0.6561 rank:006/080 cpu:103 number of recs: 100                                                                  
medi:    4.8039    0.0663    1.5576    0.0014    1.9674    0.1457    0.2797    0.6392 rank:002/080 cpu:035 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:005/080 cpu:092 number of recs: 25
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:003/080 cpu:056 number of recs: 31
medi:    5.2354    0.0727    1.6475    0.0014    2.0664    0.1667    0.2975    0.7975 rank:072/080 cpu:013 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:069/080 cpu:091 number of recs: 21
medi:    5.4092    0.1032    1.8005    0.0017    2.1825    0.2339    0.3221    0.4630 rank:064/080 cpu:002 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:067/080 cpu:049 number of recs: 16
medi:    4.5877    0.0713    1.5218    0.0012    1.9431    0.1781    0.2527    0.5710 rank:068/080 cpu:072 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:065/080 cpu:041 number of recs: 17
medi:    5.4433    0.0823    1.6727    0.0017    2.0676    0.1843    0.3526    0.8087 rank:066/080 cpu:047 number of recs: 100
medi:    5.1517    0.0894    1.6503    0.0014    2.1155    0.2186    0.3095    0.4370 rank:076/080 cpu:064 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:075/080 cpu:058 number of recs: 21
medi:    4.8540    0.0846    1.6973    0.0014    2.1009    0.1535    0.2766    0.4494 rank:074/080 cpu:045 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:073/080 cpu:033 number of recs: 16
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:077/080 cpu:080 number of recs: 17
medi:    4.6051    0.0715    1.5781    0.0014    1.9629    0.1628    0.2522    0.5059 rank:070/080 cpu:111 number of recs: 100
medi:    5.1448    0.0832    1.6077    0.0014    2.0604    0.1988    0.3479    0.7272 rank:078/080 cpu:107 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:079/080 cpu:115 number of recs: 16
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:071/080 cpu:114 number of recs: 17
medi:    5.2035    0.0718    1.6663    0.0019    2.0545    0.1619    0.2637    0.7360 rank:048/080 cpu:001 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:051/080 cpu:059 number of recs: 20
medi:    5.4414    0.0696    1.5652    0.0019    2.2063    0.1512    0.5698    0.7472 rank:050/080 cpu:043 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:053/080 cpu:089 number of recs: 16
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:061/080 cpu:084 number of recs: 16
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:059/080 cpu:050 number of recs: 17
medi:    5.1453    0.0739    1.6716    0.0014    2.1503    0.1819    0.3159    0.5887 rank:058/080 cpu:039 number of recs: 100
medi:    5.2912    0.0749    1.8258    0.0014    1.9712    0.1643    0.3657    0.7515 rank:060/080 cpu:070 number of recs: 100
medi:    4.7836    0.0708    1.6456    0.0017    1.9751    0.1626    0.2484    0.5286 rank:052/080 cpu:073 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:063/080 cpu:122 number of recs: 21
medi:    5.2607    0.0708    1.8601    0.0012    2.0831    0.1698    0.3712    0.7606 rank:062/080 cpu:100 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:055/080 cpu:113 number of recs: 16
medi:    5.0969    0.0715    1.6377    0.0014    2.1002    0.1585    0.2685    0.6599 rank:054/080 cpu:108 number of recs: 100
medi:    5.5714    0.0970    1.6961    0.0014    2.1813    0.1996    0.5479    0.7553 rank:056/080 cpu:009 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:057/080 cpu:031 number of recs: 21
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:045/080 cpu:086 number of recs: 20
medi:    5.3375    0.0761    1.6117    0.0014    2.0356    0.1752    0.4885    0.7393 rank:044/080 cpu:066 number of recs: 100
medi:    4.9605    0.0796    1.5984    0.0014    2.1014    0.1886    0.2751    0.6235 rank:046/080 cpu:096 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:047/080 cpu:127 number of recs: 17
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:049/080 cpu:029 number of recs: 16
medi:    5.2097    0.0713    1.6658    0.0017    2.0304    0.1612    0.2677    0.6661 rank:040/080 cpu:005 number of recs: 100
medi:    5.1026    0.0684    1.6205    0.0012    2.1272    0.1595    0.2685    0.7036 rank:042/080 cpu:034 number of recs: 100
medi:    5.0023    0.0713    1.5657    0.0017    1.9486    0.1631    0.3657    0.7005 rank:038/080 cpu:097 number of recs: 100
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:043/080 cpu:061 number of recs: 17
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:039/080 cpu:124 number of recs: 20
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:041/080 cpu:024 number of recs: 16
medi:    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 rank:001/080 cpu:030 number of recs: 0

summary-uedcom103-r0095-ncpu-080.txt
mean:    5.2101    0.0789    1.6667    0.0015    2.0760    0.1784    0.3861    0.6946 for 39 fully loaded cpus

Without mpi

dt, ms:  t00       t01       t02       t03       t04       t05       t06       t07
medi:    4.2410    0.0598    1.4005    0.0014    1.6875    0.1295    0.2534    0.5906 rank:000/001 cpu:120 number of recs: 100

Ratio

dt, ms:  t00       t01       t02       t03       t04       t05       t06       t07
01:    4.2410    0.0598    1.4005    0.0014    1.6875    0.1295    0.2534    0.5906
32:    5.7407    0.1245    1.7723    0.0016    2.1180    0.2873    0.5193    0.7339 for 30 fully loaded cpus
80:    5.2101    0.0789    1.6667    0.0015    2.0760    0.1784    0.3861    0.6946 for 39 fully loaded cpus 
r32/1  1.35
r80/1  1.23

2024-09-23 simulation

PAY ATTENTION TO RESULT UNITS: μs OR ms !

Node reservation

ps-4.6.3 [dubrovin@sdfiana003:~/LCLS/con-lcls2/lcls2]$ srun --partition milano --account lcls:prjdat21 -n 1 --time=05:00:00 --exclusive --pty /bin/bash
srun: job 56010892 queued and waiting for resources
srun: job 56010892 has been allocated resources
ps-4.6.3 [dubrovin@sdfmilan090:~/LCLS/con-lcls2/lcls2]$

Simulation of numpy arrays in the loop

simulation of numpy arrays for epix10ka
    import psana.pyalgos.generic.NDArrGenerators as ag
    DTYPE_RAWD = np.uint16
    DTYPE_PEDS = np.float32
    DTYPE_GAIN = np.float32
    DTYPE_MASK = np.uint8
    DTYPE_REST = np.float32
    sh = (16, 352, 384)

    for i in range(nloops):
        t0 = time()
        mask = ag.random_0or1(shape=sh, p1=0.90, dtype=DTYPE_MASK)
        peds = ag.random_standard(shape=sh, mu=1000, sigma=100, dtype=DTYPE_PEDS)
        gain = ag.random_standard(shape=sh, mu=5, sigma=1,      dtype=DTYPE_GAIN)
        raw  = ag.random_standard(shape=sh, mu=1000, sigma=100, dtype=DTYPE_RAWD)
        t1 = time()
        arrf = np.array(raw & M14, dtype=DTYPE_REST)
        t2 = time()

Discrete numpy opereations

code for time profiling of discrete numpy operations
        if CALIBMET == SIM0:
            arrf -= peds
            t3 = time()
            arrf *= gain
            t4 = time()
            arrf *= mask
            t5 = time()
            arrf = np.array(raw & M14, dtype=DTYPE_REST)
            t6 = time()
            arrf = (arrf - peds) * gain
            t7 = time()
            arrf = np.array(raw & M14, dtype=DTYPE_REST)
            t8 = time()
            arrf = (arrf - peds) * gain * mask
            t9 = time()
            times = t0, t1, t2, t3, t4, t5, t6, t7, t8, t9

dt, ms:  t00          t01              t02          t03          t04          t05           t06         t07          t08          t09

medi:  225.9027  217.7382    0.6297    0.5398    0.5188    0.4407    0.6311    2.2424    0.5478    2.2854

medi:  225.5189  215.9863    0.6931    0.5149    0.5026    0.4370    0.6458    2.3782    0.5354    2.2491

medi:  228.7627  219.0838    0.6614    0.4798    0.5418    0.4429    0.6727    2.3458    0.5847    2.3065

dtoperation1-time, μs3
dt1simulation of 4 arrays, shape = (16, 352, 384)217,738215,986219,083
dt2arrf = np.array(raw & M14, dtype=DTYPE_REST)630693661
dt3arrf -= peds540515480
dt4arrf *= gain519503542
dt5arrf *= mask441437443
dt6arrf = np.array(raw & M14, dtype=DTYPE_REST)631646673
dt7arrf = (arrf - peds) * gain224223782346
dt8arrf = np.array(raw & M14, dtype=DTYPE_REST)548535585
dt9arrf = (arrf - peds) * gain * mask228522492306

Using select for mask

numpy select
        elif CALIBMET == SIM1:
            arrf = (arrf - peds)*gain
            t3 = time()
            arrf = np.select((mask>0,), (arrf,), default=0) #.astype(DTYPE_REST))
            t4 = time()
            times = t0, t1, t2, t3, t4

dt, ms:  t00          t01               t02           t03          t04

medi:  224.5922  216.8022    0.7602    2.4498    4.7780

medi:  230.6868  222.4416    0.8460    2.6546    4.8482

medi:  226.6681  217.6425    0.8794    2.9192    4.8187

dtoperation1-time, μs23
dt1simulation of 4 arrays, shape = (16, 352, 384)216,802222,441217,642
dt2arrf = np.array(raw & M14, dtype=DTYPE_REST)760846879
dt3arrf = (arrf - peds)*gain245026552919
dt4arrf = np.select((mask>0,), (arrf,), default=0)477848484819

Numpy ufunc operetions

code for numpy ufunk operations
        elif CALIBMET == SIM2:
            np.subtract(arrf, peds, out=arrf)
            t3 = time()
            np.multiply(arrf, gain, out=arrf)
            t4 = time()
            np.multiply(arrf, mask, out=arrf)
            t5 = time()
            times = t0, t1, t2, t3, t4, t5

dt, ms:  t00          t01              t02           t03         04           t05

medi:  220.4628  216.7808    0.6970    1.1839    0.4658    0.4884

medi:  219.9044  216.8930    0.6951    0.8079    0.5212    0.4835

medi:  221.0090  217.4455    0.7130    1.0222    0.4929    0.4852

dtoperation1-time, μs23
dt3np.subtract(arrf, peds, out=arrf)11848081022
dt4np.multiply(arrf, gain, out=arrf)466521493
dt5np.multiply(arrf, mask, out=arrf)488484485

Numpy vectorization

numpy vectorization code
def myfunc(a, p, g):
    return (a - p) * g

uf = np.frompyfunc(myfunc, 3, 1)

vf = np.vectorize(myfunc)


        elif CALIBMET == SIM3:
            arrf = vf(arrf.ravel(), peds.ravel(), gain.ravel())

        elif CALIBMET == SIM4:
            arrf = uf(arrf.ravel(), peds.ravel(), gain.ravel())

ps-4.6.3 [dubrovin@sdfmilan090:~/LCLS/con-lcls2]$  lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 83

dt, ms:  t00       t01       t02       t03

medi:  649.1894  217.1867    1.0426  428.8200

medi:  641.9865  217.1310    1.1019  419.4318

medi:  650.3067  216.7441    1.0673  431.5038

ps-4.6.3 [dubrovin@sdfmilan090:~/LCLS/con-lcls2]$  lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 84

dt, ms:  t00       t01       t02       t03

medi:  506.2655  216.1238   18.0372  273.3681

medi:  505.9680  215.6790   18.2364  271.6496

medi:  505.0484  214.7954   18.1587  272.0717

dtoperation1-time, ms23
dt3(83)arrf = vf(arrf.ravel(), peds.ravel(), gain.ravel())429419432
dt3(84)arrf = uf(arrf.ravel(), peds.ravel(), gain.ravel())273272272

Cytonized/pythonized c++

code for Cytonized/pythonized c++
In C++:
=======
void calib_std(const fraw_t *raw, const peds_t *peds, const gain_t *gain, const mask_t *mask, const size_t& size, fraw_t *out)
{
  for (size_t i=0; i<size; ++i) {
    out[i] = mask[i]>0 ? (raw[i] - peds[i])*gain[i] : 0;
  }
}

In python:
==========
        elif CALIBMET == SIM5:
            ud.calib_std(arrf, peds, gain, mask, arrf)

dt, ms:  t00          t01              t02           t03

medi:  221.0429  216.8013    0.6830    3.3973

medi:  232.0428  224.8044    1.6861    3.5982

medi:  222.6292  218.0824    0.6918    3.4066

dtoperation1-time, ms23
dt3(85)ud.calib_std(arrf, peds, gain, mask, arrf)3.43.63.4


2024-09-30 simulation c++ vs cython vs numpy

Test description

Chris suggested a test of how much time calib-like code consums in C++

Modifications:

  • Fix types in malloc
  • Types brought to consistency in all tests:
    • uint16_t* raw
    • uint8_t* mask
  • Add/use M14 in C++: *raw & M14
  • 16*352*352 →16*352*384
  • #define NEVENT 500
C++ code example from Chris
// g++ -O3 -o test_cpo -g test_cpo.cc

#define EVENTS 500
#define SIZE 16*352*384
#define M14 0x3fff  // 16383 or (1<<14)-1 - 14-bit mask

#include <stdint.h>
#include <stdlib.h>
#include <chrono>
#include <iostream>
#include <cstdint>  // uint8_t

void calibrate(uint16_t* raw, uint8_t* mask, float* gain, float* ped, float* result) {
  uint16_t* end = raw+SIZE;
  while (raw<end) {
    *result = ((*raw & M14) - *ped)*(*gain)*(*mask);
    raw++; ped++; gain++; mask++; result++;
  }
}

int main() {

  uint16_t* raw = (uint16_t*)malloc(EVENTS*SIZE*sizeof(uint16_t));
  uint8_t* mask = (uint8_t*)malloc(SIZE*sizeof(uint8_t));
  float* result = (float*)malloc(SIZE*sizeof(float));
  float* ped = (float*)malloc(SIZE*sizeof(float));
  float* gain = (float*)malloc(SIZE*sizeof(float));

  for (int i=0; i<EVENTS*SIZE; i++) {
    raw[i]=1234;
  }

  for (int i=0; i<SIZE; i++) {
    mask[i]=1;
    //result[i]=0.0;
    ped[i]=1233.1;
    gain[i]=1.234;
  }

  std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();

  for (int i=0; i<EVENTS; i++) {
    calibrate(raw+i*SIZE, mask, gain, ped, result);
  }

  std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();

  std::cout << "Time per event = " << (std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count())/EVENTS << "[us]" << std::endl;
}

My test differs slightly

  • use random numbers to fill out arrays for data and constants
c++ example with correct data types and random gennerators
// g++ -O3 -o test_calib_sim test_calib_sim.cc

// srun --partition milano --account lcls:prjdat21 -n 1 --time=05:00:00 --exclusive --pty /bin/bash
// normal_distribution
//#include <iostream>
#include <string>
#include <random>
#include <chrono>//  time
#include <iomanip>
#include <iostream>

#define time_t std::chrono::steady_clock::time_point
#define time_now std::chrono::steady_clock::now
#define duration_us std::chrono::duration_cast<std::chrono::microseconds>

...
#include <stdint.h>
#include <stdlib.h>
#include <cstdint>  // uint8_t  #define PSIZE 2162688 // // 100000 // 2162688 = 16*352*384
#define EVENTS 500
#define M14 0x3fff  // 16383 or (1<<14)-1 - 14-bit mask

//#define RAWD_T float
#define RAWD_T uint16_t
#define MASK_T uint8_t
#define GAIN_T float
#define PEDS_T float
#define REST_T float

void calib(RAWD_T* raw, MASK_T* mask, GAIN_T* gain, PEDS_T* ped, REST_T* res) {
  RAWD_T* end = raw+PSIZE;
  while (raw<end) {
    *res = ((*raw & M14) - *ped)*(*gain)*(*mask);
     raw++; ped++; gain++; mask++; res++;
  }
}

void test_calib_simulation()
{
  //constants
  //RAWD_T rawd[EVENTS][PSIZE];

  time_t t0 = time_now();

  RAWD_T* rawd = (RAWD_T*)malloc(EVENTS*PSIZE*sizeof(RAWD_T));
  MASK_T* mask = (MASK_T*)malloc(PSIZE*sizeof(MASK_T));
  REST_T* rest = (REST_T*)malloc(PSIZE*sizeof(REST_T));
  PEDS_T* peds = (PEDS_T*)malloc(PSIZE*sizeof(PEDS_T));
  GAIN_T* gain = (GAIN_T*)malloc(PSIZE*sizeof(GAIN_T));

  std::cout << "time for malloc: " << duration_us(time_now() - t0).count() << " us" << std::endl;

  t0 = time_now();
  standard_normal_array<RAWD_T>(1000., 10., PSIZE*EVENTS, rawd);
  standard_normal_array<PEDS_T>(1000., 10., PSIZE, peds);
  standard_normal_array<GAIN_T>(20., 1., PSIZE, gain);
  random_array_0or1<MASK_T>(0.9, PSIZE, mask);

  std::cout << "time for random data and constants: " << duration_us(time_now() - t0).count() << " us" << std::endl;

  //std::cout << "\nrawd: "; for (int i=0; i<10; i++){std::cout << rawd[0][i] << " ";}
  std::cout << "\nrawd: "; for (int i=0; i<10; i++){std::cout << rawd[i] << " ";}
  std::cout << "\npeds: "; for (int i=0; i<10; i++){std::cout << peds[i] << " ";}
  std::cout << "\ngain: "; for (int i=0; i<10; i++){std::cout << gain[i] << " ";}
  std::cout << "\nmask: "; for (int i=0; i<10; i++){std::cout << unsigned(mask[i]) << " ";}
  std::cout << std::endl;

  std::cout << "events: " << std::to_string(EVENTS) << " panel size:" << std::to_string(PSIZE) << std::endl;
  t0 = time_now();
  for (int i=0; i<EVENTS; i++){
     calib(rawd+i*PSIZE, mask, gain, peds, rest);
  }
  std::cout << "time per event: " << duration_us(time_now() - t0).count()/EVENTS << " us" << std::endl;
}

Remarks to discuss

1. There is a difference between memory allocation

    • RAWD_T rawd[EVENTS][PSIZE];
    • RAWD_T* rawd = (RAWD_T*)malloc(EVENTS*PSIZE*sizeof(RAWD_T));

2. In Chris' example #define NEVENT 1000 increased to 2000 cause 

test_cpo.cc:23:36: warning: argument 1 value ‘18446744073049473024’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
   uint16_t* raw = (uint16_t*)malloc(NEVENT*SHAPE*sizeof(uint16_t));
                              ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3. Chris uses for raw uint16_t, in real data processing we have to use float32_t,

that is why in my test the number of events is decreased to a half:

#define EVENTS 500

Test results

ps-4.6.3 [dubrovin@sdfiana003:~/LCLS/con-lcls2/lcls2]$ srun --partition milano --account lcls:prjdat21 -n 1 --time=05:00:00 --exclusive --pty /bin/bash

In other window:

ps-4.6.3 [dubrovin@sdfiana003:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ssh -Y sdfmilan108

cd ~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos/

Tests

  1. ./test_cpo
  2. ./test_calib_sim
  3. ../detector/testman/test-scaling-mpi-epix10ka.py 80 # t9 stands for operation under numpy arrays: ((raw & M14)  - peds) * gain * mask
  4. ../detector/testman/test-scaling-mpi-epix10ka.py 85 # t3  stands for call of cythonized/pythonized C++ calib_std(raw, peds, gain, mask, databits, out)


ps-4.6.3 [dubrovin@sdfmilan108:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ./test_calib_sim
time for malloc: 39 us
time for random data and constants: 52810299 us

rawd: 1007 990 1003 1008 1007 996 1000 1010 994 996
peds: 991.406 1004.71 1008.58 989.066 1012.95 981.918 989.906 996.518 1020.47 1007.59
gain: 19.7885 20.2578 19.5383 21.1444 20.1666 20.0037 18.3353 20.5157 19.5657 20.3046
mask: 1 1 1 1 1 1 0 1 1 1
events: 500 panel size:2162688
time per event: 623 us


ps-4.6.3 [dubrovin@sdfmilan108:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ../detector/testman/test-scaling-mpi-epix10ka.py 80

t09 for numpy arrays: (arrf - peds) * gain * mask

If all arrays are generated in advance, before the event loop

medi:    3.1800    0.0005    0.0002    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    3.1785

medi:    2.9902    0.0005    0.0002    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    2.9888

medi:    3.8408    0.0005    0.0002    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000    3.8399


per event times:

493      1.9538    0.0005    0.0002    0.0000    0.0000    0.0002    0.0000    0.0000    0.0000    1.9529
494      3.8667    0.0005    0.0002    0.0000    0.0000    0.0002    0.0000    0.0000    0.0000    3.8657
495      1.9505    0.0005    0.0005    0.0000    0.0000    0.0002    0.0000    0.0000    0.0000    1.9493
496      3.8567    0.0005    0.0005    0.0000    0.0000    0.0000    0.0002    0.0000    0.0000    3.8555
497      1.9560    0.0007    0.0002    0.0000    0.0000    0.0000    0.0002    0.0000    0.0000    1.9548
498      3.8526    0.0005    0.0000    0.0002    0.0000    0.0000    0.0000    0.0000    0.0002    3.8517
499      1.9510    0.0005    0.0002    0.0000    0.0002    0.0000    0.0000    0.0000    0.0000    1.9500


ps-4.6.3 [dubrovin@sdfmilan108:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ../detector/testman/test-scaling-mpi-epix10ka.py 85

t03  for cythonized/pythonized C++ calib_std(raw, peds, gain, mask, out)

c++ code of calib_std
UtilsDetector.hh:
=================
typedef uint16_t rawd_t;
typedef float    peds_t;
typedef float    gain_t;
typedef float    out_t;
typedef uint8_t  mask_t;

UtilsDetector.cc:
=================
time_t calib_std(const rawd_t *raw, const peds_t *peds, const gain_t *gain, const mask_t *mask, const size_t& size, const rawd_t databits, out_t *out)
{
  const rawd_t *r = raw;
  const peds_t *p = peds;
  const gain_t *g = gain;
  const mask_t *m = mask;
  out_t  *o = out;
  //const rawd_t* end = raw+size;
  time_point_t t0 = time_now();
  while (r<raw+size) {
    *o++ = ((*r++ & databits) - *p++)*(*g++)*(*m++);
    //r++; p++; g++; m++; o++;
  }
  return duration_us(time_now() - t0).count();
}


Per-event time,μs, consumed in  C++, cython, and python, respectively:

dt_us_cpp: 1079.0, dt_us_cy: 1094.6, dt_us_py: 1100.3

dt_us_cpp: 904.0, dt_us_cy: 918.6, dt_us_py: 923.6

dt_us_cpp: 1091.0, dt_us_cy: 1107.2, dt_us_py: 1112.5


If all arrays are generated in advance, before the event loop

medi:    0.6473    0.0002    0.0002    0.6466

medi:    0.6249    0.0002    0.0002    0.6244

medi:    0.6549    0.0005    0.0002    0.6542

Tests on sdfmilan108:

test #test description1 try, time per event, μs2 try3 trycomment
1./test_cpo611620613entire loop in C++
2./test_calib_sim623666604the same as #1, but with random numbers
3/80numpy: arrf = ((raw & M14)  - peds) * gain * mask317629893839random arrays generated before the event loop
3/80ud.calib_std(raw, peds, gain, mask, M14, arrf)646624654measurement shows that 98% of this time is in C++

2024-10-03 tests with mpi

test in c++ without MPI

NO MPI: g++ and ./test_calib_sim
g++ -O3 -o test_calib_sim test_calib_sim.cc

ps-4.6.3 [dubrovin@sdfmilan021:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ./test_calib_sim
argc:1 argv[0]:./test_calib_sim
time for malloc: 15 us
test_calib_simulation: time for random data and constants: 53411244 us

rawd: 999 994 986 998 1004 1005 1005 998 983 1016
peds: 1006.53 991.995 991.25 1010.08 1029.81 1008.97 1004.68 1004.46 1006.12 1009.98
gain: 21.8439 18.5639 20.2683 19.0008 20.4222 19.3143 18.923 20.3292 21.6366 19.7765
mask: 1 1 0 1 1 1 1 0 1 1
events: 500 panel size:2162688
time per event: 694 us

time per event: 694 us

 adding mpi

code for mpi in c++
#include <mpi.h>  
int rank=0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
std::cout << "rank: " << rank << std::endl;
...
MPI_Finalize();
mpic++ and ./test_calib_sim 2
ps-4.6.3 [dubrovin@sdfmilan021:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ mpic++ -O3 -o test_calib_sim test_calib_sim.cc
ps-4.6.3 [dubrovin@sdfmilan021:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ ./test_calib_sim 2
argc:2 argv[0]:./test_calib_sim
rank: 0
test_calib_simulation_mpi time for malloc: 20 us
time for random data and constants: 32500309 us

rawd: 1011 995 997 994 1004 996 1003 989 1022 985
peds: 1005.32 997.622 1005.59 1002.57 1001.05 1004.12 1006.53 989.399 1004.91 990.733
gain: 19.7067 21.0016 18.7361 19.4826 19.644 19.9969 20.044 19.4837 19.7477 23.48
mask: 1 1 1 1 1 1 1 1 1 1
events: 500 panel size:2162688
rank: 0 time per event: 1229 us

rank: 0 time per event: 1229 us

mpirun -n 4 test_calib_sim 2
ps-4.6.3 [dubrovin@sdfmilan021:~/LCLS/con-lcls2/lcls2/psana/psana/pycalgos]$ mpirun -n 4 test_calib_sim 2
argc:2 argv[0]:test_calib_sim
argc:2 argv[0]:test_calib_sim
argc:2 argv[0]:test_calib_sim
argc:2 argv[0]:test_calib_sim
rank: 0
test_calib_simulation_mpi time for malloc: 22 us
rank: 1
test_calib_simulation_mpi time for malloc: 22 us
rank: 2
test_calib_simulation_mpi time for malloc: 22 us
rank: 3
test_calib_simulation_mpi time for malloc: 24 us
time for random data and constants: 32006325 us

rawd: 1013 994 994 988 1011 987 1003 994 1004 997
peds: 996.304 1012.87 993.041 992.972 1017.44 1013.74 1017.16 997.31 1020.95 988.039
gain: 18.7863 19.9596 21.8928 16.6827 18.23 21.1276 20.088 20.9273 19.5633 22.1232
mask: 1 1 1 1 0 1 1 1 1 1
events: 500 panel size:2162688
rank: 1 time per event: 587 us
time for random data and constants: 32378824 us

rawd: 999 996 1002 997 998 1025 988 1021 1004 1004
peds: 1008.2 1010.15 999.783 999.987 1013.67 1012.62 982.788 1003.99 1006.88 1003.11
gain: 21.5295 19.4924 19.7031 21.7397 19.3176 20.5381 19.1976 18.9456 19.6323 16.7032
mask: 1 1 1 1 1 1 1 1 1 1
events: 500 panel size:2162688
time for random data and constants: 32410477 us

rawd: 996 1026 980 1002 1013 985 996 995 990 995
peds: 1017.01 998.566 1006.81 1020.24 1006.75 1010.22 1002.24 1006.99 1021.96 998.486
gain: 19.7089 18.5243 19.0248 19.4491 21.5207 19.8621 17.9884 20.2176 20.1161 19.549
mask: 1 1 1 1 1 1 1 1 1 1
events: 500 panel size:2162688
time for random data and constants: 32421671 us

rawd: 986 996 1016 993 1000 1008 1021 1006 993 1009
peds: 995.158 993.944 1001.34 1019.85 1002.45 1014.37 1000.41 1012.82 1018.91 1007.91
gain: 18.1761 21.0744 19.895 20.0998 21.2834 19.5609 19.8797 20.4085 19.8719 19.3744
mask: 1 1 1 1 1 1 1 1 1 1
events: 500 panel size:2162688
rank: 0 time per event: 1260 us
rank: 3 time per event: 1309 us
rank: 2 time per event: 1295 us

rank: 0 time per event: 1260 us

2024-10-23 tests with mpi

Code

method calib
#define PSIZE 16*352*384 // // 100000 // 2162688 = 16*352*384
#define M14 0x3fff  // 16383 or (1<<14)-1 - 14-bit mask

#define RAWD_T uint16_t
#define MASK_T uint8_t
#define GAIN_T float
#define PEDS_T float
#define REST_T float

void calib(RAWD_T* raw, MASK_T* mask, GAIN_T* gain, PEDS_T* ped, REST_T* res) {
  RAWD_T* end = raw+PSIZE;
  while (raw<end) {
    *res = ((*raw & M14) - *ped)*(*gain)*(*mask);
     raw++; ped++; gain++; mask++; res++;
  }
}
method test_calib_simulation_mpi
#define EVENTS 500

void test_calib_simulation_mpi(int argc, char* argv[])
{
  int icpu = sched_getcpu();
  std::stringstream sscpu; sscpu << "cpu-" << std::setfill('0') << std::setw(3) << std::right << icpu;
  std::string scpu = sscpu.str();

  double times_s[EVENTS];
  double durats_us[EVENTS];
  time_t t0 = time_now();

  RAWD_T* rawd = (RAWD_T*)malloc(EVENTS*PSIZE*sizeof(RAWD_T));
  MASK_T* mask = (MASK_T*)malloc(PSIZE*sizeof(MASK_T));
  REST_T* rest = (REST_T*)malloc(PSIZE*sizeof(REST_T));
  PEDS_T* peds = (PEDS_T*)malloc(PSIZE*sizeof(PEDS_T));
  GAIN_T* gain = (GAIN_T*)malloc(PSIZE*sizeof(GAIN_T));

  std::cout << scpu
	    << " test_calib_simulation_mpi time for malloc: "
	    << duration_us(time_now() - t0).count() << " us" << std::endl;

  t0 = time_now();
  standard_normal_array<RAWD_T>(1000., 10., PSIZE*EVENTS, rawd);
  standard_normal_array<PEDS_T>(1000., 10., PSIZE, peds);
  standard_normal_array<GAIN_T>(20., 1., PSIZE, gain);
  random_array_0or1<MASK_T>(0.9, PSIZE, mask);

  if (icpu == 0){
    std::cout << scpu << " time for random data and constants: "
	      << duration_us(time_now() - t0).count() << " us";
    std::cout << "\n  rawd: "; for (int i=0; i<10; i++){std::cout << rawd[i] << " ";}
    std::cout << "\n  peds: "; for (int i=0; i<10; i++){std::cout << peds[i] << " ";}
    std::cout << "\n  gain: "; for (int i=0; i<10; i++){std::cout << gain[i] << " ";}
    std::cout << "\n  mask: "; for (int i=0; i<10; i++){std::cout << unsigned(mask[i]) << " ";}
    std::cout << scpu << " events: " << std::to_string(EVENTS)
	      << " panel size:" << std::to_string(PSIZE) << std::endl;
  }

  struct timespec tbeg, tcur;
  int status = clock_gettime(CLOCK_REALTIME, &tbeg);
  time_t tt0 = time_now();

  for (int i=0; i<EVENTS; i++){
     status = clock_gettime(CLOCK_REALTIME, &tcur);
     t0 = time_now();
     calib(rawd+i*PSIZE, mask, gain, peds, rest);
     durats_us[i] = duration_us(time_now() - t0).count();
     times_s[i] = time_sec(tcur) - TOFSET;
  }

  std::cout << scpu << " time per event: " << duration_us(time_now() - tt0).count()/EVENTS << " us" << std::endl;

  //std::stringstream fname; fname << "results-" << scpu << "-v80.txt";
  std::string version = (argc>2)? argv[2] : "vXX";
  std::stringstream fname; fname << "results-" << scpu << '-' << version << ".txt";
  std::cout << "save file: " <<  fname.str() << std::endl;

  std::ofstream ofile;
  ofile.open(fname.str());

  for (int i=0; i<EVENTS; i++){
     ofile << std::setw(3) << std::right << i;
     ofile << std::fixed
               << std::setprecision(6);
     ofile << " t,s:" << std::setw(14) << times_s[i];
     ofile << " dt,us: " << std::setprecision(0) << std::setw(10) << durats_us[i] << std::endl;
  }
  ofile << '\n' << scpu << " time per event: " << duration_us(time_now() - tt0).count()/EVENTS << " us" << std::endl;
  ofile << "begin event loop time_since_epoch, sec: "
	<< std::setw(14) << std::setprecision(3)
	<< time_sec(tbeg)
        << " offset: " << TOFSET << std::endl;
  ofile.close();
}
Compile, reserve node, run with mpi
g++ -O3 -o test_calib_sim test_calib_sim.cc

srun --partition milano --account lcls:prjdat21 -n 128 --time=05:00:00 --exclusive --pty /bin/bash

mpirun -n  4 test_calib_sim 2 v04
mpirun -n  8 test_calib_sim 2 v08
mpirun -n 16 test_calib_sim 2 v16
mpirun -n 32 test_calib_sim 2 v32
mpirun -n 64 test_calib_sim 2 v64
mpirun -n 96 test_calib_sim 2 v96
mpirun -n 80 test_calib_sim 2 v80

Results

ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ mpirun -n  4 ../lcls2/psana/psana/pycalgos/test_calib_sim 2 v04

cpu-048 test_calib_simulation_mpi time for malloc: 18 us
cpu-000 time for random data and constants: 53567294 us
  rawd: 1003 1008 996 1011 997 1000 1013 997 1005 980 
  peds: 993.482 1003.47 991.595 1008.37 1000.43 1019.13 1012.39 1009.26 991.405 1006.49 
  gain: 20.3113 19.8855 21.0679 21.0502 19.5256 20.4852 19.504 21.0347 17.7494 19.3758 
  mask: 1 1 1 1 1 1 1 1 1 1 cpu-000 events: 500 panel size:2162688
cpu-000 time per event, us: 710 us
save file: results-cpu-000-v04.txt
cpu-032 time per event, us: 845 us
save file: results-cpu-032-v04.txt
cpu-025 time per event, us: 675 us
save file: results-cpu-025-v04.txt
cpu-048 time per event, us: 1080 us
save file: results-cpu-048-v04.txt

ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ tail -20 results-cpu-000-v04.txt

file with times per event
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ tail -20 results-cpu-000-v04.txt
483 t,s:  24885.980090 dt,us:        719
484 t,s:  24885.980809 dt,us:        713
485 t,s:  24885.981523 dt,us:        709
486 t,s:  24885.982233 dt,us:        717
487 t,s:  24885.982950 dt,us:        715
488 t,s:  24885.983666 dt,us:        718
489 t,s:  24885.984385 dt,us:        706
490 t,s:  24885.985092 dt,us:        713
491 t,s:  24885.985806 dt,us:        729
492 t,s:  24885.986535 dt,us:        720
493 t,s:  24885.987255 dt,us:        715
494 t,s:  24885.987971 dt,us:        708
495 t,s:  24885.988679 dt,us:        707
496 t,s:  24885.989387 dt,us:        691
497 t,s:  24885.990079 dt,us:        700
498 t,s:  24885.990779 dt,us:        704
499 t,s:  24885.991484 dt,us:        692

cpu-000 time per event, us: 710 us
begin event loop time_since_epoch, sec: 1729724885.637 offset: 1729700000


ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ mpirun -n  8 ../lcls2/psana/psana/pycalgos/test_calib_sim 2 v08

cpu-112 test_calib_simulation_mpi time for malloc: 22 us
cpu-000 time for random data and constants: 53597532 us
  rawd: 974 996 1001 1006 993 1005 1008 1002 1010 1006 
  peds: 987.812 1005.22 984.196 1008.19 993.455 987.517 1007.64 1020.04 1015.84 986.13 
  gain: 21.2333 20.5797 20.0341 21.4536 20.5128 18.5795 21.5463 20.1028 20.5724 19.9281 
  mask: 1 1 1 1 1 1 1 1 0 1 cpu-000 events: 500 panel size:2162688
cpu-025 time per event, us: 666 us
save file: results-cpu-025-v08.txt
cpu-096 time per event, us: 692 us
save file: results-cpu-096-v08.txt
cpu-080 time per event, us: 685 us
save file: results-cpu-080-v08.txt
cpu-112 time per event, us: 703 us
save file: results-cpu-112-v08.txt
cpu-032 time per event, us: 865 us
save file: results-cpu-032-v08.txt
cpu-000 time per event, us: 746 us
save file: results-cpu-000-v08.txt
cpu-079 time per event, us: 681 us
save file: results-cpu-079-v08.txt
cpu-048 time per event, us: 1124 us
save file: results-cpu-048-v08.txt


ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ mpirun -n 96 ../lcls2/psana/psana/pycalgos/test_calib_sim 2 v96

argc:3 argv[0]:../lcls2/psana/psana/pycalgos/test_calib_sim
cpu-097 test_calib_simulation_mpi time for malloc: 34 us
argc:3 argv[0]:../lcls2/psana/psana/pycalgos/test_calib_sim
cpu-124 test_calib_simulation_mpi time for malloc: 18 us
cpu-000 time for random data and constants: 60955768 us
  rawd: 1007 993 983 988 1004 988 1017 991 993 1016 
  peds: 1003.6 1005.69 991.574 989.046 1004.69 1001.21 1001.52 998.964 1010.3 1013.41 
  gain: 19.2027 18.2575 20.0576 21.4993 19.7058 18.8875 20.3433 18.5005 18.1918 19.6954 
  mask: 1 1 1 1 0 1 0 1 1 1 cpu-000 events: 500 panel size:2162688
cpu-108 time per event, us: 12330 us
save file: results-cpu-108-v96.txt
cpu-064 time per event, us: 12940 us
save file: results-cpu-064-v96.txt
cpu-071 time per event, us: 12898 us
save file: results-cpu-071-v96.txt
cpu-074 time per event, us: 14708 us
save file: results-cpu-074-v96.txt
cpu-066 time per event, us: 14627 us
save file: results-cpu-066-v96.txt
cpu-096 time per event, us: 14484 us
save file: results-cpu-096-v96.txt
cpu-079 time per event, us: 14358 us
save file: results-cpu-079-v96.txt
cpu-065 time per event, us: 15051 us
save file: results-cpu-065-v96.txt
cpu-076 time per event, us: 15297 us
save file: results-cpu-076-v96.txt
cpu-068 time per event, us: 15289 us
save file: results-cpu-068-v96.txt
cpu-067 time per event, us: 15317 us
save file: results-cpu-067-v96.txt
cpu-110 time per event, us: 15419 us
save file: results-cpu-110-v96.txt
cpu-072 time per event, us: 16174 us
save file: results-cpu-072-v96.txt
cpu-106 time per event, us: 16031 us
save file: results-cpu-106-v96.txt
cpu-101 time per event, us: 16071 us
save file: results-cpu-101-v96.txt
cpu-111 time per event, us: 16039 us
save file: results-cpu-111-v96.txt
cpu-107 time per event, us: 16107 us
save file: results-cpu-107-v96.txt
cpu-077 time per event, us: 16490 us
save file: results-cpu-077-v96.txt
cpu-075 time per event, us: 16322 us
save file: results-cpu-075-v96.txt
cpu-102 time per event, us: 16870 us
save file: results-cpu-102-v96.txt
cpu-097 time per event, us: 16886 us
save file: results-cpu-097-v96.txt
cpu-098 time per event, us: 17044 us
save file: results-cpu-098-v96.txt
cpu-103 time per event, us: 17060 us
save file: results-cpu-103-v96.txt
cpu-099 time per event, us: 17098 us
save file: results-cpu-099-v96.txt
cpu-087 time per event, us: 16991 us
save file: results-cpu-087-v96.txt
cpu-089 time per event, us: 17257 us
save file: results-cpu-089-v96.txt
cpu-094 time per event, us: 19119 us
save file: results-cpu-094-v96.txt
cpu-086 time per event, us: 19207 us
save file: results-cpu-086-v96.txt
cpu-091 time per event, us: 19111 us
save file: results-cpu-091-v96.txt
cpu-080 time per event, us: 19416 us
save file: results-cpu-080-v96.txt
cpu-090 time per event, us: 19075 us
save file: results-cpu-090-v96.txt
cpu-088 time per event, us: 19535 us
save file: results-cpu-088-v96.txt
cpu-083 time per event, us: 19476 us
save file: results-cpu-083-v96.txt
cpu-118 time per event, us: 19930 us
save file: results-cpu-118-v96.txt
cpu-113 time per event, us: 20147 us
save file: results-cpu-113-v96.txt
cpu-112 time per event, us: 20181 us
save file: results-cpu-112-v96.txt
cpu-082 time per event, us: 19911 us
save file: results-cpu-082-v96.txt
cpu-116 time per event, us: 19649 us
save file: results-cpu-116-v96.txt
cpu-081 time per event, us: 19212 us
save file: results-cpu-081-v96.txt
cpu-121 time per event, us: 20267 us
save file: results-cpu-121-v96.txt
cpu-095 time per event, us: 19337 us
save file: results-cpu-095-v96.txt
cpu-125 time per event, us: 20579 us
save file: results-cpu-125-v96.txt
cpu-126 time per event, us: 20600 us
save file: results-cpu-126-v96.txt
cpu-127 time per event, us: 20653 us
save file: results-cpu-127-v96.txt
cpu-124 time per event, us: 20103 us
save file: results-cpu-124-v96.txt
cpu-117 time per event, us: 20687 us
save file: results-cpu-117-v96.txt
cpu-115 time per event, us: 20693 us
save file: results-cpu-115-v96.txt
cpu-122 time per event, us: 20619 us
save file: results-cpu-122-v96.txt
cpu-000 time per event, us: 15578 us
save file: results-cpu-000-v96.txt
cpu-014 time per event, us: 16207 us
save file: results-cpu-014-v96.txt
cpu-048 time per event, us: 19661 us
save file: results-cpu-048-v96.txt
cpu-060 time per event, us: 19429 us
save file: results-cpu-060-v96.txt
cpu-049 time per event, us: 20020 us
save file: results-cpu-049-v96.txt
cpu-030 time per event, us: 19916 us
save file: results-cpu-030-v96.txt
cpu-009 time per event, us: 19936 us
save file: results-cpu-009-v96.txt
cpu-026 time per event, us: 19812 us
save file: results-cpu-026-v96.txt
cpu-008 time per event, us: 20140 us
save file: results-cpu-008-v96.txt
cpu-025 time per event, us: 20272 us
save file: results-cpu-025-v96.txt
cpu-028 time per event, us: 20241 us
save file: results-cpu-028-v96.txt
cpu-013 time per event, us: 20422 us
save file: results-cpu-013-v96.txt
cpu-031 time per event, us: 20138 us
save file: results-cpu-031-v96.txt
cpu-058 time per event, us: 20777 us
save file: results-cpu-058-v96.txt
cpu-027 time per event, us: 20380 us
save file: results-cpu-027-v96.txt
cpu-004 time per event, us: 20405 us
save file: results-cpu-004-v96.txt
cpu-029 time per event, us: 20460 us
save file: results-cpu-029-v96.txt
cpu-011 time per event, us: 20628 us
save file: results-cpu-011-v96.txt
cpu-015 time per event, us: 20969 us
save file: results-cpu-015-v96.txt
cpu-024 time per event, us: 19811 us
save file: results-cpu-024-v96.txt
cpu-003 time per event, us: 21001 us
save file: results-cpu-003-v96.txt
cpu-002 time per event, us: 21133 us
save file: results-cpu-002-v96.txt
cpu-007 time per event, us: 21245 us
save file: results-cpu-007-v96.txt
cpu-001 time per event, us: 21316 us
save file: results-cpu-001-v96.txt
cpu-037 time per event, us: 20374 us
save file: results-cpu-037-v96.txt
cpu-054 time per event, us: 21694 us
save file: results-cpu-054-v96.txt
cpu-050 time per event, us: 21793 us
save file: results-cpu-050-v96.txt
cpu-034 time per event, us: 22151 us
save file: results-cpu-034-v96.txt
cpu-061 time per event, us: 22809 us
save file: results-cpu-061-v96.txt
cpu-059 time per event, us: 22978 us
save file: results-cpu-059-v96.txt
cpu-056 time per event, us: 23415 us
save file: results-cpu-056-v96.txt
cpu-057 time per event, us: 23422 us
save file: results-cpu-057-v96.txt
cpu-055 time per event, us: 23598 us
save file: results-cpu-055-v96.txt
cpu-052 time per event, us: 23598 us
save file: results-cpu-052-v96.txt
cpu-047 time per event, us: 25059 us
save file: results-cpu-047-v96.txt
cpu-032 time per event, us: 24526 us
save file: results-cpu-032-v96.txt
cpu-042 time per event, us: 25774 us
save file: results-cpu-042-v96.txt
cpu-038 time per event, us: 25807 us
save file: results-cpu-038-v96.txt
cpu-043 time per event, us: 25756 us
save file: results-cpu-043-v96.txt
cpu-033 time per event, us: 25837 us
save file: results-cpu-033-v96.txt
cpu-040 time per event, us: 25531 us
save file: results-cpu-040-v96.txt
cpu-036 time per event, us: 25541 us
save file: results-cpu-036-v96.txt
cpu-046 time per event, us: 25871 us
save file: results-cpu-046-v96.txt
cpu-041 time per event, us: 24943 us
save file: results-cpu-041-v96.txt
cpu-045 time per event, us: 26104 us
save file: results-cpu-045-v96.txt
cpu-039 time per event, us: 25723 us
save file: results-cpu-039-v96.txt
cpu-044 time per event, us: 25928 us
save file: results-cpu-044-v96.txt
cpu-035 time per event, us: 25595 us
save file: results-cpu-035-v96.txt

ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ tail -20 results-cpu-035-v96.txt

file with times per event
ps-4.6.3 [dubrovin@sdfmilan026:~/LCLS/con-lcls2/2024-10-23-test-calib-mpi]$ tail -20 results-cpu-035-v96.txt
483 t,s:  25419.392541 dt,us:       6775
484 t,s:  25419.399317 dt,us:       6086
485 t,s:  25419.405404 dt,us:       6654
486 t,s:  25419.412059 dt,us:       4871
487 t,s:  25419.416931 dt,us:       4087
488 t,s:  25419.421018 dt,us:       3582
489 t,s:  25419.424601 dt,us:       3766
490 t,s:  25419.428367 dt,us:       3624
491 t,s:  25419.431992 dt,us:       3544
492 t,s:  25419.435537 dt,us:       2762
493 t,s:  25419.438299 dt,us:       2159
494 t,s:  25419.440459 dt,us:       2348
495 t,s:  25419.442807 dt,us:       1916
496 t,s:  25419.444725 dt,us:       1201
497 t,s:  25419.445926 dt,us:       1234
498 t,s:  25419.447161 dt,us:       2814
499 t,s:  25419.449976 dt,us:       1534

cpu-035 time per event, us: 25595 us
begin event loop time_since_epoch, sec: 1729725406.654 offset: 1729700000

Start-stop time

Plots show start(blue)-stop(red) time along x axis vs cpu index along y axis for mpirun with 4, 8, 16, 32, 64, 80, and 96 cpus.

Each cpu job generates its own random arrays for constants and data for 500 events and process them wiith. calib method.

Results


v04 tmed_sel: 701 us

v08 tmed_sel: 684 us

v16 tmed_sel: 1106 us

v32 tmed_sel: 5488 us

v64 tmed_sel: 28244 us

v80 tmed_sel: 13018 us

v96 tmed_sel: 26146 us


2024-10-29 tests with mpi

Code difference since 2024-10-23

  • use struct pixs - to combine constants per pixel close in memory
  • beside loop over EVENTS=100 add loop over NLOOPS=100 - to increase time for each cpu
  • reduce number of instructions for time measurement - for loop only
test_cpo.cc
// g++ -O3 -o test_cpo -g test_cpo.cc
// ../lcls2/psana/psana/pycalgos/test_cpo
// mpirun -n  4 ../lcls2/psana/psana/pycalgos/test_cpo
// mpirun -n  64 ../lcls2/psana/psana/pycalgos/test_cpo

#define NLOOPS 100
#define EVENTS 100
#define SIZE 16*352*384
#define M14 0x3fff  // 16383 or (1<<14)-1 - 14-bit mask

#include <stdint.h>
#include <stdlib.h>
#include <chrono>
#include <iostream>
#include <cstdint>  // uint8_t

void calibrate(uint16_t* raw, uint8_t* mask, float* gain, float* ped, float* result) {
  uint16_t* end = raw+SIZE;
  while (raw<end) {
    *result = ((*raw & M14) - *ped)*(*gain)*(*mask);
    raw++; ped++; gain++; mask++; result++;
  }
}

int main() {

  uint16_t* raw = (uint16_t*)malloc(EVENTS*SIZE*sizeof(uint16_t));
  uint8_t* mask = (uint8_t*)malloc(SIZE*sizeof(uint8_t));
  float* result = (float*)malloc(SIZE*sizeof(float));
  float* ped = (float*)malloc(SIZE*sizeof(float));
  float* gain = (float*)malloc(SIZE*sizeof(float));

  for (int i=0; i<EVENTS*SIZE; i++) {
    raw[i]=1234;
  }

  for (int i=0; i<SIZE; i++) {
    mask[i]=1;
    //result[i]=0.0;
    ped[i]=1233.1;
    gain[i]=1.234;
  }

  std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();

  for (int n=0; n<NLOOPS; n++){
    for (int i=0; i<EVENTS; i++){
      calibrate(raw+i*SIZE, mask, gain, ped, result);
    }
  }
  std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();

  std::cout << "NLOOPS: " << NLOOPS << " EVENTS: " << EVENTS << std::endl;
  std::cout << "Time per event = " << (std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count())/EVENTS/NLOOPS << "[us]" << std::endl;
}
calib, calib_v0
#define RAWD_T uint16_t
#define MASK_T uint8_t
#define GAIN_T float
#define PEDS_T float
#define REST_T float

struct pixstr {
  MASK_T mask;
  PEDS_T ped;
  GAIN_T gain;
  REST_T rest;
};

void calib_v0(RAWD_T* raw, MASK_T* mask, GAIN_T* gain, PEDS_T* ped, REST_T* res) {
  RAWD_T* end = raw+PSIZE;
  while (raw<end) {
    *res = ((*raw & M14) - *ped)*(*gain)*(*mask);
     raw++; ped++; gain++; mask++; res++;
  }
}
  void calib(RAWD_T* raw, pixstr* pu) {
  RAWD_T* end = raw+PSIZE;
  while (raw<end) {
    (*pu).rest = ((*raw & M14) - (*pu).ped)*((*pu).gain)*((*pu).mask);
     raw++; pu++;
  }
}


void calib(RAWD_T* raw, pixstr* pu) {
  RAWD_T* end = raw+PSIZE;
  while (raw<end) {
    pu->rest = ((*raw & M14) - (pu->ped)) * (pu->gain) * (pu->mask);
     raw++; pu++;
  }
}

test_calib_simulation_mpi
#define PSIZE 16*352*384 // // 100000 // 2162688 = 16*352*384
#define NLOOPS 100
#define EVENTS 100

void test_calib_simulation_mpi(int argc, char* argv[])
{
  int icpu = sched_getcpu();
  std::stringstream sscpu; sscpu << "cpu-" << std::setfill('0') << std::setw(3) << std::right << icpu;
  std::string scpu = sscpu.str();

  time_t t0 = time_now();

  RAWD_T* rawd = (RAWD_T*)malloc(EVENTS*PSIZE*sizeof(RAWD_T));
  MASK_T* mask = (MASK_T*)malloc(PSIZE*sizeof(MASK_T));
  REST_T* rest = (REST_T*)malloc(PSIZE*sizeof(REST_T));
  PEDS_T* peds = (PEDS_T*)malloc(PSIZE*sizeof(PEDS_T));
  GAIN_T* gain = (GAIN_T*)malloc(PSIZE*sizeof(GAIN_T));
  pixstr* pixs = (pixstr*)malloc(PSIZE*sizeof(pixstr));

  std::cout << scpu
	    << " test_calib_simulation_mpi time for malloc: "
	    << duration_us(time_now() - t0).count() << " us" << std::endl;

  t0 = time_now();
  standard_normal_array<RAWD_T>(1000., 10., PSIZE*EVENTS, rawd);
  standard_normal_array<PEDS_T>(1000., 10., PSIZE, peds);
  standard_normal_array<GAIN_T>(20., 1., PSIZE, gain);
  random_array_0or1<MASK_T>(0.9, PSIZE, mask);

  for (int i=0; i<PSIZE; i++){
    pixs[i].mask = mask[i];
    pixs[i].ped  = peds[i];
    pixs[i].gain = gain[i];
    pixs[i].rest = rest[i];
  }

  if (icpu == 0){
    std::cout << scpu << " time for random data and constants: "
	      << duration_us(time_now() - t0).count() << " us";
    std::cout << "\n  rawd: "; for (int i=0; i<10; i++){std::cout << rawd[i] << ' ';}
    std::cout << "\n  peds: "; for (int i=0; i<10; i++){std::cout << peds[i] << ' ';}
    std::cout << "\n  gain: "; for (int i=0; i<10; i++){std::cout << gain[i] << ' ';}
    std::cout << "\n  mask: "; for (int i=0; i<10; i++){std::cout << unsigned(mask[i]) << ' ';}
    std::cout << "\n  " << scpu << " events: " << std::to_string(EVENTS)
	      << " panel size:" << std::to_string(PSIZE) << std::endl;
  }

  double times_s[NLOOPS];
  double durats_us[NLOOPS];

  struct timespec tbeg, tcur;
  int status = clock_gettime(CLOCK_REALTIME, &tbeg);
  time_t tt0 = time_now();

  for (int n=0; n<NLOOPS; n++){
    status = clock_gettime(CLOCK_REALTIME, &tcur);
    t0 = time_now();

    for (int i=0; i<EVENTS; i++){
      //calib_v0(rawd+i*PSIZE, mask, gain, peds, rest);
      calib(rawd+i*PSIZE, pixs);
    }

    durats_us[n] = duration_us(time_now() - t0).count() / EVENTS;
    times_s[n] = time_sec(tcur) - TOFSET;
  }

  int time_per_event_us = duration_us(time_now() - tt0).count() / EVENTS / NLOOPS;

  std::cout << scpu << " NLOOPS: " << NLOOPS << " EVENTS: " << EVENTS << std::endl;
  std::cout << scpu << " time per event: " << time_per_event_us << " us" << std::endl;

  //std::stringstream fname; fname << "results-" << scpu << "-v80.txt";
  std::string version = (argc>2)? argv[2] : "vXX";
  std::stringstream fname; fname << "results-" << scpu << '-' << version << ".txt";
  std::cout << "save file: " <<  fname.str() << std::endl;

  std::ofstream ofile;
  ofile.open(fname.str());

  for (int i=0; i<NLOOPS; i++){
     ofile << std::setw(3) << std::right << i;
     ofile << std::fixed
               << std::setprecision(6);
     ofile << " t,s:" << std::setw(14) << times_s[i];
     ofile << " dt,us: " << std::setprecision(0) << std::setw(10) << durats_us[i] << std::endl;
  }
  ofile << '\n' << scpu << " time per event: " << time_per_event_us << " us" << std::endl;
  ofile << "begin event loop time_since_epoch, sec: "
	<< std::setw(14) << std::setprecision(3)
	<< time_sec(tbeg)
        << " offset: " << TOFSET << std::endl;
  ofile.close();
} 


Start-stop time

calib_v0 - WITHOUT struct

calib - WITH struct

Results

results for median per CPU and total mediann
WITHOUT struct

ps-4.6.3 [dubrovin@sdfiana004:~/LCLS/con-lcls2/2024-10-29-test-calib-mpi]$ ../lcls2/psana/psana/pycalgos/test_calib_sim_proc.py 0
[655. 662. 738. 684.]
vers-v04 tmed_sel: 673.000 us

[686. 660. 691. 698. 683. 691. 662. 651.]
vers-v08 tmed_sel: 684.500 us

[ 902.   867.  1270.5 1270.   887.5  901.   891.   894.   701.   714.5
  713.   696.   707.   741.   719.   679. ]
vers-v16 tmed_sel: 804.000 us

[2147.  1774.  1667.  2138.  2761.5 2882.  3215.5 3049.  4044.5 3792.5
 3293.5 3722.5 3536.5 2921.5 4321.5 2914.  3710.  3673.  3462.  3847.5
 3201.  3455.5 3699.  3333.5 5042.5 1712.  3093.5 1417.  2212.  2989.5
 2979.  2367.5]
vers-v32 tmed_sel: 3147.250 us

[6093.5 5976.  3803.  5847.  3830.  5775.5 6039.  5991.  9122.5 9182.5
 8612.5 9522.5 9444.  9625.  8601.5 9349.  5564.5 4952.5 4452.  4910.
 4798.5 5558.5 5180.  4985.5 4278.5 6675.5 4751.5 6734.  6782.5 4611.5
 6559.  3879.  5412.  4424.  5790.5 7023.  5237.  5033.5 4383.5 4814.5
 6688.5 3735.  6699.  3070.  3628.5 4030.  6482.  6734.5 5849.  6543.5
 6387.  5711.  6133.  6411.5 6160.  5054.  5919.5 5927.  5932.  5731.
 5743.5 5835.  5358.  5631.5]
vers-v64 tmed_sel: 5812.750 us

[ 9114.5  8477.   9651.5  9360.   9396.   9147.   8780.   8705.5  9522.
 10188.5 10050.5  9569.5  9256.   9469.   9528.   9410.   9777.5  9642.
 12993.5 15383.  13701.5 13568.  14442.  14046.  13572.  14404.  12163.5
 14579.  12906.5 12162.5 14432.5 13915.5 12282.  14164.  13221.5 14284.
 13257.  14616.5  8478.5 13870.5  9972.  10462.5  9499.5 11857.5 12074.5
  9767.5 12130.  10941.   9896.  12578.5  8909.  10594.5 11424.  11449.5
  6321.5 11541.  10492.5  5870.  11554.   8126.5  9913.5  9754.  12638.5
 14634.5 13241.5  8302.   7723.5 13560.  10189.  14148.  10893.5  3873.5
  9580.   6394.5 10004.5 11660.5 13060.5  5162.  11885.   5410.5]
vers-v80 tmed_sel: 10543.500 us

[16550.5 12927.  11448.5 11659.5 10667.5 10602.5 16205.5 11793.5 10502.5
 11115.5 10150.5  9693.  19591.  19411.  18271.5 19618.  17706.  18302.5
 20111.  18211.  17372.5 17029.5 16757.  17514.5 16426.  16444.5 16154.5
 16320.  16239.  17003.5 16431.  17180.  17514.  15808.5 15936.5 16020.5
 11964.5 12068.  11431.  11541.5 11257.5 10805.  11759.  10783.5 11748.5
 11533.  10559.5 11501.5  8935.  12518.5 12722.5 12491.  12445.  12700.
 13166.5 12859.5 12760.  12943.  13120.   9103.  15802.  16433.  14994.5
 15505.5 16394.  16506.  15839.5 16057.  15575.5 15631.5 15916.5 16261.5
 14161.5 15411.5 15484.  13775.  11333.  14175.5 15346.5 14014.5 10677.
 13059.5 14200.5 13001.  10216.5 10119.  10138.   9900.5  9263.5  9298.
 10481.5 11008.  10017.5 10170.   9302.5  9964.5]
vers-v96 tmed_sel: 13089.750 us


WITH struct

ps-4.6.3 [dubrovin@sdfiana004:~/LCLS/con-lcls2/2024-10-29-test-calib-mpi]$ ../lcls2/psana/psana/pycalgos/test_calib_sim_proc.py 10
[2170.  2146.  2129.5 2138.5]
vers-s04 tmed_sel: 2142.250 us

[2173.  2180.  2130.  2145.  2171.5 2169.  2142.5 2145. ]
vers-s08 tmed_sel: 2157.000 us

[2288.  2262.  3026.  3052.  2269.  2259.  2251.  2254.  2295.  2253.
 2254.5 2268.  2245.5 2260.  2265.  2267. ]
vers-s16 tmed_sel: 2263.500 us

[4780.  4796.5 4774.  4776.5 6258.  6284.5 6300.  6322.  4774.  4765.5
 4777.  4805.  4820.  4807.  4802.  4819.  4825.5 4812.  4836.  4848.5
 4802.  4818.5 4824.5 4847.5 4801.  4804.  4831.5 4839.5 4817.  4864.
 4801.  4853. ]
vers-s32 tmed_sel: 4817.750 us

[ 8306.5 13282.  13358.  13412.   8195.5 13425.5 13383.  13496.5 14567.
 14525.  14469.5 14516.  14545.5 14480.5 14576.  14583.  24729.5 24777.5
 24789.  24797.  24728.5 24867.5 24867.5 24870.5 13074.  25406.  25570.
 13590.  13116.5 20931.5 11182.5 21530.   9274.5 12976.  13194.  13424.
  9441.  13224.5 13195.  13220.  12862.  10347.5 12876.5 13279.  10044.
 13051.  13128.  13150.5 13344.   8406.  13375.  13474.  13356.5 13399.5
  8500.  13495.5 13071.5  9189.5 13307.5 13299.   9350.  12196.5 13278.5
 13198. ]
vers-s64 tmed_sel: 13350.250 us


[25933.  25913.  25879.  25939.5 25960.  25881.  25949.5 26157.  25972.5
 26011.5 36925.5 37028.  37053.5 37542.  36622.5 36928.5 37571.5 37509.5
 40781.  40651.5 41444.5 41419.5 41470.  41375.  41499.  41538.  41598.
 41509.5 40990.  41614.  18383.5 35410.  36254.5 36407.5 36343.5 28216.5
 20603.5 35878.5 36097.  35472.5 19491.5 34830.  34842.  17331.5 35183.
 35044.5 17942.5 35293.  20891.5 17194.5 16066.  11550.5 16953.5 17074.5
 17227.  10389.  17858.  17958.5 18111.5 17971.5 17158.   9726.  17269.
 17258.  17291.5 17841.  17830.   9633.  17984.5 17945.  17858.  25352.
 24370.5 21753.  25108.5 24745.5 27712.  25151.5 27495.5 27630. ]
vers-s80 tmed_sel: 25954.750 us

[ 46949.5  47818.   45682.   14672.5   4791.   10277.   44990.   46062.5
  16817.5  47038.    9210.5   4779.5  44849.5  45122.   44938.   45048.
  22662.5  44665.5  44882.5  24278.5  73613.   60302.5  30383.5  33713.5
  27477.5  30422.   32844.5  32522.   54694.   37028.   27972.   34785.5
  35219.5  36218.5  36685.   70136.   35885.5  33949.5  15809.5  35992.
  36394.5   2899.5   3518.5  29984.   36764.    3561.5  36717.5  36960.
 114616.5 116601.  117007.5 116855.  116884.  116603.  116627.  116781.5
 116992.5 116968.5 117192.  117397.   64190.   63651.   64902.   64559.
  64347.5  65072.5  71813.   64748.   64059.5  64350.5  64884.   64547.5
  31874.   31778.5  31823.   31835.   31880.5  31893.   31790.   31925.5
  31878.   31894.5  31958.   31980.   68739.   68802.   68627.5  68811.
  68832.   68890.   69332.   69280.5  69016.   69531.   65626.5  68219. ]
vers-s96 tmed_sel: 44964.000 us


median time per event, μs on sdfmilan011

  1. of CPUs

WITHOUT struct , μs

max over cpus

WITH struct , μs

max over cpus

←the same, but (*pu).ped

(pu→ped)

on sdfmilan122

min-max over cpus

test-scaling-mpi-epix10ka.py 50

arrf = ((rawa[i,:] & M14) - peds) * gain * mask

min-max over cpus

Old test w/o nloops:

test-scaling-mpi-epix10ka.py 85

pytonized/cytonized/c++ calib

Old test w/o nloops:

test-scaling-mpi-epix10ka.py 81

arrf = ((raw & M14) - peds)*gain

arrf = np.select((mask>0,), (arrf,), default=0)

test_cpo780, 991, 711sdfmilan011704, 600, 602 on sdfmilan122sdfmilan122sdfmilan011sdfmilan011
mpirun -n  64 test_cpo3179, 3325, 3402,... 8819, 8828, 8839











1 NO MPI - test_calib_sim 2 v01689, 686, 681           <- in 3 jobs ->2194, 2207, 22162116, 2122, 21153310, 3422638, 645,620 )*7380
467321422083-21142488-3524650 )**8219
868421572075-21302847-36886458257
1680422642165-29665220-91066309221
32314748184565-623312598-27739403511892
645812133506742-2215710900-28674803921043, 32261
8010543259559933-4939312038-75235959842770, 52406
96130904496414471-7319618341-3349910471

36673, 26515

ud.calib_std

)* ps-4.6.3 [dubrovin@sdfmilan011:~/LCLS/con-lcls2/2024-10-29-test-calib-mpi]$ ../lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 85

)**  ps-4.6.3 [dubrovin@sdfmilan011:~/LCLS/con-lcls2/2024-10-29-test-calib-mpi]$ mpirun -n 4 ../lcls2/psana/psana/detector/testman/test-scaling-mpi-epix10ka.py 85



cytonized calib method
def calib_std(raw, peds, gain, mask, databits, out):
    """assume that all numpy arrays have the same shape"""
    t0_sec = time()
    dt_us_cpp = udext.cy_calib_std(raw.ravel(), peds.ravel(), gain.ravel(), mask.ravel(), raw.size, databits, out.ravel())
    return dt_us_cpp, (time()-t0_sec)*1e6

Summary

  • timing per event are consistent between test_cpo and test_calib_sim for single core processing ~0.7ms
  • scaling is pure for test_cpo between 1cpu and mpirun -n64:  0.7 → 10x0.7
  • using struct - decrease calib performance x3
  • scalabilipy in mpirun -n## is poor for both WITH/OUT struct  

References



  • No labels