Page History
...
2024-02-06 Test of milano216 host with perf
...
Description
Using command:
perf stat -e cache-references,cache-misses,cycles,instructions,branches,branch-misses,faults,migrations,page-faults,L1-dcache-load-misses,L1-icache-load-misses python test-scaling-subproc.py <parameter>
...
2024-02-07 Test of milano216 host with command perf
...
Description
Running perf with mpirun on a single and 80 CPUs:
...
perf stat -e cache-references,cache-misses,cycles,instructions,branches,branch-misses,faults,migrations,page-faults,L1-dcache-load-misses,L1-icache-load-misses,dTLB-load-misses,iTLB-load-misses mpirun -n 80 python Detector/examples/test-scaling-mpi.py
Code Block | ||||
---|---|---|---|---|
| ||||
import numpy as np
from time import time
def random_standard(shape=(40,60), mu=200, sigma=25, dtype=np.float64):
a = mu + sigma*np.random.standard_normal(shape)
return np.require(a, dtype)
def random_arrays(sh2d = (8*512,1024), dtype=np.float64):
sh3d = (3,) + sh2d
return random_standard(shape=sh2d, mu=10, sigma=2, dtype=dtype),\
random_standard(shape=sh3d, mu=20, sigma=3, dtype=dtype)
def time_consuming_algorithm():
a, b = random_arrays()
gr1 = a>=11
gr2 = (a>9) & (a<11)
gr3 = a<=9
t0_sec = time()
a[gr1] -= b[0, gr1]
a[gr2] -= b[1, gr2]
a[gr3] -= b[2, gr3]
return time() - t0_sec |
Code Block | ||||
---|---|---|---|---|
| ||||
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
hostname = get_hostname()
cpu_num = psutil.Process().cpu_num()
print('rank:%02d cpu_num:%03d size:%02d' % (rank, cpu_num, size))
ranks = (0, 10, 20, 30, 40, 50, 60, 70)
SAVE_FIGS = True
SHOW_FIGS = False
nevents = 100
arrts = np.zeros((nevents, size), dtype=np.float64)
for nevt in range(nevents):
dt_sec = time_consuming_algorithm()
arrts[nevt,rank] = dt_sec # dt_sec = time()-t0_sec
cpu_num = psutil.Process().cpu_num()
if cpu_num >=16 and cpu_num <=23:
print('rank:%02d cpu_num:%03d nevt:%03d time:%.6f CPU_NUM IN WEKA RANGE [16,23]' % (rank, cpu_num, nevt, dt_sec))
if nevt%10>0: continue
print('rank:%02d cpu_num:%03d nevt:%03d time:%.6f' % (rank, cpu_num, nevt, dt_sec))
...
somme graphics for array arrts |
Results
Code Block | ||||
---|---|---|---|---|
| ||||
ana-4.0.59-py3 [dubrovin@sdfmilan216:~/LCLS/con-py3]$ perf stat -e cache-references,cache-misses,cycles,instructions,branches,branch-misses,faults,migrations,page-faults,L1-dcache-load-misses,L1-icache-load-misses,dTLB-load-misses,iTLB-load-misses mpirun -n 1 python Detector/examples/test-scaling-mpi.py ... Performance counter stats for 'mpirun -n 1 python Detector/examples/test-scaling-mpi.py': 4,448,830,552 cache-references:u (50.00%) 90,374,312 cache-misses:u # 2.031 % of all cache refs (50.00%) 222,814,516,280 cycles:u (50.02%) 426,700,282,993 instructions:u # 1.92 insn per cycle (50.01%) 58,876,394,584 branches:u (50.01%) 2,343,687,188 branch-misses:u # 3.98% of all branches (50.01%) 635,183 faults:u 0 migrations:u 635,183 page-faults:u 2,158,358,417 L1-dcache-load-misses:u (50.00%) 5,694,036 L1-icache-load-misses:u (49.99%) 4,282,821 dTLB-load-misses:u (49.99%) 890,671 iTLB-load-misses:u (50.00%) 73.297275789 seconds time elapsed 69.795728000 seconds user 2.318007000 seconds sys ana-4.0.59-py3 [dubrovin@sdfmilan216:~/LCLS/con-py3]$ perf stat -e cache-references,cache-misses,cycles,instructions,branches,branch-misses,faults,migrations,page-faults,L1-dcache-load-misses,L1-icache-load-misses,dTLB-load-misses,iTLB-load-misses mpirun -n 80 python Detector/examples/test-scaling-mpi.py ... Performance counter stats for 'mpirun -n 80 python Detector/examples/test-scaling-mpi.py': 349,526,509,383 cache-references:u (50.01%) 5,932,480,814 cache-misses:u # 1.697 % of all cache refs (50.00%) 18,768,444,974,036 cycles:u (50.00%) 33,983,153,714,284 instructions:u # 1.81 insn per cycle (49.99%) 4,684,730,635,234 branches:u (49.99%) 186,649,297,019 branch-misses:u # 3.98% of all branches (50.00%) 52,121,421 faults:u 0 migrations:u 52,121,421 page-faults:u 171,500,392,922 L1-dcache-load-misses:u (50.00%) 267,672,856 L1-icache-load-misses:u (50.00%) 339,145,247 dTLB-load-misses:u (50.01%) 69,780,394 iTLB-load-misses:u (50.01%) 92.952500273 seconds time elapsed 6501.353593000 seconds user 410.844719000 seconds sys |
...
Overview
Content Tools