Content
previous page: Scaling behavior of psana1 det.calib method in multicore processing with mpi
2024-02-06 Test of milano host with command perf at heavy loading
Test description
1 CPU ====== Performance counter stats for 'python test-scaling-subproc.py -1': 4,522,410,200 cache-references:u (62.49%) 112,207,635 cache-misses:u # 2.481 % of all cache refs (62.51%) 224,402,878,245 cycles:u (62.51%) 428,582,543,872 instructions:u # 1.91 insn per cycle (62.51%) 59,430,436,824 branches:u (62.50%) 2,353,206,592 branch-misses:u # 3.96% of all branches (62.50%) 657,277 faults:u 0 migrations:u 657,277 page-faults:u 2,169,783,808 L1-dcache-load-misses:u (62.50%) 7,173,374 L1-icache-load-misses:u (62.50%) 70.762930452 seconds time elapsed 66.918003000 seconds user 2.380196000 seconds sys 8 CPU ====== Performance counter stats for 'python test-scaling-subproc.py -2': 35,293,654,947 cache-references:u (62.50%) 675,772,563 cache-misses:u # 1.915 % of all cache refs (62.50%) 1,863,835,416,629 cycles:u (62.50%) 3,408,694,078,315 instructions:u # 1.83 insn per cycle (62.50%) 470,729,321,611 branches:u (62.50%) 18,710,029,709 branch-misses:u # 3.97% of all branches (62.50%) 4,759,204 faults:u 0 migrations:u 4,759,204 page-faults:u 17,164,781,068 L1-dcache-load-misses:u (62.50%) 42,407,266 L1-icache-load-misses:u (62.50%) 82.107165073 seconds time elapsed 600.726489000 seconds user 28.169314000 seconds sys 16 CPU ====== Performance counter stats for 'python test-scaling-subproc.py -8': 71,125,012,043 cache-references:u (62.50%) 2,509,743,885 cache-misses:u # 3.529 % of all cache refs (62.50%) 4,256,512,072,612 cycles:u (62.50%) 6,815,210,853,848 instructions:u # 1.60 insn per cycle (62.50%) 940,797,592,651 branches:u (62.50%) 37,401,077,277 branch-misses:u # 3.98% of all branches (62.50%) 9,874,603 faults:u 0 migrations:u 9,874,603 page-faults:u 34,764,585,133 L1-dcache-load-misses:u (62.50%) 82,908,203 L1-icache-load-misses:u (62.50%) 98.180409648 seconds time elapsed 1370.175346000 seconds user 121.864448000 seconds sys 32 CPU ====== Performance counter stats for 'python test-scaling-subproc.py -13': 140,229,421,945 cache-references:u (62.50%) 5,022,345,750 cache-misses:u # 3.582 % of all cache refs (62.50%) 8,558,410,936,114 cycles:u (62.50%) 13,628,360,184,584 instructions:u # 1.59 insn per cycle (62.50%) 1,881,291,550,548 branches:u (62.50%) 74,783,808,615 branch-misses:u # 3.98% of all branches (62.50%) 19,579,143 faults:u 0 migrations:u 19,579,143 page-faults:u 68,615,480,748 L1-dcache-load-misses:u (62.50%) 163,094,161 L1-icache-load-misses:u (62.50%) 99.279801084 seconds time elapsed 2763.979749000 seconds user 246.852789000 seconds sys 56 CPU ====== Performance counter stats for 'python test-scaling-subproc.py -16': 245,664,589,385 cache-references:u (62.50%) 5,986,128,102 cache-misses:u # 2.437 % of all cache refs (62.50%) 13,462,198,820,573 cycles:u (62.50%) 23,847,765,747,744 instructions:u # 1.77 insn per cycle (62.50%) 3,290,927,488,525 branches:u (62.50%) 130,897,170,304 branch-misses:u # 3.98% of all branches (62.50%) 35,494,247 faults:u 0 migrations:u 35,494,247 page-faults:u 119,933,873,577 L1-dcache-load-misses:u (62.50%) 288,403,921 L1-icache-load-misses:u (62.50%) 108.453630713 seconds time elapsed 5381.177612000 seconds user 333.903330000 seconds sys ana-4.0.59-py3 [dubrovin@sdfmilan216:~/LCLS/con-py3]$ 64 CPU ====== Performance counter stats for 'python test-scaling-subproc.py -17': 281,639,175,978 cache-references:u (62.50%) 8,968,404,974 cache-misses:u # 3.184 % of all cache refs (62.50%) 16,140,364,752,053 cycles:u (62.50%) 27,256,133,511,829 instructions:u # 1.69 insn per cycle (62.50%) 3,761,710,111,186 branches:u (62.50%) 149,569,155,086 branch-misses:u # 3.98% of all branches (62.50%) 39,148,442 faults:u 0 migrations:u 39,148,442 page-faults:u 137,584,278,754 L1-dcache-load-misses:u (62.50%) 330,750,296 L1-icache-load-misses:u (62.50%) 120.688547006 seconds time elapsed 6274.688233000 seconds user 484.406164000 seconds sys 120 CPU ======= Performance counter stats for 'python test-scaling-subproc.py -18': 532,229,037,371 cache-references:u (62.50%) 14,227,944,434 cache-misses:u # 2.673 % of all cache refs (62.50%) 29,404,359,241,173 cycles:u (62.50%) 51,095,884,028,391 instructions:u # 1.74 insn per cycle (62.50%) 7,053,547,766,317 branches:u (62.50%) 280,479,284,507 branch-misses:u # 3.98% of all branches (62.50%) 73,250,012 faults:u 0 migrations:u 73,250,012 page-faults:u 260,078,672,869 L1-dcache-load-misses:u (62.50%) 618,858,635 L1-icache-load-misses:u (62.50%) 119.736692035 seconds time elapsed 11628.275939000 seconds user 843.423292000 seconds sys
Summary
number of CPU | cache- references | cache- misses | cycles | instructions | branches | branch- misses | faults | page-faults | L1-dcache- load-misses | L1-icache- load-misses | cmt |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 4,522,410,200 | ||||||||||
8 | 35,293,654,947 | ||||||||||
16 | 71,125,012,043 | ||||||||||
32 | 140,229,421,945 | ||||||||||
56 | 245,664,589,385 | ||||||||||
64 | 281,639,175,978 | ||||||||||
120 | 532,229,037,371 | 14,227,944,434 | 29,404,359,241,173 | 51,095,884,028,391 | 7,053,547,766,317 | 280,479,284,507 | 73,250,012 | 73,250,012 | 137,584,278,754 | 618,858,635 | |
Command perf with 5sec accumulation time submitted in subprocess one by one in loop, response parameters parsed in dict and preserved in the list. In total, 100 loops are executed for 500 sec. After loop 10 and 50 (twise) host is loaded by the "simulated time consuming job" loading different number of CPUs, which is running for about 2 min or ~20 loops. At the end of the loop plots and table of parameters vs time are generated.
Results
References
Overview
Content Tools