Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Below is the blocking MPI performance for a comparison with improvements seen in the following sections:


eb=1eb=2eb=4eb=8eb=16
TASKtotal(ms)#occurstotal(ms)#occurstotal(ms)#occurstotal(ms)#occurstotal(ms)#occurs
SMD0GOTCHUNK2028.1510862009.6810861981.510861992.4710861995.751086
SMD0GOTEB575.47108745.39108745.85108745.11108746.171087
SMD0GOTREPACK264.11087298.131087272.721087298.471087279.941087
SMD0DONEWITHEB3535.7710875023.3610874849.8810874972.4610874895.141087
SMD0GOTSTEPHIST64.02108760.18108763.05108764.49108769.191087
SMD0GOTSTEP85.66108784.72108786.01108783.84108786.551087
total:6553.166553.167521.477521.477299.017299.017456.847456.847372.747372.74
rate (MHz)1.53
1.33
1.37
1.34
1.36

Overlapping with Send

By replacing Send with Isend. We allow Smd0 to move on after initiating send command to an eventbuilder core. With this overlap, we see that the total wall time improves from 7.4 to 4.4 seconds with 16 eventbuilder cores.


eb=1eb=2eb=4eb=8eb=16
TASKtotal(ms)#occurstotal(ms)#occurstotal(ms)#occurstotal(ms)#occurstotal(ms)#occurs
SMD0GOTCHUNK
1964
1995.
37
071086
2035
1993.
83
691086
2015
1975.
74
41086
1992
1964.
94
61086
2004
1983.
79
131086
SMD0GOTEB
5695
5841.
49
781087
2800
2779.481087
1748
1964.
11
361087
1676
1916.
01
661087
1619
1832.
14
131087
SMD0GOTREPACK
244
297.
85
861087
212
258.
7
331087
235
295.
5
821087
198
306.
57
371087
186
345.
95
581087
SMD0DONEWITHEB
48
57.
9
761087
50
61.
04
781087
52
62.
74
371087
53
61.
23
951087
51
60.
64
761087
SMD0GOTSTEPHIST
76
78.
27
341087
79
85.
68
661087
83
87.
65
961087
83
80.
27
871087
82
81.
98
091087
SMD0GOTSTEP
87
86.
37
381087
86
89.
9
51087
90
91.
34
871087
92
89.
26
96108788.
62
431087
total:
8117
8357.
26
18
8117
8357.
26
18
5265
5268.
15
44
5265
5268.
15
44
4226
4477.
07
77
4226
4477.
07
77
4096
4420.
28
42
4096
4420.
28
42
4034
4391.
12
13
4034
4391.
12
13
rate (MHz)1.
23
20
1.90
2.
37
23
2.
44
26
2.
48
28

Conclusions/ Known Issues

We gain some performance by overlapping Send with other computation tasks. However, this code with (Isend/ Irecv) crashes with the current real experiment data (tmoc00118, run=463). We need to investigate this issue before continuing this work. 

In additional to overlapping send, we can also perform computational tasks while Smd0 wait for an eventbuilder core to come back (Irecv).