Page History

...

Below is the blocking MPI performance for a comparison with improvements seen in the following sections:

	eb=1		eb=2		eb=4		eb=8		eb=16
TASK	total(ms)	#occurs	total(ms)	#occurs	total(ms)	#occurs	total(ms)	#occurs	total(ms)	#occurs
SMD0GOTCHUNK	2028.15	1086	2009.68	1086	1981.5	1086	1992.47	1086	1995.75	1086
SMD0GOTEB	575.47	1087	45.39	1087	45.85	1087	45.11	1087	46.17	1087
SMD0GOTREPACK	264.1	1087	298.13	1087	272.72	1087	298.47	1087	279.94	1087
SMD0DONEWITHEB	3535.77	1087	5023.36	1087	4849.88	1087	4972.46	1087	4895.14	1087
SMD0GOTSTEPHIST	64.02	1087	60.18	1087	63.05	1087	64.49	1087	69.19	1087
SMD0GOTSTEP	85.66	1087	84.72	1087	86.01	1087	83.84	1087	86.55	1087
total:	6553.16	6553.16	7521.47	7521.47	7299.01	7299.01	7456.84	7456.84	7372.74	7372.74
rate (MHz)	1.53		1.33		1.37		1.34		1.36

Overlapping with Send

By replacing Send with Isend. We allow Smd0 to move on after initiating send command to an eventbuilder core. With this overlap, we see that the total wall time improves from 7.4 to 4.4 seconds with 16 eventbuilder cores.

	eb=1		eb=2		eb=4		eb=8		eb=16
TASK	total(ms)	#occurs	total(ms)	#occurs	total(ms)	#occurs	total(ms)	#occurs	total(ms)	#occurs
SMD0GOTCHUNK

1964

1995.

37

07

1086

2035

1993.

83

69

1086

2015

1975.

74

4

1086

1992

1964.

94

6

1086

2004

1983.

79

13	1086
SMD0GOTEB

5695

5841.

49

78

1087

2800

2779.48

1087

1748

1964.

11

36

1087

1676

1916.

01

66

1087

1619

1832.

14

13	1087
SMD0GOTREPACK

244

297.

85

86

1087

212

258.

7

33

1087

235

295.

5

82

1087

198

306.

57

37

1087

186

345.

95

58	1087
SMD0DONEWITHEB

48

57.

9

76

1087

50

61.

04

78

1087

52

62.

74

37

1087

53

61.

23

95

1087

51

60.

64

76	1087
SMD0GOTSTEPHIST

76

78.

27

34

1087

79

85.

68

66

1087

83

87.

65

96

1087

83

80.

27

87

1087

82

81.

98

09	1087
SMD0GOTSTEP

87

86.

37

38

1087

86

89.

9

5

1087

90

91.

34

87

1087

92

89.

26

96

1087

88.

62

43	1087
total:

8117

8357.

26

18

8117

8357.

26

18

5265

5268.

15

44

5265

5268.

15

44

4226

4477.

07

77

4226

4477.

07

77

4096

4420.

28

42

4096

4420.

28

42

4034

4391.

12

13

4034

4391.

12

13
rate (MHz)	1.

23

20

1.90

2.

37

23

2.

44

26

2.

48

28

Conclusions/ Known Issues

We gain some performance by overlapping Send with other computation tasks. However, this code with (Isend/ Irecv) crashes with the current real experiment data (tmoc00118, run=463). We need to investigate this issue before continuing this work.

In additional to overlapping send, we can also perform computational tasks while Smd0 wait for an eventbuilder core to come back (Irecv).

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Overlapping with Send

Conclusions/ Known Issues