Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Below is the blocking MPI performance for a comparison with improvements seen in the following sections:


eb=1eb=2eb=4eb=8eb=16
TASKtotal(ms)#occurstotal(ms)#occurstotal(ms)#occurstotal(ms)#occurstotal(ms)#occurs
SMD0GOTCHUNK2028.1510862009.6810861981.510861992.4710861995.751086
SMD0GOTEB575.47108745.39108745.85108745.11108746.171087
SMD0GOTREPACK264.11087298.131087272.721087298.471087279.941087
SMD0DONEWITHEB3535.7710875023.3610874849.8810874972.4610874895.141087
SMD0GOTSTEPHIST64.02108760.18108763.05108764.49108769.191087
SMD0GOTSTEP85.66108784.72108786.01108783.84108786.551087
total:6553.166553.167521.477521.477299.017299.017456.847456.847372.747372.74
rate (MHz)1.53
1.33
1.37
1.34
1.36

Overlapping with Send

By replacing Send with Isend. We allow Smd0 to move on after initiating send command to an eventbuilder core. With this overlap, we see that the total wall time improves from 7.4 to 4 seconds with 16 eventbuilder cores.


eb=1eb=2eb=4eb=8eb=16
TASKtotal(ms)#occurstotal(ms)#occurstotal(ms)#occurstotal(ms)#occurstotal(ms)#occurs
SMD0GOTCHUNK1964.3710862035.8310862015.7410861992.9410862004.791086
SMD0GOTEB5695.491087280010871748.1110871676.0110871619.141087
SMD0GOTREPACK244.851087212.71087235.51087198.571087186.951087
SMD0DONEWITHEB48.9108750.04108752.74108753.23108751.641087
SMD0GOTSTEPHIST76.27108779.68108783.65108783.27108782.981087
SMD0GOTSTEP87.37108786.9108790.34108792.26108788.621087
total:8117.268117.265265.155265.154226.074226.074096.284096.284034.124034.12
rate (MHz)1.23
1.90
2.37
2.44
2.48

Conclusions/ Known Issues

We gain some performance by overlapping Send with other computation tasks. However, this code with (Isend/ Irecv) crashes with the current real experiment data (tmoc00118, run=463). We need to investigate this issue before continuing this work.

In additional to overlapping send, we can also perform computational tasks while Smd0 wait for an eventbuilder core to come back (Irecv). This implementation should be explored after the issue mentioned above is solved.