Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

OpenMPI hangs on large message

This happens with (Open MPI) 4.1.1. To reproduce the problem, run below script with

mpirun -n 2 python test_largemsg.py
cat test_largemsg.py

Code Block
languagepy
from mpi4py import MPI

...


comm = MPI.COMM_WORLD

...


rank = comm.Get_rank()

...


size = comm.Get_size()

...


import numpy as np

...




n = 20000

...


if rank == 0:

...


    data = np.arange(n, dtype='i')

...


else:

...


    data = np.empty(n, dtype='i')

...


comm.Bcast(data, root=0)

...


print(f'rank={rank} data[-1]={data[-1]}')

Solution: suppress openmpi tcp protocol with the following command:

mpirun -n 2 --mca btl ^tcp python test_largemsg.py

Note: This is NOT an issue with srun.