OpenMPI hangs on large message
This happens with (Open MPI) 4.1.1. To reproduce the problem, run below script with
mpirun -n 2 python test_largemsg.py
cat test_largemsg.py
from mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.Get_rank() size = comm.Get_size() import numpy as np n = 20000 if rank == 0: data = np.arange(n, dtype='i') else: data = np.empty(n, dtype='i') comm.Bcast(data, root=0) print(f'rank={rank} data[-1]={data[-1]}')
Solution: suppress openmpi tcp protocol with the following command:
mpirun -n 2 --mca btl ^tcp python test_largemsg.py
Note: This is NOT an issue with srun.
Overview
Content Tools