You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

We may not need to use openmpi with Infiniband if we can get similar performance running psana2 on Ethernet for MPI communications. This connections are needed only for transferring small data (11 GB) for this test from Smd0 to EventBuilders and BigData nodes. Here we show the performance of reading 123 GB on 16 files using 7 drp nodes (113 cores: 1 Smd0/ 12 EventBuilders/ 100 Bigdata cores).

 

Conclusion:

Using OpenMPI with Infiniband: Rate 39.5 kHz (Total Time: 253 s)

Using MPICH from conda on Ethernet: Rate 39.7 kHz (Total Time: 252 s)

 

Note 1: below are plots from Grafana showing incoming/outgoing traffics

OpenMPI with Infiniband: the outgoing traffic peaks show Smd0 sending chunks of data to EventBuilders over Infiniband.

MPICH on Ethernet: no noticeable peaks

 

To run the test:

OpenMPI with Infiniband: 

Clone psana environment then remove mpi4py, mpich, and mpi.

Build openmpi on drp nodes (drp-tst-dev011 was used for this test). No special flag needed just use --prefix to put the build somewhere.

Do conda build for mpi4py (see recipe on relmanage/recipe) by pointing build.sh script to the openmpi build.

Install this new mpi4py to the cloned conda env.

Running it with (for example), 

~/tmp/4.0.0-rhel7/bin/mpirun --hostfile openmpi_hosts --mca btl_openib_allow_ib 1 run_slac.sh

where ~/tmp/4.0.0-rhel7/ is the --prefix used to build openmpi.

(ps-1.2.2-openmpi) monarin@drp-tst-dev011 (master *) psana2 $ cat openmpi_hosts drp-tst-dev011 slots=1

drp-tst-dev012 slots=12

drp-tst-dev013 slots=20

drp-tst-dev014 slots=20

drp-tst-dev015 slots=20

drp-tst-dev016 slots=20

drp-tst-dev017 slots=20

(ps-1.2.2-openmpi) monarin@drp-tst-dev011 (master *) psana2 $ cat run_slac.sh 

#!/bin/bash

export PS_SMD_NODES=12

source $HOME/lcls2/setup_env.sh

conda activate ps-1.2.2-openmpi

python dev_bd.py


MPICH on ethernet

(ps-2.1.2) monarin@drp-tst-dev011 (master *) psana2 $ /reg/g/psdm/sw/conda2/inst/envs/ps-2.1.2/bin/mpirun -f mpich_hosts ./run_slac.sh

(ps-2.1.2) monarin@drp-tst-dev011 (master *) psana2 $ cat mpich_hosts

drp-tst-dev011:1

drp-tst-dev012:12

drp-tst-dev013:20

drp-tst-dev014:20

drp-tst-dev015:20

drp-tst-dev016:20

drp-tst-dev017:20

(ps-1.2.2-openmpi) monarin@drp-tst-dev011 (master *) psana2 $ cat run_slac.sh 

#!/bin/bash

export PS_SMD_NODES=12

source $HOME/lcls2/setup_env.sh

python dev_bd.py


 

  • No labels