Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titlesetup_hosts_openmpi.sh
############################################################
# First node must be exclusive to smd0
# * For openmpi, slots=1 must be assigned to the first node.
############################################################

# Get list of hosts by expand shorthand node list into a 
# line-by-line node list
host_list=$(scontrol show hostnames $SLURM_JOB_NODELIST)
hosts=($host_list)

# Write out to host file by putting rank 0 on the first node
host_file="slurm_host_${SLURM_JOB_ID}"
for i in "${!hosts[@]}"; do
    if [[ "$i" == "0" ]]; then
        echo ${hosts[$i]} slots=1 > $host_file
    else
        echo ${hosts[$i]} >> $host_file
    fi
done

# Export hostfile for mpirun  
export PS_HOST_FILE=$host_file

# Calculate no. of ranks available in the job
export PS_N_RANKS=$(( SLURM_CPUS_ON_NODE * ( SLURM_JOB_NUM_NODES - 1 ) + 1 ))

Performance Tuning Tips

To get improved performance when running large jobs consider the following options.  It is not straightforward to set these optimally for an arbitrary analysis job so some study is required for your application.

  • increase the environment variable PS_SMD_NODES to be larger than its default of 1.  For many analyses, a number that is 1/16 of the number of big data cores has been good
  • if you're writing a large amount of hdf5 data increase the environment variable PS_SRV_NODES to have more cores writing hdf5 files.  It is difficult here to provide guidance on the number since it depends on the application
  • set environment variable PS_SMD_N_EVENTS larger to increase the number of events that get sent in a "batch" when transmitting data from SMD0 cores through to BD cores
  • when setting up the smalldata, increase the number of events that get sent in a "batch" when transmitting data from BD cores to SRV cores by setting the batch_size kwarg in the DataSource.smalldata() call.