Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
titlesetup_hosts_openmpi.sh
############################################################
# First node must be exclusive to smd0
# * For openmpi, slots=1 must be assigned to the first node.
############################################################

# Get list of hosts by expand shorthand node list into a 
# line-by-line node list
host_list=$(scontrol show hostnames $SLURM_JOB_NODELIST)
hosts=($host_list)

# Write out to host file by putting rank 0 on the first node
host_file="slurm_host_${SLURM_JOB_ID}"
for i in "${!hosts[@]}"; do
    if [[ "$i" == "0" ]]; then
        echo ${hosts[$i]} slots=1 > $host_file
    else
        echo ${hosts[$i]} >> $host_file
    fi
done

# Export hostfile for mpirun  
export PS_HOST_FILE=$host_file

# Calculate no. of ranks available in the job
export PS_N_RANKS=$(( SLURM_CPUS_ON_NODE * ( SLURM_JOB_NUM_NODES - 1 ) + 1 ))

Performance Tuning Tips

(see MPITaskStructureToSupportScaling to get a sense for where the parameters described here are used)

To get improved performance when running large jobs consider the following options.  It is not straightforward to set these parameters optimally for an arbitrary analysis job so some study is required for the particular applicationspecific applications.  In some cases we can offer guidelines.

  • understand the CPU usage of your big-data ("BD") processing loop to make sure the bottleneck isn't "user code".  this can typically be done by running on 1 core.
  • increase the environment variable PS_SMD_NODES to be larger than its default of 1.  For many analyses, a number that is 1/16 of the number of big data cores has been good.  This variable, along with the PS_SRV_NODES variable described next determines how many cores are used for which task in psana (see MPITaskStructureToSupportScaling).
  • if you're writing a large amount of hdf5 data increase the environment variable PS_SRV_NODES to have more cores writing hdf5 files.  It is difficult here to provide guidance on the number since it depends on the application
  • set environment variable PS_SMD_N_EVENTS larger to increase the number of events that get sent in a "batch" when transmitting data from SMD0 cores through to BD cores
  • when setting up the smalldata, increase the number of events that get sent in a "batch" when transmitting data from BD cores to SRV cores by setting the batch_size kwarg in the DataSource.smalldata() call.

psana also has some grafana monitoring built in that, with expert help, can be used to identify bottlenecks in an analysis.  Contact pcds-ana-l@slac.stanford.edu for guidance.