Page History
...
Code Block | ||||
---|---|---|---|---|
| ||||
############################################################ # First node must be exclusive to smd0 # * For openmpi, slots=1 must be assigned to the first node. ############################################################ # Get list of hosts by expand shorthand node list into a # line-by-line node list host_list=$(scontrol show hostnames $SLURM_JOB_NODELIST) hosts=($host_list) # Write out to host file by putting rank 0 on the first node host_file="slurm_host_${SLURM_JOB_ID}" for i in "${!hosts[@]}"; do if [[ "$i" == "0" ]]; then echo ${hosts[$i]} slots=1 > $host_file else echo ${hosts[$i]} >> $host_file fi done # Export hostfile for mpirun export PS_HOST_FILE=$host_file # Calculate no. of ranks available in the job export PS_N_RANKS=$(( SLURM_CPUS_ON_NODE * ( SLURM_JOB_NUM_NODES - 1 ) + 1 )) |
Performance Tuning Tips
To get improved performance when running large jobs consider the following options. It is not straightforward to set these optimally for an arbitrary analysis job so some study is required for your application.
- increase the environment variable PS_SMD_NODES to be larger than its default of 1. For many analyses, a number that is 1/16 of the number of big data cores has been good
- if you're writing a large amount of hdf5 data increase the environment variable PS_SRV_NODES to have more cores writing hdf5 files. It is difficult here to provide guidance on the number since it depends on the application
- set environment variable PS_SMD_N_EVENTS larger to increase the number of events that get sent in a "batch" when transmitting data from SMD0 cores through to BD cores
- when setting up the smalldata, increase the number of events that get sent in a "batch" when transmitting data from BD cores to SRV cores by setting the batch_size kwarg in the DataSource.smalldata() call.
Overview
Content Tools