Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
SLURM is new job scheduling system for the LCLS batch compute systems it is replacing the current LSF system. Generic documentation about SLURM can be found in this Quick Start User Guide. Even shorter documentation, some of it specific to psana, can be found in this page.
Some quick guides showing equivalent commands in LSF and SLURM:
The partition/queue information can be provided by the sinfo
command.
From the psana pool:
psanagpu104:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST psanagpuq* up 10-00:00:0 1 drain* psanagpu118 psanagpuq* up 10-00:00:0 2 down* psanagpu[115-116] psanagpuq* up 10-00:00:0 1 drain psanagpu117 psanagpuq* up 10-00:00:0 2 idle psanagpu[113-114] psanaq up 10-00:00:0 1 drain* psana1509 psanaq up 10-00:00:0 4 down* psana[1503,1519,1604-1605] psanaq up 10-00:00:0 6 mix psana[1502,1504-1506,1520,1602] psanaq up 10-00:00:0 1 alloc psana1501 psanaq up 10-00:00:0 27 idle psana[1507-1508,1510-1518,1601,1606-1620] psanaq up 10-00:00:0 1 down psana1603 The * following the name means default partition (queue) is psanagpuq
scontrol show node psanagpu116 -d | grep Gres=gpu
From the psana pool:
Gres=gpu:1080ti:1(S:0)
There are 2 ways to submit a job on the cluster. The main way is by using the sbatch
command for later execution, and the other is to submit an interactive job via srun
.
The following is a simple submission script of a parallel psana batch job run with mpi. It can be submitted with the command "sbatch submit.sh
". The commands specified in the script file will be ran on the first available compute node that fits the resources requested. There are two ideas: "nodes" and "tasks per node". A "node" is a physical computer box (with a host-name, for example) but each box/node typically has multiple-cpu-cores (see this page for specific numbers: Batch Nodes And Queues). Typically the tasks-per-node parameter is set to utilize all the cores on each node.
> cat submit.sh #!/bin/bash #SBATCH --partition=psanaq #SBATCH --nodes=2 #SBATCH --ntasks-per-node=3 #SBATCH --output=%j.log # "-u" flushes print statements which can otherwise be hidden if mpi hangs # "-m mpi4py.run" allows mpi to exit if one rank has an exception mpirun python -u -m mpi4py.run /reg/g/psdm/tutorials/examplePython/mpiDataSource.py
One can also do this same command from the command line using the "--wrap" option for sbatch:
sbatch -p psanaq --nodes 2 --ntasks-per-node 3 --wrap="mpirun python mpi_simpletest.py"
This script shows some additional features controllable via SLURM:
> cat tst_script #!/bin/bash # #SBATCH --job-name=<name> # Job name for allocation #SBATCH --output=%j.log # File to which STDOUT will be written, %j inserts jobid #SBATCH --error=%j.err # File to which STDERR will be written, %j inserts jobid #SBATCH --partition=psanagpuq # Partition/Queue to submit job #SBATCH --gres=gpu:1080ti:1 # Number of GPUs #SBATCH --ntask=8 # Total number of tasks #SBATCH --ntasks-per-node=4 # Number of tasks per node #SBATCH --mail-user='username'@slac.stanford.edu # Receive e-mail from slurm #SBATCH --mail-type=ALL # Type of e-mail from slurm; other options are: Error, Info. # srun -l hostname srun python ExampleMultipleChaperones.py > sbatch tst_script Submitted batch job 187
Differently from sbatch
, the srun
command does not return immediately and waits for the job to complete. The srun command can be used to get control of a node to run interactively. These can be useful for data exploration and software development.
The following are a few examples:
> srun -N2 -n4 hello.mpi Process 0 on psanagpu110 out of 1 Process 0 on psanagpu110 out of 1 Process 0 on psanagpu113 out of 1 Process 0 on psanagpu113 out of 1
To check that jobs that exist on the system use the squeue command:
psanagpu104:~$ squeue -u khegazy JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 466440 psanaq wrap khegazy R 1-17:51:12 1 psana1502 466423 psanaq wrap khegazy R 1-17:53:31 1 psana1506 466420 psanaq wrap khegazy R 1-17:53:34 1 psana1602 466421 psanaq wrap khegazy R 1-17:53:34 1 psana1504
The ST (job state) field shows that jobid 466440 is currently running (R). Another common state is "pending" (P).
Get information about status of finished jobs
psanagpu104:~$ sacct -u khegazy JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 466416 wrap psanaq 10 RUNNING 0:0 466416.batch batch 10 RUNNING 0:0 466418 wrap psanaq 10 RUNNING 0:0 466418.batch batch 10 RUNNING 0:0 466420 wrap psanaq 10 RUNNING 0:0 466420.batch batch 10 RUNNING 0:0