Page History

...

SLURM is new job scheduling system for the LCLS batch compute systems it is replacing the current LSF system. Generic documentation about SLURM can be found in this Quick Start User Guide. Even shorter documentation, some of it specific to psana, can be found in this page.

LSF to SLURM Cheat Sheets

Some quick guides showing equivalent commands in LSF and SLURM:

Partitions

The partition/queue information can be provided by the sinfo command.

sinfo

From the psana poolLCLS users typically use the "milano" queue at s3df:

Code Block

language	bash
title	sinfo

psanagpu104:~$[cpo@sdfiana002 ~]$ sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
psanagpuqroma*        up 10-00:00:0      1 drain* psanagpu118
psanagpuq  comp sdfrome004
roma*        up 10-00:00:0     16 2 drng@ down* psanagpu[115-116]
psanagpuq*sdfrome[006-018,041-043]
roma*        up 10-00:00:0      1  draindown$ psanagpu117sdfrome003
psanagpuqroma*        up 10-00:00:0      21   idle psanagpu[113-114]
psanaqdrain$ sdfrome037
roma*        up 10-00:00:0      1  drain* psana1509sdfrome005
psanaqroma*        up 10-00:00:0     21  4  down*mix psanasdfrome[1503,1519,1604-1605019-036,038-040]
psanaqmilano        up 10-00:00:0      1  inval sdfmilan221
milano       up 10-00:00:0   6  14  mixdrng@ psana[1502,1504-1506,1520,1602]
psanaq sdfmilan[036-038,120-121,126,129,204-205,212,229-232]
milano       up 10-00:00:0      4 drain$ sdfmilan[009,041,049,112]
milano       up 10-00:00:0      1  allocdrain psana1501
psanaq sdfmilan032
milano       up 10-00:00:0     12   resv sdfmilan[001-005,029-030,052,057,117-119]
milano       up 10-00:00:0    102 27   idlemix psana[1507-1508,1510-1518,1601,1606-1620]
psanaq sdfmilan[006-008,010-019,021-028,031,033-035,039-040,042-048,050-051,053-056,058-072,101-111,113-116,122-125,127-128,130-131,201-203,206-211,213-220,222-228]
milano       up 10-00:00:0      1   idle sdfmilan020
ampere       up 10-00:00:0      1  drng@ sdfampere010
ampere       up 10-00:00:0      down1  psana1603
The *drng followingsdfampere011
ampere the name means default partition (queue) is psanagpuq

Check available GPUs on a specific node

scontrol show node psanagpu116 -d | grep Gres=gpu

From the psana pool:

Gres=gpu:1080ti:1(S:0)

Job Submission

...

up 10-00:00:0      3  drain sdfampere[005,008,023]
ampere       up 10-00:00:0     18    mix sdfampere[001-004,006-007,009,012-022]
[cpo@sdfiana002 ~]$

The "*" following the roma queue name indicates that it is a default queue for submission.

`Job Submission`

`sbatch`

The following is a simple submission script of a parallel psana batch job run with mpi. It can be submitted with the command "sbatch submit.sh". The commands specified in the script file will be ran on the first available compute node that fits the resources requested. There are two ideas: "nodes" and "tasks per node". A "node" is a physical computer box (with a host-name, for example) but each box/node typically has multiple-cpu-cores (see this page for specific numbers: Batch Nodes And Queues). Typically the tasks-per-node parameter is set to utilize all the cores on each node.

Code Block

> cat submit.sh
 #!/bin/bash

#SBATCH --partition=psanaqmilano
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=3120
#SBATCH --output=%j.log

# "-u" flushes print statements which can otherwise be hidden if mpi hangs
# "-m mpi4py.run" allows mpi to exit if one rank has an exception
mpirun python -u -m mpi4py.run /reg/g/psdm/tutorials/examplePython/mpiDataSourcemy_psana_script.py

One can also do this same command from the command line using the "--wrap" option for sbatch:

Code Block
sbatch -p psanaqmilano --nodes 2 --ntasks-per-node 3 --wrap="mpirun python -u -m mpi4py.run /reg/g/psdm/tutorials/examplePython/mpiDataSource.py"

This script shows some additional features controllable via SLURM:

Code Block

title	sbatch

> cat tst_script 
#!/bin/bash
#
#SBATCH --job-name=<name> # Job name for allocation
#SBATCH --output=%j.log # File to which STDOUT will be written, %j inserts jobid
#SBATCH --error=%j.err # File to which STDERR will be written, %j inserts jobid
#SBATCH --partition=psanagpuq # Partition/Queue to submit job
#SBATCH --gres=gpu:1080ti:1 # Number of GPUs
#SBATCH --ntask=8  # Total number of tasks
#SBATCH --ntasks-per-node=4 # Number of tasks per node
#SBATCH --mail-user='username'@slac.stanford.edu # Receive e-mail from slurm
#SBATCH --mail-type=ALL # Type of e-mail from slurm; other options are: Error, Info.
#
srun -l hostname
srun python ExampleMultipleChaperones.py


> sbatch tst_script 
Submitted batch job 187

`srun`

Differently from sbatch, the srun command does not return immediately and waits for the job to complete. The srun command can be used to get control of a node to run interactively. These can be useful for data exploration and software development.

The following are a few examples:

Code Block

language	bash
title	srun

>  srun -N2 -n4 hello.mpi 
Process 0 on psanagpu110 out of 1
Process 0 on psanagpu110 out of 1
Process 0 on psanagpu113 out of 1
Process 0 on psanagpu113 out of 1

`Monitoring/Status`

`squeue`

To check that jobs that exist on the system use the squeue command:

my_psana_script.py"

srun

In principle the slurm "srun" command can also be used to launch parallel jobs, however the current S3DF "srun" version only supports an older "pmi2" protocol, which is incompatible the mpi packages from conda that LCLS uses which use the newer "pmix" protocol. srun should be avoided for parallel jobs at S3DF (see output of "srun --mpi=list").

Monitoring/Status

`squeue`

To check that jobs that exist on the system use the squeue command:

Code Block

language	bash
title	squeue

[cpo@sdfiana002 ~]$ squeue -u ytl
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          30703603 ampere,ro      out      ytl PD       0:00      1 (launch failed requeued held)
          30703602 ampere,ro      out      ytl PD       0:00      1 (launch failed requeued held)
          30701730 ampere,ro      out      ytl PD       0:00      1 (launch failed requeued held)
          30700739 ampere,ro      out      ytl PD       0:00      1 (launch failed requeued held)
          30700738 ampere,ro      out      ytl PD       0:00      1 (launch failed requeued held)

Code Block

language	bash
title	squeue

psanagpu104:~$ squeue -u khegazy
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            466440    psanaq     wrap  khegazy  R 1-17:51:12      1 psana1502
            466423    psanaq     wrap  khegazy  R 1-17:53:31      1 psana1506
          30699545  466420ampere,ro      out    psanaq  ytl PD  wrap  khegazy  R 1-17:53:340:00      1 psana1602
 (launch failed requeued held)
          30704838    milano   466421   out    psanaq  ytl CG  wrap  khegazy  R 1-17:53:344:07      1 psana1504 sdfmilan221
[cpo@sdfiana002 ~]$

The ST (job state) field shows that jobid 466440 is currently running (R). Another common state is "pending" (Pmost jobs are pending (PD) and one is completing (CG).

`sacct`

Get information about status of finished jobs

Code Block

language	bash
title	sacct

psanagpu104:~$[cpo@sdfiana002 ~]$ sacct -u khegazyytl
JobID       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
46641630682524            out wrap    milano psanaqshared:de+        112            10    RUNNINGPREEMPTED       0:0 
46641630682524.batchba+      batch            shared:de+        112           10    RUNNINGCANCELLED       0:015 
46641830682524.ex+     extern        wrap    shared:de+ psanaq       112    COMPLETED         10    RUNNING      0:0 
466418.batch30682525      batch      out     milano shared:de+        112           10    RUNNINGPREEMPTED      0:0 
46642030682525.ba+      batch       wrap     psanaqshared:de+        112            10  CANCELLED  RUNNING      0:015 
46642030682525.batchex+     extern batch           shared:de+        112            10    RUNNINGCOMPLETED      0:0

`Misc Slurm commands`

scontrol is used to view or modify Slurm configuration including: job, job step, node, partition, reservation, and overall system configuration. Most of the commands can only be executed by user root or an Administrator.

...

Child pages

Versions Compared

Old Version 38

New Version 39

Key

LSF to SLURM Cheat Sheets

Partitions

sinfo

Check available GPUs on a specific node

Job Submission

`Job Submission`

`sbatch`

`srun`

`Monitoring/Status`

`squeue`

`squeue`

`sacct`

`Misc Slurm commands`