Slurm Batch

Slurm is a batch scheduler that enables users (you!) to submit long (or even short) compute 'jobs' to our compute clusters. It will queue up jobs such that the (limited) resources compute resources available are fairly shared and distributed for all users. This page describes basic usage of slurm at SLAC. It will provide some simple examples of how to request common resources.

Slurm is currently being tested and is scheduled for deployment on the SLAC Scientific Shared Data Facility. We welcome any suggestions and issues to be reported to unix-admin@slac.stanford.edu. Note that whilst we strive to keep the information on these pages up-to-date, there may be inconsistencies and/or incorrect information contained within.

You will need to request access to slurm whilst we work out more automated ways of adding users to the system. We will need to know which slurm Account to 'bill' you against (don't worry, there will be no $ charge for usage, it's purely for accounting and reporting). This Account will most likely be your immediate group/team that you work with. Please send your unix username and your group/team name to unix-admin@slac.stanford.edu.

We are also testing the ability for your group/team Slurm Administrator to have the ability to add users to their Accounts (delegated administration). If you wish to represent your group/team to do this, please contact us!

Why should I use Batch?

Whilst your desktop computer and or laptop computer has a fast processor and quick local access to data stored on its hard disk/ssd; you may want to run either very big and/or very large compute tasks that may require a lot of CPUs, GPUs, memory, or a lot of data. Our compute servers that are part of the Batch system allows your to do this. Our servers typically also have very fast access to centralised storage, have (some) common software already preinstalled, and will enable you to run these long tasks without impacting your local desktop/laptop resources.

Why should I use Slurm?

Historically, we have always use IBM's LSF as our Batch scheduler software. However, with new hardware such as GPU's, we have found that the user experience and the administrative accounting features of LSF to be lacking. Slurm is also commonly used across academic and laboratory environments and we hope that this commonality will facilitate easy usage for you, and simpler administration for us.

What should I know about using Batch?

The first thing to note is that you should probably be comfortable in a Unix 'command line' environment. LINKS?

When you submit a compute task to the batch system, this is called a Job. We need to charge each Job to an Account. You may also select what pool of servers to run the Jobs on - this is known as a Partition.

You should also acquaint yourself with slurm Accounts and Partitions.

What is a Slurm Account?

As the number of servers and GPUs in our environment is limited (but not small), we need to keep account of who uses what. In addition, as groups/teams can purchase their own servers to be added to the SDF we must provide a method of which allocated users can have priority access to the servers that were purchased for them. A slurm Account is basically something that you will charge your job against.

What is a Slurm Partition?

A Partition is a logical grouping of compute servers. These may be servers of a similar technical specification (eg Cascade Lake CPUs, Telsa GPUs etc), or by ownership of the servers - eg SUNCAT group may have purchased so many servers, so we put them all into a Partition.

You can view the active Partitions on SDF with

 sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
gpu*         up 5-00:00:00      3  idle* cryoem-gpu03,hep-gpu01,nu-gpu02
gpu*         up 5-00:00:00     21  down* cryoem-gpu[02,04-09,11-15],ml-gpu[02-10]
gpu*         up 5-00:00:00      7   idle cryoem-gpu[01,10,50],ml-gpu[01,11],nu-gpu[01,03]
neutrino     up   infinite      1  idle* nu-gpu02
neutrino     up   infinite      2   idle nu-gpu[01,03]
cryoem       up   infinite      1  idle* cryoem-gpu03
cryoem       up   infinite     12  down* cryoem-gpu[02,04-09,11-15]
cryoem       up   infinite      3   idle cryoem-gpu[01,10,50]

What is a Slurm Allocation?

In order to provide appropriate access for users to the hardware, an Allocation is created that defines what User can run on what Partition and charge against what Account (there's a bit more in the backend to this).

How do I use Slurm?

As we consolidate design for Slurm, you will need to access a node that has Slurm installed; currently we recommend login into ocio-gpu01.slac.stanford.edu.

ssh ocio-gpu01.slac.stanford.edu

Then you will need to use modules to add the slurm binaries into your path environment:

module load slurm

How can I get an Interactive Terminal?

use the srun command

module load slurm
srun -A myaccount -p mypartition1 -n 1 --pty /bin/bash

This will then execute /bin/bash on a (scheduled) server in the Partition mypartition1 and charge against Account myaccount. This will request a single CPU, launch a pseudo terminal (pty) where bash will run.

Note that when you 'exit' the interactive session, it will relenquish the resources for someone else to use. This also means that if your terminal is disconnected (you turn your laptop off, loose network etc), then the Job will also terminate (similar to ssh).

How do I submit a Batch Job?

use the sbatch command

How can I request GPUs?

# request single gpu
srun -A myaccount -p mypartition1[,mypartition2] -n 1 --gpus 1 --pty /bin/bash
 
# request a gtx 1080 gpu
srun -A myaccount -p mypartition1[,mypartition2] -n 1 --gpus geforce_gtx_1080_ti:1 --pty /bin/bash
 
# request a gtx 2080 gpu
srun -A myaccount -p mypartition1[,mypartition2] -n 1 --gpus geforce_rtx_2080_ti:1 --pty /bin/bash

# request a v100 gpu
srun -A myaccount -p mypartition1[,mypartition2] -n 1 --gpus v100:1 --pty /bin/bash

How can I see what GPUs are available?

$ sinfo -o "%12P %5D %14F %7z %7m %10d %11l %42G %38N %f"
PARTITION    NODES NODES(A/I/O/T) S:C:T   MEMORY  TMP_DISK   TIMELIMIT   GRES                                       NODELIST                               AVAIL_FEATURES
gpu*         1     0/1/0/1        2:12:2  257330  0          5-00:00:00  gpu:geforce_gtx_1080_ti:8                  hep-gpu01                              CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1
gpu*         1     0/1/0/1        2:8:2   191567  0          5-00:00:00  gpu:v100:4                                 nu-gpu02                               CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0
gpu*         8     0/1/7/8        2:12:2  257336  0          5-00:00:00  gpu:geforce_gtx_1080_ti:10                 cryoem-gpu[02-09]                      CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1
gpu*         14    0/0/14/14      2:12:2  191552  0          5-00:00:00  gpu:geforce_rtx_2080_ti:10                 cryoem-gpu[11-15],ml-gpu[02-10]        CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:GTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
gpu*         1     0/1/0/1        2:12:2  257336  0          5-00:00:00  gpu:geforce_gtx_1080_ti:10(S:0)            cryoem-gpu01                           CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1
gpu*         3     0/3/0/3        2:12:2  191552  0          5-00:00:00  gpu:geforce_rtx_2080_ti:10(S:0)            cryoem-gpu10,ml-gpu[01,11]             CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:GTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
gpu*         3     0/3/0/3        2:8:2   191567  0          5-00:00:00  gpu:v100:4(S:0-1)                          cryoem-gpu50,nu-gpu[01,03]             CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0
neutrino     1     0/1/0/1        2:8:2   191567  0          infinite    gpu:v100:4                                 nu-gpu02                               CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0
neutrino     2     0/2/0/2        2:8:2   191567  0          infinite    gpu:v100:4(S:0-1)                          nu-gpu[01,03]                          CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0
cryoem       8     0/1/7/8        2:12:2  257336  0          infinite    gpu:geforce_gtx_1080_ti:10                 cryoem-gpu[02-09]                      CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1
cryoem       5     0/0/5/5        2:12:2  191552  0          infinite    gpu:geforce_rtx_2080_ti:10                 cryoem-gpu[11-15]                      CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:GTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
cryoem       1     0/1/0/1        2:12:2  257336  0          infinite    gpu:geforce_gtx_1080_ti:10(S:0)            cryoem-gpu01                           CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1
cryoem       1     0/1/0/1        2:12:2  191552  0          infinite    gpu:geforce_rtx_2080_ti:10(S:0)            cryoem-gpu10                           CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:GTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
cryoem       1     0/1/0/1        2:8:2   191567  0          infinite    gpu:v100:4(S:0-1)                          cryoem-gpu50                           CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0

What Accounts are there?

TBA

Account Name	Description	Contact
cryoem	CryoEM Group	Yee
neutrino	Neutrino Group	Kazu
cryoem-daq
ml	Machine Learning Initiative	Daniel

What Partitions are there?

TBA

Partition Name	Purpose	Contact
gpu	General GPU resources	Yee / Daniel
cryoem	CryoEM GPU servers	Yee
neutrino	Neutrino GPU servers	Kazu

Help! My Job takes a long time before it starts!

This is often due to limited resources. The simplest way is to request less CPU (-N) or less memory for your Job. However, this will also likely increase the amount of time that you need for the Job to complete. Note that perfect scaling is often very difficult (ie using 16 CPUs will run twice as fast as 8 CPUs), so it may be beneficial to submit many smaller Jobs where possible.

The more expensive option is to buy more hardware to SDF and have it added to your group/teams Partition.

How can I restrict/contraint which servers to run my Job on?

You can use slurm Constraints. We tag each and every server that help identify specific Features that each has: whether that is the kind of CPU, or the kind of GPU that run on them.

You can view a servers specific Feature's using

$ module load slurm
$ scontrol show node ml-gpu01
NodeName=ml-gpu01 Arch=x86_64 CoresPerSocket=12
   CPUAlloc=0 CPUTot=48 CPULoad=1.41
   AvailableFeatures=CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:GTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
   ActiveFeatures=CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:GTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
   Gres=gpu:geforce_rtx_2080_ti:10(S:0)
   NodeAddr=ml-gpu01 NodeHostName=ml-gpu01 Version=19.05.2
   OS=Linux 3.10.0-1062.4.1.el7.x86_64 #1 SMP Fri Oct 18 17:15:30 UTC 2019
   RealMemory=191552 AllocMem=0 FreeMem=182473 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=gpu
   BootTime=2019-11-12T11:18:04 SlurmdStartTime=2019-12-06T16:42:16
   CfgTRES=cpu=48,mem=191552M,billing=48,gres/gpu=10
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

We are openly investigating additional Features to add. Comments and suggestions welcome.

Possibly add: GPU_DRV, OS_VER, OS_TYPE

Space shortcuts

Page tree