Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note

Slurm is currently being tested and is scheduled for deployment on the SLAC Scientific Shared Data Facility. We welcome any suggestions and issues to be reported to unix-admin@slac.stanford.edu. Note that whilst we strive to keep the information on these pages up-to-date, there may be inconsistencies and/or incorrect information contained within.

Note

By default, all users when they first use slurm will have access to the shared Account on the shared Partition with scavenger QoS.

If you belong to a group that has contributed hardware into the SDF, you will be eligible to use different Accounts and Partitions:

  • We are testing the ability for your group/team Slurm Administrator to have the ability to add users to their Accounts (delegated administration). If you wish to represent your group/team to do this, please contact us!
  • We will need to know which slurm Account to 'bill' you against (don't worry, there will be no $ charge for
You will need to request access to slurm whilst we work out more automated ways of adding users to the system. We will need to know which slurm Account to 'bill' you against (don't worry, there will be no $ charge for
  • usage, it's purely for accounting and reporting). This Account will most likely be your immediate group/team that you work with. Please send your unix username and your group/team name to unix-admin@slac.stanford.edu.
Note

We do NOT, and WILL NOT support AFS tokens with slurm. This will cause your jobs to fail if you try to write to anywhere under /afs (including your currently home ~ directories). We shall be deploying new storage in the near future, with dedicated home and data directories. In the meantime, It is recommended to use GPFS space if your group currently has any.We are also testing the ability for your group/team Slurm Administrator to have the ability to add users to their Accounts (delegated administration). If you wish to represent your group/team to do this, please contact us!

 

Why should I use Batch?

Whilst your desktop computer and or laptop computer has a fast processor and quick local access to data stored on its hard disk/ssd; you may want to run either very big and/or very large compute tasks that may require a lot of CPUs, GPUs, memory, or a lot of data. Our compute servers that are part of the Batch system allows your to do this. Our servers typically also have very fast access to centralised storage, have (some) common software already preinstalled, and will enable you to run these long tasks without impacting your local desktop/laptop resources.

...

 

 

How do I use Slurm?

Note

We are still testing the best way to deploy Slurm at SLAC, and as such, some of the examples and instructions that follow may be subject to change. If you have any opinions and or suggestions, we would love to hear from.

Slurm is installed on a limited number of hosts currently. We recommend logging on using ssh via a terminal:

 
As we consolidate design for Slurm, you will need to access a node that has Slurm installed; currently we recommend login into ocio-gpu01.slac.stanford.edu.
ssh ocio-gpu01.slac.stanford.edu

Then In order to get the slurm binaries available, you will need to use modules to add the slurm binaries into your path environment:

 
module load slurm
 

How can I get an Interactive Terminal?

 

use the srun command

Code Block
module load slurm
srun -A myaccount -p mypartition1 -n 1 --pty /bin/bash

This will then execute /bin/bash on a (scheduled) server in the Partition mypartition1 and charge against Account myaccount. This will request a single CPU, launch a pseudo terminal (pty) where bash will run.

Note that when you 'exit' the interactive session, it will relinquish the resources for someone else to use. This also means that if your terminal is disconnected (you turn your laptop off, loose network etc), then the Job will also terminate (similar to ssh).

 

Warning

If your interactive request doesn't immediately find resources, it will currently not actually return you a pty - even though the job actually does run. This results in what looks like a hanging process. We are investigating.

 

How do I submit a Batch Job?

use the sbatch command, this primer needs to be elaborated:

Create a job submission script:

Code Block
#!/bin/bash

#SBATCH --job-name=test
#SBATCH --output=output-%j.txt
#SBATCH --error=output-%j.txt
#
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=100
#
#SBATCH --time=10:00
#
#SBATCH --gpu geforce_gtx_1080_ti:1


<commands here>

Then 

Code Block
module load slurm
sbatch script.sh

How can I request GPUs?

Code Block
# request single gpu
srun -A myaccount -p mypartition1[,mypartition2] -n 1 --gpus 1 --pty /bin/bash
 
# request a gtx 1080 gpu
srun -A myaccount -p mypartition1[,mypartition2] -n 1 --gpus geforce_gtx_1080_ti:1 --pty /bin/bash
 
# request a gtx 2080 gpu
srun -A myaccount -p mypartition1[,mypartition2] -n 1 --gpus geforce_rtx_2080_ti:1 --pty /bin/bash

# request a v100 gpu
srun -A myaccount -p mypartition1[,mypartition2] -n 1 --gpus v100:1 --pty /bin/bash

 

How can I see what GPUs are available?

We will likely have the above command automatically run, so it may not be necessary later.

Common commands are:

  
srunrequest a quick job to be ran - eg an interactive terminal
sbatchsubmit a batch job to run
squeueshow jobs
scancelcancel a job
scontrol show jobshow job details
sstatshow job usage details
sacctmgrmanage Associations

 

How can I get an Interactive Terminal?

 

Warning

We are experiencing problems with the shared partition with interactive jobs: srun will claim that it's waiting for resources, but in fact the allocation fails immediately. we have yet to experience the same issue with other partitions. Your batch jobs should continue to function correctly, however.


use the srun command

Code Block
module load slurm
srun -A shared -p shared -n 1 --pty /bin/bash

This will then execute /bin/bash on a (scheduled) server in the Partition shared and charge against Account shared. This will request a single CPU, launch a pseudo terminal (pty) where bash will run. You may be provided different Accounts and Partitions and should use them when possible.

Note that when you 'exit' the interactive session, it will relinquish the resources for someone else to use. This also means that if your terminal is disconnected (you turn your laptop off, loose network etc), then the Job will also terminate (similar to ssh).

 

How do I submit a Batch Job?

Warning

We are NOT support AFS as part of slurm deployment. We shall be migrating home directories and group directories over to our new storage appliances as part of SDF deployment. If you wish to access your AFS files, please copy them over to the new storage. *elaborate.

 

use the sbatch command, this primer needs to be elaborated:

Create a job submission script (text file) script.sh:

Code Block
#!/bin/bash

#SBATCH --account=shared
#SBATCH --partition=shared
#SBATCH --qos=scavenger
#
#SBATCH --job-name=test
#SBATCH --output=output-%j.txt
#SBATCH --error=output-%j.txt
#
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12
#SBATCH --mem-per-cpu=1g
#
#SBATCH --time=10:00
#
#SBATCH --gpus 1

<commands here>

In the above example, we submit a job named 'test' and output both stdout and stderr to the same file (%j will be replaced with the Job ID). We request a single Task (think of it as an MPI rank) and that single task will request 12 CPUs; each of which will be allocated 1GB of RAM - so a total of 12GB. By default, the --ntasks will be equivalent to the number of nodes (servers) asked for. In order to aid scheduling (and potentially prioritising the Job), we limit the length of the Job to 10 minutes.

We also request a single GPU with the Job. This will be exposed via CUDA_VISIBLE_DEVICES. To specify specific GPU's, see below.

You will need an account (see below). All SLAC users have access to the "shared" partition with a quality of service of "scavenger". This is so that stakeholders of machines in the SDF will get priority access to their resources, whilst any user can use all resources as long as the 'owners' of the hardware isn't wanting to use it. As such, owners (or stakeholders) will have qos "normal" access to their partitions (of which such hosts are also within the shared partition).

Then, in order to submit the job: 

Code Block
module load slurm
sbatch script.sh

You can then use the command to monitor your job progress:

Code Block
squeue

And you can cancel the job with

Code Block
scancel <jobid>

 

How can I request GPUs?

You can use the --gpus to specify gpus for your jobs: Using a number will request the number of any gpu that is available (what you get depends upon what your Account/Association is and what is available when you request it). You can also specify the type of gpus by prefixing the number with the model name. eg

Code Block
# request single gpu
srun -A shared -p shared -n 1 --gpus 1 --pty /bin/bash
 
# request a gtx 1080 gpu
srun -A shared -p shared -n 1 --gpus geforce_gtx_1080_ti:1 --pty /bin/bash
 
# request a gtx 2080 gpu
srun -A shared -p shared -n 1 --gpus geforce_rtx_2080_ti:1 --pty /bin/bash

# request a v100 gpu
srun -A shared -p shared -n 1 --gpus v100:1 --pty /bin/bash

 

How can I see what GPUs are available?

Code Block
# sinfo -o "%12P %5D %14F %7z %7m %10d %11l %42G %38N %f"
PARTITION    NODES NODES(A/I/O/T) S:C:T   MEMORY  TMP_DISK   TIMELIMIT   GRES
Code Block
# sinfo -o "%12P %5D %14F %7z %7m %10d %11l %42G %38N %f"
PARTITION    NODES NODES(A/I/O/T) S:C:T   MEMORY  TMP_DISK   TIMELIMIT   GRES                                       NODELIST                               AVAIL_FEATURES
shared*      1     0/1/0/1        2:8:2   191567  0          7-00:00:00  gpu:v100:4                                 nu-gpu02                               CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0
shared*      8     0/1/7/8        2:12:2  257336  0          7-00:00:00  gpu:geforce_gtx_1080_ti:10                 cryoem-gpu[02-09]                      CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1
shared*      14    0/0/14/14       2:12:2  191552  0      NODELIST    7-00:00:00  gpu:geforce_rtx_2080_ti:10                 cryoem-gpu[11-15],ml-gpu[02-10]        CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
shared*      1     AVAIL_FEATURES
shared*      1     0/1/0/1        2:128:2  257336 191567  0          7-00:00:00  gpu:geforce_gtx_1080_ti:10(S:0)v100:4                 cryoem-gpu01                nu-gpu02           CPU_GEN:HSW                    CPU_GEN:SKX,CPU_SKU:E5-2670v34110,CPU_FRQ:2.30GHz10GHz,GPU_GEN:PSCVLT,GPU_SKU:GTX1080TIV100,GPU_MEM:11GB32GB,GPU_CC:67.10
shared*      38     0/31/07/38        2:12:2  191552257336  0          7-00:00:00  gpu:geforce_rtxgtx_20801080_ti:10(S:0)                 cryoem-gpu10,ml-gpu[01,11]02-09]              CPU        CPU_GEN:SKXHSW,CPU_SKU:5118E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:TURPSC,GPU_SKU:RTX2080TIGTX1080TI,GPU_MEM:11GB,GPU_CC:76.51
shared*      3 14    0/30/014/314        2:812:2   191567191552  0          7-00:00:00  gpu:v100:4(S:0-1)geforce_rtx_2080_ti:10                 cryoem-gpu[11-15],ml-gpu[02-10]         cryoem-gpu50,nu-gpu[01,03]             CPU_GENCPU_GEN:SKX,CPU_SKU:41105118,CPU_FRQ:2.10GHz30GHz,GPU_GEN:VLTTUR,GPU_SKU:V100RTX2080TI,GPU_MEM:32GB11GB,GPU_CC:7.05
shared*      1     0/1/0/1        2:12:2  257330257336  0          7-00:00:00  gpu:geforce_gtx_1080_ti:810(S:0),gpu:titan_x hep-gpu01            cryoem-gpu01                           CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1
mlshared*      3     90/3/0/3       0/0/9/9        2:12:2  191552  0          infinite7-00:00:00    gpu:geforce_rtx_2080_ti:10(S:0)                 cryoem-gpu10,ml-gpu[02-10]            01,11]              CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
mlshared*           23     0/23/0/23        2:128:2  191552 191567  0          infinite 7-00:00:00   gpu:geforce_rtx_2080_ti:10v100:4(S:0-1)            ml-gpu[01,11]              cryoem-gpu50,nu-gpu[01,03]             CPU_GEN:SKX,CPU_SKU:51184110,CPU_FRQ:2.30GHz10GHz,GPU_GEN:TURVLT,GPU_SKU:RTX2080TIV100,GPU_MEM:11GB32GB,GPU_CC:7.50
neutrinoshared*      1     0/1/0/1        2:812:2  257330 191567  0          infinite7-00:00:00    gpu:v100:4                                 nu-gpu02gpu:geforce_gtx_1080_ti:8(S:0),gpu:titan_x hep-gpu01                               CPU_GEN:SKXHSW,CPU_SKU:4110E5-2670v3,CPU_FRQ:2.10GHz30GHz,GPU_GEN:VLTPSC,GPU_SKU:V100GTX1080TI,GPU_MEM:32GB11GB,GPU_CC:76.01
neutrinoml      2     0/29     0/0/29/9        2:812:2   191567191552  0          infinite    gpu:v100:4(S:0-1)geforce_rtx_2080_ti:10                          numl-gpu[01,0302-10]                          CPU_GEN:SKX,CPU_SKU:41105118,CPU_FRQ:2.10GHz30GHz,GPU_GEN:VLTTUR,GPU_SKU:V100RTX2080TI,GPU_MEM:32GB11GB,GPU_CC:7.0
cryoem5
ml           82     0/12/70/82        2:12:2  257336191552  0          infinite    gpu:geforce_gtxrtx_10802080_ti:10(S:0)            ml-gpu[01,11]     cryoem-gpu[02-09]                      CPU_GEN:HSWSKX,CPU_SKU:E5-2670v35118,CPU_FRQ:2.30GHz,GPU_GEN:PSCTUR,GPU_SKU:GTX1080TIRTX2080TI,GPU_MEM:11GB,GPU_CC:67.15
cryoemneutrino       51     0/1/0/5/51        2:128:2   191552191567  0          infinite    gpu:geforce_rtx_2080_ti:10v100:4                 cryoem-gpu[11-15]                nu-gpu02                               CPU_GEN:SKX,CPU_SKU:51184110,CPU_FRQ:2.30GHz10GHz,GPU_GEN:TURVLT,GPU_SKU:RTX2080TIV100,GPU_MEM:11GB32GB,GPU_CC:7.50
cryoemneutrino     2  1     0/12/0/12        2:128:2   257336191567  0          infinite    gpu:geforce_gtx_1080_ti:10v100:4(S:0-1)            cryoem-gpu01              nu-gpu[01,03]               CPU           CPU_GEN:HSWSKX,CPU_SKU:E5-2670v34110,CPU_FRQ:2.30GHz10GHz,GPU_GEN:PSCVLT,GPU_SKU:GTX1080TIV100,GPU_MEM:11GB32GB,GPU_CC:67.10
cryoem       18     0/1/07/18        2:12:2  191552257336  0          infinite    gpu:geforce_rtxgtx_20801080_ti:10(S:0)            cryoem-gpu10     cryoem-gpu[02-09]                      CPU_GEN:SKXHSW,CPU_SKU:5118E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:TURPSC,GPU_SKU:RTX2080TIGTX1080TI,GPU_MEM:11GB,GPU_CC:76.51
cryoem       15     0/1/0/15/5        2:812:2   191567191552  0          infinite    gpu:v100:4(S:0-1)  geforce_rtx_2080_ti:10                        cryoem-gpu50gpu[11-15]                            CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0

 

 

What Accounts are there?

TBA

Account NameDescriptionContact
cryoemCryoEM GroupYee
neutrinoNeutrino GroupKazu
cryoem-daq  
mlMachine Learning InitiativeDaniel
   
   
   
   
   
   

 

 

What Partitions are there?

TBA

 

Partition NamePurposeContact
sharedGeneral resources; this contains all shareable reasources, including GPUsYee / Daniel
cryoemCryoEM GPU serversYee
neutrinoNeutrino GPU serversKazu
suncat  
hps  
fermi  
   
   
   

 

 

Help! My Job takes a long time before it starts!

This is often due to limited resources. The simplest way is to request less CPU (-N) or less memory for your Job. However, this will also likely increase the amount of time that you need for the Job to complete. Note that perfect scaling is often very difficult (ie using 16 CPUs will run twice as fast as 8 CPUs), so it may be beneficial to submit many smaller Jobs where possible. You can also set the --time option to specify that your job will only run upto that amount of time so that the scheduler can better fit your job in.

The more expensive option is to buy more hardware to SDF and have it added to your group/teams Partition.

 

CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
cryoem       1     0/1/0/1        2:12:2  257336  0          infinite    gpu:geforce_gtx_1080_ti:10(S:0)            cryoem-gpu01                           CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1
cryoem       1     0/1/0/1        2:12:2  191552  0          infinite    gpu:geforce_rtx_2080_ti:10(S:0)            cryoem-gpu10                           CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5
cryoem       1     0/1/0/1        2:8:2   191567  0          infinite    gpu:v100:4(S:0-1)                          cryoem-gpu50                           CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0

 

How can I request a GPU with certain features and or memory?

TBA... something about using Constraints. Maybe get the gres for gpu memory working.

What Accounts are there?

Accounts are used to allow us to track, monitor and report on usage of SDF resources. As such, users who are members of stakeholders of SDF hardware, should use their relevant Account to charge their jobs against. We do not associate any monetary value to Accounts currently, but we do require all Jobs to be charged against an Account.

 

Account NameDescriptionContact
sharedEveryoneYee
cryoemCryoEM GroupYee
neutrinoNeutrino GroupKazu
cryoem-daqCryoEM data acquitisionYee
mlMachine Learning InitiativeDaniel
suncatSUNCAT GroupJohannes
hpsHPS Group

Omar

atlasATLAS GroupYee/Wei
LCLSLCLS GroupWilko

 

What Partitions are there?

Partitions define a grouping of machines. In our use case the grouping to refer to science and engineering groups who have purchased servers for the SDF. We do this such that members (or associates) of those groups can have priority access to their hardware. Whilst we give everyone access to all hardware, by default, users who belong to groups who do not own any stake in SDF will have lower priority access and use of stakeholder's resources.

Partition NamePurposeContact
sharedGeneral resources; this contains all shareable reasources, including GPUsYee
mlMachine Learning Initiative GPU serversDaniel / Yee
cryoemCryoEM GPU serversYee
neutrinoNeutrino GPU serversKazu
suncatSUNCAT AMD Rome ServersJohannes
hpsHPS AMD Rome ServersOmar
fermiFermi (LAT) AMD Rome ServersRichard
atlasATLAS GPU ServersYee / Wei
lclsLCLS AMD Rome ServersWilko

 

Help! My Job takes a long time before it starts!

This is often due to limited resources. The simplest way is to request less CPU (-N) or less memory for your Job. However, this will also likely increase the amount of time that you need for the Job to complete. Note that perfect scaling is often very difficult (ie using 16 CPUs will run twice as fast as 8 CPUs), so it may be beneficial to submit many smaller Jobs where possible. You can also set the --time option to specify that your job will only run upto that amount of time so that the scheduler can better fit your job in.

The more expensive option is to buy more hardware to SDF and have it added to your group/teams Partition.

You can also make use of the Scavenger QoS such that your job may run on any available resources available at SLAC. This, however, has the disadvantage that should the owners of the hardware that your job runs on requires its resources, your may will be terminated (preempted) - possibly before it has completed.

 

What is QoS?

A Quality of Service for a job defines restrictions on how a job is ran. In relation to an Allocation, a user may preempt, or be preempted by other job with a 'higher' QoS. We define 2 levels of QoS:

scavenger: Everyone has access to all resources, however it is ran with the lowest priority and will be terminated if another job with a higher priority needs it

normal: Standard QoS for owners of hardware; jobs will (attempt) to run til completion and will not be preempted. normal jobs therefore will preempt scavenger jobs.

Scavenger QoS is useful if you have jobs that may be resumed (checkpointed) and if there are available resources available (ie owners are not using all of their resources).

You may submit to multiple Partition with the same QoS level:

Code Block
#!/bin/bash
#SBATCH --account=cryoem
#SBATCH --partition=cryoem,shared
#SBATCH --qos=scavenger

In the above example, a cryoem user is charging against their Account cryoem; she is willing to run the job whereever available (the use of the cryoem Partition is kinda moot as the cryoem nodes are a subset of the Shared Partition anyway).

is it possible to define multiple? ie cryoem with normal + shared with scavenger?

How can I restrict/contraint which servers to run my Job on?

...

We are openly investigating additional Features to add. Comments and suggestions welcome.

Documentation PENDING.

Possibly add: GPU_DRV, OS_VER, OS_TYPE

...