...
How do I use Slurm?
Note |
---|
We are still testing the best way to deploy Slurm at SLAC, and as such, some of the examples and instructions that follow may be subject to change. If you have any opinions and or suggestions, we would love to hear from. Slurm is installed on a limited number of hosts currently. We recommend logging on using ssh via a terminal: ssh ocio-gpu01.slac.stanford.edu In order to get the slurm binaries available, you will need to use modules to add the slurm binaries into your path environment: module load slurm We will likely have the above command automatically run, so it may not be necessary later. |
Common commands are:
srun | request a quick job to be ran - eg an interactive terminal |
sbatch | submit a batch job to run |
squeue | show jobs |
scancel | cancel a job |
scontrol show job | show job details |
sstat | show job usage details |
sacctmgr | manage Associations |
How can I get an Interactive Terminal?
...
Code Block |
---|
module load slurm srun -A myaccountshared -p mypartition1shared -n 1 --pty /bin/bash |
This will then execute /bin/bash on a (scheduled) server in the Partition mypartition1 shared and charge against Account myaccountshared. This will request a single CPU, launch a pseudo terminal (pty) where bash will runwill run. You may be provided different Accounts and Partitions and should use them when possible.
Note that when you 'exit' the interactive session, it will relinquish the resources for someone else to use. This also means that if your terminal is disconnected (you turn your laptop off, loose network etc), then the Job will also terminate (similar to ssh).
...
Warning |
---|
If your interactive request doesn't immediately find resources, it will currently not actually return you a pty - even though the job actually does run. This results in what looks like a hanging process. We are investigating... salloc first? |
How do I submit a Batch Job?
...
Code Block |
---|
scancel <jobid> |
...
How can I request GPUs?
You can use the --gpus to specify gpus for your jobs: Using a number will request the number of any gpu that is available (what you get depends upon what your Account/Association is and what is available when you request it). You can also specify the type of gpus by prefixing the number with the model name. eg
Code Block |
---|
# request single gpu srun -A myaccountshared -p mypartition1[,mypartition2]shared -n 1 --gpus 1 --pty /bin/bash # request a gtx 1080 gpu srun -A myaccountshared -p mypartition1[,mypartition2]shared -n 1 --gpus geforce_gtx_1080_ti:1 --pty /bin/bash # request a gtx 2080 gpu srun -A myaccountshared -p mypartition1[,mypartition2]shared -n 1 --gpus geforce_rtx_2080_ti:1 --pty /bin/bash # request a v100 gpu srun -A myaccountshared -p mypartition1[,mypartition2]shared -n 1 --gpus v100:1 --pty /bin/bash |
How can I see what GPUs are available?
Code Block |
---|
# sinfo -o "%12P %5D %14F %7z %7m %10d %11l %42G %38N %f" PARTITION NODES NODES(A/I/O/T) S:C:T MEMORY TMP_DISK TIMELIMIT GRES NODELIST AVAIL_FEATURES shared* 1 0/1/0/1 2:8:2 191567 0 7-00:00:00 gpu:v100:4 nu-gpu02 CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0 shared* 8 0/1/7/8 2:12:2 257336 0 7-00:00:00 gpu:geforce_gtx_1080_ti:10 cryoem-gpu[02-09] CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1 shared* 14 0/0/14/14 2:12:2 191552 0 7-00:00:00 gpu:geforce_rtx_2080_ti:10 cryoem-gpu[11-15],ml-gpu[02-10] CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5 shared* 1 0/1/0/1 2:12:2 257336 0 7-00:00:00 gpu:geforce_gtx_1080_ti:10(S:0) cryoem-gpu01 CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1 shared* 3 0/3/0/3 2:12:2 191552 0 7-00:00:00 gpu:geforce_rtx_2080_ti:10(S:0) cryoem-gpu10,ml-gpu[01,11] CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5 shared* 3 0/3/0/3 2:8:2 191567 0 7-00:00:00 gpu:v100:4(S:0-1) cryoem-gpu50,nu-gpu[01,03] CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0 shared* 1 0/1/0/1 2:12:2 257330 0 7-00:00:00 gpu:geforce_gtx_1080_ti:8(S:0),gpu:titan_x hep-gpu01 CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1 ml 9 0/0/9/9 2:12:2 191552 0 infinite gpu:geforce_rtx_2080_ti:10 ml-gpu[02-10] CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5 ml 2 0/2/0/2 2:12:2 191552 0 infinite gpu:geforce_rtx_2080_ti:10(S:0) ml-gpu[01,11] CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5 neutrino 1 0/1/0/1 2:8:2 191567 0 infinite gpu:v100:4 nu-gpu02 CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0 neutrino 2 0/2/0/2 2:8:2 191567 0 infinite gpu:v100:4(S:0-1) nu-gpu[01,03] CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0 cryoem 8 0/1/7/8 2:12:2 257336 0 infinite gpu:geforce_gtx_1080_ti:10 cryoem-gpu[02-09] CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1 cryoem 5 0/0/5/5 2:12:2 191552 0 infinite gpu:geforce_rtx_2080_ti:10 cryoem-gpu[11-15] CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5 cryoem 1 0/1/0/1 2:12:2 257336 0 infinite gpu:geforce_gtx_1080_ti:10(S:0) cryoem-gpu01 CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1 cryoem 1 0/1/0/1 2:12:2 191552 0 infinite gpu:geforce_rtx_2080_ti:10(S:0) cryoem-gpu10 CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5 cryoem 1 0/1/0/1 2:8:2 191567 0 infinite gpu:v100:4(S:0-1) cryoem-gpu50 CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0 |
How can I request a GPU with certain features and or memory?
TBA... something about using Constraints. Maybe get the gres for gpu memory working.
What Accounts are there?
Accounts are used to allow us to track, monitor and report on usage of SDF resources. As such, users who are members of stakeholders of SDF hardware, should use their relevant Account to charge their jobs against. We do not associate any monetary value to Accounts currently, but we do require all Jobs to be charged against an Account.
...