Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Partitions

Partition is the term used by Slurm to reference queues.  Depending on your job requirements, select the appropriate partition:

...

The * following the name means default partition (queue) is anagpu

Job Submission

There are 2 ways to submit a job on the cluster. The main way is by using the sbatch command to take full advantage of the computing power, and the other is to submit an interactive job. 

Sbatch

The commands specified in the script file will be ran on the first available compute node that fits the resources requested.

...

Code Block
titlesbatch
[omarq@psslurm conda 06:10:08]cat tst_script 
#!/bin/bash
#
#SBATCH --job-name=‘name’ # Job name for allocation
#SBATCH --output=‘filename’ # File to which STDOUT will be written, %j inserts jobid
#SBATCH --error=‘filename’ # File to which STDERR will be written, %j inserts jobid
#SBATCH --partition=anagpu # Partition/Queue to submit job
#SBATCH --gres=gpu:1080ti:1 # Number of GPUs
#SBATCH --ntask=8  # Total number of tasks
#SBATH --tasks-per-node=4 # Number of tasks per node
#SBATCH --mail-user='username'@slac.stanford.edu # Receive e-mail from slurm
#SBATCH --mail-type=ALL # Type of e-mail from slurm; other options are: Error, Info.
#
srun -l hostname
srun python ExampleMultipleChaperones.py


[omarq@psslurm conda 06:10:11]sbatch tst_script 
Submitted batch job 187

Srun

The srun command gets control of a node to run jobs interactively.  These can be useful for data exploration and significant software development.

...

Code Block
languagebash
titlesrun
psslurm conda 07:29:39 srun -N2 -n4 hello.mpi 
Process 0 on psanagpu110 out of 1
Process 0 on psanagpu110 out of 1
Process 0 on psanagpu113 out of 1
Process 0 on psanagpu113 out of 1

Monitoring

To check that jobs that exist on the system use the squeue command:

...