...
Partition | Nodes | Data | Cores per Node | Mem per Node (GB) | GPU Type | Time Limit (hr) | Priority | Comments | Location |
---|---|---|---|---|---|---|---|---|---|
anagpu | 16 | ALL | 128 | NVIDIA GTX 1080Ti | 48 | B054 (SRCF) | |||
psanagpu(101 - 106) | 12 | ||||||||
psanagpu(107 - 116) | 16 | ||||||||
*anabatch | Default |
anagpu:This is16 16 node partition is for individuals wishing to use GPU resources
...
The * following the name means default partition (queue) is anagpu
The main way to run jobs There are 2 ways to submit a job on the cluster. The main way is by using the sbatch command to take full advantage of the computing power, and the other is to submit an interactive job.
The commands specified in the script file will be ran on the first available compute node that fits the resources requested.
The following is a sample submission script (tst_script):
Code Block | ||
---|---|---|
| ||
[omarq@psslurm conda 06:10:08]cat tst_script
#!/bin/bash
#
#SBATCH --job-name=ânameâ # Job name for allocation
#SBATCH --output=âfilenameâ # File to which STDOUT will be written, %j inserts jobid
#SBATCH --error=âfilenameâ # File to which STDERR will be written, %j inserts jobid
#SBATCH --partition=anagpu # Partition/Queue to submit job
#SBATCH --gres=gpu:1080ti:1 # Number of GPUs
#SBATCH --nodes=1 # number of nodes.
#SBATCH --mail-user='username'@slac.stanford.edu # Receive e-mail from slurm
#SBATCH --mail-type=ALL # Type of e-mail from slurm; other options are: Error, Info.
#
srun -l hostname
srun python ExampleMultipleChaperones.py
[omarq@psslurm conda 06:10:11]sbatch tst_script
Submitted batch job 200
|
sbatch tst_scirpt
Prior to submitting a batch job it is recommended to check that jobs that exist on the system by using the squeue command:
Code Block | ||||
---|---|---|---|---|
| ||||
psslurm ~ # squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
187 anagpu ExampleM omarq R 0:04 1 psanagpu110
|
The ST (job state) field shows that jobid 187 is currently running (R).