You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Partitions

Partition is the term used by Slurm to reference queues.  Depending on your job requirements, select the appropriate partition:

PartitionNodesDataCores per NodeMem per Node (GB)GPU TypeTime Limit (hr)PriorityCommentsLocation
anagpu

16

ALL 128NVIDIA GTX 1080Ti48  B054 (SRCF)
psanagpu(101 - 106)12
psanagpu(107 - 116)16
*anabatch      Default  

anagpu:This 16 node partition is for individuals wishing to use GPU resources

The information can be provided by the sinfo command:

sinfo
psslurm ~ # sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST 
anagpu* up infinite 1 drain psanagpu107 
anagpu* up infinite 6 idle psanagpu[101-104,110,113]

The * following the name means default partition (queue) is anagpu

Job Submission

There are 2 ways to submit a job on the cluster. The main way is by using the sbatch command to take full advantage of the computing power, and the other is to submit an interactive job.

Sbatch

The commands specified in the script file will be ran on the first available compute node that fits the resources requested.

The following is a sample submission script (tst_script):

sbatch
[omarq@psslurm conda 06:10:08]cat tst_script 
#!/bin/bash
#
#SBATCH --job-name=‘name’ # Job name for allocation
#SBATCH --output=‘filename’ # File to which STDOUT will be written, %j inserts jobid
#SBATCH --error=‘filename’ # File to which STDERR will be written, %j inserts jobid
#SBATCH --partition=anagpu # Partition/Queue to submit job
#SBATCH --gres=gpu:1080ti:1 # Number of GPUs
#SBATCH --nodes=1  # number of nodes.
#SBATCH --mail-user='username'@slac.stanford.edu # Receive e-mail from slurm
#SBATCH --mail-type=ALL # Type of e-mail from slurm; other options are: Error, Info.
#
srun -l hostname
srun python ExampleMultipleChaperones.py


[omarq@psslurm conda 06:10:11]sbatch tst_script 
Submitted batch job 200


sbatch tst_scirpt


Prior to submitting a batch job it is recommended to check that jobs that exist on the system by using the squeue command:

squeue
psslurm ~ # squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
               187    anagpu ExampleM    omarq  R       0:04      1 psanagpu110 

The ST (job state) field shows that jobid 187 is currently running (R).

  • No labels