Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
This page contains information in no particular order. If a lot more information is added, one should think about organizing it.

SLURM auto-completion tool

For anyone using Slurm tool often, the following utilities is really helpful: https://github.com/SchedMD/slurm/tree/master/contribs/slurm_completion_help

sinfo

See what jobs are in the queue

Code Block
languagebash
squeue
squeue -u <username>
squeue --reservation <reservation_name>

Detailed information about a specific running or recent job

Code Block
scontrol show jobid -dd <jobID>

(scontrol does not show information about jobs that have completed more than a few minutes ago)


For detailed reporting and stats on jobs, use sacct, e.g.  getting all jobs for user <USER> that started after starttime (e.g.: 2024-06-15):

Code Block
export FMT="reservation,jobid,jobname,User,reqcpus,ntasks,reqmem,averss,maxrss,elapsed,state%20,exitcode,Submit,Start,End,Account%17,Partition,AveCpu,NodeList%30 --unit=M"
sacct --format=${FMT} -u <USER>  --starttime 2024-06-15 

or for specific job(s) and/or account(s) using additional format options

Code Block
sacct -a -j <JOBID> -A <ACCOUNT> -o JobID,JobName,Partition,Account%18,AllocCPUS,Nodelist%24,NNodes,start,elapsed,workdir%60,submitline%160

or for specific user and account with some additional format options to compare runtimes ("elapsed") and resources between similar jobs

Code Block
sacct -a -u <USER> -A <ACCOUNT> -o JobID,JobName,Partition,Account%18,AllocCPUS,Nodelist%24,NNodes,AveRSS,MaxRSS,AveDiskRead,start,elapsed,submitline%160

Show all format options with `-e`

Code Block
$ sacct -e
Account             AdminComment        AllocCPUS           AllocNodes         
AllocTRES           AssocID             AveCPU              AveCPUFreq         
AveDiskRead         AveDiskWrite        AvePages            AveRSS             
AveVMSize           BlockID             Cluster             Comment            
Constraints         Container           ConsumedEnergy      ConsumedEnergyRaw  
CPUTime             CPUTimeRAW          DBIndex             DerivedExitCode    
Elapsed             ElapsedRaw          Eligible            End                
ExitCode            Flags               GID                 Group              
JobID               JobIDRaw            JobName             Layout             
MaxDiskRead         MaxDiskReadNode     MaxDiskReadTask     MaxDiskWrite       
MaxDiskWriteNode    MaxDiskWriteTask    MaxPages            MaxPagesNode       
MaxPagesTask        MaxRSS              MaxRSSNode          MaxRSSTask         
MaxVMSize           MaxVMSizeNode       MaxVMSizeTask       McsLabel           
MinCPU              MinCPUNode          MinCPUTask          NCPUS              
NNodes              NodeList            NTasks              Priority           
Partition           QOS                 QOSRAW              Reason             
ReqCPUFreq          ReqCPUFreqMin       ReqCPUFreqMax       ReqCPUFreqGov      
ReqCPUS             ReqMem              ReqNodes            ReqTRES            
Reservation         ReservationId       Reserved            ResvCPU            
ResvCPURAW          Start               State               Submit             
SubmitLine          Suspended           SystemCPU           SystemComment      
Timelimit           TimelimitRaw        TotalCPU            TRESUsageInAve     
TRESUsageInMax      TRESUsageInMaxNode  TRESUsageInMaxTask  TRESUsageInMin     
TRESUsageInMinNode  TRESUsageInMinTask  TRESUsageInTot      TRESUsageOutAve    
TRESUsageOutMax     TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin    
TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot     UID                
User                UserCPU             WCKey               WCKeyID            
WorkDir            


Get information about current reservation

Code Block
scontrol show res

User and experiment accounts' associations

Code Block
languagebash
sacctmgr show associations users=espov format=cluster,account%25,partition # list account that the user belongs to. %25 make the column larger so that the full account name is displayed.
sacctmgr list associations -p account=lcls:xpp1234 # list accounts associated with xpp1234 format=user,account%25,partition

The "format" argument can be modified to see more details. Remove it to see all (can be messy).

Partition and node information

 sinfo is used to view partition and node information for a system running Slurm. 

...

sinfo -o "%n %C"  -n sdfmilan[021-022,040,202-204,210-213,226,232]
HOSTNAMES CPUS(A/I/O/T)
sdfmilan021 120/8/0/128
sdfmilan022 45/83/0/128
sdfmilan040 8/120/0/128
sdfmilan202 116/12/0/128
sdfmilan203 120/8/0/128
sdfmilan204 120/8/0/128
sdfmilan210 120/8/0/128
sdfmilan211 113/15/0/128
sdfmilan212 105/23/0/128
sdfmilan213 104/24/0/128
sdfmilan226 9/119/0/128
sdfmilan232 7/121/0/128

scontrol, sacctmgr

scontrol is used to view or modify Slurm configuration including: job, job step, node, partition, reservation, and overall system configuration. Most of the commands can only be executed by user root or an Administrator.

  • Detail job information: scontrol show jobid -dd <jobID>
  • Show reservation: scontrol show res

sacctmgr is used to deal with accounts, assocations and users (format can be modified at will. Remove it to see all):

...

Priorities

Show priorities for an account: sacctmgr list associations -p accounts=<accounts>

...