...
Info |
---|
This page contains information in no particular order. If a lot more information is added, one should think about organizing it. |
For anyone using Slurm tool often, the following utilities is really helpful: https://github.com/SchedMD/slurm/tree/master/contribs/slurm_completion_help
Code Block | ||
---|---|---|
| ||
squeue
squeue -u <username>
squeue --reservation <reservation_name> |
Code Block |
---|
scontrol show jobid -dd <jobID> |
(scontrol does not show information about jobs that have completed more than a few minutes ago)
For detailed reporting and stats on jobs, use sacct, e.g. getting all jobs for user <USER> that started after starttime (e.g.: 2024-06-15):
Code Block |
---|
export FMT="reservation,jobid,jobname,User,reqcpus,ntasks,reqmem,averss,maxrss,elapsed,state%20,exitcode,Submit,Start,End,Account%17,Partition,AveCpu,NodeList%30 --unit=M"
sacct --format=${FMT} -u <USER> --starttime 2024-06-15 |
or for specific job(s) and/or account(s) using additional format options
Code Block |
---|
sacct -a -j <JOBID> -A <ACCOUNT> -o JobID,JobName,Partition,Account%18,AllocCPUS,Nodelist%24,NNodes,start,elapsed,workdir%60,submitline%160 |
or for specific user and account with some additional format options to compare runtimes ("elapsed") and resources between similar jobs
Code Block |
---|
sacct -a -u <USER> -A <ACCOUNT> -o JobID,JobName,Partition,Account%18,AllocCPUS,Nodelist%24,NNodes,AveRSS,MaxRSS,AveDiskRead,start,elapsed,submitline%160 |
Show all format options with `-e`
Code Block |
---|
$ sacct -e
Account AdminComment AllocCPUS AllocNodes
AllocTRES AssocID AveCPU AveCPUFreq
AveDiskRead AveDiskWrite AvePages AveRSS
AveVMSize BlockID Cluster Comment
Constraints Container ConsumedEnergy ConsumedEnergyRaw
CPUTime CPUTimeRAW DBIndex DerivedExitCode
Elapsed ElapsedRaw Eligible End
ExitCode Flags GID Group
JobID JobIDRaw JobName Layout
MaxDiskRead MaxDiskReadNode MaxDiskReadTask MaxDiskWrite
MaxDiskWriteNode MaxDiskWriteTask MaxPages MaxPagesNode
MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask
MaxVMSize MaxVMSizeNode MaxVMSizeTask McsLabel
MinCPU MinCPUNode MinCPUTask NCPUS
NNodes NodeList NTasks Priority
Partition QOS QOSRAW Reason
ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov
ReqCPUS ReqMem ReqNodes ReqTRES
Reservation ReservationId Reserved ResvCPU
ResvCPURAW Start State Submit
SubmitLine Suspended SystemCPU SystemComment
Timelimit TimelimitRaw TotalCPU TRESUsageInAve
TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin
TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve
TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin
TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot UID
User UserCPU WCKey WCKeyID
WorkDir
|
Code Block |
---|
scontrol show res |
Code Block | ||
---|---|---|
| ||
sacctmgr show associations users=espov format=cluster,account%25,partition # list account that the user belongs to. %25 make the column larger so that the full account name is displayed.
sacctmgr list associations -p account=lcls:xpp1234 # list accounts associated with xpp1234 format=user,account%25,partition |
The "format" argument can be modified to see more details. Remove it to see all (can be messy).
sinfo is used to view partition and node information for a system running Slurm.
...
|
scontrol is used to view or modify Slurm configuration including: job, job step, node, partition, reservation, and overall system configuration. Most of the commands can only be executed by user root or an Administrator.
scontrol show jobid -dd <jobID>
scontrol show res
sacctmgr is used to deal with accounts, assocations and users (format can be modified at will. Remove it to see all):
...
Show priorities for an account: sacctmgr list associations -p accounts=<accounts>
...