Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
...
a partition and slurm account should be specified when submitting jobs. The slurm account would be lcls:<experiment-name> e.g. lcls:xpp123456. The account is used for keeping track of resource usage per experiment
Code Block |
---|
% sbatch -p milano --account lcls:xpp1234 ........ |
Number of cores
Warning |
---|
Some cores of a milano batch node are exclusively used for file IO (WekaFS). Therefore although a milano node has 128 core only 120 can be used. |
Environment Varibales: sbatch option can also be set via environment variables which is useful if a program is executed that calls sbatch and doesn't allow to set options on the command line e.g.:
Code Block | ||
---|---|---|
| ||
% SLURM_ACCOUNT=lcls:experiment executable-to-run [args] or % export SLURM_ACCOUNT=lcls:experiment % executable-to-run [args] |
The environment variables are SBATCH_MEM_PER_NODE (--mem), SLURM_ACCOUNT(--account) and SBATCH_TIMELIMIT (--time). The order arguments are selected is: command line, environment and withing sbatch script.
Currently we are reserving nodes for experiments that need real-time processing. This is an example of parameters that should be added to a slurm submission script:
Code Block |
---|
#SBATCH --reservation=lcls:onshift
#SBATCH --account=lcls:tmoc00221 |
You must be a member of the experiment's slurm account which you can check with a command like this:
Code Block |
---|
sacctmgr list associations -p account=lcls:tmoc00221 |
and your experiment must have been added to the reservation permissions list. This can be checked with this command:
Code Block |
---|
(ps-4.6.1) scontrol show res lcls:onshift
ReservationName=lcls:onshift StartTime=2023-08-02T10:39:06 EndTime=2023-12-31T00:00:00 Duration=150-14:20:54
Nodes=sdfmilan[001,014,022,033,047,059,062,101,127,221-222,226] NodeCnt=12 CoreCnt=1536 Features=(null) PartitionName=milano Flags=IGNORE_JOBS
TRES=cpu=1536
Users=(null) Groups=(null) Accounts=lcls:tmoc00221,lcls:xppl1001021,lcls:cxil1022721,lcls:mecl1011021,lcls:xcsl1004621,lcls:mfxx1004121,lcls:mfxp1001121 Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
(ps-4.6.1)
|
You can use the reservation only if your experiment is on-shift or off-shift. If you think slurm settings are incorrect for your experiment email pcds-datamgt-l@slac.stanford.edu.
For running mpi jobs on the S3DF slurm cluster mpirun (or related tools) should be used. Using srun to run mpi4py will fail as it requires pmix which is not supported by the Slurm version. Example psana MPI submission scripts are here: Submitting SLURM Batch Jobs
...