Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • a partition and slurm account should be specified when submitting jobs. The slurm account would be lcls:<experiment-name> e.g. lcls:xpp123456. The account  is used for keeping track of resource usage per experiment

    Code Block
    % sbatch -p milano --account lcls:xpp1234 ........
  • Submitting jobs using the lcls account allows only to submit preemptable jobs and requires to specify: --qos preemptable . The lcls account is set by --account lcls or --account lcls:default  (the lcls name gets automatically translated to lcls:default by slurm, and s-commands will show the default one).
  • In the S3DF by default memory is limited to 4GB/core. Usually that is not an issue as processing jobs use many core (e.g. a job with 64 cores would request 256GB memory)
  • the memory limit will be enforced and your job will fail with an OUT_OF_MEMORY status
  • memory can be increased using the--memsbatch option (e.g.: --mem 16G, default unit is megabytes)
  • Default total run time is 1 day, the --time option allows to increase/decrease it.
  • Number of cores

    Warning

    Some cores of a milano batch node are exclusively used for file IO (WekaFS). Therefore although a milano node has 128 core only 120 can be used.
    submitting a task with  --nodes 1 --ntasks-per-node=128 would fail with:  Requested node configuration is not available

  • Environment Varibales: sbatch option can also be set via environment variables which is useful if a program is executed that calls sbatch and doesn't allow to set options on the command line e.g.:

    Code Block
    languagebash
    % SLURM_ACCOUNT=lcls:experiment  executable-to-run [args]
    or
    % export SLURM_ACCOUNT=lcls:experiment
    % executable-to-run [args]

    The environment variables are SBATCH_MEM_PER_NODE (--mem), SLURM_ACCOUNT(--account) and SBATCH_TIMELIMIT (--time). The order arguments are selected is: command line, environment and withing sbatch script. 

Using A Reservation For A Running Experiment

Currently we are reserving nodes for experiments that need real-time processing.  This is an example of parameters that should be added to a slurm submission script:

Code Block
#SBATCH --reservation=lcls:onshift
#SBATCH --account=lcls:tmoc00221

You must be a member of the experiment's slurm account which you can check with a command like this:

Code Block
sacctmgr list associations -p account=lcls:tmoc00221

and your experiment must have been added to the reservation permissions list.  This can be checked with this command:

Code Block
(ps-4.6.1) scontrol show res lcls:onshift
ReservationName=lcls:onshift StartTime=2023-08-02T10:39:06 EndTime=2023-12-31T00:00:00 Duration=150-14:20:54
   Nodes=sdfmilan[001,014,022,033,047,059,062,101,127,221-222,226] NodeCnt=12 CoreCnt=1536 Features=(null) PartitionName=milano Flags=IGNORE_JOBS
   TRES=cpu=1536
   Users=(null) Groups=(null) Accounts=lcls:tmoc00221,lcls:xppl1001021,lcls:cxil1022721,lcls:mecl1011021,lcls:xcsl1004621,lcls:mfxx1004121,lcls:mfxp1001121 Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
   MaxStartDelay=(null)
(ps-4.6.1) 

You can use the reservation only if your experiment is on-shift or off-shift.  If you think slurm settings are incorrect for your experiment email pcds-datamgt-l@slac.stanford.edu.

MPI and Slurm

For running mpi jobs on the S3DF slurm cluster mpirun (or related tools) should be used. Using srun to run mpi4py will fail as it requires pmix which is not supported by the Slurm version.  Example psana MPI submission scripts are here:  Submitting SLURM Batch Jobs

...