Notes from mtg with Thorsten, Murali, Wilko, Yee, Silke, Valerio on Jan 19, 2024

yee's two points:
(1) need to limit the number of non-preemptable jobs, in particular the
    single-core jobs which can scatter across nodes
(2) how do we preempt the jobs and limit the resources used by preemptable
    jobs

yee's proposal:
(1) limit every facilities use of non-preemptable jobs to what
    they have purchased (e.g. LCLS is limited to 88 nodes of
    non-preemptable jobs, or half of 176 nodes)
    - coact can help with this
    - is this a hard limit, or does it have a long time-constant?
      o a hard limit for a repo (enforced by slurm), but not at the
        multiple-repo level
        (multi-repo is not enforced by slurm, but coact so a long
     time-constant to see if we've crossed the 88 node
     threshold)
(2) LCLS defines the order of preemption of experiment repos
(add-on) could extend this to support cross-facility preemption

concrete example of yee's proposal:
4 expts: exp1 (on-shift), exp2 (off-shift), exp3 (normal), exp4 (normal)
- set exp1 to have high-priority for queue-placement
  o implemented by setting QOS=on-shift for exp1
- have a sequence of job-QOS's: preemptable, normal, off-shift, on-shift
  o could have "normal" jobs be preemptable, although could create
    issues with sharing with others like rubin
  o reuse murali's stuff to automatically get on-shift/off-shift settings
    - uses the experiment runs within calendar URAWI start/end time
    - can this handle last-minute changes?  could add some buffer at
      the edges or manually override (sub-czars could do this?)
  o setting "on-shift" setting for a repo is a permission, the
    on-shift expts could specify on-shift or normal QOS in
    job submission script
    - there can be a default QOS
    - if a job is set to a non-permissible QOS job will
      currently fail (Silke would like it to switch to lower QOS
      automatically)
- we will try suspend preemption within the milan partition
  o ****** memory ****** is a worrisome issue, but nodes will get larger
    SSDs in 3 months (several TB)
    - expect 2-10GB/s maybe limited by kernel, so 512GB would take 1.5min
      but happens in parallel, so hopefully OK
- each experiment repo would set an allocation (a hard limit!) that
  would limit the number of cores (enables multiple on-shift expts)
- beamline staff could tweak repo allocations?  may need sub-czar
  (operator) privileges


Preemption of non-LCLS jobs: (e.g. Rubin)

  • coact will be set set up so that on-shift lcls jobs can preempt normal lcls jobs, but not normal Rubin jobs.
    • what if Milan queue is filled up with Rubin jobs?
      • the coact limits will prevent this: currently they could use up 72 Milans (or 144 half-milans) out of 160.
  • No labels