Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

concrete example of yee's proposal:
4 expts: exp1 (on-shift), exp2 (off-shift), exp3 (normal), exp4 (normal)
- set exp1 to have high-priority for queue-placement
  o implemented by setting QOS=on-shift for exp1
- have a sequence of job-QOS's: preemptable, normal, off-shift, on-shift
  o could have "normal" jobs be preemptable, although could create
    issues with sharing with others like rubin
  o reuse murali's stuff to automatically get on-shift/off-shift settings
    - uses the experiment runs within calendar URAWI start/end time
    - can this handle last-minute changes?  could add some buffer at
      the edges or manually override (sub-czars could do this?)
  o setting "on-shift" setting for a repo is a permission, the
    on-shift expts could specify on-shift or normal QOS in
    job submission script
    - there can be a default QOS
    - if a job is set to a non-permissible QOS job will
      currently fail (Silke would like it to switch to lower QOS
      automatically)
- we will try suspend preemption within the milan partition
  o ****** memory ****** is a worrisome issue, but nodes will get larger
    SSDs in 3 months (several TB)
    - expect 2-10GB/s maybe limited by kernel, so 512GB would take 1.5min
      but happens in parallel, so hopefully OK
- each experiment repo would set an allocation (a hard limit!) that
  would limit the number of cores (enables multiple on-shift expts)
- beamline staff could tweak repo allocations?  may need sub-czar
  (operator) privileges


Preemption of non-LCLS jobs: (e.g. Rubin)

  • coact will be set set up so that on-shift lcls jobs can preempt normal lcls jobs, but not normal Rubin jobs.
    • what if Milan queue is filled up with Rubin jobs?
      • the coact limits will prevent this: currently they could use up 72 Milans (or 144 half-milans) out of 160.