Page History
...
concrete example of yee's proposal:
4 expts: exp1 (on-shift), exp2 (off-shift), exp3 (normal), exp4 (normal)
- set exp1 to have high-priority for queue-placement
o implemented by setting QOS=on-shift for exp1
- have a sequence of job-QOS's: preemptable, normal, off-shift, on-shift
o could have "normal" jobs be preemptable, although could create
issues with sharing with others like rubin
o reuse murali's stuff to automatically get on-shift/off-shift settings
- uses the experiment runs within calendar URAWI start/end time
- can this handle last-minute changes? could add some buffer at
the edges or manually override (sub-czars could do this?)
o setting "on-shift" setting for a repo is a permission, the
on-shift expts could specify on-shift or normal QOS in
job submission script
- there can be a default QOS
- if a job is set to a non-permissible QOS job will
currently fail (Silke would like it to switch to lower QOS
automatically)
- we will try suspend preemption within the milan partition
o ****** memory ****** is a worrisome issue, but nodes will get larger
SSDs in 3 months (several TB)
- expect 2-10GB/s maybe limited by kernel, so 512GB would take 1.5min
but happens in parallel, so hopefully OK
- each experiment repo would set an allocation (a hard limit!) that
would limit the number of cores (enables multiple on-shift expts)
- beamline staff could tweak repo allocations? may need sub-czar
(operator) privileges
Preemption of non-LCLS jobs: (e.g. Rubin)
- coact will be set set up so that on-shift lcls jobs can preempt normal lcls jobs, but not normal Rubin jobs.
- what if Milan queue is filled up with Rubin jobs?
- the coact limits will prevent this: currently they could use up 72 Milans (or 144 half-milans) out of 160.
- what if Milan queue is filled up with Rubin jobs?