Page History

Below please find the wishlist and questions the main SLAC batch system users and administrators posed.

What kind of support (transition/longterm) do we get for the quoted price?
How much downtime do we incur if we need to "restart the system"
What sort of activities require such a "restart" (e.g. creating a new queue?)
Do large numbers of short jobs (about 1 minute) cause problems?
Please provide additional information about support for virtualization, and any plans for future enhancements to that support.
What would be involved in adding support for Mac OSX or Windows? Would that be considered normal "support"?
Where are the bottlenecks in the system likely to occur? For example, can a user stress the system by repeatedly and frequently querying job stats?

Automatic job preemption/suspend/resume?
Support for multiple-levels of job preemption (e.g. 3-queue hierarchy)?
Job environment propagation (including limits like "stacksize")?
Subgroup-specific priority calculation (queue-specific priority formula)?
Capability to delegate subgroup administration privileges (adjust job priorities, suspend, resume, kill) to subgroup administrators?
Cross-queue fairshare (with cpu-speed weighting)?
CPU advanced reservations for MPI?
GPU support?
Ability to submit jobs to hosts where we don't have accounts/home-directories?
Avoid bad behavior when MPI head node reboots: slave node processes get "forgotten" ?
How well does the system scale?
- Number of cores, queues, queued and running jobs?

...

Versions Compared