...
Advice on running (large memory jobs) in batch:
For large memory jobs:
(e.g. if you get slurmstepd: error: Detected 1 oom-kill event(s) in StepId=53170609.batch. Some of your processes may have been killed by the cgroup out-of-memory handler):
You can ask for up to 480G on a single milano node -- which is equivalent to asking for exclusive use of that node - so the more mem you request, the longer it may take to schedule the job.
...