Page History
...
- job management: support restart, scancel with signal? support restart an individual process, show complete completed jobs. Do daq processes have states?
- check for existing user/platform. Note: --dependency flag can be used to check for unique jobname but jobs are still queued. Better to exit if the same jobname is found. See the unique format here: ls ~tmoopr/.psdaq/. Show details of conflicting jobs. Job comment (right now is unique) is used to check for existing jobs for ALL users.
procstat like unbuffered output style. Maybe https://portal.supercomputing.wales/index.php/index/slurm/interactive-use-job-arrays/x11-gui-forwarding/. Note: you can x11-forward using for example,
Code Block language bash title slurm.conf salloc -n1 --x11 srun -n1 --x11 xterm -hold -e "python test_run.py"
This --x11 in srun also works with sbatch when $DISPLAY is exported correctly. See lcls2/psdaq/psdaq/slurm for how it's implemented.
- check if slurm avoids weka cores. It looks like slurm tries to avoid weka cores automatically.
- Multi-threading process. This was possible in the past but possibly with recent changes, it's not working.
- How to identify what a resource is for drp (Bandwidth/ Memory/ physical cores for each process).
- Documentation/How to
- Testing goals: TMO, RIX, other long-live processes, high rate 71kHz
- slurm.conf: configless setup still not complete, set MaxTime=UNLIMITED, no limit on memory, no hyper-threading, make sure slurm version is consistent
Note on multi-user access:
...
Overview
Content Tools