Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. job management: support restart, scancel with signal? support restart an individual process, show complete completed jobs. Do daq processes have states? 
  2. check for existing user/platform. Note: --dependency flag can be used to check for unique jobname but jobs are still queued. Better to exit if the same jobname is found. See the unique format here: ls ~tmoopr/.psdaq/. Show details of conflicting jobs. Job comment (right now is unique) is used to check for existing jobs for ALL users. 
  3. procstat like unbuffered output style. Maybe https://portal.supercomputing.wales/index.php/index/slurm/interactive-use-job-arrays/x11-gui-forwarding/. Note: you can x11-forward using for example, 

    Code Block
    languagebash
    titleslurm.conf
    salloc -n1 --x11 srun -n1 --x11 xterm -hold -e "python test_run.py"

    This --x11 in srun also works with sbatch when $DISPLAY is exported correctly. See lcls2/psdaq/psdaq/slurm for how it's implemented.

  4. check if slurm avoids weka cores. It looks like slurm tries to avoid weka cores automatically.
  5. Multi-threading process. This was possible in the past but possibly with recent changes, it's not working. 
  6. How to identify what a resource is for drp (Bandwidth/ Memory/ physical cores for each process). 
  7. Documentation/How to
  8. Testing goals: TMO, RIX, other long-live processes, high rate 71kHz
  9. slurm.conf: configless setup still not complete, set MaxTime=UNLIMITED, no limit on memory, no hyper-threading, make sure slurm version is consistent

Note on multi-user access:

...