Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I thought that maybe the first thing to try would be to figure out how to launch a process that brings up a GUI, e.g., groupca or xpmpva, or maybe even start simpler with xeyes or xclock.  The main idea was to test the ability of telling slurm that the process that you want to run is an X11 application, which I read in the docs it can do.
The next thing might be to try to bring up the DAQ using slurm and thus thinking about what the slurm description file would look like.  Can we use something like the .cnf?  Can we automatically convert the .cnfs to whatever slurm requires?  Or do we need to start from scratch?  For this step I’m thinking we would still have to specify everything, like the node each process runs on.
The last thing I looked into a little bit was the idea of defining resources to slurm.  For this I thought I’d need some setup to try things out on, which resulted in Jira ECS-4017 (I don’t think anything was done though).  Chris Ford was also working on this project and he suggested setting up a virtual machine with a private slurm setup I could tinker with (I haven’t figured out how to do that, yet).  Anyway, the idea of the resources is that based on what each DRP needs (e.g., detector type, KCU firmware type, a GPU, X11, etc.), resources would be defined to slurm so that when you launch a DAQ, it would allocate the nodes according to the resources needed and start the processes on them.  Perhaps at some point in the future we could even have it modify the connections in the BOS to connect a detector to an available host that has the right KCU firmware, thus making RIX hosts available to TMO and vice versa.
I think that’s about as far as I got.  Let me know if you have questions.  I have to take Rachel to a doctor’s appointment at 1 so I think I’ll be out until 3 or so.  We can talk later, if you prefer.  I’ll take a look at the link as soon as I can.  Feel free to add the above to that if you think it would be helpful.

Configless setup

To share the slurm.conf file on all compute nodes, we can modify the slurm.conf file on the control node with following parameter:

Code Block
languagebash
titleslurm.conf
SlurmctldParameters=enable_configless

Run scontrol reconfig to enable the change. On each compute node, add the following argument to slurmd in sysconfig

Code Block
languagebash
title/etc/sysconfig/slurmd
SLURMD_OPTIONS="--conf-server psslurm-drp"

Restart the compute node with 

Code Block
languagebash
systemctl restart slurmd

Slurm Feature (Constraint)

...