Page History
...
- make the collection software long-lived and multi-platform
- cpo currently leaning this way: "fewer moving parts"
- suggested naming: "collection"
- add a layer on top of control.py (merged collection/control code)
...
- resources managed: node/kcu-variant/pgplane/cores
- a service: two cnf files can have same resources but would be prevented from starting if another cnf already has it allocated.
- make sure right process ("opal") runs on node with right firmware
- allocation request is made before the processes are launched
- maybe a new "resource check" transition in the state machine before processes are launched?
- implementation possibility: parse the cnf file to determine node/kcu-variant/pgplane/cores
- cnf: "cmp001: drp -l 0x5 /dev/datadev_0 -D opal" would be translated to the resource request (need to add "cores")
- would tweak cnf syntax to make easier, but cpo believes we should leave procmgr roughly as-is
- procmgr is a static configuration, if we use it the resource manager would essentially be checking for conflicts and not for dynamic allocation
- which resources are
- allocated
- don't want to start the process until we know there is node/pgplane for it. an ordering problem.
- deallocated
- crashed
- allocated
- killPartition
- try to keep the idea of "resource" generic. today: node/kcu/pgplane but could add more resources in the future (ram, network I/O). like slurm.
- there would be limits on node resources: e.g. 2 kcu's, 8 pgplanes, 64 cores that would be enforce by the resource manager.
...
Overview
Content Tools