Red Hat Discussion

SLAC Discussion

To support a shared DRP. One process that understands all the resources that are in-use and prevents conflicts. Two possibilities:

make the collection software long-lived and multi-platform
- cpo currently leaning this way: "fewer moving parts"
- suggested naming: "collection"
add a layer on top of control.py (merged collection/control code)

...

resources managed: node/kcu-variant/pgplane/cores
a service: two cnf files can have same resources but would be prevented from starting if another cnf already has it allocated.
make sure right process ("opal") runs on node with right firmware
allocation request is made before the processes are launched
- maybe a new "resource check" transition in the state machine before processes are launched?
- implementation possibility: parse the cnf file to determine node/kcu-variant/pgplane/cores
  - cnf: "cmp001: drp -l 0x5 /dev/datadev_0 -D opal" would be translated to the resource request (need to add "cores")
  - would tweak cnf syntax to make easier, but cpo believes we should leave procmgr roughly as-is
  - procmgr is a static configuration, if we use it the resource manager would essentially be checking for conflicts and not for dynamic allocation
which resources are
- allocated
  - don't want to start the process until we know there is node/pgplane for it. an ordering problem.
- deallocated
- crashed
killPartition
try to keep the idea of "resource" generic. today: node/kcu/pgplane but could add more resources in the future (ram, network I/O). like slurm.
there would be limits on node resources: e.g. 2 kcu's, 8 pgplanes, 64 cores that would be enforce by the resource manager.

...

What Resources are Managed/Shared

Resource Dependency Graph

Introductory slides available here: ppt pdf

Dynamic Resource Allocation Discussion

Aug. 17, 2021 with caf, valmar, cpo, weaver, snelson

Simple-minded picture:

(straw man big-idea) cnf-authors (pcds-poc's and data-systems) write a .cnf file but leave node-name and pgp-lane(s) as "templated" parameters
- should we have a lookup that takes cnf-author-supplied trigger-rate and detector type (opal, epix) and computes a number of lanes? i.e. number-of-pgp-lanes not user-supplied, but computed (alternative is a "sanity check" on trigger-rate and number of lanes/nodes). in principle resources needed depend on computing/bandwidth, which makes it more complex.
- for simplicity: we could have detector types which have trigger-rate/computing-resources built into the detector-type name, e.g. epix10k2M_120Hz, epix10k2M_20kHz, epix10k2M_120Hz_high_computing_load. to start, only a "shorthand" for allocating resource: wouldn't enforce consistency with timing-system-configured-trigger-rate, but in principle could do that in the future (could use ts_120Hz category as a first attempt to enforce consistency).
- for ami/shmem the cnf-author would specify number of nodes, since there aren't specific patterns like there are with detectors
  - this number could be determined by the cnf-author with a "show resources" command
  - we should provide guidance for what ami resources are needed for a particular analysis (this is complex since scientists can do anything)
some node-names could be hard-coded (e.g. control.py)
- consider this for user-shmem, so it doesn't move too often. could have two different timescale for filling in cnf templated node-names: "slow" timescale for user-shmem nodes (e.g. once per experiment) and a "fast" timescale for detector nodes (e.g. every daq restart)
- could be useful for debugging hardware-specific problems
need a mechanism to indicate if this cnf line is opal/control/ami/etc.
- some .cnf files have loops and associated lists of nodes (e.g. for ami) which adds complexity. maybe could understand the dictionary that these loops generate
- some chance that with resource management we could eliminate the loops for the templated-cnf (the "loops" would be generated by resource manager)
how do scientists know where their shmem is running?
- we can provide a tool that tells the scientists where it is running, but we can't change it very often
- maybe we a mechanism to "pin" some resources (shouldn't be templated)
consider jinja for handling templated parameters?
dynamic allocation would require control of the BOS
- for the camlink-converter the BOS needs to manage both the timing/data fibers
- all other detectors just need the data fibers
- need to fix serious any firmware glitches that would happen when fibers are moved (hard)
- api's: REST, T1 or TL1, SNMP
feels like resources would be managed via a database
- for each node database would have a "detector type" and number of lanes that are free
question: with low-rate detectors we could squeeze many into one node (e.g. 120Hz epics/timing-system/pvadetectors)
- (heuristic: not precise) perhaps cnf-authors (e.g. pcds-poc, data-systems) would indicate "exclusive access", "as many you like", "somewhere in the middle". Maybe just a number-of-lanes to reserve? this is error prone, e.g. if the scientists change the trigger rate
- to help avoid resource over-utilization would be good if power-on defaults (e.g. for trigger rates) are conservative
- can we do a sanity check
currently the .cnf files are wasteful of resources: people leave detectors in .cnf that they do not use. make it easier to comment-out items in cnf? a big change ("selecting detectors before start of cnf") but still maybe still worth doing.

Possible Workflow

at the beginning of the shift expt does "resource_manager_alloc tmo_template.cnf > tmo.cnf"
- this would ideally move all the BOS fibers
at the end of the shift: "resource_manager_dealloc tmo.cnf". show who is responsible for conflicts.
- provide a "resource_manager_kill" command so that someone can seize control if they need it.
- provide a "resource_manager_list" command to show available/allocated resources

Page tree

Versions Compared

Old Version 2

New Version Current

Key

Red Hat Discussion

SLAC Discussion

Resource Dependency Graph

Dynamic Resource Allocation Discussion

Possible Workflow

Page tree

Page History

Versions Compared

Old Version 2

New Version Current

Key

Red Hat Discussion

SLAC Discussion

Resource Dependency Graph

Dynamic Resource Allocation Discussion

Possible Workflow