Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Red Hat Discussion

Resource Management Guide (RHEL 7)

SLAC Discussion

To support a shared DRP.  One process that understands all the resources that are in-use and prevents conflicts.  Two possibilities:

  • make the collection software long-lived and multi-platform
    • cpo currently leaning this way: "fewer moving parts"
    • suggested naming: "collection"
  • add a layer on top of control.py (merged collection/control code)

...

  • resources managed:  node/kcu-variant/pgplane/cores
  • a service:  two cnf files can have same resources but would be prevented from starting if another cnf already has it allocated.
  • make sure right process ("opal") runs on node with right firmware
  • allocation request is made before the processes are launched
    • maybe a new "resource check" transition in the state machine before processes are launched?
    • implementation possibility: parse the cnf file to determine node/kcu-variant/pgplane/cores
      • cnf: "cmp001: drp -l 0x5 /dev/datadev_0 -D opal" would be translated to the resource request (need to add "cores")
      • would tweak cnf syntax to make easier, but cpo believes we should leave procmgr roughly as-is
      • procmgr is a static configuration, if we use it the resource manager would essentially be checking for conflicts and not for dynamic allocation
  • which resources are
    •  allocated
      • don't want to start the process until we know there is node/pgplane for it.  an ordering problem.
    • deallocated
    • crashed
  • killPartition
  • try to keep the idea of "resource" generic.  today: node/kcu/pgplane but could add more resources in the future (ram, network I/O).  like slurm.
  • there would be limits on node resources: e.g. 2 kcu's, 8 pgplanes, 64 cores that would be enforce by the resource manager.

...

What Resources are Managed/Shared

Resource Dependency Graph

Introductory slides available here:  ppt  pdf

Dynamic Resource Allocation Discussion

Aug. 17, 2021 with caf, valmar, cpo, weaver, snelson

Simple-minded picture:

  • (straw man big-idea) cnf-authors (pcds-poc's and data-systems) write a .cnf file but leave node-name and pgp-lane(s) as "templated" parameters
    • should we have a lookup that takes cnf-author-supplied trigger-rate and detector type (opal, epix) and computes a number of lanes?  i.e. number-of-pgp-lanes not user-supplied, but computed (alternative is a "sanity check" on trigger-rate and number of lanes/nodes).  in principle resources needed depend on computing/bandwidth, which makes it more complex.
    • for simplicity: we could have detector types which have trigger-rate/computing-resources built into the detector-type name, e.g. epix10k2M_120Hz, epix10k2M_20kHz, epix10k2M_120Hz_high_computing_load.  to start, only a "shorthand" for allocating resource: wouldn't enforce consistency with timing-system-configured-trigger-rate, but in principle could do that in the future (could use ts_120Hz category as a first attempt to enforce consistency).
    • for ami/shmem the cnf-author would specify number of nodes, since there aren't specific patterns like there are with detectors
      • this number could be determined by the cnf-author with a "show resources" command
      • we should provide guidance for what ami resources are needed for a particular analysis (this is complex since scientists can do anything)
  • some node-names could be hard-coded (e.g. control.py)
    • consider this for user-shmem, so it doesn't move too often.  could have two different timescale for filling in cnf templated node-names: "slow" timescale for user-shmem nodes (e.g. once per experiment) and a "fast" timescale for detector nodes (e.g. every daq restart)
    • could be useful for debugging hardware-specific problems
  • need a mechanism to indicate if this cnf line is opal/control/ami/etc.
    • some .cnf files have loops and associated lists of nodes (e.g. for ami) which adds complexity.  maybe could understand the dictionary that these loops generate
    • some chance that with resource management we could eliminate the loops for the templated-cnf (the "loops" would be generated by resource manager)
  • how do scientists know where their shmem is running?
    • we can provide a tool that tells the scientists where it is running, but we can't change it very often
    • maybe we a mechanism to "pin" some resources (shouldn't be templated)
  • consider jinja for handling templated parameters?
  • dynamic allocation would require control of the BOS 
    • for the camlink-converter the BOS needs to manage both the timing/data fibers
    • all other detectors just need the data fibers
    • need to fix serious any firmware glitches that would happen when fibers are moved (hard)
    • api's: REST, T1 or TL1, SNMP
  • feels like resources would be managed via a database
    • for each node database would have a "detector type" and number of lanes that are free
  • question: with low-rate detectors we could squeeze many into one node (e.g. 120Hz epics/timing-system/pvadetectors)
    • (heuristic: not precise) perhaps cnf-authors (e.g. pcds-poc, data-systems) would indicate "exclusive access", "as many you like", "somewhere in the middle".  Maybe just a number-of-lanes to reserve?  this is error prone, e.g. if the scientists change the trigger rate
    • to help avoid resource over-utilization would be good if power-on defaults (e.g. for trigger rates) are conservative
    • can we do a sanity check
  • currently the .cnf files are wasteful of resources: people leave detectors in .cnf that they do not use.  make it easier to comment-out items in cnf?  a big change ("selecting detectors before start of cnf") but still maybe still worth doing.

Possible Workflow

  • at the beginning of the shift expt does "resource_manager_alloc tmo_template.cnf > tmo.cnf"
    • this would ideally move all the BOS fibers
  • at the end of the shift: "resource_manager_dealloc tmo.cnf".  show who is responsible for conflicts.
    • provide a "resource_manager_kill" command so that someone can seize control if they need it.
    • provide a "resource_manager_list" command to show available/allocated resources