Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If there is significant deadtime coming from an HSD pair when running at high rate and 'kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]' messages appear in dmesg, etc., it may be that the interrupts are being handled for both datadev devices by one core, usually CPU0.  To avoid this, the interrupt handling can be moved to two different cores, e.g. CPU4 and 5.  First, disable the irqbalance service or tell it to avoid the two chosen coresdatadev's IRQ(s):

sudo systemctl stop irqbalance

or, edit /etc/sysconfig/irqbalance and add the cores to IRQBALANCE_BANNED_CPUSARGS:

IRQBALANCE_BANNED_CPUS = 18ARGS=--banirq=369 --banirq=370

Restart the irqbalance service in the latter case.  Then set the /proc/irq/<datadev_N IRQ>/smp_affinity_list values, e.g.:

sudo sh -c "echo 4 > /proc/irq/369/smp_affinity_list"
sudo sh -c "echo 5 > /proc/irq/370/smp_affinity_list"

These values will stick so long as the datadev driver is not reloaded and irqbalance is not restarted or misconfigured.  We'll need to find a way to do this as part of the datadev driver service (tdetsim.service or kcu.service) startup.  Since nothing sets the datadev's IRQ numbers to any particular value, as far as I can tell, I think we need to consider the possibility that they can be different from system (or driver?) restart to restart.

Cable Swaps

hsd cables can be plugged into the wrong place (e.g. "pairs" can be swapped).  They must match the mapping documentation Matt has placed at the bottom of hsd.cnf (which is reflected in the lines in hsd.cnf that start up processes, making sure those are consistent is a manual process).  Matt has the usual "remote link id" pattern that can be used to check this, by using "kcuStatus" on the KCU end and "hsdpva" on the other end. e.g.

...

BLD Data Formats

(from Matt Weaver)

Slides 4 and 9 of https://docs.google.com/presentation/d/1EwJTx_L5JNZF0mURIhxflpmzJ8p5ey6uIXqwth3gPBc/edit#slide=id.g47d26c62e9_0_29.  Kukhee has a collection of documentation here (https://confluence.slac.stanford.edu/pages/viewpage.action?pageId=285114775).  There are also EPICS PVA records from the server that provide "static" information about the packet contents - like the field names and data types: https://github.com/slac-lcls/lcls2/blob/a9886a2768fb83028317ba854664bd424d94e386/psdaq/psdaq/app/hpsBldServer.cc#L2-L11

hpsBldServer.cc represents exactly what the IOC needs to do - receive the UDP unicasts from the ATCA board and transmit the data as UDP multicast.