Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that the datadev driver can handle multiple buffers (O(1000)) per interrupt, so the IRQ rate tends to fall well below the rate set by the IRQ HOLD / cfgIrqHold value at high trigger rates.  At 1 MHz trigger rate, the IRQ rate is a few hundred Hz.

'BUG: Soft lockup' messages - Interrupt affinity

If there is significant deadtime coming from a node hosting multiple DRPs at high trigger rate and kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] messages appear in dmesg, etc., it may be that the interrupts are being handled for both datadev devices by one core, usually CPU0.  To avoid this, the interrupt handling can be moved to two different cores, e.g. CPU4 and 5 (to avoid the Weka cores).  First, disable the irqbalance service and then set the IRQ affinity values (here datadev IRQs 369 and 370, as found from dmesg):

...

ExecStartPost=/usr/bin/sh -c "/usr/bin/echo 4 > /proc/irq/369/smp_affinity_list"
ExecStartPost=/usr/bin/sh -c "/usr/bin/echo 5 > /proc/irq/370/smp_affinity_list"

Note that some KCU firmwares provide only one datadev device.  When this is the case, the associated IRQ is 368.

It's unclear to me how the IRQ numbers are assigned, so since nothing sets the datadev's IRQ numbers to any particular value (as far as I can tell), we may need to consider the possibility that they can change from system (or driver?) restart to restart or be different from node to node.  However, I've seen no evidence that this is the case.

...