Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Thanks for looking into this, Chris.  The only thing I’ve been able to come up with on a quick look is that the time is maybe going into setting up the memory region for the DRP to share the event with the MEBs.  I think this is mostly libfabric manipulating the MMU for this already allocated space (the pebble).  Is the event buffer size maybe particularly large compared to the other DRPs?  I’m guessing that since it’s an Andor, maybe 8 MB?  Not sure what ‘vls’ does to it.  If the trigger rate for this DRP isn’t high (10 Hz?), we could maybe speed this step up by lowering the number of DMA buffers so that fewer MMU entries are needed.

Performance Issues

  • Ric does "taskset -c 4-63" for daq executables to avoid the cores where weka processes are running
  • Ric also has lines like "ExecStartPost=/usr/bin/sh -c "/usr/bin/echo 4 > /proc/irq/369/smp_affinity_list" in tdetsim.service to set IRQ affinities.  This directs high-rate KCU interrupts to non-OS CPUs and avoids "soft lockup" issues.  The IRQ's (369 in this example) depend on the precise linux version.

Deadtime Issues

Default plan: high-rate detectors get L0Delay 100, low-rate detectors get L0Delay 0.

...

Roughly, if a DRP chain is stalled for some reason, the DMA buffers will be consummed at the trigger rate.  So in the above example, the HSDs will start back pressuring into firmware after roughly 1 second given a trigger rate of 1 MHz.  For a given trigger rate, there is no clear benefit to having one DRP have more or less DMA buffers than another.  The first one to run out of buffers will cause backpressure and ultimately inhibit triggers, leaving additional buffers on other DRPs inaccessible.  Thus, I suggest making the number of DMA buffers (cfgRxCount + 4) the same for each DRP in a given readout group and to roughly keep the cfgRxCounts in the same ratio as the trigger rates of the groups (while still following the 2**N - 4 rule of above).

taskset

Ric does "taskset -c 4-63" for daq executables to avoid the cores where weka processes are running

Interrupt Coalescing

We think this can help with errors like:

...