Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • "clearreadout" and "clear"
  • need to reset the Rx/Tx link (in that order!) for XPM front-panel (note: have learned that RxLink reset can cause link CRC errors (see below) and have to do a TxLink reset to fix.  so order is important). The TxLink reset causes the link to retrain using K characters
  • look for deadtime
  • check that the "partition" window (with the trigger-enable checkbox) is talking to the right XPM: look in the XPM window label, which is something like DAQ:LAB2:XPM:N, where N is the XPM number.  A symptom of this number being incorrect is that the L0InpRate/L0AccRate remain at zeros when triggers are enabled.  This number is a unique identifier within a hierarchy of XPMs.
  • XPM is not configured to forward triggers ("LinkEnable" for that link on the XPM GUI)
  • L0Delay set to 99
  • DST Select (in PART window) set to "DontCare" (could be Dontcare/Internal)
  • check processes in lab3-base.cnf are running
  • run psdaq/build/psdaq/pgp/kcu1500/app/kcuStatus and kcuDmaStatus in kcuDmaStatus "blockspause" and "blocksfree" determine whether or not deadtime is set. if blocksfree drops below blockspause then it will assert deadtime. in hsd window "pgp last rx opcode" 0 means no backpressure, 1 means backpressure. Watch for locPause non zero which causes deadtime.
  • check for multiple drp executables
  • clearReadout broadcasts a message to receiving  kcu's telling them to reset timing-header FIFOs.
  • if running "drp" executable, check that lane mask is correct
  • if events are showing up "sporadically" look for CRC errors from "kcuSim -s -d /dev/datadev_0".  We have seen this caused by doing an XPM RxLink reset without a later TxLink reset.
  • for the pgp driver this parameter needs to be increased in /etc/sysctl.conf:

    Code Block
    [root@drp-neh-cmp005 cpo]# grep vm /etc/sysctl.conf 
    vm.max_map_count=1000000
    [root@drp-neh-cmp005 cpo]# 
    
    

Connect Timeout Issues

On May 20th 2022 we found that the RIX connect timeout was near the edge and it was caused by the PVA "andor vls" detector.  Ric wrote about this:

Thanks for looking into this, Chris.  The only thing I’ve been able to come up with on a quick look is that the time is maybe going into setting up the memory region for the DRP to share the event with the MEBs.  I think this is mostly libfabric manipulating the MMU for this already allocated space (the pebble).  Is the event buffer size maybe particularly large compared to the other DRPs?  I’m guessing that since it’s an Andor, maybe 8 MB?  Not sure what ‘vls’ does to it.  If the trigger rate for this DRP isn’t high (10 Hz?), we could maybe speed this step up by lowering the number of DMA buffers so that fewer MMU entries are needed.

Timing System kcu1500 (or "sim")

...