Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A useful diagram of the pieces of a DRP node that should be monitored via prometheus is here:  https://docs.google.com/presentation/d/1LvvUsV4A1F-7cao4t6mzr8z2qFFZEIwLKR4pXazU3HE/edit?usp=sharing

Updating TPR Firmware

See: SLAC EVR and TPR

Code Block
sudo /reg/g/pcds/package/slac_evr_upgrade/UPGRADE.sh /dev/tpra /reg/g/pcds/package/slaclab/evr-card-g2/images/latest-tpr

Generic

  • "clearreadout" and "clear"
  • need to reset the Rx/Tx link (in that order!) for XPM front-panel (note: have learned that RxLink reset can cause link CRC errors (see below) and have to do a TxLink reset to fix.  so order is important). The TxLink reset causes the link to retrain using K characters
  • look for deadtime
  • check that the "partition" window (with the trigger-enable checkbox) is talking to the right XPM: look in the XPM window label, which is something like DAQ:LAB2:XPM:N, where N is the XPM number.  A symptom of this number being incorrect is that the L0InpRate/L0AccRate remain at zeros when triggers are enabled.  This number is a unique identifier within a hierarchy of XPMs.
  • XPM is not configured to forward triggers ("LinkEnable" for that link on the XPM GUI)
  • L0Delay set to 99
  • DST Select (in PART window) set to "DontCare" (could be Dontcare/Internal)
  • check processes in lab3-base.cnf are running
  • run psdaq/build/psdaq/pgp/kcu1500/app/kcuStatus and kcuDmaStatus in kcuDmaStatus "blockspause" and "blocksfree" determine whether or not deadtime is set. if blocksfree drops below blockspause then it will assert deadtime. in hsd window "pgp last rx opcode" 0 means no backpressure, 1 means backpressure. Watch for locPause non zero which causes deadtime.
  • check for multiple drp executables
  • clearReadout broadcasts a message to receiving  kcu's telling them to reset timing-header FIFOs.
  • if running "drp" executable, check that lane mask is correct
  • if events are showing up "sporadically" look for CRC errors from "kcuSim -s -d /dev/datadev_0".  We have seen this caused by doing an XPM RxLink reset without a later TxLink reset.
  • >for the pgp driver this parameter needs to be increased in /etc/sysctl.conf:

    Code Block
    [root@drp-neh-cmp005 cpo]# grep vm /etc/sysctl.conf 
    vm.max_map_count=1000000
    [root@drp-neh-cmp005 cpo]# 
    
    

...