Page History
...
Matt's document showing the location of each hsd in the tmo chassis: https://docs.google.com/document/d/1SzPwrJsoJR0brlQG-mCNILPFh8njXYHGrQz7tl39Thw/edit?usp=sharing.
Supermicro manual for hsd chassis: https://www.supermicro.com/manuals/superserver/4U/MNL-2107.pdf
Supermicro document discussing pcie root complexes for some different systems: https://www.supermicro.com/products/system/4U/4029/PCIe-Root-Architecture.cfm
General Debugging
- look at configured parameters using (for example) "hsdpva DAQ:LAB2:HSD:DEV06_3D:A"
- for kcu firmware that is built to use both QSFP links, the naming of the qsfp's is swapped. i.e. the qsfp that is normally called /dev/datadev_0 is now called /dev/datadev_1
- HSD is not configured to do anything (Check the HSD config tab for no channels enabled)
- if hsd timing frames are not being received at 929kHz (status here), click TxLink Reset in XPM window. Typically when this is an issue the receiving rate is ~20kHz.
- The HSD readoutGroup number does not match platform number in .cnf file (Check the HSD "Config" tab)
- also check that HEADERCNTL0 is incrementing in "Timing" tab of HSD cfg window.
- in hsd Timing tab timpausecnt is number of clocks we are dead (156.25MHz clock ticks). dead-time fraction is timpausecnt/156.25e6
- in hsd expert window "full threshold(events)" sets threshold for hsd deadtime
- in hsd Buffer tab "fex free events" and "raw free events" are the current free events.
- in hsd status window "write fifo count" is number of timing headers waiting for HSD data to associate.
- "readcntsum" on hsd timing tab goes up when we send a transition OR L1Accepts. "trigcntsum" counts L1Accepts only.
- "txcntsum" on PGP tab goes up when we send a transition or l1accepts.
- check kcuStatus for "locPause" non-zero (a low level pgp FIFO being full). If this happens then: configure hsd, clear readout, reboot drp node with KCU
- if links aren't locking in hsdpva use "kcuStatus" to check that the tx/rx clock frequencies are 156MHz. If not (we have seen lower rates like 135MHz) a node power cycle (to reload the KCU FPGA) can fix this. Matt writes: "kcuStatus should have an option to reset the clock to its factory input before attempting to program it to the standard input." It looks like there is a "kcuStatus -R" which "kcuStatus -h" says should reset the clock to 156MHz, but cpo tried this twice and it seems to be stuck at 131MHz still.
- If the drp doesn't complete rollcall and the log file shows messages about PADDR_U being zero, restarting the corresponding hsdioc process may help.
...
Overview
Content Tools