Page History

...

eye-scans for all transceivers
work on high-speed-digitizer timing robustness using teststand
- occasional need to restart hsdioc process
- kcu1500 can lose link and hsd loses/regains power, and can only be recovered by power cycling cmp node
check wave8 timing robustness
(almost done) program hsd firmware over pcie?
manufacture new xpm boards (4 for txi)
- do we need another xpm/crate order for mfx? (separate from LCLS-II-HE?). go from mfx hutch back to 208 or the mezzanine?
reproduce/fix timing nodes assigning wrong timestamp to configure transition by 1 or 2 buckets
- matt thinks this is on the receiver side: some fifos that carry daq data separate from timing data. matt thinks perhaps we have to connect the resets to those fifos.
- have seen this is hsd/wave8. see both being problematic after a power outage here: /cds/home/opr/tmoopr/2024/03/04_17:11:56_drp-srcf-cmp030:teb0.log (and Riccardo saw it in his tests, below)
(perhaps done by fixing reset logic?) reproduce/fix link-lock failure on timing system KCUs
make pyxpm processes robust to timing outages?
(done) ensure that Matt's latest xpm firmware fixes the xpm link-glitch storms
(perhaps done by fixing reset logic ?) reproduce/fix TxLinkReset workaround
(perhaps done by fixing reset logic?) reproduce/fix xpmmini-to-lcls2timing workaround
(done, fixed with equalizer 0x3 setting) check/fix loopback fiber problem in production xpms in room 208
after Julian's fixes in late 2023 on April 7 we had a failure where cmp002 kcu wouldn't lock to its timing link. power cycling "fixed" the problem.
also saw two incidents in April 2024 where "cat /proc/datadev_0" showed all 1's (0xffffffff) everywhere as well as nonsensensical string values. Likely triggered by timing outages?
(also after Julian's fixes in late 2023) this file shows a failure mode of a tdet kcu1500 on drp-srcf-cmp010 where its timestamps pulse-ids were off by one pulse-id ("bucket jumping" problem that Riccardo reproduced on the teststand): teb log file showing the cmp010 problem: /cds/home/opr/rixopr/scripts/logfiles/2024/04/08_11:58:28_drp-srcf-cmp013:teb0.log. Powercycling "fixed" the problem. Split event partial-output from that log (two Andor's on cmp010 timestamps were incorrect, since all other detectors showed 0x8ff3 at the end). A similar failure on drp-srcf-cmp025 can be seen here: /cds/home/opr/rixopr/scripts/logfiles/2024/04/13_12:43:08_drp-srcf-cmp013:teb0.log. There was a timing outage two days previously, I believe.

...

Page tree

Versions Compared

Old Version 50

New Version 51

Key