Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • eye-scans for all transceivers
  • work on high-speed-digitizer timing robustness using teststand
    • occasional need to restart hsdioc process
    • kcu1500 can lose link and hsd loses/regains power, and can only be recovered by power cycling cmp node
  • check wave8 timing robustness
  • (almost done) program hsd firmware over pcie?
  • manufacture new xpm boards (4 for txi)
    • do we need another xpm/crate order for mfx? (separate from LCLS-II-HE?).  go from mfx hutch back to 208 or the mezzanine?
  • reproduce/fix timing nodes assigning wrong timestamp to configure transition by 1 or 2 buckets
    • matt thinks this is on the receiver side: some fifos that carry daq data separate from timing data.  matt thinks perhaps we have to connect the resets to those fifos.
    • have seen this is hsd/wave8.  see both being problematic after a power outage here: /cds/home/opr/tmoopr/2024/03/04_17:11:56_drp-srcf-cmp030:teb0.log (and Riccardo saw it in his tests, below)
  • (perhaps done by fixing reset logic?) reproduce/fix link-lock failure on timing system KCUs
  • make pyxpm processes robust to timing outages?
  • (done) ensure that Matt's latest xpm firmware fixes the xpm link-glitch storms
  • (perhaps done by fixing reset logic ?) reproduce/fix TxLinkReset workaround
  • (perhaps done by fixing reset logic?) reproduce/fix xpmmini-to-lcls2timing workaround
  • (done, fixed with equalizer 0x3 setting) check/fix loopback fiber problem in production xpms in room 208
  • after Julian's fixes in late 2023 on April 7 we had a failure where cmp002 kcu wouldn't lock to its timing link.  power cycling "fixed" the problem.
  • (also after Julian's fixes in late 2023) this file shows a failure mode of a tdet kcu1500 on drp-srcf-cmp010 where its timestamps were off by one clock-tick: teb log file showing the cmp010 problem: /cds/home/opr/rixopr/scripts/logfiles/2024/04/08_11:58:28_drp-srcf-cmp013:teb0.log.  Powercycling "fixed" the problem.  Split event partial-output from that log (two Andor's on cmp010 timestamps were incorrect, since all other detectors showed 0x8ff3 at the end).  A similar failure on drp-srcf-cmp025 can be seen here: /cds/home/opr/rixopr/scripts/logfiles/2024/04/13_12:43:08_drp-srcf-cmp013:teb0.log.  There was a timing outage two days previously, I believe.
Code Block
rix-teb[2111]: <W> Fixup Configure, 008a4a15bf8ff2, size 0, source 0 (andor_norm_0)
rix-teb[2111]: <W> Fixup Configure, 008a4a15bf8ff2, size 0, source 1 (andor_dir_0)
rix-teb[2111]: <W> Fixup Configure, 008a4a15bf8ff3, size 0, source 2 (manta_0)
rix-teb[2111]: <W> Fixup Configure, 008a4a15bf8ff3, size 0, source 3 (mono_encoder_0)

...