Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

actionresultremedyresult
Remove XPM10 fiber timing in the back
while DAQ running

*** XpmDetector: timing link ID is ffffffff = 4294967295^M
Timing 1 shutsdown

TxlinkReset of cmp015 in XPM11DAQ recovers
Repeat XPM10 fiber timing removal removal

DAQ cannot disable

---DAQ recovers by itself at restart
Repeat XPM10 fiber timing removal removal

---

---no issue
Repeat XPM10 fiber timing removal removal

DAQ cannot disable

---DAQ recovers by itself at restart
Remove XPM10 fiber timing in the back
while DAQ stopped

---

---DAQ starts with no issue
Repeat XPM10 fiber timing removal removal while DAQ stopped

---

---DAQ starts with no issue
Remove transceiver from XPM10 in the back (DAQ stopped)

---

---DAQ starts with no issue
Remove transceiver from XPM10 in the back (DAQ started)

---

---DAQ starts with no issue

timing 1 shutsdown by itself

TXlinkReset on XPM10 for XPM11DAQ recovers
Remove fiber on XPM10 to XPM11

---

---DAQ starts with no issue
Remove transceiver on XPM10 to XPM11

---

---DAQ starts with no issue
Remove fiber on XPM11 AMC0 port 0

---

---DAQ starts with no issue
Remove transceiver on XPM11 AMC 0 port0

---

---DAQ starts with no issue

opal disappears from the list f detectors

restart DAQDAQ starts with no issue
power cycle xpm10 via switch only AMC0

XPM 11 looses timing node
Opal not in the list of detectors

Restart pyxpm 10 and 11
Power cycle xpm 11 with handles
fru-deactivate xpm11 (3 times)
fru-deactivate xpm10

restart pyxpm 11


DAQ restarts but opal shutsdown

opal still shutdown

devGui xpmmini timing v2
TxLinkReset
Opal still not back BadDetector Paddr
Xpmpva died xpm11

no avail


Stop pyxpm 10 and 11
fru-deactivate 10 and 11
strat pyxpm 10 and 11

DAQ starts with no issue


Conclusion

...


It appears that yanking the timing fiber can cause disturbances in the system, but they are not repeatable 100% of the time.
XPMs Power spikes can set the DAQ in a behavior similar to the XPM glitch, but only if pyxpms are running. To be repeated.

Upgrading XPM firmware seems to have mitigated all the issues (to 3.6.0 from 3.5.4). The bucket issue becomes more prominent, probably because other issues are not happening. This issue appears when power cycling the xpm11.

Brainstorming Session

Nov. 16, 23 with mona, dan, weaver, caf, claus, melchior, cpo

proposal:

- move ric/mona/christos to xpm10 (for the future)
- give riccardo the whole system for the day and he messes with xpm10
- add startupMode=1 kwarg to opal

new xpm firmware (leaving xpm10 alone, no xpmmini->lcls2 hack):
riccardo can't reproduce the errors, except for bucket skipping
(txlinkreset fixed it for matt, but not riccardo and ric)

old xpm firmware (also messing with xpm10 with xpmmini->lcls2): riccardo could reproduce
xpm link glitch and txlinkreset (once) and (likely) xpmmini issue

theories:
- maybe ConfigLclsTimingV2 isn't reliable (should perhaps poll
  on something like rxid!=0xffffffff) 
- either new xpm firmware makes things better
- or we need to mess with xpm10 to reproduce problems
- or we're unlucky and can't reproduce (or we're not doing the right
  things to reproduce)
- might need a minimum length of time to tickle the issues (matt says
  try 30 minutes to 1 hour)

matt has an idea for bucket-jumps.  could direct julian.