Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
Group: 4  Master: 1  RateSel: 0  DestSel: 8000  Ena: 1
Group: 3  Master: 1  RateSel: 3  DestSel: 8000  Ena: 0
Group: 7  Master: 1  RateSel: 4  DestSel: 8000  Ena: 0
Traceback (most recent call last):
  File "/reg/neh/home4/cpo/git/lcls2/psdaq/psdaq/pyxpm/pvstats.py", line 457, in update
    self._links[i].update()
  File "/reg/neh/home4/cpo/git/lcls2/psdaq/psdaq/pyxpm/pvstats.py", line 136, in update
    updatePv(self._pv_remoteLinkId,self._app.remId.get())
  File "/reg/neh/home4/cpo/git/lcls2/psdaq/psdaq/pyxpm/pvstats.py", line 114, in updatePv
    pv.post(value)
  File "/reg/g/psdm/sw/conda2/inst/envs/ps-3.1.11/lib/python3.7/site-packages/p4p/server/raw.py", line 160, in post
    _SharedPV.post(self, self._wrap(value))
RuntimeError: bad_weak_ptr
Caught exception... retrying.


UNSOLVED (1MHz): running 5 hsd nodes at 1mhz saw this on dev010 and node became unresponsive (disable timed out?).  removed dev010 but saw disable timeout on dev019:

...

problem goes away if we reduce the trigger rate to 71kHz.  even "ls" on the drp node will hang until the trigger rate is reduced to 71kHz.  "ls" hanging is reproducible even when running only 1 of the two hsd's on a drp.  pgp driver and firmware haven't changed since February.  caused by conda compiler work? no. can see problem with pgpread with both old/new compilers.  maybe it's having 2 hsd's on a drp (interrupts may still be firing even when we don't run the second hsd drp executable).  Matt has been unable to reproduce with fakecam


UNSOLVED (1MHz): with both the fake cam and hsd Matt saw that if he ran at 1MHz then disabled for a few seconds then reenabled that all the buffers were stuck in software and system would hang.


UNSOLVEDhsd configure times out the first time, but works second time.  Matt says that he seen the transition not received at all by the hsd.  Phase 1 completes in Digitizer.cc but phase2 maybe doesn't start, consistent with Matt's observation?  log file in /reg/neh/home/cpo/2020/06/09_14:38:23_drp-tst-dev021:tmohsd_0.log.  Could it be the clear readout?  But we have a 1s delay in control.py.

...