Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PROPOSED SOLUTIONrunning 1 hsd (datadev_0) and after a while the deadtime goes to 100% from that hsd, but timepausecnt is zero for both hsd’s.  I think it's caused by ben's headercntof latching to 1 on the “A” hsd, even though msgdelay is set to 99.   Do we need to increase msgdelay?  Matt says msgdelay is max 100 (corresponding to 1us).  This could be caused by slow updates ignoring dead time if we get long dead time due to the soft-lockup issue above (now solved by Matt, we believe).  So could be the same issue.  Matt will have SlowUpdates pay attention to dead time to avoid this.


UNSOLVED: Sept. 11 '20:  The system (4 DRPs on dev004 and 2 HSDs on dev008) ran normally several times this morning.  When a new run was started, phase 2 of Configure wasn't received by the 2 HSDs.  Instead, their DRPs both died with 'Jump in complete l1Count'.  Each DRP reported a different PulseID and TimeStamp, but in both cases the event was a SlowUpdate which should have been a Configure.  hsdpva shows only the A side having a headercntof of 1.  For both sides, msgdelayset is 98, msgdelayget is 0, headerfifow is 0, headerfifor is 16.  Starting a new run proceded correctly with Configure properly received.   

Not Critical


UNSOLVED (any rate)with both the fake cam and hsd Matt saw that if he ran at 1MHz then disabled for a few seconds then reenabled that all the buffers were stuck in software and system would hang.  Reproducible 1 out of 3 attempts.  Learned that Pause/Resume does not see this problem - must just disable triggers.  When this happens, the monitoring shows that the DRP does not process all of the L1s.  It's like a batch gets stuck.  After that, nothing gets processed (transition or more L1s). 

Found that SlowUpdates prevent the system from hanging.  Ric suggested that the TEB's in-progress batch gets stuck doesn't get released because the interval between the last L1 and the next datagram is greater than the size of the batch intervalDRP's batch ringbuffer, which can accomodate 4 seconds of running.  Thus, the ringbuffer is not empty, and the head would have to pass the tail in order to allocate another batch.  This doesn't happen between transitions, because a transition flushes any in-progress batch.

...