Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

After two tmo.cnf runs consisting of pvcam, epics, bld, ts, 3 fakecams, 10 hsds, which lasted more than a few minutes, all 10 hsds didn't respond to Disable.  The teb log files (/reg/neh/home/claus/2020/07/24_17:12:50_drp-tst-dev016:teb0.log, /reg/neh/home/claus/2020/07/24_19:01:01_drp-tst-dev016:teb0.log) show two L1Accepts and the Disable were timed out due to missing all HSD contributions.  The HSDs were being triggered at 360 Hz, which matches the time difference between the L1Accepts.  On another run attempt lasting no more than a minute or so, the Disable (and subsequent transitions) proceded correctly.

BLD

In recent tmo.cnf runs with the BLD, the BLD has consistently been marking every event it handles with MissedData damage.  Adding a print to Pgp::next() results in lines being printed like:

Code Block
languagetext
Missed BLD: PID 61939cc1184cc327, TH PID 000011aaf02c72, ts 398227b806e0c6f6
tst-drp_bld[200900]: <D> PGPReader  lane 0  size 432  hdr 0c000011aaf02c72.398227b806e0c6f6.a3110008

The first pulse ID value shown is received by Bld::next() from the multicast socket.  It looks nothing like the TimingHeader pulse ID.  On an earlier try, I got:

Code Block
languagetext
Missed BLD: PID 3980f65135993731, TH PID 000000c3effd0f
tst-drp_bld[93557]: <D> PGPReader  lane 0  size 432  hdr 0c000000c3effd0f.3980f65433113b5a.9f0f0008

In this case, part of the pulse ID value received from the multicast socket was in common with some of the upper 32 bits of the TimingHeader timestamp (3980f65).

Miscellaneous

On one attempt to record a run with tmo.cnf, the control_gui reported bld failing to respond to BeginRun.  The teb log file (/reg/neh/home/claus/2020/07/24_19:15:28_drp-tst-dev016:teb0.log) shows the BeginRun event to be split.  All contributors but the bld arrived in the teb within the 5 second event build timeout period.  Later (not clear how much later) the bld contribution arrived, starting a new event, for which all the other contributions didn't show up within the 5 second timeout period (since they had already arrived for the previous event).  Because the pulse ID of this event was the same as that of the previous event (i.e., didn't advance), the teb asserted.