Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

perhaps related to https://stackoverflow.com/questions/23178606/debugging-python-fatal-error-gc-object-already-tracked?

 


(diagnostic) UNSOLVEDSOLVED: Message sent to mikhail/caf One small issue I noticed is that the progress bar got “stuck” at 11% when the configure was timing out (it was timing out for reasons that aren’t relevant to this message).  You can see the logfile here:
/reg/neh/home/cpo/2020/06/16_15:49:13_daq-tst-dev03.pcdsn:control_gui.log

At 17:34:33 there is a line like:

2020-06-16 17:34:33,789 tst-CGWMain[14913]: <D> received progress msg: {'transition': 'configure', 'elapsed': 17, 'total': 150}

(note that 17/150=11% progress that I saw in control_gui).  Then the GUI didn’t put out any other messages until the transition timed out at 17:36:46 (and correspondingly the progress bar didn’t update during that period).

Not critical, but would be nice to understand why the progress bar stopped updating.  Any thoughts on this?

...

Matt explained that the KCU's Tx link was down while its Rx link was up, so transitions could be received, but the XPM's deadtime signal was latched asserted due to the KCU to XPM link being down.  He further traced the KCU's Tx link being down is due to "the reset that gets asserted when loading the datadev driver that causes the timing link to go into this unrecoverable state."  He contacted Ben for a proper fix.

AMI

Got the following when running ami.cnf against a running tmo.cnf system that includes HSDs:

Code Block
languagetext
Traceback (most recent call last):
 File "/reg/neh/home/claus/lclsii/daq/test/lcls2-200602/install/bin/ami-worker", line 11, in <module>
   load_entry_point('ami', 'console_scripts', 'ami-worker')()
 File "/reg/neh/home5/claus/lclsii/daq/ami/ami/worker.py", line 387, in main
   flags)
 File "/reg/neh/home5/claus/lclsii/daq/ami/ami/worker.py", line 253, in run_worker
   return worker.run()
 File "/reg/neh/home5/claus/lclsii/daq/ami/ami/worker.py", line 159, in run
   for msg in self.src.events():
 File "/reg/neh/home5/claus/lclsii/daq/ami/ami/data.py", line 626, in events
   self._update(run)
 File "/reg/neh/home5/claus/lclsii/daq/ami/ami/data.py", line 795, in _update
   self._update_group(detname, det_xface_name, det_attr_list, is_env_det)
 File "/reg/neh/home5/claus/lclsii/daq/ami/ami/data.py", line 738, in _update_group
   group_types[attr] = self.data_types[attr_name]
KeyError: 'tmohsd:raw:peaks’

Message sent to Seshu and CPO.

HSD

After two tmo.cnf runs consisting of pvcam, epics, bld, ts, 3 fakecams, 10 hsds, which lasted more than a few minutes, all 10 hsds didn't respond to Disable.  The teb log files (/reg/neh/home/claus/2020/07/24_17:12:50_drp-tst-dev016:teb0.log, /reg/neh/home/claus/2020/07/24_19:01:01_drp-tst-dev016:teb0.log) show two L1Accepts and the Disable were timed out due to missing all HSD contributions.  The HSDs were being triggered at 360 Hz, which matches the time difference between the L1Accepts.  On another run attempt lasting no more than a minute or so, the Disable (and subsequent transitions) proceded correctly.

Miscellaneous

On one attempt to record a run with tmo.cnf, the control_gui reported bld failing to respond to BeginRun.  The teb log file (/reg/neh/home/claus/2020/07/24_19:15:28_drp-tst-dev016:teb0.log) shows the BeginRun event to be split.  All contributors but the bld arrived in the teb within the 5 second event build timeout period.  Later (not clear how much later) the bld contribution arrived, starting a new event, for which all the other contributions didn't show up within the 5 second timeout period (since they had already arrived for the previous event).  Because the pulse ID of this event was the same as that of the previous event (i.e., didn't advance), the teb asserted.