Deadtime Behavior

From a slack thread with Matt on Jan. 31, 2024:

Matt, does the 0x80 in this line in control.py mean that SlowUpdate will not be sent if any readout group has deadtime?

            # Force SlowUpdate to respect deadtime                                     
            if not self.pva.pv_put(pv, (0x80 | ControlDef.transitionId['SlowUpdate'])):

Matt writes: The coupled deadtime idea didn't work.  So, this should make the transition respect deadtime per readout group (some might get it and some might not).

Mona writes about psana2: eventbuilder requires that all non-L1 have to be complete (i.e. all streams need to see SlowUpdate).  This suggests to cpo that we need to eliminate the 0x80 bit and set xpm-pause_threshold low enough that we always guarantee at least one SlowUpdate can be received by all detectors.  Matt agrees with this, but points out that the system must drain that SlowUpdate before another is sent down.

Brainstorming

Issue: SlowUpdate currently preempts L1Accept, so integrating detectors miss their normalization data

Requirement:

  • slowupdate data should be in time order
  • not event-built

Possible solutions:

  • do nothing: accept missing normalization data for integrating detectors
    • they could turn off SlowUpdate
  • (too hard) have L1Accept preempt SlowUpdate
    • doesn't work at 1MHz
    • some groups get SlowUpdate and others get L1Accept? want to veto the slow update for all readout groups.  Matt says this is difficult because of the usual L0Delay issue.
  • (maybe doable) eliminate SlowUpdate and add a bit to L1Accept saying "attach slowupdate data" (epics data is broadcast by psana and acts a "barrier") 
    • this perhaps feels most elegant
      • psana L1 data is unicast, SlowUpdate is broadcast
      • payloads would be identifiable somehow, as separate xtc's
        • example "broadcast" payloads: timetool background, epics data
    • biggest change
    • maybe also need a SlowUpdate without L1Accept?
    • possible implementation: new transition SlowUpdateAndL1 (in addition to SlowUpdate, L1 on their own)
      • feels like meb and psana need to split it out to a SlowUpdate and L1Accept
    • bit could be in the env or it could be a new transitionId?
    • implementation possibility:
      • we request SlowUpdate from firmware, if firmware detects no L1Accept generate SlowUpdate, if there is L1Accept firmware generates SlowUpdateAndL1
      • some readout groups will get SlowUpdate, some will get SlowUpdateAndL1 (because of the L0Delay problem): is that a problem for the event builders?
        • Ric thinks it's a problem for online eb, but maybe manageable
  • (too hard) timing system could delay SlowUpdate to the next available bucket
    • need to have all readout groups coordinate to find a commonly available bucket.  this is difficult because of L0Delay
    • this doesn't work at 1MHz (but neither do integrating detectors)
  • (ric felt might be hard) could segment levels inject broadcast data without the timing system?
    • manufacture a timestamp
      • messy to have two timestamp generators
      • use last l1accept timestamp plus 1
        • can't use pulse id plus 1 because of the 1MHz limit
      • could we use the control-level collection mechanism to distribute a common timestamp? (separate physical thread)
        • cpo worries that this is tough to do in full-flight when l1accepts are going
    • do we need to event-build the broadcast data?
      • we think no
      • psana eb, teb, meb
      • would want to treat it like an L1Accept
      • could there be, for example, a need to event build slowupdate data from a multi-segment epix detector?  like a time-time tool background?
        • background data would be staggered
        • could have an algorithm compute_bkgd=timestamp%40==0: all segments agree on when to produce new background
    • maybe give it a different transition ID: SlowL1Accept
    • Ric feels like we perhaps couldn't reuse the existing MEB slowupdate buffers for this: might be a lot of work
  • imperfect possibility: they can detect missing events (either due to dead time or slowupdate) if they run patterns with fixed numbers of l1accepts per integrating detector image.  Maybe it's low enough that it's OK?

Another idea for dealing with the L1/SlowUpdate collisions

(from ric)

  • Assume we'll never have a detector that adds payload to SlowUpdate timing stream data
    • Payload is added at the segment level
  • Let the L1Accepts have a SlowUpdate modifier bit
    • Could maybe be a different transition ID, but I think that means more conditionals and will be messier
  • If this bit is set:
    • The DRPs handle the L1A as usual
    • The DRPs look up a transition buffer using the L1A's pebble index and fills it in with the SlowUpdate data
      • The SlowUpdate EbDgram has the same pulseId, etc. as the L1A but its timestamp will be the L1A's incremented by 1 ns for psana EB
    • EbReceiver checks this bit and if set:
      • Writes out the L1Accept
      • Looks up the SlowUpdate dgram in the transition pool using the L1A's pebble index and writes it out (both to the file and the meb's)
    • psana sees two independent events, an L1Accept and a SlowUpdate, each with a separate timestamp, and handles them as currently
    • Only the L1A with the SU modifier bit set is forwarded to the TEB
      • TEB doesn't really do anything with transitions (e.g. the teb configuration is written out by the timing system we believe) 
      • Is it okay that the TEB doesn't get SlowUpdate trigger input data? (cpo thinks it's OK)
        • Perhaps such input data could possibly be included in the associated L1's payload (for both directions: drp to teb and vice-versa)
      • A plus of this idea: There is possibly no room in the batch for the SlowUpdate at 1 MHz
  • If the bit is not set:
    • The DRPs handle the L1As as usual
  • If there is no collision, SlowUpdate is generated and handled as usual
    • Since these can occur in slow RoGs when the fast RoG gets L1SU, the slow RoGs need to increment the timestamp (the +1ns) so that psana can event build
    • For all SlowUpdates, OK to unconditionally increment the timestamp by 1 ns?
      • Feels a bit kludgy
      • Maybe store the transition ID and/or group in the lower timestamp bits?
  • The TEB would need to be changed so that slow RoG SlowUpdates are built with L1As having the SlowUpdate bit set
    • Probably not a big modification as the pulse ID build happens naturally and only consistency checks need to be adjusted
  • The MEB changes might be minimal?
    • The EB is shared between TEB and MEB
    • Unlike for the TEB, L1As and SUs would need to be built separately by the MEBs, despite sometimes having the same pulseId
    • Maybe avoid the conflict by vetoing the monitor trigger request in the TEB if the L1A has the modifier bit set, so the MEB doesn't get two events (SU, L1A) with the same pulse id?
      • The MEB would then receive both L1As and SUs, none with the same pulseId
    • In any case buffer handling for the 3 cases should not be an issue

Another "locally generated" slowupdate idea

(we think this may not work because events could get out of time order in shared memory)

meb tells teb, teb tells drp which meb and which buffer to use (for L1)

for SU (and other transitions) meb tells drp which transition-buffer to use
   - msg is sent as soon as meb is done with the previous SU or
     any other transition (i.e. when previous transition copied to shmem)
effectively, the drp has a list of free transition buffers

could epicsarch EbReceiver make up a slowupdate locally using
(timestamp=last_L1_time+1ns) (on a timer) and push it in time order to
a free meb buffer?

  - need to tell worker to poll all the variables 
  - could the epicsarch worker attach the payload to a high-rate l1accept
    at some slow rate (allows the slowupdate to flow from worker to ebreceiver)
  - EbReceiver would translate the low-rate L1Accept into a slowupdate transition
    (using the +1ns hack) and save SlowUpdate to xtc
  - could keep the L1Accept (a little ugly) could truncate it in epicsarch.
    If the L1Accept is selected for monitoring, the MEB needs the L1Accept header at least to complete the event-build!
    Mona doesn't need the L1Accept, cpo thinks.
  - SlowUpdates won't necessarily be accompanied by their "parent" L1Accepts in the MEB

two SU broadcasts:
- EbReceiver going to multiple meb's
- psana's SMD0

what if there are two generators of slowupdate?  What if they're generated on the same timestamp?

like a "locally generated occurrence" (babar synchrequest from 1 ROM)

downsides:
- a multisegment detector couldn't write out slowupdate all on the same timestamp

- non-backwardly compatible change: old data has event-build slowupdate, new data
  doesn't (could use a different transitionid?)

- lose ability to do global operations like mpi-reduce e.g. in shared memory

- can things get out of time order in the meb if locally-generated slowupdate is not event-built?

  • Ric: Yes, I think they can.  It leads to the same problem that was dealt with by inventing the common readout group.
    • The EBs are often empty
    • When a new contribution arrives, the only time reference available is that of the most recently built event or previous contribution.  For the case of an empty EB, the code can thus recognize if the new contribution starts a new event or is out of time order.
    • For the case of the proposed SlowUpdate, it may make sense to think of it as belonging to its own readout group (perhaps a fictitious one).  The single contribution can then be determined to be a complete event and is flushed to the TransitionCache.
      • Another possibility might be to have SlowUpdates bypass the EB and go directly to the TransitionCache, but that would take some redesign of the system.
    • Various delays in the system (e.g., network and scheduling, etc.) can conspire to cause the first contribution of an event older than the SlowUpdate to arrive in the MEB after the SlowUpdate has been flushed.  All contributions from a given source arrive in the EBs in time order, but nothing coordinates their arriving in time order relative to other sources.  Thus, contributions from different readout groups could arrive in any order were it not for our requirement that the common readout group triggers together with any other readout group.
  • Since SlowUpdate data is imprecisely timestamped (PV read times may differ from the parent L1Accept timestamp by random amounts), is it a significant problem if the SlowUpdates appear in MEB shmem out of order with respect to the L1Accepts?
    • This might affect reproducibility between online monitoring and offline psana

changes required:
- remove slowupdate generation from control.py
- change epicsarch to do the L1/SU generation
- mona changes the psana broadcast of SU in some sort of backwardly compatible way
- ric intuition: drp executable significant changes? EbReceiver?

  • No labels