Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Deadtime Behavior

From a slack thread with Matt on Jan. 31, 2024:

Matt, does the 0x80 in this line in control.py mean that SlowUpdate will not be sent if any readout group has deadtime?

            # Force SlowUpdate to respect deadtime                                     
            if not self.pva.pv_put(pv, (0x80 | ControlDef.transitionId['SlowUpdate'])):

Matt writes: The coupled deadtime idea didn't work.  So, this should make the transition respect deadtime per readout group (some might get it and some might not).

Mona writes about psana2: eventbuilder requires that all non-L1 have to be complete (i.e. all streams need to see SlowUpdate).  This suggests to cpo that we need to eliminate the 0x80 bit and set xpm-pause_threshold low enough that we always guarantee at least one SlowUpdate can be received by all detectors.  Matt agrees with this, but points out that the system must drain that SlowUpdate before another is sent down.

Brainstorming

Issue: SlowUpdate currently preempts L1Accept, so integrating detectors miss their normalization data

...

  • Assume we'll never have a detector that adds payload to SlowUpdate timing stream data
    • Payload is added at the segment level
  • Let the L1Accepts have a SlowUpdate modifier bit
    • Could maybe be a different transition ID, but I think that means more conditionals and will be messier
  • If this bit is set:
    • The DRPs handle the L1A as usual
    • The DRPs look up a transition buffer using the L1A's pebble index and fills it in with the SlowUpdate data
      • The SlowUpdate EbDgram has the same pulseId, etc. as the L1A but its timestamp will be the L1A's incremented by 1 ns for psana EB
    • EbReceiver checks this bit and if set:
      • Writes out the L1Accept
      • Looks up the SlowUpdate dgram in the transition pool using the L1A's pebble index and writes it out (both to the file and the meb's)
    • psana sees two independent events, an L1Accept and a SlowUpdate, each with a separate timestamp, and handles them as currently
    • Only the L1A with the SU modifier bit set is forwarded to the TEB
      • TEB doesn't really do anything with transitions (e.g. the teb configuration is written out by the timing system we believe) 
      • Is it okay that the TEB doesn't get SlowUpdate trigger input data? (cpo thinks it's OK)
        • Perhaps such input data could possibly be included in the associated L1's payload (for both directions: drp to teb and vice-versa)
      • A plus of this idea: There is possibly no room in the batch for the SlowUpdate at 1 MHz
  • If the bit is not set:
    • The DRPs handle the L1As as usual
  • If there is no collision, SlowUpdate is generated and handled as usual
    • Since these can occur in slow RoGs when the fast RoG gets L1SU, the slow RoGs need to increment the timestamp (the +1ns) so that psana can event build
    • For all SlowUpdates, OK to unconditionally increment the timestamp by 1 ns?
      • Feels a bit kludgy
      • Maybe store the transition ID and/or group in the lower timestamp bits?
  • The TEB would need to be changed so that slow RoG SlowUpdates are built with L1As having the SlowUpdate bit set
    • Probably not a big modification as the pulse ID build happens naturally and only consistency checks need to be adjusted
  • The MEB changes might be minimal?
    • The EB is shared between TEB and MEB
    • Unlike for the TEB, L1As and SUs would need to be built separately by the MEBs, despite sometimes having the same pulseId
    • Maybe avoid the conflict by vetoing the monitor trigger request in the TEB if the L1A has the modifier bit set, so the MEB doesn't get two events (SU, L1A) with the same pulse id?
      • The MEB would then receive both L1As and SUs, none with the same pulseId
    • In any case buffer handling for the 3 cases should not be an issue

Another "locally generated" slowupdate idea

...

(we think this may not work because events could get out of time order in shared memory)

meb tells teb, teb tells drp which L1 meb and which buffer to use (for L1)

for SU (and other transitions) meb tells drp which transition-buffer to use
   - msg is sent as soon as meb is done with the previous SU or
     any other transition (i.e. when previous transition copied to shmem)
effectively, the drp has a list of free transition buffers

could epicsarch EbReceiver make up a slowupdate locally using
(timestamp=last_L1_time+1ns) (on a timer) and push it in timeorder time order to
a free meb buffer?

  - need to tell worker to poll all the variables 
  - could the epicsarch worker attach the payload to a high-rate l1accept
    at some slow rate (allows the slowupdate to flow from worker to ebreceiver)
  - EbReceiver would translate the low-rate L1Accept into a slowupdate transition
    (using the +1ns hack) and save SlowUpdate to xtc
  - could keep the L1Accept (a little ugly) could truncate it in epicsarch.
    The If the L1Accept is selected for monitoring, the MEB needs the L1Accept header at least to complete the event-build!
    Mona doesn't need the L1Accept, cpo thinks.
  - SlowUpdates won't necessarily be accompanied by their "parent" L1Accepts in the MEB

two SU broadcasts:
- EbReceiver going to multiple meb's
- psana's SMD0

what if there are two generators of slowupdate?  What if they're generated on the same timestamp?

like a "locally generated occurrence" (babar synchrequest from 1 ROM)

downsidedownsides:
- a multisegment detector couldn't write out slowupdate all on the same timestamp

- non-backwardly compatible change: old data has event-build slowupdate, new data
  doesn't (could use a different transitionid?)

- lose ability to do global operations like mpi-reduce e.g. in shared memory

- can things get out of time order in the meb if locally-generated slowupdate is not event-built?

  • Ric: Yes, I think they can.  It leads to the same problem that was dealt with by inventing the common readout group.
    • The EBs are often empty
    • When a new contribution arrives, the only time reference available is that of the most recently built event or previous contribution.  For the case of an empty EB, the code can thus recognize if the new contribution starts a new event or is out of time order.
    • For the case of the proposed SlowUpdate, it may make sense to think of it as belonging to its own readout group (perhaps a fictitious one).  The single contribution can then be determined to be a complete event and is flushed to the TransitionCache.
      • Another possibility might be to have SlowUpdates bypass the EB and go directly to the TransitionCache, but that would take some redesign of the system.
    • Various delays in the system (e.g., network and scheduling, etc.) can conspire to cause the first contribution of an event older than the SlowUpdate to arrive in the MEB after the SlowUpdate has been flushed.  All contributions from a given source arrive in the EBs in time order, but nothing coordinates their arriving in time order relative to other sources.  Thus, contributions from different readout groups could arrive in any order were it not for our requirement that the common readout group triggers together with any other readout group.
  • Since SlowUpdate data is imprecisely timestamped (PV read times may differ from the parent L1Accept timestamp by random amounts), is it a significant problem if the SlowUpdates appear in MEB shmem out of order with respect to the L1Accepts?
    • This might affect reproducibility between online monitoring and offline psana

changes required:
- remove slowupdate generation from control.py
- change epicsarch to do the L1/SU generation
- mona changes the psana broadcast of SU in some sort of backwardly compatible way
- ric intuition: drp executable significant changes? EbReceiver?