Low Rate Detector Handling

1) could use python-meb-decision with "heuristics" to make sure
   shmem sees desired events (including low-rate detectors), OR
2) Ric can reserve existing buffer indices for the 8 readout groups
   to allow the low-rate detectors to be seen: easier for seeing low-rate
   detectors but not as general as (1) for making monitoring decisions
3) The mechanism of (2) could also be used to allow the python-meb-decision
   to direct which events go to which meb (if there are multiple meb's)

Python Trigger/Monitor Decision

python-teb-decision:
- Ric has already done this with C++ .so.  This discussion is about
  whether or not we can do it python as well for lower-rate expts
  (it let's us reuse Mikhail's det.calib())
- could use the existing dlopen .so mechanism to start psana-python
  o to cpo: this feels like an extra layer of complexity (easier
    to use Valerio mechanism ("fork") to start up python?)
  o Valerio and Ric feels that uniformity provided by the .so approach
    is a significant benefit (configurable).  maybe the "fork" is in the .so?
- reuse Valerio's stuff, but maybe simpler since we don't need XTC?
  o Valerio is at the standalone-prototype stage
- three pieces:
  (1) trigger-primitive data production
  (2) trigger decision logic
  (3) reception of trigger decision,
      o we think there was an additional process that had to happen in
        EbReceiver.  cpo should try to remember
  o persistence the trigger
    data (e.g. was this a prescaled event) to XTC. (do this
    on timing drp)
  o produce the trigger primitive data (e.g. number of photons) which will
    get sent to the teb (does this happen in Valerio's drp script, or do
    we have a separate script?)
  o trigger-primitive data production (e.g. number of photons) may often
    need psana-python (Mikhail's corrections)
  o trigger-decision logic does NOT need psana-python. feel it will
    be difficult to run psana-python in the teb (don't have the datagrams
    flowing in the same way).
- data format between the drp and teb could be "simple".  e.g. pickle.
  doesn't have to be XTC, since it's not persisted.  but if Valerio's
  drp script also produces trigger-primitive data then.  how would
  Valerio put pickle into the pebble? separate pebble buffer?

NOTE: epics DRP produces nothing on L1 (only SlowUpdate).  Have
discussed eliminating epics L1 XTC writing to file.  Offline EB is OK
with that.  But Online EB needs it.  Need logic for this, ideally.
For now we can live with empty L1 XTC.

NOTE: ric says ensuring that the pebble has enough space reserved for
trigger primitive data.

key decisions:
- for low rate detectors do we do python-meb-decision-complex-heuristics or
  ric-reserved-buffer approach or both?
  o ric: learn if heuristics approach works well enough, but maybe the
    reserved-buffer approach isn't too difficult.
  o we'll learn more about both approaches and then decide
- does the drp-reduction script also compute the trigger-primitives?
  o yes (cpo, ric)
- what data format do we use for drp->teb communication
  o flexible user-defined, e.g. pickle (cpo, ric)
    error could be handled e.g. missing key in pickled dict
    more robust mechanism would be versioning, but that is also error prone
  o could use xtc for the drp-production of trigger primitive data if
    we so choose (teb may not receive xtc, ric may "preprocess" the xtc
    for sending)
- how to reuse Valerio's python on teb (not psana-python)
  o related to the startup-decision ("fork" or .so).  maybe the "fork" is
    in the .so?
  o the teb has a fixed set of reusable buffers (like the pebble) so
    ric/valerio's shmem mechanism should work

Multiple Readout Group Handling

Ric raised an issue that is significantly increasing the complexity of the online event-builders and the problems may not be solvable with the current flexible readout group structure and dead time:

  • the problem: "disjoint" sets of readout groups (e.g. detector A has data on shots 1,3,5,6 while detector B has data on shots 2,4) effectively creates 2 separate data acquisition systems: e.g. if detector B  gets slow, events can end up out of order (and perhaps other complexities that I don't understand)
  • the problem affects both the teb/meb as well as the code in the drp that joins teb results with the event data.  additional complexity could potentially be avoided by the proposed change below.
  • The following changes are proposed to address the problem:
    • all detector readout groups must occur together with another readout group (the "parent" or "common" group) which is always sent to the teb.  this eliminates the possibility of disjoint readout groups.  It is an error if the TEB finds an input event without the common group bit set
    • Events on which the common readout group didn't trigger won't be sent to the TEB.  We've gone with the second of 2 possibilities:
      • In one scenario we count when this occurs and discard the event, releasing its resources
      • In another, we bypass the TEB and queue the event to the back-end of the DRP (EbReceiver) in time order with events (possibly in batches) processed by the TEB.  When the back-end event builder won't find a result to match up with an input event, i.e., the common readout group bit isn't set in the input event's env, the input event is handled as if the TEB had produced a result indicating the event is to be recorded to disk and not monitored.  Because the MEB buffer addressing information normally received from the TEB is missing for these events, monitoring them is not currently an option.
    • currently dead time from one readout group will not affect another readout group, which could create the possibility of disjoint readout groups.  Matt found that changing this in the timing system firmware ends up being resource intensive, consequently the software must be made robust against events not having the common readout group bit set by having them bypass the TEB
    • it's only important that the teb receives events with that common group bit set and produces a result for them ... there is no requirement that the detector providing the heartbeat needs to be recorded to disk
    • a natural candidate to provide the common group would be the timing system
    • in the case where the common group is 1MHz, detectors can be triggered on any pulse.  For cases where it is <1MHz, timing system configuration software (or configuration storage software) should validate that all readout groups overlap with the common group.  If a brute-force (check-every-shot) algorithm is used to validate, Matt points out that there is a 1MHz period in LCLS-II so only 1 million pulses need to be checked.
  • Testing: the most general case to be tested would be to have one or two detectors running at the common group rate with 0, 1 or 2 groups running at lower rates (one of them at 1Hz)
  • No labels