Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Initial Discussion

want:
- give the drp a psana-python script
- drive that psana-python script by calling psana_set_dgram(Dgram*) (would replace the file reading)

...

- need to make sure json2xtc can separate the Names (configure-only)
  from Shapes/Data (l1accept)
- cpo worries about performance of the extra json step
- cpo votes to try dgramcreate approach(with 75% confidence level)

Where Python Runs

  1. drp fex (mona/valerio/ric psana python)
  2. producing the trigger data (on the drp) (Ric, custom simple data format?)
    1. should this part of the drp fex or a separate subprocess/call?
    2. we can do this with either a python "call" or a subprocess, but "call" can be dangerous because the multi-threaded DRP can get stuck on the GIL: don't do the "call".
  3. analyzing the trigger data (on the teb) (Ric, custom simple data format?)
  4. python in EbReceiver that uses trigger data results to modify the dgram (e.g. ROI) (mona/valerio/ric psana python) 
    1. use the prescale to record the raw/fex? (but isn't this done by drp fex?)

(1) and (2) could be put into the same psana-python process. (4) is a separate python process.

Some Technical Details

  • shmem ownership/cleanup
    • valerio uses sysv ipc instead of posix_ipc (because the conda version of posix_ipc claims to not support message queues, but a pip-installed version does: this should be a solvable problem ... feels like it's not built optimally?) 
    • unlike our shmem there are not physical in /dev/shm
    • for sysv we control the naming of the numeric keys (which are the equivalent of the filenames) so we can avoid permissions issues that way.  currently the numeric key is formed from the thread number and partition number.  Ric suggests perhaps adding the primary XPM number (cpo points out this is indirect, somehow ideally would use "username").  maybe not such a big issue because the username is controlled by the platform number and procmgr.conf.
  • pebble size (both for transition buffers and L1 buffers: the maximum of these two is used for the drp-python shmem)
    • in Ric's new mode the pebble size is determined from the .cnf, or defaults to .service if not specified (used to be the .service) but drp-python can return a dgram larger than the pebble size.  what do we do?
    • we will manage the two bufend's in the drp-python and we will crash if that gets exceeded
    • if we have low-rate large events, could be better to assert damage rather than crash
      • Ric says maybe this is the job of the fex?  cpo says it might be better if we could solve it in one place for all possible fex's.  since truncating the data corrupts it then we have to mark the xtc as corrupt/not-iterable.
      • A downside of not crashing:  people won't realize there's a problem

Calibration Constants Broadcast


serial number: 1.3.5.7  (segments 1,3 run in one drp process, 5,7 run on another)

configure or connect:
- we do the socket setup
beginrun:
- we do the broadcast

three options:

1) one pub per drp process
   o disadvantage: more database fetches (10 or 20 database simultaneous fetches)
   o the identity of pub will be determined by threadnumber==0.  valerio says
     this is available in python
2) one pub per drp node (with multiple drp processes per node)
   o feels a little messy
3) one pub per detector (multiple detectors)
   o requires either that we fetch the constants for the whole detector using
     a subset of the serial number (5,7).  Mikhail says in this case we don't
     get the (1,3) constants
   o exchange serial numbers using the collection mechanism (so everyone would
     know 1,3,5,7)

- ***NEW*** leaning towards (1) since it works now.
- to get a unique port for the pub/sub (allows multiple drp processes on same node)
  two options:
  o use connect_json (heavyweight answer), or
  o use base_port_number+lowest_segment_number.  use zmq ipc's so we don't see
    broadcasts from other nodes (cpo votes for this option).  in this case
    it's a filename not a port number. instead of lowest_segment_number use the
    detector name + segment (e.g. atmopal_0) as the unique ipc name.
- socket setup on configure

configure:
- chris ford points out: configure is already slow and gets redone more often,
  so connect would be better

creation of python process should be on connect or earlier (e.g. startup)?
- do the socket setup here

python "user startup" (determination of which drp-python script the user has chosen):
- ideally should happen on configure since it is a user "configuration"