Initial Discussion
want:
- give the drp a psana-python script
- drive that psana-python script by calling psana_set_dgram(Dgram*) (would replace the file reading)
ds = DataSource(dgramDsource=True)
myrun = next(ds.runs())
for event in myrun.events():
pass
xtcreader C++ calls: psana_set_dgram(Dgram*) (goes into dgram.cc?)
idea:
- multiprocess drp (not multithreaded) (Valerio), talk with Elliott (like legion)
o have standalone multi-threaded or multi-process C++ code run python scripts
- (Mona) use xtcreader to represent one of those processes for development
modified to call a python script, perhaps like:
https://github.com/slac-lcls/lcls2/blob/c4fa38db1799b5c2acf6e4908daf50403c1bf616/psdaq/drp/BEBDetector.cc#L80
- xtcreader C++ calls psana-python script
o it make a new DataSource (DgramDataSource?)
o as drp C++ receives a new dgram, it passes it to the DgramDataSource (instead of reading from file)
ds = DataSource(dgramDsource=True)
C++ call: psana_set_dgram(Dgram*) (would replace the file reading)
the above Dgram is passed (somehow) to psana/src/dgram.cc (does
file-reading) who creates the python "Dgram"
maybe use the "buffer"/view interface?
two options:
(1) shmem we copy every dgram so that the python reference counting works in a standard way. could do the same thing here. decouples the psana-memory-management from drp-memory-management
(2) we don't copy the dgram, more efficient but we can't delete the dgram in a normal way, and we can't save information from old events
my inclination is to do (1)
A potential issue: (lower priority) this method of running psana (and shmem) do not have scalable ways of loading the calibration constants: each core will access the database. Ideally we would fix.
Second Discussion
Oct 14, 2021 with Mikhail, Mona, Valerio
Executive Summary: now prefer option 3 below, which feels like the simplest, although not highest performance.
Current picture:
kcu1500 -> PGPReader -> N*Worker -> Ric's TEB -> Collector
(TEB/MEB stuff on another node)
Ideally, replace Worker threads with Worker processes
BUT, RIC's (complicated) infiniband stuff is all multithreaded
Option 1: (the original idea discussed above)
One process for C++ and python: Mona calls Py_CallScript("dummy.py")
which returns when it hits iterator iterator is awoken with
psana_set_dgram(dg)
Advantage:
- only 1 process
- no new DRP
Disadvantage:
- weird
Option 2:
Python is low performance. Maybe replace all of it with multiprocess zmq?
Disadvantages:
- two independent DRP's to maintain
- rewrite Ric's TEB/MEB, easier with ZMQ
o ZMQ has some uncontrollable aspects (e.g. hidden threads)
- also have to rewrite the shmem server
- a lot of work
Advantage:
- looks more like shmem
- standard multiprocess python
Option 3:
kcu1500 -> PGPReader -> N*Worker -> Ric's TEB -> Collector
(TEB/MEB stuff on another node)
each Worker has a bidirectional pipe to a separate psana process
(behaves like shmem again) where dgrams go back-and-forth: psana is
given a "raw" dgram, psana returns a "fex" dgram, e.g. with list-of-photons
Worker <---> psana
Could use zmq? bidirectional pipe? or something similar.
Advantage:
- only have to support 1 DRP
Disadvantage:
- 2 processes with communication (also have that in option 2)
work to be done:
- valerio: drp communicate with multiple python processes
- mona:
o routines to modify dgrams from python
- look at dgramCreate.pyx
https://github.com/slac-lcls/lcls2/blob/master/psana/psana/peakFinder/dgramCreate.pyx
also the test test_py2xtc.py
goal: receive the raw-dgram, and return a fex-dgram with the raw (how do we do this return?)
data removed (almost always) and put in fex data.
o think about calib-fetch scaling (lower-priority)
- ideally independent of mpi. zmq approach? how do we
determine "supervisor"?