The XPP large detector is 4Mpx and 25kHz. From Vincent: Two multimode MPO 24 fiber bundles per megapixel (8 bundles total) to the MM to SM conversion box. This means 8x24=192 fibers (96 pairs) which is 2 full SM-MM conversion box. (Aside: The SparkPix scales similarly with the number of pixels, so it only needs 24 fibers (1x MPO 24)).
For TXI (2Mpx 5kHz) based on the epixhremu study from ric/stefano/valerio/mikhail/mona it feels like there is a good chance we can do it on 20 nodes with CPU (each with ~50 cores). The XPP detector has 10x that data volume, and 200 nodes feels too difficult.
Proposed Goal: we should try to target GPUs for this to reduce the node count (ideally to 24 nodes, each of which would take 1 LR4 multi-color fiber). This is ~8GB/s into each node which is difficult with KCU1500 in our existing nodes, but will hopefully be doable in time for XPP.
If using 1 LR4 fiber per node isn't doable (i.e. we need more nodes) we should still use LR4 to reduce fiber-count in the BOS and use a SM-MM box in SRCF to split out the LR4 into more PLR4 fibers.
To-do items:
- test common-mode speed on gpu (seshu has demonstrated Mikhail’s calibration formula without common-mode will be fast on a GPU)
- we should meet with XPP scientists to understand what data-reduction algorithms are needed for their hutch
- if libsz is one of the algorithms, we should understand its performance on gpu
- benchmark other data-reduction algorithms on gpu
- consider multiple options for algorithm implementation: cupy, cunumeric, cuda kernels
- talk to TID engineers about dma'ing the kcu1500 data directly to gpu