You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Task Overview

Goals:

  • 35kHz 16Mpx 1TB/s MFX area detectors
  • 50GB/s per into GPU (SZ/LC and calibration, or calibration+ML), avoid CPU where possible, 5GB/s output
    • "pipeline" as much as possible. hopefully enough buffering to handle the software trigger latency (Ric's TEB)

Approach:

  • use nvidia/cuda for now
    • in future consider kokkos/hip etc.
  • python will be less useful
    • still useful for managing the slow transitions (right now C++ handles the 35kHz trigger data)
    • not as useful if we can keep the L1Accept path entirely on the GPU
  • nvlink would be nice, but likely the protocol isn't open to us
  • hopefully GPU can launch its own kernels
  • CPU would be used for:
    • handling the transitions (configure, beginrun, etc...)
    • monitoring and trigger info transmit/receive

Algorithms:

  • SZ3/LC compression
  • ML algorithms (peak-finding, triggering on teb?)
  • detector calibration (Gabriel?)
  • validation of reduction algorithms (stefano)

Infrastructure:

  • worry about pcie performance (pcie7)
  • use drp-srcf-gpu[001-004]. cuda 12
  • datadev driver (TID)
  • GPU-based file-writing with gpu-direct (Gabriel?)
    • generating correct xtc headers
  • drp-gpu executable (ric?)
    • need to solve the trigger-problem
    • need to solve the monitoring-problem
  • move to spack
  • test that GPU-compressed data can be CPU-decompressed by psana (especially SZ3)

Driver

Meeting with Larry, Ryan, Matt, Ric, cpo on Feb. 9, 2024

To address concerns about LCLS-II HE large area detectors

run a different driver (same github repo as datadev):  requires "Quadro" ability.  RTX A5000 (in drp-srcf-cmpNNN) seems to have this ability.
additional dma engine needs slightly different dma engine
"metadata" word write triggers gpu
  - two physical addresses: destination for data and mailbox address
    which gpu would wait on
TID driver uses "GPUDirect" to move data directly to GPU memory over pcie
does a cpu need to be involved in the path?  unclear.  maybe a little.
multiple gpus? might need firmware development.  currently 1-to-1.  Matt points out that if GPUs support 30GB/s of SZ compression that matches about what one KCU1500/C1100 could do, so maybe have to have 1-to-1.

have to disable something in linux kernel to i2o or p2p that
allow the direct pci computation

include in makefile the path to nvidia drivers

https://docs.nvidia.com/cuda/gpudirect-rdma/index.html

http://mug.mvapich.cse.ohio-state.edu/static/media/mug/presentations/2016/MUG16_GPU_tutorial_V5.pdf

ryan/larry will resurrect the driver
cpo try to install kcu in our gpu nodes
cpo gets account number

research nvlink: direct to gpu, avoids pcie, still a closed protocol?)

Jana writes that Ryan/Larry can charge 19330-44523 for a few months.

Nvidia Meeting

Diagram of system from conversation with Quincey Koziol and Rebanta Mitra on March 20th, 2024


  • No labels