Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Task Overview

Goals:

  • 35kHz 16Mpx 1TB/s MFX area detectors
  • 50GB/s per into GPU (SZ/LC and calibration, or calibration+ML), avoid CPU where possible, 5GB/s output
    • "pipeline" as much as possible. hopefully enough buffering to handle the software trigger latency (Ric's TEB)

Approach:

  • use nvidia/cuda for now
    • in future consider kokkos/hip etc.
  • python will be less useful
    • still useful for managing the slow transitions (right now C++ handles the 35kHz trigger data)
    • not as useful if we can keep the L1Accept path entirely on the GPU
  • nvlink would be nice, but likely the protocol isn't open to us
  • hopefully GPU can launch its own kernels
  • CPU would be used for:
    • handling the transitions (configure, beginrun, etc...)
    • monitoring and trigger info transmit/receive

Algorithms:

  • SZ3/LC compression
  • ML algorithms (peak-finding, triggering on teb?)
  • detector calibration (Gabriel?)
  • validation of reduction algorithms (stefano)

Infrastructure:

  • worry about pcie performance (pcie7)
  • use drp-srcf-gpu[001-004]. cuda 12
  • datadev driver (TID)
  • GPU-based file-writing with gpu-direct (Gabriel?)
    • generating correct xtc headers
  • drp-gpu executable (ric?)
    • need to solve the trigger-problem
    • need to solve the monitoring-problem
  • move to spack
  • test that GPU-compressed data can be CPU-decompressed by psana (especially SZ3)

Driver

Meeting with Larry, Ryan, Matt, Ric, cpo on Feb. 9, 2024

...

Diagram of system from conversation with Quincey Koziol and Rebanta Mitra on March 20th, 2024

Task Overview

Goals:

  • 35kHz
  • 1TB/s area detectors
  • 50GB/s per GPU, avoid CPU where possible

Algorithms:

  • SZ/LC compression
  • ML algorithms (peak-finding, triggering on teb?)
  • detector calibration
  • validation of reduction (stefano)

Infrastructure:

Update about putting SLAC’s FPGA’s on the NVLink bus, from someone at NVIDIA who’s close to the NVLink work:  

- It's not possible today
- Or better said, it would be very hard today. :-)
- It could be possible to connect them to the Arm C2C link, which speaks the standard Arm CHI protocol.
- NVLink is a multiplanar network. You would need to connect all of the FPGAs to all 18 planes of the network because the GPU does an address based spray across the planes.

    In that direction, here’s info about NVLink-C2C (which is what I believe that he was referring to):   https://www.nvidia.com/en-us/data-center/nvlink-c2c/ and I think this quote from that page is relevant:

"Supports Arm’s AMBA CHI (Coherent Hub Interface) or Compute Express Link (CXL) industry standard protocols for interoperability between devices.”

GPU Direct Storage

a.k.a GDS.  Supported by weka: https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#gds-config-file-changes

datadev Driver Meeting With TID

On May 1, 2024

Slides from Ryan and Jeremy: https://docs.google.com/presentation/d/1yJ-WIs73lon9LzBYxIhKKyNvoYWAq8OUNDiN_TOD2Pw/edit?usp=sharing

Useful docs and links