Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Debugging session with Jeremy, Ryan, Gabriel, CPO and Ric
  - The issue with test_dma.cu failing on rdsrv403 is that the frame is too big for the buffer
    - Change PrbsTx.PacketLength to 0xff from 0xfff
    - test_dma then works on rdsrv403
    - Ryan found this from either AxiGpuAsyncCore.WriteAxiErrorVal or ReadAxiErrorVal having value 0x4
  - We modified test_dma.cu to write the AxiPcieCore.AxiVersion.scratchpad register with 0xdeadbeef
    - We saw 0xdeadbeef appear in the GUI
    - So GPU to KCU writes seem to be working
      - There is some worry that we don't know whether the path to the KCU might be going through the CPU
  - We modified test_dma.cu to replace the spin on the handshake location with a getchar() so we can dump after we trigger the read
    - We see that all the data is zeros rather than junk or random data
    - This explains why the spin never returns
    - AxiGpuAsyncCore.ReadAxiErrorVal shows an error value of 0x3 after attempting to transfer one frame
    - PrbsTx.Busy is 0
    - (Didn't catch the meaning of the error code other than that the transfer failed)
  - Jeremy determined that on gpu001 we're using the closed source nvidia driver rather than the open source one installed on rdsrv403
    - He'll set us up with the open source driver
  - Ryan points out that the two GPU  cards are different
    - CPO will lend them gpu002's A5000 to try in rdsrv403
    - rdsrv403 looks to have only 1 root complex with a number of hubs and bridges, different from our nodes'
  - If the problem is the root complex, it's not clear that we can rearrange the cards in our nodes to be on the same root complex due to slots and space constraints
    - CPO suggests moving to the HSD box in Lab 3 in that case because it has a much larger PCIe bus

June 24, 2024

  • two separate free-lists: one for CPU, one for GPU
  • LCLS would like a GPU-enabled TDet firmware (timing system triggers a fake-payload of user-defined size)
    • generates fake payloads 
    • Ryan said: that TID needs to come up with a block-diagram for this
  • For real detector: LCLS would like multiple detector lanes (8 for kcu) built by the Batching EB in in kcu1500 firmware
    • this is separate from the BEB on the FEB which joins timing to data 
    • a "partial" event builder (detectors still split 
    • this is currently done for epixHR (we think the old pre-existing timing-stream is disabled in the KCU BEB)
    • Mudit could modify the existing epixHR kcu1500 firmware, but eliminate timing system and expand to 8 data lanes
    • could we eliminate the front-end BEB to avoid a level of tail-iteration?  Ryan thinks maybe even could avoid the kcu tail-iteration somehow

Example of batching:

FE segment a batch: t1,d1a,d1b event-built into "f1a"
FE segment b batch: t1,d1c,d1d event-built into "f1b"

KCU batch: f1a,f1b (static size for HR, UHR, but variable for sparkPIX)

will always get a packet from sparkpix from empty payload