Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

June 17, 2024

With Ryan, Larry, Mudit, Ric, Gabriel, cpo

...

Debugging session with Jeremy, Ryan, Gabriel, CPO and Ric
  - The issue with test_dma.cu failing on rdsrv403 is that the frame is too big for the buffer
    - Change PrbsTx.PacketLength to 0xff from 0xfff
    - test_dma then works on rdsrv403
    - Ryan found this from either AxiGpuAsyncCore.WriteAxiErrorVal or ReadAxiErrorVal having value 0x4
  - We modified test_dma.cu to write the AxiPcieCore.AxiVersion.scratchpad register with 0xdeadbeef
    - We saw 0xdeadbeef appear in the GUI
    - So GPU to KCU writes seem to be working
      - There is some worry that we don't know whether the path to the KCU might be going through the CPU
  - We modified test_dma.cu to replace the spin on the handshake location with a getchar() so we can dump after we trigger the read
    - We see that all the data is zeros rather than junk or random data
    - This explains why the spin never returns
    - AxiGpuAsyncCore.ReadAxiErrorVal shows an error value of 0x3 after attempting to transfer one frame
    - PrbsTx.Busy is 0
    - (Didn't catch the meaning of the error code other than that the transfer failed)
  - Jeremy determined that on gpu001 we're using the closed source nvidia driver rather than the open source one installed on rdsrv403
    - He'll set us up with the open source driver
  - Ryan points out that the two GPU  cards are different
    - CPO will lend them gpu002's A5000 to try in rdsrv403
    - rdsrv403 looks to have only 1 root complex with a number of hubs and bridges, different from our nodes'
  - If the problem is the root complex, it's not clear that we can rearrange the cards in our nodes to be on the same root complex due to slots and space constraints
    - CPO suggests moving to the HSD box in Lab 3 in that case because it has a much larger PCIe bus

June 24, 2024

We learned that there will be two separate free-lists: one for CPU, one for GPU

Two firmware requests:

  • LCLS would like a GPU-enabled TDet firmware (timing system triggers a fake-payload of user-defined size)
    • generates fake payloads 
    • Ryan said : that TID needs to come up with a block-diagram for this
  • For real detector: LCLS would like multiple detector lanes (8 for kcu) built by the Batching EB in in kcu1500 firmware
    • this is separate from the BEB on the FEB which joins timing to data 
    • a "partial" event builder (detectors still split 
    • this is currently done for epixHR (we think the old pre-existing timing-stream is disabled in the KCU BEB)
    • Mudit could modify the existing epixHR kcu1500 firmware, but eliminate timing system and expand to 8 data lanes
    • could we eliminate the front-end BEB to avoid a level of tail-iteration?  Ryan thinks maybe even could avoid the kcu tail-iteration somehow

There are two different uses of the word "batching": batching-event-builder (which cpo thinks of as being just an event builder) and batching in the sense of giving the KCU one "wakeup" call for multiple events (a "batch").

Example of two-level batching :event builder (BEB) in the FEB and KCU:

FEB "a" event-build for event 1 with two detector segments a,b and a timestamp t1FE segment a batch: t1,d1a,d1b event-built into "f1a"
FE segment b batchFEB "b" event-build for event 1 with two detector segments c,d and a timestamp t1: t1,d1c,d1d event-built into "f1b"

KCU batch event-build for event 1: f1a,f1b (static size for HR, UHR, but variable for sparkPIX).  Note that this is a partial event-build because other detector segments will be connected to other KCUs.  Existing LCLS software does that high-level event-build.

Note: will always get a packet from sparkpix from empty payload

July 1, 2024

  • we think gpu can write to kcu, but Jeremy will try to confirm that this is really the case.  we think kcu to gpu is broken.
  • Mudit will work on TDet firmware
  • Jeremy and TID will use rdsrv419
  • LCLS should find another machine where we can control the root-complex topology better.  lab3 daq-tst-dev06?
    • could be that the CPU datadev driver is incompatible with the GPU datadev driver
    • should probably find a different machine.  a fee-alcove machine?
  • chris and gabriel on vacation July 6-20
  • Jeremy unavailable July 8-10