Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

June 17, 2024

With Ryan, Larry, Mudit, Ric, Gabriel, cpo

...

Example of two-level batching event builder (BEB) in the FEB and KCU:

FEB segment a batch"a" event-build for event 1 with two detector segments a,b and a timestamp t1: t1,d1a,d1b event-built into "f1a"
FEB segment b batch"b" event-build for event 1 with two detector segments c,d and a timestamp t1: t1,d1c,d1d event-built into "f1b"

KCU batch event-build for event 1: f1a,f1b (static size for HR, UHR, but variable for sparkPIX).  Note that this is a partial event-build because other detector segments will be connected to other KCUs.  Existing LCLS software does that high-level event-build.

Note: will always get a packet from sparkpix from empty payload

July 1, 2024

  • we think gpu can write to kcu, but Jeremy will try to confirm that this is really the case.  we think kcu to gpu is broken.
  • Mudit will work on TDet firmware
  • Jeremy and TID will use rdsrv419
  • LCLS should find another machine where we can control the root-complex topology better.  lab3 daq-tst-dev06?
    • could be that the CPU datadev driver is incompatible with the GPU datadev driver
    • should probably find a different machine.  a fee-alcove machine?
  • chris and gabriel on vacation July 6-20
  • Jeremy unavailable July 8-10

July 8, 2024

  • Jeremy and CPO are on vacation
  • Chris and I have set up daq-tst-dev06 in Lab 3 for testing with a GPU & KCU
    • The KCU appears as /dev/datagpu_1
    • After some debugging to get the software to recognize /dev/datagpu_1 (as opposed to _0) the interCardGui comes up and shows sensible values
    • test_dma also runs, but the AxiVersion.ScratchPad register does not go from 0x0 to 0xdeadbeef
      • Neither does test_dma see any DMAs
  • Larry urges to move axi-pcie-devel forward to v4.1.0 rather than working with v4.0.0
    • Mudit has created v4.2.0 (CPU/GPU DMA switch?) but it has not been tested with hardware yet
    • It is unclear whether Jeremy has gotten a baseline working system yet
  • Ryan agreed that my hacks of test_dma.cu and _Root.py to target/dev/datagpu_1 should be sufficient
  • Ryan would like us all to work on one machine and get confidence in it before we branch out to different machines
  • I suggested that we could pull out the current /dev/datagpu_0 from dev06 so that the KCU of interest becomes _0
    • Ryan & Larry suggested pullling the other ones out as well
  • Ryan suggests setting iommu=no in the BIOS as well as on the command line
    • I later found that there is no iommu parameter in dev06's BIOS
  • There's no experience with multiple KCU cards and the datagpu driver - it is untested
    • Larry is aware of some issue with the usual datadev driver and multiple KCUs
  • Work on rdsrv419
    • A GPU is installed
    • CUDA is installed
    • A KCU is installed and has PCIe slot 03:00

July 15, 2024

  • CPO and Ryan on vacation
  • Let's wait with contacting NVIDIA without GPUDirect issues until Chris gets back
  • Ric to look into the BIOS settings again to see if we can glean something

  • Jeremy is finding that when he executes KCU register reads with various methods, he always gets back the Version value from register offset 0

    • So far, he's baffled