The axi-pcie-devel package is supplied with a test application to verify KCU/GPU interaction. It is comprised of the interCardGui
application and the test_dma
program. These are used with the InterCardTest
firmware that should be loaded on the KCU. The interCardGui
brings up a devGui
-like GUI that provides access to the registers provided by the InterCardTest
firmware on the KCU.
Running the test
After building and installing the datagpu
driver using comp_and_load_drivers.sh (in the aes-stream-drivers package), go to the gpu
directory in a axi-pcie-devel
check-out to launch the interCardGui
:
cd ~/git/axi-pcie-devel/software/gpu/ ../scripts/interCardGui
There was an issue launching it from another directory at one point, but this may have been fixed. The GUI connects with the /dev/datagpu_0
device by default, but another may be select by using the --dev
option. The following GUI should come up:
Depending on the state of the system, it may be desirable to reset the firmware state by opening the AxiPcieCore.AxiVersion blocks and clicking on UserRst to clear out any previous state:
From another prompt, start the test_dma program:
cd ~/git/axi-pcie-devel/software/gpu/ sudo ./bin/test_dma
This program also interacts with /dev/datagpu_0
by default. Another device may be selected using the -d
option. The program will print some stuff and then pause:
(rogue_v6.1.3) claus@drp-srcf-gpu001:gpu$ sudo ./bin/test_dma [sudo] password for claus: Total devices 1 Selected device: NVIDIA RTX A5000 Global memory: 24026 MB 64-bit Memory Address support Setting write pointer: 0x7f9885600000 - 65536 Setting read pointer: 0x7f9885610000 - 65536 Done with pointers Mapping FPGA registers swFpgaRegs = 0x7f98a36bd000 Enabling IO memory for FPGA registers Mapping write start register Mapping read start register Mapped FPGA registers Create stream write memory Trigger write Wait memory value
At this point it is useful to click Read All
at the bottom of the interCardGui to verify that various registers look reasonable. The dmesg
program also shows some output from the datagpu
driver that might be of interest.
To trigger a DMA sequence, click on the OneShot
Exec
button in the PrbsTx
block of the interCardGui
. This should result in additional printout from test_dma
:
Done waiting Stream Sync Context sync data: 0 0x 2000000 - 0x 1 data: 1 0x 2000 - 0x 0 data: 2 0x 0 - 0x 0 data: 3 0x 0 - 0x 0 data: 4 0x 0 - 0x 0 data: 5 0x 0 - 0x 0 data: 6 0x 0 - 0x 0 data: 7 0x 0 - 0x 0 data: 8 0x 1 - 0x ff data: 9 0x 0 - 0x 0 ...
Click on Read All
in interCardGui
again to update the register values. The AxiGpuAsyncCore
block shows some statistics that may be of interest. Here it indicates that both the write
and read
DMAs completed successfully:
Problems seen
When we started working with the InterCardTest, we found that the DMAs worked fine on some machines but would not occur on others. The systems that worked fine are ones used by the TID development group, rdsrv419, rdsrv415, etc. These run the Ubuntu 22.04 OS. On the LCLS nodes drp-srcf-gpu001 and daq-tst-dev06, the test would fail. These hosts run RHEL7 3.10.0-1160.
Further, when we modified the test_dma.cu source code to write to the AxiPcieCore.AxiVersion.ScratchPad register, it would succeed fro gpu001 and fail for dev06.