Page History
...
From a meeting with Dan Damiani, Jana, Valerio, cpo on Jan 19, 2023. A zoom recording is here: https://pswww.slac.stanford.edu/swdoc/tutorials/jungfrau.mp4
KCU1500
For 16M integration into LCLS-II DAQ. See discussion with Larry here: https://slac.slack.com/archives/C5SEZCQD6/p1709091378312859
Repo with kcu1500 firmware: https://github.com/slaclab/lcls2-udp-pcie-apps/tree/main
Pictures
In the detector lab of a 0.5M module
...
- numbers shown for each 0.5M above are the serial numbers
- read out on two nodes because too much data for one (daq-cxi-jungfrau01 and 02)
- 40Gb nic in each machine, setup at 4 10Gb interfaces. MPO cables get broken out very near the detector into LC.
- 4 segment level processes (2 per node, one per quad) to allow more cpu parallelization
- "CxiDs1/0/Jungfrau/0" -S 0,2,8 flags in .cnf show the "parent" detector id and the modules of this on (-S 0,2 means 0,1 and -S 2,2 means 2,3)
- intercepted in the DSS nodes which puts together the 4 pieces into one CxiDs1/0/Jungfrau/0 detector in the final .cnf
- pdsapp/tools/JungfrauSegBuilder.cc does this. Included in Recorder.cc. does something on both configure and l1accept. FrameCacheIter is holding pieces while before they are memcopied onto the end
Design of KCU1500 Firmware for LCLS2
A conversion with Larry Ruckman on slack on Feb. 29, 2024
Link is here: https://slac.slack.com/archives/C5SEZCQD6/p1709091378312859
After talking with @ddamiani I think it would be most useful to have a batching-event-builder to join together the udp packets coming in on the various lanes. We also need the firmware timestamping done in the kcu1500, since we can’t do it at the camera.
20 replies
...
- 1GbE or 10GbE for each fiber optic lane?
- Only 1 UDP port per fiber optic lane or multiple UDP ports?
- If multiple, how do you want to address potential
- Will the KCU1500 be a UDP server or UDP client?
- Does this KCU1500 send fiber triggers?
- Does the KCU1500 need to do bi-directional communication to configure sensor(s)? Or is only a "listener" of streaming data?
- If not configuration, how is the configuration done?
- From this
udp packets coming in on the various lanes
statement, are we only batching 1 optic lane w/ event building (1 event builder per KCU1500 fiber optic data lane) or need to batch all UDP lanes (up to 6 on the KCU1500) into a single event building (1 event builder per KCU1500)? - To confirm: point-to-point and no ETH switch between the KCU1500 and sensor(s)?
- Is there only 1 UDP frame per DAQ trigger per fiber optic lane?
- Default IP/MAC addresses and default UDP port that you want the KCU1500 to be for receiving data?
- What's the name of the sensor generating the data? I want to match the Github repo name with it.
- What's the max. number of UDP lanes that this KCU1500 need to support?
- LCLS-I timing only, LCLS-II only, or both?
- If LCLS-I timing only (max. 120 Hz triggering), why not do this in software with a COTS NIC card in the same PC as the TPR?
Those are all good questions @ruckman. I will talk with @ddamiani and get back to you with answers today.
each lane is 10Gbe
one port per lane
3. just receives packets
4. no the detector is triggered by ttl from a tpr
5. kcu is only a listener. The configuration of the detector is done over a separate 1GbE copper interface
7. no switch inbetween
8. 128 frames per DAQ trigger - these are the packets that need to collected together to make up the detector data
Chris, each lane is a separate module so each lane can be treated more less separately what buidling in the kcu do we need to do across lanes?
Thank you Dan, that’s very useful. Given that, it feels like we need a batching-event-builder on the kcu1500 that event-builds the udp packets AND a timing packet (we will plug Matt’s timing fiber into the KCU). The batching event-builder in this case is unusual: we need 128 udp packets per trigger. Maybe we’ll need to discuss what is best for that? 128 could be “hardcoded”, I think.
9. Dan said that you can set mac address and ip address in the kcu1500 to whatever you want. He can program the camera to send to “anything”. He can also program the camera-side mac/ip to anything that would help you.
10. “Jungfrau”
11. We think we would like to have 7 UDP lanes and 1 timing lane.
12, Only LCLS-II timing.
I think that answers all your (very useful) questions. Let us know if more questions arise.
13) The bandwidth of the KCU1500 is ~48Gb/s for moving data on the PCIe bus. If you have 7 UDP lanes into a single KCU1500, that should be more bandwidth than PCIe bus can move potentially. What's your mitigation strategy?
14) How do you plan to assert back pressure from the KCU1500 to the TRP for stopping DAQ triggers?
15) I don't think the FW event batcher will make timing if we have 128 different UDP frames routed to it. Can I use a different batcher to pre-process the 128 UDP frames into a "single" frame that feeds into the batcher that comes the data and timing together?
15) yes that should be fine
Hi Larry,13) for the foreseeable future the detector trigger rate will be 120Hz. So 7 lanes give 0.5Mpixel*2bytes/pixel*120*7=840MB/s which should be good. Some day in the distant future when the trigger rate increases the traffic will be spread out of more UDP fibers, and KCU cards. Note that the camera currently has 32 UDP fibers, so there would be 4 nodes with 7 fibers, and 1 node with 4 fibers.14) the TPR will subscribe to a DAQ readout group. The timing link on the KCU card will assert backpressure to the XPM generating the readout group triggers and cause the jungfrau triggers to stop when we cross the usual buffer “high water mark” in the KCU.15) I agree with Dan that your idea is a good one: pre-processing the 128 frames into one feels like a reasonable solution.
One other thought: @ddamiani points out that the 128 UDP packets show up in a fixed but unnatural order. I can think of three options to get the data in a natural order:
- have a programmable register that allows us to specify the desired fixed packet order out of the pre-processor
- have the firmware “spy” on the UDP packet content to determine the order. this is encoded in a header, but feels to me like it would be more awkward than (1) for firmware
- have software do the sorting
I would (perhaps selfishly) vote for (1). What do you think?