Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
Bring up environment variables
The prototype is located in a chamber, and with external cooling.
Carrier 1 contains a single ASIC functioning. Once the setup is brought up, the current consumption indicates 954mA. With one ASIC, if that ASIC is enabled the current consumption rises to 1.435A.
The temperature of the ASIC was measured during operation on a 100Hz trigger and was measured to be 27C in these conditions.
Carrier 2 contains all 4 ASICs functional. Once the setup is brought up with this carrier the current consumption indicates is also 954mA. Once the carrier is powered up, the consumed current reaches 2.164A, and once the ASICs are configured, the consumed current reaches 2.68Amps. The temperature seems to reach 54 degrees C.
The laser images were generated for all 4 ASICs as follows
First laser light ePixHrM platform
The status of the component tests are shown in the following table
Module | Description | Simulation | Test in hardware |
---|---|---|---|
RegisterControlDualClock | Tested AXI lite reads and writes, and waveform generation | Tested | Tested |
TrigControlAxi | Software and hardware trigger | Tested | Tested at 250MHzt |
AXiStreamRepeater | |||
DigitalAsicStreamAxiV2 | Generated data and sent to software | Tested | |
AxiStreamBatcherEventBuilder | Generated data and sent to software | Tested | |
AxiLiteSaciMaster | Read all values from ASIC | Tested | |
AppClk | All logic using clocks seem to work | OK | OK |
AppDeser | |||
PwrCtrl | Enabled and disable power | Tested | Partially tested |
DAC - Max5443 | Changes output in software and probed | Tested 0x0/0xffff → 0/3V | |
DAC - DacWaveformGenAxi | apply and measure on board | Tested 0x0/0xfffff → 0/2.5V | |
Slow ADCs | Work in progress. P&CB ADC work, but digital board ADC not responding. | ||
Fast ADCs | |||
Oscope | |||
TimingRx | |||
Chip scope pro | Trigger on read of AXI lite | JTAG and virtual cable tested | |
SI5345 Jitter cleaner | Wrote new csv files and testpoint | Tested | |
PROM | Wrote to PROM | Tested | |
JTAG | write bitstreams | - | tested |
Ref Clock | AXI clock working | tested | |
ASIC lanes | ASUC U3 (2) is outputting data on lane 1. debugging | ||
Serial number | Carrier serial number still does not work |
DAC measurements at C559 on digital board
Applied digital value | Measured value (V) |
---|---|
0x1ffff | 0.32 |
0x2ffff | 0.477 |
0x3ffff | 0.633 |
0x4ffff | 0.789 |
0x5ffff | 0.946 |
0x6ffff | 1.102 |
0x7ffff | 1.259 |
0x8ffff | 1.415 |
0x9ffff | 1.572 |
0xaffff | 1.728 |
0xbffff | 1.884 |
0xcffff | 2.041 |
0xdffff | 2.197 |
0xeffff | 2.352 |
0xfffff | 2.497 |
SN testing
Probed R7 on the carrier board. Was able to verify that the voltage on the DS2411 is as expected (2.5V). I was also able to observe the input signal coming from the FPGA. Seems like the FPGA is not interpreting this result correctly, or not reading it. Ila shows that the signal in the FPGA is dead.
Switching the pin assignment of serialNumber[2] and serialNumber[0] , the digital serial number died. so this means that the problem is somewhere outside of the FPGA.
Slow ADC testing
The digital board ADC is not responding. The first time the FPGA tried to interact with the ADC, I can see some signals on the digital output, and on the CLK input of the ADC. After that the FPGA waits infinitely for a reply from the ADC. The voltages of the ADC were both checked and verified to be as expected.
I would ask Lupe to soldier more wires to test, and I would probe the ADC and make sure the other pins are as expected.
Increased digital and analog voltage to 2.65V instead of 2.5V. Characterizing lanes again. For the meantime, Serial number does not seem to work as well. Something seems to be wrong on the path from the FPGA to the Carrier board.
ASIC | Functioning lanes (automatic calibration) |
---|---|
0 | All (locked 0xffffff) |
1 | 2 lanes unlocked (0x20001) |
2 | 5 lanes unlocked (0xd90000) |
3 | 2 lanes unlocked (0x010100) |
Work in progress. P&CB ADC work, but digital board ADC not responding.
Carrier 3 has all 4 ASICs functional. Once the setup is brought up with this carrier the current consumption indicates is 965mA. Once the carrier is powered up, the consumed current reaches 2.29A, and once the ASICs are configured, the consumed current reaches 2.808A. The temperature seems to reach 35.7 C. Triggering the system the total current consumption goes above 3A (3.030A).
Changed the IOSTANDARD on all data lanes to the ASIC to EQ_LEVEL1 in an attempt to imrpove lane locking, but no difference observed.
set_property -dict {IOSTANDARD LVDS DIFF_TERM_ADV TERM_100 DQS_BIAS TRUE EQUALIZATION EQ_LEVEL1} [get_ports {asicDataP[*][*]}]
set_property -dict {IOSTANDARD LVDS DIFF_TERM_ADV TERM_100 DQS_BIAS TRUE EQUALIZATION EQ_LEVEL1} [get_ports {asicDataN[*][*]}]
U1 (ASIC0) All lanes active
U2 (ASIC1) 1 lane not locked
U3 (ASIC2) 4 lanes not locked
U4 (ASIC3) all lanes locked
After Fixing the descrambling by inverting the reading inside ADC from top to bottom right to left
A1 (ASIC0)
U2 (ASIC1)
U3 (ASIC2)
U4 (ASIC3)
Occasionally interface errors happen. not clear what is the reason. Cannot write or read anything.
1689356297.917865:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356298.918923:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356299.919966:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356300.921032:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356301.922090:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356302.923134:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356303.924196:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356304.925257:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356305.926327:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356306.927388:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356307.928423:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356308.929486:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356309.929565:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure. 1689356310.930659:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
Ben believes that the blowoff is creating corrupted frames that the software unbatcher then segfaults while trying to parse. He needs a dump to debug.
I have this issue happening when I send a lot of frames and the software can't seem to be able to handle it. I configured the hardware to transmit 15000 frames at 5000 frames / second. Seems like the software can't keep up.
1690836274.390990:pyrogue.batcher.CoreV1: Not enough space (131648) for frame (147504) 1690836274.390995:pyrogue.batcher.CoreV1: Not enough space (146144) for frame (147504) 1690836274.390998:pyrogue.batcher.CoreV1: Not enough space (146144) for frame (147504) 1690836274.391000:pyrogue.batcher.CoreV1: Not enough space (146144) for frame (147504) 1690836275.476448:pyrogue.batcher.CoreV1: Not enough space (23264) for frame (1240877856) 1690836275.476462:pyrogue.batcher.CoreV1: Not enough space (23264) for frame (1240877856) 1690836275.476466:pyrogue.batcher.CoreV1: Not enough space (23264) for frame (1240877856) 1690836275.476469:pyrogue.batcher.CoreV1: Not enough space (134240) for frame (147504) 1690836275.476473:pyrogue.batcher.CoreV1: Not enough space (134240) for frame (147504) 1690836275.476475:pyrogue.batcher.CoreV1: Not enough space (134240) for frame (147504)
Ryan: That error means that their is an error parsing the frame, meaning that it needs to process 1240877856 bytes but only has 23264 remaining in the current frame.
Ben : A fifo is overflowing and truncating frames, corrupting them.
10000 frames @ 1000 FPS caused corruption as well.
Started investigation at the DigitalAsicStreamAxiV2.vhd. Surprisingly discovered that corruption seems to happen at a much slower rate 2 frames @10 Hz. Examining the overflow counters, they start counting already on the second frame arrival. Not clear yet how the second image is arriving to the server.
For each asic there are 24 fifos (1 per lane) of depth 512 of 19 bits running on a speed of 42 MHz ( ASICs readout at 168 MHz ). One lane is 3072 pixels / lane. Data is read from these fifos, a header is appended and fed into a single fifo of depth 8192 of 48 bytes at speed of 42 Mhz but read on speed of 156.25 MHz. Then through an AxiStreamResizer the width is reduced to 16 bytes. Then these 16 bytes are combined with timing and sent to core which also has a clock of 156.25 MHz. On several stages back pressure is applied and propagates all the way to the dual clock fifo, and the dual clock fifo is generating the full flag that is used to increment the overflow counter.
Got to the bottom of this. The number of triggers sent to the ASICs is higher than that sent to the hardware. That said, the logic was not ready, and the fifos were storing data. Fixing the order in software (StrtAutoTrigger function) fixed the issue. Acquisition of 5000 frames upto 5000 FPS were done with ASIC3 with no overflow detected.
Next, testing storing data, and all ASICs together.
Continue: When storing data is enabled, data overflow is observed in the fifos of the DigAsicStrmRegisters. The backpressure seems to propagate all the way to the beginning (Which is reasonable and good). Meaning that the all the Fifos are used too maximum potential. Around 70000 frames can be written to disk before an overflow can be detected.
The origin of the horizontal lines denoted in this figure cannot be explained. The first line seems to be flipped with the last.
A series of tests were done to identify if these lines come from the same image or from the previous acquisition.
I performed the following acquisition in sequence
The charge injection image is as follows
The sabtraction of 1 and 3 is the following
bottom line, there does not seem to be any artifact propagating from one image to another, so we will solve the issue by exchanging these lines in the descrambler.
After investigation, seems like
See image below after organizing the lines
This shift does not happen when lanes are disabled and enumerate feature is activated; meaning that the bug is anywhere before the enumerate assignment and is not the descrambler.
Next a simulation from end to end was done by injecting a fixed pattern in each lane and picking it up on software. The horizontal line shift is evident, but that is because the patch is applied. See images below
Row 142 is the row before the color shifts, and it should be row 143. Same is for ow 95 and 47 and 191. Debugging in simulation.
Once the patch is removed, the images come out without shift. So this rules out that the bug is in the firmware or the descrambler, and has to be in the ASIC.
After discussing with Lorenzo and Dionisio, we did some tests to inject some patterns before the 8b10Encoder by setting the ro_mode_i register to 0x1 (veritical strips), and 0x3 (ramp) without the work around.
The upper wrokaround did not seem reasonable as the rows are not from the same bank (lane), and it had to be within the bank (lane), so our next guess is the following
Shifting all row by 1 downwards, then setting
In other words, bank down rotation. The final image with a cross laser is as as follows
Applying the work around will screw up the image coming from any source on or after the 8b10bencoder. Here is an example of setting the ASIC register ro_mode_i to 0x3. The rows that stand out are 96 and 144 that come from 144 and 191 respectively.
The l2si-xpm server configuration is here . Some extra tips:
XPM/EVR: It means your triggering either follows the timing input XPM Partition/ReadoutGroup selection (XPM source) or it is decided separately from the timing input and EVR type event logic (FixedRate + Destination) which appears in EvrV2ChannelReg/EvrV2TriggerReg modules.
you can enable loopback in the xpm-server by choosing the link number, then setting the loopback register to 1.
you can test timing by looping back both sides and see if the link locks.
Two issues are observed with the LCLS-II timing integration.
For 1, for some reason, the transceiver is not locking. For 2, the state machine that sets RxLinkUp does not seem to function correctly. With a bad link, the transceiver seems to struggle to lock, and the signals do not seem to be set in the expected sequence. While if Ila is synthesized to monitor these signals, locking happens everytime, and relatively fast.
sds
+--------------------------+------------+-----------+-------+-----------+ | Memory Type | Total Used | Available | Util% | Inferred% | +--------------------------+------------+-----------+-------+-----------+ | URAM | 0 | 128 | 0.00 | 0.00 | | BlockRAM | 500.5 | 984 | 50.86 | 100.00 | | RAMB36E2 | 444 | | | 100.00 | | RAMB18E2 | 113 | | | 100.00 | | LUTMs as Distributed RAM | 7868 | 161280 | 4.88 | 100.00 | | LUTMs as RAM32X1D | 704 | | | 100.00 | | LUTMs as RAM32M16 | 4112 | | | 100.00 | | LUTMs as RAM32M | 380 | | | 100.00 | | LUTMs as RAM256X1D | 2672 | | | 100.00 | +--------------------------+------------+-----------+-------+-----------+
URAM usage is 0. URAM size is 288Kb
ASIC 2 lanes 23 22 and 16 seem to be always dead. Here is an acquisition from chipscope. From the looks of it, lanes 23 and 22 seem out of the norm. 16 seems to be sending data. Next analyse data
Talked to Dionisio and conclusion are
Observations
digital | carrier | image | notes |
---|---|---|---|
C00-02 | Copper strong back John doe (used in ASC) | ASIC 2 lanes 22 and 16 working | |
C00-02 | RX000 | ASIC 2 lanes 23, 22, 21, 16, 13, 9, 5, 1 not working | |
C00-01 | RX000 | ASIC 2 lanes 23, 22 and 16 not working | |
C00-03 | RX001 | ASIC 2 lane 22, 21, 20, 19, 16, 13, 5, 1 not working | |
C00-01 | Copper strong back John doe (used in ASC) | No thermal pad. no screws. | |
C00-01 | Copper strong back John doe (used in ASC) | No thermal pad. With carrier cover and with screws. | |
C00-01 | RX005 | with thermal pad and applying pressure on screws | |
C00-02 | RX005 | With thermal pad. Same lanes dead. | |
and we suspect that the digital board has a problem, most probably the AC coupling capacitors. Test with the prototype in ASC and one of our carriers to see if the lane failure problem persists.
Add this submodule to both projects https://github.com/slaclab/AsicRegMapping
version stuff and reproduce timing lock success at least 3 times: bitstream 6bbaaf4
Make power cable (Julian) - on it
Test all ASICs together at 5000 FPS
Port ADC stuff from TXI
Img descrambling in firmware
Connect transceivers to MM-SM converter
Fix timing: What goes on when RxLinkUp is up?
Make script for lorenzo
Send digital board to increase analog voltage to 2V0
Fix scrambling: replace first and last lines in each horizontal line and test
High speed acquisition seems to cause corruption. Fifo in firmware seems to get full. See what is going on.
Investigate delays
Fixed descrambling algorithm
Make a jupyter for Lorenzo
Charge injection seems messed up for columns
Test MM to SM converter boxes