Carrier Tests

Bring up environment variables

The prototype is located in a chamber, and with external cooling.

Carrier 1

Carrier 1 contains a single ASIC functioning. Once the setup is brought up, the current consumption indicates 954mA. With one ASIC, if that ASIC is enabled the current consumption rises to 1.435A.

The temperature of the ASIC was measured during operation on a 100Hz trigger and was measured to be 27C in these conditions.

Carrier 2

Carrier 2 contains all 4 ASICs functional. Once the setup is brought up with this carrier the current consumption indicates is also 954mA. Once the carrier is powered up, the consumed current reaches 2.164A, and once the ASICs are configured, the consumed current reaches 2.68Amps. The temperature seems to reach 54 degrees C.

Carrier 2 laser tests

The laser images were generated for all 4 ASICs as follows

First laser light ePixHrM platform

carrier 2 - ASIC0 (U1) all lanes locked

Carrier 2 - ASIC3 (U4) - one lane not locked

Carrier 2 - ASIC1 (U2) - 2 lanes unlocked, for some reason, laser light is very fade. Temperature reached 64 degrees.

Carrier 2 - ASIC2 (U3) - 7 lanes unlocked

Digital board 001

The status of the component tests are shown in the following table

Module	Description	Simulation	Test in hardware
RegisterControlDualClock	Tested AXI lite reads and writes, and waveform generation	Tested	Tested
TrigControlAxi	Software and hardware trigger	Tested	Tested at 250MHzt
AXiStreamRepeater
DigitalAsicStreamAxiV2	Generated data and sent to software	Tested
AxiStreamBatcherEventBuilder	Generated data and sent to software	Tested
AxiLiteSaciMaster	Read all values from ASIC		Tested
AppClk	All logic using clocks seem to work	OK	OK
AppDeser
PwrCtrl	Enabled and disable power	Tested	Partially tested
DAC - Max5443	Changes output in software and probed		Tested 0x0/0xffff → 0/3V
DAC - DacWaveformGenAxi	apply and measure on board		Tested 0x0/0xfffff → 0/2.5V
Slow ADCs			Work in progress. P&CB ADC work, but digital board ADC not responding.
Fast ADCs
Oscope
TimingRx
Chip scope pro	Trigger on read of AXI lite		JTAG and virtual cable tested
SI5345 Jitter cleaner	Wrote new csv files and testpoint		Tested
PROM	Wrote to PROM		Tested
JTAG	write bitstreams	-	tested
Ref Clock	AXI clock working		tested
ASIC lanes			ASUC U3 (2) is outputting data on lane 1. debugging
Serial number			Carrier serial number still does not work

DAC measurements at C559 on digital board

Applied digital value	Measured value (V)
0x1ffff	0.32
0x2ffff	0.477
0x3ffff	0.633
0x4ffff	0.789
0x5ffff	0.946
0x6ffff	1.102
0x7ffff	1.259
0x8ffff	1.415
0x9ffff	1.572
0xaffff	1.728
0xbffff	1.884
0xcffff	2.041
0xdffff	2.197
0xeffff	2.352
0xfffff	2.497

SN testing

Probed R7 on the carrier board. Was able to verify that the voltage on the DS2411 is as expected (2.5V). I was also able to observe the input signal coming from the FPGA. Seems like the FPGA is not interpreting this result correctly, or not reading it. Ila shows that the signal in the FPGA is dead.

Switching the pin assignment of serialNumber[2] and serialNumber[0] , the digital serial number died. so this means that the problem is somewhere outside of the FPGA.

Slow ADC testing

The digital board ADC is not responding. The first time the FPGA tried to interact with the ADC, I can see some signals on the digital output, and on the CLK input of the ADC. After that the FPGA waits infinitely for a reply from the ADC. The voltages of the ADC were both checked and verified to be as expected.

I would ask Lupe to soldier more wires to test, and I would probe the ADC and make sure the other pins are as expected.

Digital board C00-02

Increased digital and analog voltage to 2.65V instead of 2.5V. Characterizing lanes again. For the meantime, Serial number does not seem to work as well. Something seems to be wrong on the path from the FPGA to the Carrier board.

ASIC	Functioning lanes (automatic calibration)
0	All (locked 0xffffff)
1	2 lanes unlocked (0x20001)
2	5 lanes unlocked (0xd90000)
3	2 lanes unlocked (0x010100)

Slow ADC testing

Work in progress. P&CB ADC work, but digital board ADC not responding.

Carrier 3 testing w/ digital board 002

Carrier 3 has all 4 ASICs functional. Once the setup is brought up with this carrier the current consumption indicates is 965mA. Once the carrier is powered up, the consumed current reaches 2.29A, and once the ASICs are configured, the consumed current reaches 2.808A. The temperature seems to reach 35.7 C. Triggering the system the total current consumption goes above 3A (3.030A).

Carrier 3 laser tests automatic lock

ASIC0 (U1) two lanes disabled (0x1002)

ASIC1 (U2) two lanes disabled (0x100001)

ASIC2 (U3) 4 lanes disabled (0xc90000)

ASIC3 (U4) All lanes active

ASIC3 DAC test

ASIC3 Charge injection columns 50 to 100

Carrier 3 lane delay eye plots

ASIC0 (all lanes recovered)

After testing with EQ_LEVEL0 discovered:

Lane 1 least error is ~1/second. At no delay value 0 is achieved while no trigger is provided. Also eye diagram seems to change when power cycling the ASIC. Setting the delay to a value of 0 seems to get it to lock permanently after a while. Although on high speeds (5000FPS) lane 1 does seem to cause some timeouts. Will be disabled.
Although Lane 8 is locked and never counts an error, it occasionally times out in the DigAsicStrmRegister0. this lane is disabled in yml. There were times where we tested with this lane enabled
The frames of the first second are always lost
- 2000 frames @ 1000 FPS : 1001 frames arrive
- 1000 frames @ 1000 FPS : 1 frame
- 5000 frames @ 5000 FPS : 0 frames
- 8000 frames @ 2000 FPS : 6000 frames
- 8000 frames @ 1000 FPS : 6991 frames
- 15000 frames @ 5000 FPS : 10483 frames (5784 at writer. Buffer size not enough. After talking to Ryan and Ben, seems like corruption is happening.)

Occasionally lanes timed out. In a 15000 acquisition,

Timeout info for 15000 acquisition

      DigAsicStrmRegisters0:
        enable: True
        FrameCount: 10483
        FrameSize: 3071
        FrameMaxSize: 3071
        FrameMinSize: 3071
        asicDataReq: 3071
        DisableLane: 0x100
        EnumerateDisLane: 0xffffff
        TimeoutCntLane[0]: 3
        TimeoutCntLane[1]: 98
        TimeoutCntLane[2]: 3
        TimeoutCntLane[3]: 3
        TimeoutCntLane[4]: 0
        TimeoutCntLane[5]: 2
        TimeoutCntLane[6]: 2
        TimeoutCntLane[7]: 2
        TimeoutCntLane[8]: 72
        TimeoutCntLane[9]: 3
        TimeoutCntLane[10]: 3
        TimeoutCntLane[11]: 4
        TimeoutCntLane[12]: 2
        TimeoutCntLane[13]: 3
        TimeoutCntLane[14]: 3
        TimeoutCntLane[15]: 2
        TimeoutCntLane[16]: 25
        TimeoutCntLane[17]: 3
        TimeoutCntLane[18]: 2
        TimeoutCntLane[19]: 3
        TimeoutCntLane[20]: 2
        TimeoutCntLane[21]: 2
        TimeoutCntLane[22]: 3
        TimeoutCntLane[23]: 3

Testing with EQ_LEVEL1:

Lane 1 still times out and has errors detected.
Lane 8 seems to have improved slightly. No timeouts were observed during the tests. Increasing the delay to 400 seemed to have resolved it's problems.

ASIC1 (1 lane not recovered - lane 0)

After testing with EQ_LEVEL0 discovered:

Lane 0 dead
The frames of the first second are always lost
- 5000 frames @ 1000 FPS : 4000 frames arrive

EQ_LEVEL1:

Lane 0 still dead

ASIC2 (4 lanes not recovered - lanes 23,22, 19, 16)

After testing with EQ_LEVEL0 discovered:

lanes 23,22, 19, 16 are dead. no delay brings seems to bring them back to life
5000 frames @ 1000 FPS : 3997 frames arrive
Timeouts are observed occasionally on some lanes
DigAsicStrmRegisters2:
enable: True
FrameCount: 8430
FrameSize: 3071
FrameMaxSize: 3071
FrameMinSize: 3071
asicDataReq: 3071
DisableLane: 0xc90000
EnumerateDisLane: 0xffffff
TimeoutCntLane[0]: 3
TimeoutCntLane[1]: 3
TimeoutCntLane[2]: 3
TimeoutCntLane[3]: 3
TimeoutCntLane[4]: 3
TimeoutCntLane[5]: 3
TimeoutCntLane[6]: 3
TimeoutCntLane[7]: 3
TimeoutCntLane[8]: 3
TimeoutCntLane[9]: 3
TimeoutCntLane[10]: 3
TimeoutCntLane[11]: 3
TimeoutCntLane[12]: 3
TimeoutCntLane[13]: 3
TimeoutCntLane[14]: 3
TimeoutCntLane[15]: 3
TimeoutCntLane[16]: 0
TimeoutCntLane[17]: 3
TimeoutCntLane[18]: 3
TimeoutCntLane[19]: 0
TimeoutCntLane[20]: 3
TimeoutCntLane[21]: 3
TimeoutCntLane[22]: 0
TimeoutCntLane[23]: 0

ASIC3 (all lanes ok)

After testing with EQ_LEVEL0 discovered:

Manual delay used, and all lanes seem ok
4000 frames @ 1000 FPS : 3000 frames arrive

Changed the IOSTANDARD on all data lanes to the ASIC to EQ_LEVEL1 in an attempt to imrpove lane locking, but no difference observed.

set_property -dict {IOSTANDARD LVDS DIFF_TERM_ADV TERM_100 DQS_BIAS TRUE EQUALIZATION EQ_LEVEL1} [get_ports {asicDataP[*][*]}]

set_property -dict {IOSTANDARD LVDS DIFF_TERM_ADV TERM_100 DQS_BIAS TRUE EQUALIZATION EQ_LEVEL1} [get_ports {asicDataN[*][*]}]

Laser images with the new descrambling

U1 (ASIC0) All lanes active

U2 (ASIC1) 1 lane not locked

U3 (ASIC2) 4 lanes not locked

U4 (ASIC3) all lanes locked

After Fixing the descrambling by inverting the reading inside ADC from top to bottom right to left

A1 (ASIC0)

U2 (ASIC1)

U3 (ASIC2)

U4 (ASIC3)

Rogue bugs

Issue 1

Occasionally interface errors happen. not clear what is the reason. Cannot write or read anything.

1689356297.917865:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356298.918923:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356299.919966:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356300.921032:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356301.922090:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356302.923134:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356303.924196:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356304.925257:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356305.926327:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356306.927388:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356307.928423:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356308.929486:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356309.929565:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.
1689356310.930659:pyrogue.axi.AxiStreamDma: AxiStreamDma::acceptReq: Timeout waiting for outbound buffer after 1.0 seconds! May be caused by outbound back pressure.

Ben believes that the blowoff is creating corrupted frames that the software unbatcher then segfaults while trying to parse. He needs a dump to debug.

Issue 2

I have this issue happening when I send a lot of frames and the software can't seem to be able to handle it. I configured the hardware to transmit 15000 frames at 5000 frames / second. Seems like the software can't keep up.

1690836274.390990:pyrogue.batcher.CoreV1: Not enough space (131648) for frame (147504)
1690836274.390995:pyrogue.batcher.CoreV1: Not enough space (146144) for frame (147504)
1690836274.390998:pyrogue.batcher.CoreV1: Not enough space (146144) for frame (147504)
1690836274.391000:pyrogue.batcher.CoreV1: Not enough space (146144) for frame (147504)
1690836275.476448:pyrogue.batcher.CoreV1: Not enough space (23264) for frame (1240877856)
1690836275.476462:pyrogue.batcher.CoreV1: Not enough space (23264) for frame (1240877856)
1690836275.476466:pyrogue.batcher.CoreV1: Not enough space (23264) for frame (1240877856)
1690836275.476469:pyrogue.batcher.CoreV1: Not enough space (134240) for frame (147504)
1690836275.476473:pyrogue.batcher.CoreV1: Not enough space (134240) for frame (147504)
1690836275.476475:pyrogue.batcher.CoreV1: Not enough space (134240) for frame (147504)

Ryan: That error means that their is an error parsing the frame, meaning that it needs to process 1240877856 bytes but only has 23264 remaining in the current frame.

Ben : A fifo is overflowing and truncating frames, corrupting them.

10000 frames @ 1000 FPS caused corruption as well.

Started investigation at the DigitalAsicStreamAxiV2.vhd. Surprisingly discovered that corruption seems to happen at a much slower rate 2 frames @10 Hz. Examining the overflow counters, they start counting already on the second frame arrival. Not clear yet how the second image is arriving to the server.

For each asic there are 24 fifos (1 per lane) of depth 512 of 19 bits running on a speed of 42 MHz ( ASICs readout at 168 MHz ). One lane is 3072 pixels / lane. Data is read from these fifos, a header is appended and fed into a single fifo of depth 8192 of 48 bytes at speed of 42 Mhz but read on speed of 156.25 MHz. Then through an AxiStreamResizer the width is reduced to 16 bytes. Then these 16 bytes are combined with timing and sent to core which also has a clock of 156.25 MHz. On several stages back pressure is applied and propagates all the way to the dual clock fifo, and the dual clock fifo is generating the full flag that is used to increment the overflow counter.

Overflow detected

Got to the bottom of this. The number of triggers sent to the ASICs is higher than that sent to the hardware. That said, the logic was not ready, and the fifos were storing data. Fixing the order in software (StrtAutoTrigger function) fixed the issue. Acquisition of 5000 frames upto 5000 FPS were done with ASIC3 with no overflow detected.

Next, testing storing data, and all ASICs together.

Continue: When storing data is enabled, data overflow is observed in the fifos of the DigAsicStrmRegisters. The backpressure seems to propagate all the way to the beginning (Which is reasonable and good). Meaning that the all the Fifos are used too maximum potential. Around 70000 frames can be written to disk before an overflow can be detected.

Mysterious horizontal flipped lines

The origin of the horizontal lines denoted in this figure cannot be explained. The first line seems to be flipped with the last.

A series of tests were done to identify if these lines come from the same image or from the previous acquisition.

I performed the following acquisition in sequence

I acquired an image without any special setting
Enabled charge injection on columns 80 to 100, and acquired an image
I acquired another image with no special setting.

The charge injection image is as follows

The sabtraction of 1 and 3 is the following

bottom line, there does not seem to be any artifact propagating from one image to another, so we will solve the issue by exchanging these lines in the descrambler.

After investigation, seems like

Data of row 47 comes in row 95
Data of row 95 comes in row 143
Data of row 143 comes in row 191
Data of row 191 comes in row 47

See image below after organizing the lines

This shift does not happen when lanes are disabled and enumerate feature is activated; meaning that the bug is anywhere before the enumerate assignment and is not the descrambler.

Next a simulation from end to end was done by injecting a fixed pattern in each lane and picking it up on software. The horizontal line shift is evident, but that is because the patch is applied. See images below

Row 142 is the row before the color shifts, and it should be row 143. Same is for ow 95 and 47 and 191. Debugging in simulation.

Once the patch is removed, the images come out without shift. So this rules out that the bug is in the firmware or the descrambler, and has to be in the ASIC.

After discussing with Lorenzo and Dionisio, we did some tests to inject some patterns before the 8b10Encoder by setting the ro_mode_i register to 0x1 (veritical strips), and 0x3 (ramp) without the work around.

The upper wrokaround did not seem reasonable as the rows are not from the same bank (lane), and it had to be within the bank (lane), so our next guess is the following

Shifting all row by 1 downwards, then setting

row 47 to 0
row 95 to 48
row 143 to 96
row 191 to 144

In other words, bank down rotation. The final image with a cross laser is as as follows

Applying the work around will screw up the image coming from any source on or after the 8b10bencoder. Here is an example of setting the ASIC register ro_mode_i to 0x3. The rows that stand out are 96 and 144 that come from 144 and 191 respectively.

Locking to LCLS-II timing

The l2si-xpm server configuration is here . Some extra tips:

XPM/EVR: It means your triggering either follows the timing input XPM Partition/ReadoutGroup selection (XPM source) or it is decided separately from the timing input and EVR type event logic (FixedRate + Destination) which appears in EvrV2ChannelReg/EvrV2TriggerReg modules.

you can enable loopback in the xpm-server by choosing the link number, then setting the loopback register to 1.

you can test timing by looping back both sides and see if the link locks.

Two issues are observed with the LCLS-II timing integration.

Timing does not lock at all. Decoder and disparity errors infinitely counts
Decoder and disparity errors stop counting, sof and eof start counting but RxLinkUp never goes to up.

For 1, for some reason, the transceiver is not locking. For 2, the state machine that sets RxLinkUp does not seem to function correctly. With a bad link, the transceiver seems to struggle to lock, and the signals do not seem to be set in the expected sequence. While if Ila is synthesized to monitor these signals, locking happens everytime, and relatively fast.

RAM usage

sds

+--------------------------+------------+-----------+-------+-----------+
| Memory Type              | Total Used | Available | Util% | Inferred% |
+--------------------------+------------+-----------+-------+-----------+
| URAM                     |          0 |       128 |  0.00 |      0.00 |
| BlockRAM                 |      500.5 |       984 | 50.86 |    100.00 |
|  RAMB36E2                |        444 |           |       |    100.00 |
|  RAMB18E2                |        113 |           |       |    100.00 |
| LUTMs as Distributed RAM |       7868 |    161280 |  4.88 |    100.00 |
|  LUTMs as RAM32X1D       |        704 |           |       |    100.00 |
|  LUTMs as RAM32M16       |       4112 |           |       |    100.00 |
|  LUTMs as RAM32M         |        380 |           |       |    100.00 |
|  LUTMs as RAM256X1D      |       2672 |           |       |    100.00 |
+--------------------------+------------+-----------+-------+-----------+

URAM usage is 0. URAM size is 288Kb

Debugging failing lanes

ASIC 2 lanes 23 22 and 16 seem to be always dead. Here is an acquisition from chipscope. From the looks of it, lanes 23 and 22 seem out of the norm. 16 seems to be sending data. Next analyse data

Talked to Dionisio and conclusion are

If it were the ASIC, we would see the issue in all ASICs
If it were the carrier board, since the schematic is hierarchical, we would see the issue in all ASICs
Since all carriers were tested with only one digital board, the defect could be anywhere on the digital board.
We plan to try continuity tests from the connector to the AC coupler capacitor on the digital board.

Observations

digital	carrier	notes
C00-02	Copper strong back John doe (used in ASC)	ASIC 2 lanes 22 and 16 working
C00-02	RX000	ASIC 2 lanes 23, 22, 21, 16, 13, 9, 5, 1 not working
C00-01	RX000	ASIC 2 lanes 23, 22 and 16 not working
C00-03	RX001	ASIC 2 lane 22, 21, 20, 19, 16, 13, 5, 1 not working
C00-01	Copper strong back John doe (used in ASC)	No thermal pad. no screws.
C00-01	Copper strong back John doe (used in ASC)	No thermal pad. With carrier cover and with screws.
C00-01	RX005	with thermal pad and applying pressure on screws
C00-02	RX005	With thermal pad. Same lanes dead.

and we suspect that the digital board has a problem, most probably the AC coupling capacitors. Test with the prototype in ASC and one of our carriers to see if the lane failure problem persists.

List of tasks

~~Add this submodule to both projects https://github.com/slaclab/AsicRegMapping~~

~~version stuff and reproduce timing lock success at least 3 times: bitstream 6bbaaf4~~

~~Make power cable (Julian) - on it~~

~~Test all ASICs together at 5000 FPS~~

~~Port ADC stuff from TXI~~

Img descrambling in firmware

~~Connect transceivers to MM-SM converter~~

~~Fix timing: What goes on when RxLinkUp is up?~~

~~Make script for lorenzo~~

~~Send digital board to increase analog voltage to 2V0~~

~~Fix scrambling: replace first and last lines in each horizontal line~~ and test

~~High speed acquisition seems to cause corruption. Fifo in firmware seems to get full. See what is going on.~~

~~Investigate delays~~

~~Fixed descrambling algorithm~~

~~Make a jupyter for Lorenzo~~

~~Charge injection seems messed up for columns~~

~~Test MM to SM converter boxes~~

Space shortcuts

Confluence Content

Child pages

Testing firmware and software

Carrier Tests

Carrier 1

Carrier 2

Carrier 2 laser tests

carrier 2 - ASIC0 (U1) all lanes locked

Carrier 2 - ASIC3 (U4) - one lane not locked

Carrier 2 - ASIC1 (U2) - 2 lanes unlocked, for some reason, laser light is very fade. Temperature reached 64 degrees.

Carrier 2 - ASIC2 (U3) - 7 lanes unlocked

Digital board 001

Digital board C00-02

Slow ADC testing

Carrier 3 testing w/ digital board 002

Carrier 3 laser tests automatic lock

ASIC0 (U1) two lanes disabled (0x1002)

ASIC1 (U2) two lanes disabled (0x100001)

ASIC2 (U3) 4 lanes disabled (0xc90000)

ASIC3 (U4) All lanes active

ASIC3 DAC test

ASIC3 Charge injection columns 50 to 100

Carrier 3 lane delay eye plots

ASIC0 (all lanes recovered)

After testing with EQ_LEVEL0 discovered:

Testing with EQ_LEVEL1:

ASIC1 (1 lane not recovered - lane 0)

After testing with EQ_LEVEL0 discovered:

EQ_LEVEL1:

ASIC2 (4 lanes not recovered - lanes 23,22, 19, 16)

After testing with EQ_LEVEL0 discovered:

lanes 23,22, 19, 16 are dead. no delay brings seems to bring them back to life

ASIC3 (all lanes ok)

After testing with EQ_LEVEL0 discovered:

Laser images with the new descrambling

Rogue bugs

Issue 1

Issue 2

Overflow detected

Mysterious horizontal flipped lines

Locking to LCLS-II timing

RAM usage

Debugging failing lanes

List of tasks