Observed: some of the XPM links does not come up or get a huge amount of errors.
Theory: high speed repeater, located on the AMC card, which repeat (clean) the link from/to SFP/FPGA, have bad settings.
Proposal: Scan of the equalizer settings in order to find the best configuration (data received from detector)
Script: https://github.com/slac-lcls/lcls2/blob/master/psdaq/psdaq/pyxpm/pyxpm_hsrepeater.py
Tests result
In order to validate the theory, the script was tested on XPM10, XPM6 and XPM5. First goal was to fix non working links on XPM6 (AMC1 port 3, 4 and 5)
Scan equalizer settings for link 11 / XPM 6 Link[11] status (eq=00): Ready (Rec: 242973 - Err: 38) Link[11] status (eq=01): Ready (Rec: 20635028 - Err: 0) Link[11] status (eq=02): Ready (Rec: 20635028 - Err: 0) Link[11] status (eq=03): Ready (Rec: 20635028 - Err: 0) Link[11] status (eq=07): Ready (Rec: 20635029 - Err: 0) Link[11] status (eq=15): Not ready (Rec: 0 - Err: 4294966931) Link[11] status (eq=0B): Ready (Rec: 20635029 - Err: 0) Link[11] status (eq=0F): Ready (Rec: 20635028 - Err: 0) Link[11] status (eq=55): Not ready (Rec: 0 - Err: 17872) Link[11] status (eq=1F): Not ready (Rec: 0 - Err: 7686) Link[11] status (eq=2F): Not ready (Rec: 0 - Err: 4294964401) Link[11] status (eq=3F): Not ready (Rec: 0 - Err: 4294958545) Link[11] status (eq=AA): Not ready (Rec: 0 - Err: 13246) Link[11] status (eq=7F): Not ready (Rec: 0 - Err: 4294940893) Link[11] status (eq=BF): Not ready (Rec: 0 - Err: 4294950259) Link[11] status (eq=FF): Not ready (Rec: 0 - Err: 4294922138) [Configured] Set eq = 0x01
The scan reports errors for equalizer value of 0x2F (default value), which is a good sign. By moving to other values, we started getting data and link ready up. Therefore, additional tests, on different links/XPM have been made to find a common working point:
Equalizer | XPM6 link 12 | XPM6 link 11 | XPM6 link 10 | XPM5 link 13 | XPM5 link 12 | XPM6 link 13 | XPM10 link 7 |
---|---|---|---|---|---|---|---|
0x00 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
0x01 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
0x02 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
0x03 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
0x07 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
0x15 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
0x0B | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
0x0F | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
0x55 | ✔️ | ||||||
0x1F | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
0x2F | ✔️ | ✔️ | ✔️ | ✔️ | |||
0x3F | ✔️ | ✔️ | ✔️ | ||||
0xAA | ✔️ | ||||||
0x7F | |||||||
0xBF | |||||||
0xFF |
Low values sound to be the best settings, even if it does not provide a 100% guarantee. Setting an initial value of 0x03 with a way to execute a scan in case of issue sounds to be a good deal. Additional tests were carried out to ensure that the configuration is consistent over scans and hard reboot. The scans was run 3 times in a row for XPM 13, link 6:
Run 1 errors:
- Link[13] status (eq=55): Not ready (Rec: 4294966397 - Err: 18172)
- Link[13] status (eq=AA): Ready (Rec: 4294945582 - Err: 45710) - Link[13] status (eq=7F): Not ready (Rec: 4294967293 - Err: 9120) - Link[13] status (eq=BF): Not ready (Rec: 0 - Err: 4294966592) - Link[13] status (eq=FF): Not ready (Rec: 0 - Err: 25655)
Run 2 errors:
- Link[13] status (eq=55): Ready (Rec: 4294966484 - Err: 13768)
- Link[13] status (eq=AA): Ready (Rec: 4294966334 - Err: 4294947967) - Link[13] status (eq=7F): Ready (Rec: 4294963255 - Err: 8614) - Link[13] status (eq=BF): Not ready (Rec: 0 - Err: 4294963111) - Link[13] status (eq=FF): Not ready (Rec: 0 - Err: 24541)
Run 3 errors:
- Link[13] status (eq=55): Ready (Rec: 1756 - Err: 16360)
- Link[13] status (eq=AA): Not ready (Rec: 4294966637 - Err: 46671) - Link[13] status (eq=7F): Ready (Rec: 2806 - Err: 10299) - Link[13] status (eq=BF): Not ready (Rec: 0 - Err: 2857) - Link[13] status (eq=FF): Not ready (Rec: 0 - Err: 23747)
Results from this test shows a consistent behavior over scans. Final test consisted in looking at XPM10 / link 7 before and after a hard reset:
Before:
- Link[7] status (eq=01): Ready (Rec: 4294371331 - Err: 65) - Link[7] status (eq=7F): Ready (Rec: 4294964175 - Err: 6412) - Link[7] status (eq=BF): Not ready (Rec: 0 - Err: 25808) - Link[7] status (eq=FF): Not ready (Rec: 0 - Err: 53525)
After:
- Link[7] status (eq=7F): Not ready (Rec: 27406 - Err: 4294944911) - Link[7] status (eq=BF): Not ready (Rec: 4294967292 - Err: 4469) - Link[7] status (eq=FF): Not ready (Rec: 0 - Err: 14706)
Configuration showed a small amount of errors with setting 0x01 before being rebooted but disappeared after. However, according to the really small number, this difference is negligible and can be ignored.
Summary
Setting good valued to the high speed repeater equalizer fix the issue. Nevertheless, it is difficult to guarantee a unique value that will guarantee a working link. However, the default one (0x2F) sounds to be one of the worst we could set according to the statistics we got. Therefore, it is recommended to initialize the repeater with 0x03 and run the scan script to optimize it in case of link issue.
ToDo list
Additional tests that could help getting even better idea of link quality issue:
- Try plugging in a couple of different detectors (and different fiber lengths) into a few of the xpm6 ports to get a feeling for how much variation there is in the equalizer settings. This would address my question of whether a small piece of dust in the wave8 fiber path could be affecting the settings.