Symptom | Solution | ToDo | - PRL loop Tx/Rx PVs are < 5000
| - Reboot master IOC.
- For L0-L1 loop, master IOC is sioc-sys0-ms02, slave IOC is sioc-sys0-ms05.
- For L2 loop, master IOC is sioc-sys0-ms03, slave IOC is sioc-sys0-ms06.
Reboot either through lclshome (SC) -> Network/Global, or with IOCConsole: Code Block |
---|
[ laci@cpu-sys0-sp01]$ iocConsole.sh -r sioc-sys0-ms02 |
| Driver has problems reconnecting after a disconnect.
Investigate whether an asynRecord reconnect can restore the connection so that we don't need to reboot the IOC.
---|
- MO locker matlab script is dead
- Master Oscillator (MO) frequency is > 1 Hz
- Matlab script LED (ALRM:SYS0:PRL_HRTBT:ALHBERR) is purple or red
- MO DAC voltage is zero
| - Restart matlab script
- In the "Matlab scipt utils" box on the Ms/PRL main display, push "stop"
- Wait 10 seconds
- In the "Matlab scipt utils" box on the Ms/PRL main display, push "restart"
| Migrate matlab script functionality to EPICS |
- SIM master-slave connection is broken
- PRL loop Tx/Rx PVs are < 5000
| - Reboot master and/or slave IOC.
- For L0-L1 loop, master IOC is sioc-sys0-ms02, slave IOC is sioc-sys0-ms05.
- For L2 loop, master IOC is sioc-sys0-ms03, slave IOC is sioc-sys0-ms06.
Reboot either through lclshome (SC) -> Network/Global, or with IOCConsole: Code Block |
---|
[ laci@cpu-sys0-sp01]$ iocConsole.sh -r sioc-sys0-ms02 |
| Driver has problems reconnecting after a disconnect. Investigate whether an asynRecord reconnect can restore the connection so that we don't need to reboot the IOC.
|
- Master source IOCs are not running
| - Ensure cpu-sys0-sp01 is up
- Check IOC status
- From lclshome, open the Networking/Global display and check the status of sioc-sys0-ms0[1..6]
- Log on to cpu and check that sioc-sys0-ms0[1..6] are running:
Code Block |
---|
| [softegr@lcls-srv01 ~]$ ssh laci@cpu-sys0-sp01
Entering commonSetup.sh
...
[ laci@cpu-sys0-sp01]$ screen -ls
There are screens on:
20633.sioc-sys0-ms01 (Detached)
8234.sioc-sys0-ms02 (Detached)
8113.sioc-sys0-ms03 (Detached)
7956.sioc-sys0-ms04 (Detached)
7865.sioc-sys0-ms05 (Detached)
7786.sioc-sys0-ms06 (Detached)
7710.sioc-sys0-ms07 (Detached)
7634.sioc-sys0-ms08 (Detached)
8978.sioc-sys0-ts01 (Detached)
8434.sioc-gunb-ts02 (Detached)
|
|
|
- IOC connections to SIMs are IOC connections to SIMs is broken
- SIMs won't ping (IP addresses here)
| Code Block |
---|
[ laci@cpu-sys0-sp01]$ ifconfig
eth0 Link encap:Ethernet HWaddr 74:FE:48:28:BA:5F
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::76fe:48ff:fe28:ba5f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:743826 errors:0 dropped:0 overruns:0 frame:0
TX packets:867746 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:64437267 (61.4 MiB) TX bytes:63883718 (60.9 MiB)
Memory:fbc60000-fbc7ffff
eth4 Link encap:Ethernet HWaddr 74:FE:48:28:BA:64
inet addr:172.27.128.47 Bcast:172.27.131.255 Mask:255.255.252.0
inet6 addr: fe80::76fe:48ff:fe28:ba64/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:31052445 errors:0 dropped:0 overruns:0 frame:0
TX packets:12823233 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:11697758920 (10.8 GiB) TX bytes:1440944247 (1.3 GiB)
Memory:fbb00000-fbb7ffff
eth5 Link encap:Ethernet HWaddr 74:FE:48:28:BA:5D
inet addr:10.0.1.1 Bcast:10.0.1.255 Mask:255.255.255.0
inet6 addr: fe80::76fe:48ff:fe28:ba5d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:27621 errors:0 dropped:0 overruns:0 frame:0
TX packets:65 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3419307 (3.2 MiB) TX bytes:4502 (4.3 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:9782170 errors:0 dropped:0 overruns:0 frame:0
TX packets:9782170 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:445710120 (425.0 MiB) TX bytes:445710120 (425.0 MiB) |
- Ensure dhcp server is running
Code Block |
---|
[ laci@cpu-sys0-sp01]$ ps aux | grep dhcpd
root 7484 0.0 0.0 9336 6976 ? Ss Aug30 0:03
/usr/local/lcls/epics/iocTop/SharedPlatform/R1.0.22/bin/linuxRT-x86_64/dhcpd
-4 -cf /usr/local/lcls/epics/iocCommon/cpu-sys0-sp01/iocSpecificRelease/cpuBoot/lcls/cpu-sys0-sp01/dhcpd.conf
-lf /data/cpu-sys0-sp01/dhcpd/dhcpd.leases -pf
/data/cpu-sys0-sp01/dhcpd/dhcpd.pid eth0 |
- Check for dhcp requests (should be one for each dhcp client)
Code Block |
---|
[ laci@cpu-sys0-sp01]$ grep -i dhcpreq /var/log/messages
Aug 31 11:25:13 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.16 (0.0.0.0) from 08:00:56:00:46:86 via eth0
Aug 31 11:25:33 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.19 (0.0.0.0) from 08:00:56:00:46:15 via eth0
Aug 31 11:25:43 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.20 (0.0.0.0) from 08:00:56:00:46:16 via eth0
Aug 31 11:25:43 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.28 (0.0.0.0) from 08:00:56:00:49:55 via eth0
Aug 31 11:30:13 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.16 (0.0.0.0) from 08:00:56:00:46:86 via eth0
Aug 31 11:30:33 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.19 (0.0.0.0) from 08:00:56:00:46:15 via eth0
Aug 31 11:30:43 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.20 (0.0.0.0) from 08:00:56:00:46:16 via eth0
... |
or Code Block |
---|
[ root@cpu-sys0-sp01]$ tcpdump -i eth0 port 67 or port 68 -e -n -vv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:50:13.340432 08:00:56:00:46:86 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 302:
(tos 0x0, ttl 32, id 4916, offset 0, flags [DF], proto UDP (17), length 288)
0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 08:00:56:00:46:86,
length 260, xid 0x76d, Flags [none] (0x0800)
Client-Ethernet-Address 08:00:56:00:46:86
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Request
Requested-IP Option 50, length 4: 192.168.1.16
Server-ID Option 54, length 4: 0.0.0.0
11:50:13.340601 74:fe:48:28:ba:5f > 08:00:56:00:46:86, ethertype IPv4 (0x0800), length 342:
(tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
192.168.1.1.67 > 192.168.1.16.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x76d, Flags [none] (0x0800)
Your-IP 192.168.1.16
Client-Ethernet-Address 08:00:56:00:46:86 |
Code Block |
---|
[ laci@cpu-sys0-sp01]$ ping 10.0.1.102
PING 10.0.1.102 (10.0.1.102): 56 data bytes
64 bytes from 10.0.1.102: seq=0 ttl=32 time=0.067 ms
64 bytes from 10.0.1.102: seq=1 ttl=32 time=0.072 ms
64 bytes from 10.0.1.102: seq=2 ttl=32 time=0.040 ms
... |
Reboot ATCA switch. Warning |
---|
This will bring down the local network. Only do this if you know what you're doing. |
Code Block |
---|
lcls-srv01> source /usr/local/lcls/package/IPMC/env.sh
lcls-srv01> fru_deactivate shm-sys0-sp01-1/1
lcls-srv01> fru_activate shm-sys0-sp01-1/1 |
- Reset LAN switch ports
- Check ports first. Problematic SFP+ ports might have a "NO LINK/ *SYNC(LCLFAULT)" status, like ports 21 and 22 below:
Code Block |
---|
ssh softegr@lcls-srv01
...
[softegr@lcls-srv01 ~]$ iocConsole cswh-sys0-sp01-1
: ssh -x -t -l laci lcls-daemon1 bash -l -c " pyiocscreen.py -t HIOC cswh-sys0-sp01-1 ts-li02-nw01 2016 "
Starting up HIOC cswh-sys0-sp01-1
...
Trying 172.27.132.127...
Connected to ts-li02-nw01.
Escape character is '^]'.
!!!!! Welcome to SLAC !!!!!!!
edit this message (/etc/issue) to remove this message
cswh-sys0-sp01-1 login:
Password:
[root@cswh-sys0-sp01-1 ~]$ axel_l1stat all
(FABRIC 1 ) Port 6: LINK/ *ALIGN/ *SYNC3/ *SYNC2/ *SYNC1/ *SYNC0
(FABRIC 2 ) Port 7: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 3 ) Port 8: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 4 ) Port 9: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 5 ) Port 10: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 6 ) Port 11: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 7 ) Port 12: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 8 ) Port 13: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 9 ) Port 14: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 10 ) Port 15: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 11 ) Port 16: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 12 ) Port 17: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 13 ) Port 18: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(UPDATE ) Port 19: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(SFP+ 0 ) Port 20: LINK/ *SYNC
(SFP+ 1 ) Port 0: NO LINK/ NO SYNC(LCLFAULT)
(SFP+ 2 ) Port 21: NO LINK/ *SYNC(LCLFAULT)
(SFP+ 3 ) Port 1: LINK/ *SYNC
(SFP+ 4 ) Port 22: NO LINK/ *SYNC(LCLFAULT)
(SFP+ 5 ) Port 2: LINK/ *SYNC
(SFP+ 6 ) Port 23: LINK/ *SYNC
(SFP+ 7 ) Port 3: LINK/ *SYNC
(RTM 0 ) Port 24: NO LINK/ NO SYNC(LCLFAULT)
(RTM 1 ) Port 4: NO LINK/ NO SYNC(LCLFAULT)
(RTM 2 ) Port 25: NO LINK/ NO SYNC(LCLFAULT)
(RTM 3 ) Port 5: NO LINK/ NO SYNC(LCLFAULT)
(SWTOSW ) Port 26: LINK
(MANAGE ) Port 27: LINK
|
Reset port. Note |
---|
Only reset a port if you are sure it is necessary. |
Code Block |
---|
[root@cswh-sys0-sp01-1 ~]$ axel_sfp_port 21 1g
Setting switch port 21 to 1000Base-X...
(SFP+ 2 ) Port 21 vsc7224 Retimer on bus direct chip 1 channel 0
[root@cswh-sys0-sp01-1 ~]$ axel_l1stat all
(FABRIC 1 ) Port 6: LINK/ ALIGN/ SYNC3/ SYNC2/ SYNC1/ SYNC0
(FABRIC 2 ) Port 7: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 3 ) Port 8: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 4 ) Port 9: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 5 ) Port 10: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 6 ) Port 11: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 7 ) Port 12: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 8 ) Port 13: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 9 ) Port 14: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 10 ) Port 15: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 11 ) Port 16: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 12 ) Port 17: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 13 ) Port 18: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(UPDATE ) Port 19: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(SFP+ 0 ) Port 20: LINK/ SYNC
(SFP+ 1 ) Port 0: NO LINK/ NO SYNC(LCLFAULT)
(SFP+ 2 ) Port 21: LINK/ *SYNC
(SFP+ 3 ) Port 1: LINK/ SYNC
(SFP+ 4 ) Port 22: NO LINK/ SYNC(LCLFAULT)
(SFP+ 5 ) Port 2: LINK/ SYNC
(SFP+ 6 ) Port 23: LINK/ SYNC
(SFP+ 7 ) Port 3: LINK/ SYNC
(RTM 0 ) Port 24: NO LINK/ NO SYNC(LCLFAULT)
(RTM 1 ) Port 4: NO LINK/ NO SYNC(LCLFAULT)
(RTM 2 ) Port 25: NO LINK/ NO SYNC(LCLFAULT)
(RTM 3 ) Port 5: NO LINK/ NO SYNC(LCLFAULT)
(SWTOSW ) Port 26: LINK
(MANAGE ) Port 27: LINK |
|
|
| is probably - may not be locked
- Also as a result, attenuator is likely at full attenuation (31.75 dB)
| - Reset amp fault, then ramp attenuator to target setpoint
- Open PRL display (RF/Global --> Phase Reference Line)
- Find faulted amp (should be a MAJOR alarm)
- Open amp display
- Click "Amp Reset". Fault should clear.
- Click "Ramp to target setpoint". Attenuator should ramp down to target setpoint, and PRL loop should then lock.
|
|
- PRL loop "ADC Ampl" is OFF
- As a result, loop in MINOR alarm state
- ADC1 Amplitude readback may be < 1 volt
| - Check amp attenuator setpoints (particularly VCO amps)
- If any are not at target setpoints, click "Ramp to target setpoint". Attenuator should ramp down to target setpoint, and ADC Ampl should read ON.
|
|
- PRL amp "Not at target" alarm
| - Ramp attenuator to nominal setpoint
- Open PRL display (RF/Global --> Phase Reference Line)
- Open amp display
- Click "Ramp to target setpoint". Attenuator should ramp to target setpoint.
- If ramp fails, it may be necessary to switch off ramp checking
- Open the "Expert" screen, and disable "Ramp Check"
- Try step 3 again.
|
|