Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

SymptomSolutionToDo
  • MO locker matlab script is dead
    • Master Oscillator (MO) frequency is > 1 Hz
    • Matlab script LED (ALRM:SYS0:PRL_HRTBT:ALHBERR) is purple or red
    • MO DAC voltage is zero
  • Restart matlab script
    • In the "Matlab scipt utils" box on the Ms/PRL main display, push "stop"
    • Wait 10 seconds
    • In the "Matlab scipt utils" box on the Ms/PRL main display, push "restart"

Migrate matlab script functionality to EPICS

  • SIM master-slave connection is broken
    • PRL loop Tx/Rx PVs are < 5000
  • Reboot master and/or slave IOC.
    • For L0-L1 loop, master IOC is sioc-sys0-ms02, slave IOC is sioc-sys0-ms05.
    • For L2 loop, master IOC is sioc-sys0-ms03, slave IOC is sioc-sys0-ms06.

Reboot either through lclshome (SC) -> Network/Global, or with IOCConsole:

Code Block
[ laci@cpu-sys0-sp01]$ iocConsole.sh -r sioc-sys0-ms02

Driver has problems reconnecting after a disconnect.

Investigate whether an asynRecord reconnect can restore the connection so that we don't need to reboot the IOC.


  • Master source IOCs are not running
  • Ensure cpu-sys0-sp01 is up
  • Check IOC status
    • From lclshome, open the Networking/Global display and check the status of sioc-sys0-ms0[1..6]
    • Log on to cpu and check that sioc-sys0-ms0[1..6] are running:
      • Code Block
        languagebash
        [softegr@lcls-srv01 ~]$ ssh laci@cpu-sys0-sp01
        Entering commonSetup.sh
        ...
        [ laci@cpu-sys0-sp01]$ screen -ls
        There are screens on:
        	20633.sioc-sys0-ms01 (Detached)
        	8234.sioc-sys0-ms02	(Detached)
        	8113.sioc-sys0-ms03	(Detached)
        	7956.sioc-sys0-ms04	(Detached)
        	7865.sioc-sys0-ms05	(Detached)
        	7786.sioc-sys0-ms06	(Detached)
        	7710.sioc-sys0-ms07	(Detached)
        	7634.sioc-sys0-ms08	(Detached)
            8978.sioc-sys0-ts01	(Detached)
        	8434.sioc-gunb-ts02	(Detached)
        
        



  • IOC connections to SIMs are broken
    • SIMs won't ping (IP addresses here)
  • Check network interfaces
Code Block
[ laci@cpu-sys0-sp01]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 74:FE:48:28:BA:5F  
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::76fe:48ff:fe28:ba5f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:743826 errors:0 dropped:0 overruns:0 frame:0
          TX packets:867746 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:64437267 (61.4 MiB)  TX bytes:63883718 (60.9 MiB)
          Memory:fbc60000-fbc7ffff 

eth4      Link encap:Ethernet  HWaddr 74:FE:48:28:BA:64  
          inet addr:172.27.128.47  Bcast:172.27.131.255  Mask:255.255.252.0
          inet6 addr: fe80::76fe:48ff:fe28:ba64/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:31052445 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12823233 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:11697758920 (10.8 GiB)  TX bytes:1440944247 (1.3 GiB)
          Memory:fbb00000-fbb7ffff 

eth5      Link encap:Ethernet  HWaddr 74:FE:48:28:BA:5D  
          inet addr:10.0.1.1  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::76fe:48ff:fe28:ba5d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27621 errors:0 dropped:0 overruns:0 frame:0
          TX packets:65 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3419307 (3.2 MiB)  TX bytes:4502 (4.3 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:9782170 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9782170 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:445710120 (425.0 MiB)  TX bytes:445710120 (425.0 MiB)


  • Ensure dhcp server is running
Code Block
[ laci@cpu-sys0-sp01]$ ps aux | grep dhcpd
root 7484 0.0 0.0 9336 6976 ? Ss Aug30 0:03
/usr/local/lcls/epics/iocTop/SharedPlatform/R1.0.22/bin/linuxRT-x86_64/dhcpd
-4 -cf /usr/local/lcls/epics/iocCommon/cpu-sys0-sp01/iocSpecificRelease/cpuBoot/lcls/cpu-sys0-sp01/dhcpd.conf
-lf /data/cpu-sys0-sp01/dhcpd/dhcpd.leases -pf
/data/cpu-sys0-sp01/dhcpd/dhcpd.pid eth0

  • Check for dhcp requests (should be one for each dhcp client)
Code Block
[ laci@cpu-sys0-sp01]$ grep -i dhcpreq /var/log/messages
Aug 31 11:25:13 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.16 (0.0.0.0) from 08:00:56:00:46:86 via eth0
Aug 31 11:25:33 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.19 (0.0.0.0) from 08:00:56:00:46:15 via eth0
Aug 31 11:25:43 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.20 (0.0.0.0) from 08:00:56:00:46:16 via eth0
Aug 31 11:25:43 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.28 (0.0.0.0) from 08:00:56:00:49:55 via eth0
Aug 31 11:30:13 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.16 (0.0.0.0) from 08:00:56:00:46:86 via eth0
Aug 31 11:30:33 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.19 (0.0.0.0) from 08:00:56:00:46:15 via eth0
Aug 31 11:30:43 buildroot local7.info dhcpd: DHCPREQUEST for 192.168.1.20 (0.0.0.0) from 08:00:56:00:46:16 via eth0
...

or

Code Block
[ root@cpu-sys0-sp01]$ tcpdump -i eth0 port 67 or port 68 -e -n -vv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:50:13.340432 08:00:56:00:46:86 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 302: 
(tos 0x0, ttl 32, id 4916, offset 0, flags [DF], proto UDP (17), length 288)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 08:00:56:00:46:86, 
length 260, xid 0x76d, Flags [none] (0x0800)
 Client-Ethernet-Address 08:00:56:00:46:86
 Vendor-rfc1048 Extensions
   Magic Cookie 0x63825363
   DHCP-Message Option 53, length 1: Request
   Requested-IP Option 50, length 4: 192.168.1.16
   Server-ID Option 54, length 4: 0.0.0.0
11:50:13.340601 74:fe:48:28:ba:5f > 08:00:56:00:46:86, ethertype IPv4 (0x0800), length 342: 
(tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    192.168.1.1.67 > 192.168.1.16.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x76d, Flags [none] (0x0800)
 Your-IP 192.168.1.16
 Client-Ethernet-Address 08:00:56:00:46:86


  • Check blade
Code Block
[ laci@cpu-sys0-sp01]$ ping 10.0.1.102
PING 10.0.1.102 (10.0.1.102): 56 data bytes
64 bytes from 10.0.1.102: seq=0 ttl=32 time=0.067 ms
64 bytes from 10.0.1.102: seq=1 ttl=32 time=0.072 ms
64 bytes from 10.0.1.102: seq=2 ttl=32 time=0.040 ms
...


  • Reboot ATCA switch.  

    Warning

    This will bring down the local network.  Only do this if you know what you're doing.

Code Block
lcls-srv01> source /usr/local/lcls/package/IPMC/env.sh
lcls-srv01> fru_deactivate shm-sys0-sp01-1/1
lcls-srv01> fru_activate shm-sys0-sp01-1/1


  • Reset LAN switch ports
    • Check ports first.  Problematic SFP+ ports might have a "NO LINK/   *SYNC(LCLFAULT)" status, like ports 21 and 22 below:
Code Block
ssh softegr@lcls-srv01
...

[softegr@lcls-srv01 ~]$ iocConsole cswh-sys0-sp01-1
 : ssh -x -t -l laci lcls-daemon1 bash -l -c " pyiocscreen.py -t HIOC cswh-sys0-sp01-1 ts-li02-nw01 2016 "
Starting up HIOC cswh-sys0-sp01-1
...
Trying 172.27.132.127...
Connected to ts-li02-nw01.
Escape character is '^]'.

!!!!! Welcome to SLAC !!!!!!!

edit this message (/etc/issue) to remove this message

cswh-sys0-sp01-1 login:
Password:  

[root@cswh-sys0-sp01-1 ~]$ axel_l1stat all
(FABRIC 1   ) Port  6:    LINK/  *ALIGN/  *SYNC3/  *SYNC2/  *SYNC1/  *SYNC0
(FABRIC 2   ) Port  7: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 3   ) Port  8: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 4   ) Port  9: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 5   ) Port 10: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 6   ) Port 11: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 7   ) Port 12: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 8   ) Port 13: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 9   ) Port 14: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 10  ) Port 15: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 11  ) Port 16: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 12  ) Port 17: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 13  ) Port 18: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(UPDATE     ) Port 19: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(SFP+ 0     ) Port 20:    LINK/   *SYNC
(SFP+ 1     ) Port  0: NO LINK/ NO SYNC(LCLFAULT)
(SFP+ 2     ) Port 21: NO LINK/   *SYNC(LCLFAULT)
(SFP+ 3     ) Port  1:    LINK/   *SYNC
(SFP+ 4     ) Port 22: NO LINK/   *SYNC(LCLFAULT)
(SFP+ 5     ) Port  2:    LINK/   *SYNC
(SFP+ 6     ) Port 23:    LINK/   *SYNC
(SFP+ 7     ) Port  3:    LINK/   *SYNC
(RTM 0      ) Port 24: NO LINK/ NO SYNC(LCLFAULT)
(RTM 1      ) Port  4: NO LINK/ NO SYNC(LCLFAULT)
(RTM 2      ) Port 25: NO LINK/ NO SYNC(LCLFAULT)
(RTM 3      ) Port  5: NO LINK/ NO SYNC(LCLFAULT)
(SWTOSW     ) Port 26:    LINK
(MANAGE     ) Port 27:    LINK

    • Reset port.  

      Note

      Only reset a port if you are sure it is necessary.

Code Block
 [root@cswh-sys0-sp01-1 ~]$ axel_sfp_port 21 1g
Setting switch port 21 to 1000Base-X...
(SFP+ 2     ) Port 21 vsc7224 Retimer on bus direct chip 1 channel 0

[root@cswh-sys0-sp01-1 ~]$ axel_l1stat all    
(FABRIC 1   ) Port  6:    LINK/   ALIGN/   SYNC3/   SYNC2/   SYNC1/   SYNC0
(FABRIC 2   ) Port  7: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 3   ) Port  8: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 4   ) Port  9: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 5   ) Port 10: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 6   ) Port 11: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 7   ) Port 12: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 8   ) Port 13: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 9   ) Port 14: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 10  ) Port 15: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 11  ) Port 16: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 12  ) Port 17: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(FABRIC 13  ) Port 18: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(UPDATE     ) Port 19: NO LINK/NO ALIGN/NO SYNC3/NO SYNC2/NO SYNC1/NO SYNC0(LCLFAULT)
(SFP+ 0     ) Port 20:    LINK/    SYNC
(SFP+ 1     ) Port  0: NO LINK/ NO SYNC(LCLFAULT)
(SFP+ 2     ) Port 21:    LINK/   *SYNC
(SFP+ 3     ) Port  1:    LINK/    SYNC
(SFP+ 4     ) Port 22: NO LINK/    SYNC(LCLFAULT)
(SFP+ 5     ) Port  2:    LINK/    SYNC
(SFP+ 6     ) Port 23:    LINK/    SYNC
(SFP+ 7     ) Port  3:    LINK/    SYNC
(RTM 0      ) Port 24: NO LINK/ NO SYNC(LCLFAULT)
(RTM 1      ) Port  4: NO LINK/ NO SYNC(LCLFAULT)
(RTM 2      ) Port 25: NO LINK/ NO SYNC(LCLFAULT)
(RTM 3      ) Port  5: NO LINK/ NO SYNC(LCLFAULT)
(SWTOSW     ) Port 26:    LINK
(MANAGE     ) Port 27:    LINK

  • PRL amp faulted
    • As a result, loop may not be locked
    • Also as a result, attenuator is likely at full attenuation (31.75 dB)
  • Reset amp fault, then ramp attenuator to target setpoint
    1. Open PRL display (RF/Global --> Phase Reference Line)
    2. Find faulted amp (should be a MAJOR alarm)
    3. Open amp display
    4. Click "Amp Reset".  Fault should clear.
    5. Click "Ramp to target setpoint".   Attenuator should ramp down to target setpoint, and PRL loop should then lock.

  • PRL loop "ADC Ampl" is OFF
    • As a result, loop in MINOR alarm state
    • ADC1 Amplitude readback may be < 1 volt
  • Check amp attenuator setpoints (particularly VCO amps)
    • If any are not at target setpoints, click "Ramp to target setpoint".   Attenuator should ramp down to target setpoint, and ADC Ampl should read ON.

  • PRL amp "Not at target" alarm
  • Ramp attenuator to nominal setpoint
    1. Open PRL display (RF/Global --> Phase Reference Line)
    2. Open amp display
    3. Click "Ramp to target setpoint".   Attenuator should ramp to target setpoint.
    4. If ramp fails, it may be necessary to switch off ramp checking
      1. Open the "Expert" screen, and disable "Ramp Check"
      2. Try step 3 again.

...