Some missing information:

1. Does anything need to be done wrt the Photon Group besides informing them?

2. Which LLRF screens should have snapshots taken (Kukhee, Ops)?

3. Should some verification of message logging be done? If so, what?

4. What are the 24/7 matlab processes?

5. What does 'check for Aida recovery' consist of?

... 

NFS server change: IOC checkout plan
====================================

Before Jingchen starts (begin this a couple hours before):
---------------------------------------------------------
1. Take snapshots of all the Laser motor positions and timing setup

2. MPS:

a. Record which versions of MPS logic are loaded and which is running. Go to lclshome > Global/MPS > Logic Info... Take
screenshot.

b. Record MPS bypass status. Go to lclshome > Global/MPS
> MPS GUI. Bypassed Faults are on the bottom. Take screenshot.

3. Timing:

a. Take snapshot of lclshome > Global/Event

b. Take snapshot of lclshome > Global/Event > Events...

c. Go to one or two EVent main displays and verify that Fiducial Rate is 360. For example: lclshome > Event/IN20 and Event/LI21

4. Take snapshots of important RF displays - Kukhee and/or Mike Stanek to indicate which ones...

2. Coordinate with Laser team - be sure Sasha has turned off laser

3. Coordinate with Photon Group - any IOC preparation needed??

4. Email controls dept; all IOC engineers must check autosave and channelwatcher files to ensure they are good.

5. Make backup of ioc/data directory (tar it?)

Immediately before server reboots:
---------------------------------
1. Turn off all matlab feedbacks

2. Turn off all physics 24/7 matlab applications (on lclshome go to Matlab GUIs...| GOTO Watcher... these are the one's I know about; turn them off)

When servers are back up, but before hard IOC reboots: -----------------------------------------------------

1. Check that terminal server communications have recovered; especially for vacuum, MCC knobs (these had trouble in the past).
Reset ports, reboot terminal servers if needed.

2. Check that VME crate communications have recovered (had trouble in the past). Power-cycle crates if needed.

3. Run Judys's script to restart screen processes and reset iocConsole terminal server ports for all iocs.

4. Judy - reboot alarm soft IOCs

5. Bob - check for Aida recovery

6. Check that all soft IOCs and ChannelWatchers have started up properly

7. Check message logging?

After Hard IOC reboots:
----------------------
1. Critical IOCs/subsystems listed below must go through some kind of checkout procedure

2. Restart 24/7 matlab apps - or inform ops that they can restart as needed...

Coordination and checkout for critical systems
----------------------------------------------

Timing Mike Stanek, Kukhee

  • Check the EVG diagnostics displays, and make sure the Event code rates are as expected. Check that EVR fiducial rate is 360.
    If it is not, there may be a problem with TRIG:LI20:404. Compare displays with saved snapshots.

RF Ron Akre, Mike Stanek, Kukhee

  • Check the readback on the RF Reference PADs in IN20, and execute the Resync GUI. Compare displays with saved snapshots.
  • Last time, some of the PAD and PAC eiocs did not properly start the 2nd NIC. This must be explicity checked because there are no
    associated error messages:

lanIpBscDumpIfStats(lanIpIf,-1)

  • The send/receive packet counter must be checked to verify it is increasing. If it is not, the eioc must be rebooted again.
  • Note: last time, some eiocs showed RPCIO errors which was related to high CPU load and did not reflect a problem with NFS access.

Laser Sasha, Qaio, Matt

  • Compare Laser motors and timing snapshots, verify all is recovered before turning the laser back on

PPS Enzo, Kristina, Arturo

Vacuum Matt

BCS Brian Bennet, Arturo

MPS Matt

  • Go to Recovery information for MP; found on the EDM display for the ioc-bsy0-mp01 Network IOC display. Follow directions for
    'After Booting Link Processor':

a. Recover bypasses: Go to lclshome > Global/MPS > Bypass Recover. The typical action here is to choose the most recent file. If
you choose option 0, no bypasses are recovered but the bypass recover fault is cleared.

b. Compare MPS logic versions to saved snapshots. If no logic is running, you may need to unload, then reload the algorithm that
was running before the reboot. This is done with caputs:

caput IOC:BSY0:MP01:ALGUNLOAD "name"
caput IOC:BSY0:MP01:ALGLOAD "name"
caput IOC:BSY0:MP01:ALGRUN "name"

("name" is Name (first column) of Running Logic on the MPS Logic Status display)

c. Unlatch faults: Go to lclshome > Global/MPS> Unlatch All

LU Sonya

  • Compare current and saved autosave files; verify there are no unexpected differences. Check several LI21-LI30 lclshome
    displays (for example Temperature, PPS, Event>PDUs...). Change PV value and verify messages appear in cmlog.
  • No labels