...
- we are trying to collect information about upcoming Fermi computing outages (disks, oracle, network) to improve planning
- when planning an outage, please send an email to datalist and write the description here (including requested duration and preferred timeframe)
- we will try to combine outages as much as possible, in order to maximize uptime for time-critical services (FASTCopy, pipeline, etc.)
- once the plan is finalized, don't forget to send a message to glast-outage and the collaboration (if applicable)
Upcoming outage requests
- Outage of mysql-node03 to move to HA rack.
Feb 03, 2014 - Oracle and OS patching (ghost vulnerability patches)
- Outage of FASTCopy starting at 9:00am, reboot of FASTCopy machines
- Oracle OS reboot and patching starting at 10:00am
- Reboot Fermi linux xrootd servers and fermilnx machines
Feb 11, 2014 - Oracle and OS patching; also retirement of various glastlnx machines
- 10am - duration is likely several hours
- This outage affects all NFS servers (wains), including user disk as well as xroot servers.
- Expect interruptions in all Fermi services as they are moved from old glastlnx -> new fermilnx machines
Dec 11, 2013 - Oracle server battery replacement
- 10am - glast-oracle03 to have battery replaced in storage array. Expected outage duration: 30m
Dec 4, 2013 - OS Patching and re-IP'ing
- 10am - all Fermi wain-class servers will be rebooted for OS patching.
- 10am - glast-oracle03/04 will be rebooted for OS patching.
- At the same time, 16 wains will have new IP addresses assigned in anticipation of retiring old network switches and reconfiguring the network in January 2014.
- Three wains will be physically relocated to consolidate rack space
HOST | Switch | Service (xrootd if not specified) | Physical move |
---|
wain006 | RTR-FARM08 | NFS | |
wain017 | RTR-FARM01 | NFS | |
wain018 | RTR-FARM01 | NFS | |
wain019 | RTR-FARM01 | | |
wain020 | RTR-FARM01 | | |
wain021 | RTR-FARM01 | | |
wain025 | RTR-FARM08 | NFS | |
wain026 | RTR-FARM08 | NFS | |
wain032 | RTR-FARM08 | NFS | |
wain033 | RTR-FARM08 | | |
wain034 | RTR-FARM08 | | |
wain035 | RTR-FARM08 | | yes |
wain036 | RTR-FARM08 | | yes |
wain037 | RTR-FARM08 | | yes |
wain038 | RTR-FARM08 | | |
wain039 | RTR-FARM08 | | |
Oct 2, 2013 - ISOC logging gateways to be shut down
At present the gateway daemons that allow one to make entries in the ISOC event log, the one displayed by the LogWatcher web app, have been run on glastlnx06 and glastlnx11. These daemons were contacted by software outside of the usual ISOC distribution.
glastlnx06 and 11 will shortly be decommisioned. fermilnx01 and 02 are now running the gateway daemons and I've prepared versions of the logging software that use them:
- GPLtools - Version GPLtools-02-00-02 in /afs/slac/g/glast/ground/PipelineConfig/GPLtools. This provides a Python version of the logging software. The only changed file is python/PipelineNetloggerConfig.py.
- org-glast-isoc-common - version 1.3 in the Fermi Maven repository. This provides the Java and Jython versions of the logging software.
I intend to shut down the logging gateways on glastlnx06 and 11 on Wednesday, Oct 2.
Sep 9-10, 2013 - ISOC realtime displays
- There will be occasional interruptions in service as the ISOC realtime support daemons are moved from glastlnx06,11 to fermilnx01 and 02.
Aug 13, 2013 - Quarterly Oracle security patching
- 10:00-12:00 GLASTP (glast-oracle03, 04)
- 11:00-12:00 reboot of the Wains
May 2013 - Quarterly Oracle security patching
- Wed May 01 10:00-11:00: GLASTDEV, GLASTSTG (glast-oracle02)
- Wed May 08 10:00-14:00: GLASTP (glast-oracle03, 04)
- Wed May 08 11:00-14:00: reboot of the wains (xrootd & NFS servers) for OS patches
Feb 13, 2013 - Oracle password change
2 PM. Semi-annual password change for Fermi accounts:
No Format |
---|
Oracle Instance Oracle Account Password Expires
--------------- ----------------------- -----------------
GLASTDEV GLAST_ISOC 14-FEB-2013
GLASTDEV ISOC_NIGHTLY 14-FEB-2013
GLASTDEV ISOC_TEST 14-FEB-2013
GLASTP GLAST_CAL 14-FEB-2013
GLASTP GLAST_ISOC 14-FEB-2013
GLASTP ISOC_FLIGHT 14-FEB-2013
|
- There should be no actual interruption of service unless something goes wrong.
Jan-Feb 2013 - Quarterly Oracle security patching
- Thu Jan 31 10:00-11:00: GLASTDEV, GLASTSTG (glast-oracle02)
- Mon Feb 4 10:00-14:00: GLASTP (glast-oracle03, 04)
Nov 18, 2012 - Oracle firmware upgrade
- 10-11 AM: Oracle firmware upgrade on glast-oracle03
Oct 25, 2012
...
- NFS Server upgrade
- Duration: approx 2+ hours (to be coordinated with HA rack movements (see below) )
- NFS server change from old sulky machine(s) to wain031, affecting ISOC Ops: /u23, /u28, /u41, /u42. The selected method of making this cut-over will determine the nature and length of the outage. More details and discussion on the datalist email list. Refs:
...
- move glast-oracle03,04 , and glast-win01,2 and possibly mysql01,2 to the HA rack
- Oracle quarterly security patching
Sep 4, 2012
- 10-12:30 AM: Oracle patching.
- 10-10:30 AM: replacing a fan on sulky 33.
...
- 10am - 11:30am: migrating calib* and mood* databases from glastlnx01/02 to mysql-node03
May 10 2012
...
- \[10am-12:30pm\] Oracle quarterly update. This will affect pipeline, data catalog, flight operations and any other databases on the main Fermi Oracle server.
Wiki Markup |
\[10am-12:30pm\] xroot server reboot for OS upgrade. This will affect all 36 of the wain (Solaris) xroot servers.
Wiki Markup |
\[10am-12:30pm\] Fermi USER DISK (wain006) reboot for OS upgrade.
Wiki Markup |
\[9am-3pm\] xroot file server move. This will affect only two xroot servers: wain070 and wain071.
Wiki Markup |
\[9am-3pm\] NFS file server move. This will affect the following servers which will be unplugged and physically moved to new rack space
in building 50: sulky33, sulky34, sulky35, sulky36