Page History

The first Wednesday of the month is designated as a maintenance reboot window for some Linux servers. Linux servers can be automatically rebooted by taylor or chef, or they can be manually rebooted by a unix-admin team member.

This monthly reboot schedule facilitates the patching and activation of updated kernels, glibc, and other shared libraries on Linux servers. Periodic reboots to enable security patches are required by SLAC cyber security and DOE. The first Wednesday of the month Linux server reboots are staggered between 2 AM - 7 AM Pacific Time for automatic reboots by taylor and chef, or 7 AM - 11:30 AM for manual reboots by a unix-admin team member.

Date (and comp-out link)

Time

Server hostnames

RHEL 6 kernel release

CentOS 7 kernel release

Wednesday, March 4th, 2020

1 AM - 11:30 AM

Chef nodes:

chef-automate2
chef-build01
cups02
ftp1
ganglia04
ksa-c7a
mgmt-centos7
nagios04
novel02
nx3
samba03

Taylor nodes:

cdlogin1
cdlogin2
cdlogin3
mgmt-authproxy01
mgmt-rhel01
mgmt-rhel02
ns-test
ssrl-vip1
ssrl-vip2
tftp
tftp-rhel6
version01
vip1

2.6.32-754.27.1

3.10.0-1062.12.1

There is not currently a documented and announced standard maintenance windows for patching and rebooting of centrally managed unix hosts.

The classes of machines as described below needs to be clearly identified within the configuration management tools (taylor, chef).

Trivial patching (eg RPM updates of userland software) can be done outside these outage windows.

The outage windows are needed to enable new kernels, glibc, X11, etc related software which requires a reboot to be activated.

The scheduled outage window is not required to be used, but it is an announced outage to be used if necessary.

Linux Desktops

Taylored RHEL 5, RHEL 6
Cheffed CentOS 7, Ubuntu
Clarify definition of a desktop (does not include kiosk machines)
1. Single user personal productivity workstation
Follow similar schedule as Windows desktops
Communication via unixusers-l@slac.stanford.edu mailing list in addition to comp-out
1. depending on the type of outage and who it affects

Linux Storage Servers

Schedule determined by SCS Storage Team
Recommended: one per quarter outage window for patching and rebooting

Linux Infrastructure Servers (non-storage)

Examples include:
1. samba
2. ftp
3. web
4. chef
5. mail servers - quarterly
  1. Uy Chu said: you can group them together mailgate10/15 & mailgate11/16 and reboot those 2 pairing at a time
    1. they should be redundant and should not affect any mail if working properly as a pair
Follow similar schedule as Windows servers
Communication via unixusers-l@slac.stanford.edu mailing list

ERP Linux servers

Schedule determined by ERP computing owners (eg, Monica, Ram)
Recommended: one per quarter outage window for patching and rebooting

Linux Interactive Servers

Reboot monthly via taylor and chef installed root cronjob (this is already being done for iris and flora, but iris and flora are on weekly reboot schedules).

Examples include:
1. rhel6-64 login pool
2. centos7 login pool
3. FastX login pool (this needs to be announced)

Batch Compute Farms

Schedule determined by SCS HPC Team
Recommended: one per quarter outage window for patching and rebooting
Could be managed via compute scheduler so outage is non-disruptive and transparent
1. However, might be simpler for SCS staff to have a maintenance window to perform mass reboots

Non-OCIO Linux Servers (non-storage)

...

Fermi
BaBar
SUNCAT
EED
etc.

...

contentbylabel

showLabels falsemax5spacesSCSshowSpacefalsesortmodifiedreversetruetypepagecqllabel = "kb-how-to-article" and type = "page" and space = "SCS"labelskb-how-to-article

...

hidden	true

...

Space shortcuts

Page tree

Versions Compared

Old Version 6

New Version 7

Key