You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

There is not currently a documented and announced standard maintenance windows for patching and rebooting of centrally managed unix hosts.

The classes of machines as described below needs to be clearly identified within the configuration management tools (taylor, chef).

Trivial patching (eg RPM updates of userland software) can be done outside these outage windows. 

The outage windows are needed to enable new kernels, glibc, X11, etc related software which requires a reboot to be activated.

The scheduled outage window is not required to be used, but it is an announced outage to be used if necessary.

Linux Desktops

  1. Taylored RHEL 5, RHEL 6
  2. Cheffed CentOS 7, Ubuntu
  3. Clarify definition of a desktop (does not include kiosk machines)
    1. Single user personal productivity workstation
  4. Follow similar schedule as Windows desktops
  5. Communication via unixusers-l@slac.stanford.edu mailing list in addition to comp-out
    1. depending on the type of outage and who it affects

Linux Storage Servers

  1. Schedule determined by SCS Storage Team
  2. Recommended: one per quarter outage window for patching and rebooting

Linux Infrastructure Servers (non-storage)

  1. Examples include:
    1. samba
    2. ftp
    3. web
    4. chef
    5. mail servers - quarterly
      1. Uy Chu said: you can group them together mailgate10/15 & mailgate11/16 and reboot those 2 pairing at a time
        1. they should be redundant and should not affect any mail if working properly as a pair
  2. Follow similar schedule as Windows servers
  3. Communication via unixusers-l@slac.stanford.edu mailing list

ERP Linux servers

  1. Schedule determined by ERP computing owners (eg, Monica, Ram)
  2. Recommended: one per quarter outage window for patching and rebooting

Linux Interactive Servers

  1. Examples include:
    1. rhel6-64 login pool
    2. centos7 login pool
    3. FastX login pool

Batch Compute Farms

  1. Schedule determined by SCS HPC Team
  2. Recommended: one per quarter outage window for patching and rebooting
  3. Could be managed via compute scheduler so outage is non-disruptive and transparent
    1. However, might be simpler for SCS staff to have a maintenance window to perform mass reboots

Non-OCIO Linux Servers (non-storage)

  1. Examples include:
    1. Fermi
    2. BaBar
    3. SUNCAT
    4. EED
    5. etc.
  2. A science computing coordinator/contact person needs to be identified
  3. Schedule needs to be negotiated between OCIO and the computing contact
  4. Recommended: one per quarter outage window for patching and rebooting

 

 

 

 

There is no content with the specified labels

  • No labels