Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
...
Nodes | Services | Start Time | Expected End Time | Actual End Time | Reason | Comments |
---|---|---|---|---|---|---|
psexport01 |
| Thursday, May 30th, 2013 16:30hrs | Friday May 31st, 2013 12:00hrs |
| Unplanned power outage at SLAC |
|
ana01, ana02 | /reg/d/ana01, /reg/d/ana02 filesystems | Thursday, May 30th, 2013 16:30hrs | Friday May 31st, 2013 14:00hrs |
| Unplanned power outage at SLAC |
|
pssrv100 (psnfs) | NFS mountpoint for PCDS diskless nodes | Tuesday, Mar 26th, 2013 | Tuesday, Mar 26th, 2013 | Tuesday, Mar 26th, 2013 |
|
|
pssrv100 (psnfs) | NFS mountpoint for PCDS diskless nodes | Monday, Jan 7th, 2013 (1030 hrs) | Monday, Jan 7th, 2013 | Wednesday, Jan 9th, 2013 | RAID controller malfunctioned upon power restoral after planned power outage in B950 203A | pssrv101 (old data) was used to bring up the FEE nodes for part of the outage. pssrv100 was restored to operation after a new RAID controller was delivered and installed. |
ana01 | /reg/d/ana01 filesystem | Tuesday, Dec 18th 2012 | unknown | Partial (98%) restoral Monday Dec 24th (0800 hrs) | Controller failed causing corrupted parity data | Parity errors fixed and new controller installed. 2 OSTs (LUNs) needed fsck'ing. One took a few hours, the other took 10 days. |
psanaoss21* | /reg/d/ana12 filesystem | Monday, Oct 8th, 2012 (1700 hrs) | Monday, Oct 8th, 2012 (1900 hrs) | Monday, Oct 8th, 2012 (1900 hrs) | Hardware upgrades |
|
psanaoss2** | /reg/d/ana11 and /reg/d/ana12 filesystem | Thursday, Sep 27, 2012 (1700 hrs) | Friday, Sep 28, 2012 (0100 hrs) | Friday, Sep 28, 2012 (0400 hrs) | Hardware upgrades |
|
Sitewide outage. All Linux Servers at NEH, FEH, XRT, FEE. | All computing services at LCLS. | Wednesday August 15, 2012 | August 17, 2012 1:00 PM |
| SLAC sitewide power outage on August 16. | Expect logging in to any machines to be unavailable between 8/15 and 8/17 even if some of the servers are powered up before the expected end time. They will be maintenance performed on various servers during these 2 days. |
All machines in XPP hutch and control room will be inaccessible. | XPP | Monday April 9, 2012 11:15AM | Monday April 9, 2012 11:45AM | Monday April 9, 2012 11:30AM | Electrical Work at XPP Hutch | Completed |
| ana01/ana02 file systems | Wed Mar 28th, 2012 9am | Wed Mar 28th, 2012 1pm | Wed Mar 28th, 2012 4pm | Upgrade to IB | Completed |
psananeh | NEH storage and processing |
| Tue Dec 27, 2011 4pm | Mon Dec 26, 2011 1pm |
| Completed. Chilled water was restored on Friday. |
psana batch nodes | All Science data is currently unavailable. Psananeh psanafeh is up for Matlab use, but no access to data on Lustre file system. | Saturday Oct 1, 2011 6am |
|
| Lustre file system remains down after the unplanned power outage on Saturday. | The system administrators are working to bring them back. |
| All LCLS computing services | Monday |
|
| Electrical work at NEH server room and FEH. | pslogin is up. NFS server, LDAP, DNS, pswww are up. |
psana, | All Science data, All user home directories, all DAQ cache nodes. All online services. | Wed | Wed | Wed | Upgrade of Lustre hardware. |
|
psana | Science data access | Tue |
| Tue | NEH power outage | B950 and several other buildings experienced short power glitch but the lustre file servers did not survive the interruption and is still being brought up. |
psana | Science data access | Thu | Thu |
| Lustre failover testing. |
|
NEH online nodes | NEH DAQ, outside ssh access | Thu | Thu |
| Server room upgrade, ana02 memory upgrade | Completed |
psana | Science data access | Thu | Thu | Thu | Lustre maintenance | Completed. Upgraded memory on psanaoss101-104, and replaced 10Gbit cards with 1 port SMCs. 717W power supplies are in place on psanaoss103-104 now. |
psana | Science data access | Thu | Thu | Thu | Lustre maintenance | Completed |
All | All | Fri | Sun | Sun | NEH power outage | Completed |
psana | Science data access | Thu | Thu | Thu | Lustre maintenance | Completed |
psana | Science data access | Fri | Mon |
| NEH cooling outage | Completed |
psana | Science data access | Thu | Thu |
| Enabling HA for Lustre system | Completed |
All | All | Sat | Sat | Mon | NEH power cut | Completed |
psana | Science data access | Thu | Thu |
| Lustre testing | Completed |
All | All | Wed | Wed |
| NEH power cut | This power cut was NOT planned |
All | All | Sat | Sat | Mon | NEH power cut | Completed |