Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

All times are PDT. Red entries are active. Most PST (Pacific Standard Time).  Most recent entry first.

...

Nodes

...

Services

...

Start Time

...

Expected End Time

...

Actual End Time

...

Reason

...

Comments

...

All machines in XPP hutch and control room will be inaccessible.

...

XPP

...

Monday April 9, 2012 11:15AM

...

Monday April 9, 2012 11:45AM

...

 

...

Electrical Work at XPP Hutch

Info

Add planned outages or maintenance activities using this project: https://jira.slac.stanford.edu/projects/CDSO/

Excerpt

Planned or ongoing

Jira
serverSLAC National Accelerator Laboratory
columnIdspriority,summary,customfield_11121,customfield_11122,description,assignee
columnspriority,summary,Target start,Target end,description,assignee
maximumIssues20
jqlQueryresolution = Unresolved and project = "LCLS CDS Outages"
serverId1b8dc293-975d-3f2d-b988-18fd9aec1546

Completed

Jira
serverSLAC National Accelerator Laboratory
columnIdssummary,description,resolutiondate
columnssummary,description,resolutiondate
maximumIssues20
jqlQuerystatus=done and project = "LCLS CDS Outages"
serverId1b8dc293-975d-3f2d-b988-18fd9aec1546

...

 

...

ana01/ana02 file systems

...

Wed Mar 28th, 2012 9am

...

Wed Mar 28th, 2012 1pm

...

Wed Mar 28th, 2012 4pm

...

Upgrade to IB

...

Completed

...

psananeh
lclsq
ana01
ana02

...

NEH storage and processing

...

Tue Dec 27, 2011 4pm

...

Mon Dec 26, 2011 1pm

...

Completed. Chilled water was restored on Friday.

...

psana batch nodes

...

All Science data is currently unavailable. Psananeh psanafeh is up for Matlab use, but no access to data on Lustre file system.

...

Saturday Oct 1, 2011 6am

...

 

...

 

...

Lustre file system remains down after the unplanned power outage on Saturday.

...

The system administrators are working to bring them back.

...

 

...

All LCLS computing services

...

Monday
Nov 14, 2011 7am

...

 

...

 

...

Electrical work at NEH server room and FEH.

...

pslogin is up. NFS server, LDAP, DNS, pswww are up.
The daq nodes will not come up until after 4PM.
Lustre will not come up until after about 4.30PM.
Batch nodes (psana11* psana12*) and psana01* will not be up until Lustre is up.

...

psana,
NEH Online Nodes,
psimport,
psexport,
pslogin,
psdev.
psanasrv100,
psanasrv101,
psanasrv102

...

All Science data, All user home directories, all DAQ cache nodes. All online services.

...

Wed
Sep 28,
2011
10am

...

Wed
Sep 28,
2011
6pm

...

Wed
Sep 28,
2011
6pm

...

Upgrade of Lustre hardware.
Installation of taylor on several offline systems. Update of kernel on Online nodes.

...

 

...

psana

...

Science data access

...

Tue
Sep 20,
2011
11:15am

...

 

...

Tue
Sep 20,
2011
6:15pm

...

NEH power outage

...

B950 and several other buildings experienced short power glitch but the lustre file servers did not survive the interruption and is still being brought up.

...

psana

...

Science data access

...

Thu
Jun 2,
2011
1pm

...

Thu
Jun 2,
2011
5pm

...

 

...

Lustre failover testing.

...

 

...

NEH online nodes
ana02
psexport, psimport

...

NEH DAQ, outside ssh access

...

Thu
May 25,
2011
noon

...

Thu
May 25,
2011
7pm

...

 

...

Server room upgrade, ana02 memory upgrade

...

Completed

...

psana

...

Science data access

...

Thu
May 12,
2011
1pm

...

Thu
May 12,
2011
6pm

...

Thu
May 12,
2011
6.30pm

...

Lustre maintenance

...

Completed. Upgraded memory on psanaoss101-104, and replaced 10Gbit cards with 1 port SMCs. 717W power supplies are in place on psanaoss103-104 now.

...

psana

...

Science data access

...

Thu
May 5,
2011
1pm

...

Thu
May 5,
2011
5pm

...

Thu
May 5,
2011
5pm

...

Lustre maintenance

...

Completed

...

All

...

All

...

Fri
Apr 29,
2011
6.30pm

...

Sun
May 1,
2011
11pm

...

Sun
May 1,
2011
9pm

...

NEH power outage

...

Completed

...

psana

...

Science data access

...

Thu
Apr 28,
2011
2pm

...

Thu
Apr 28,
2011
6pm

...

Thu
Apr 28,
2011
3pm

...

Lustre maintenance
pssrv100 NFS volume reconstruction.

...

Completed
Lustre maintenance postponed.
Raid reconstruction pssrv100 will take 2-3 days. The new volume size is not released by the controller, so we will have to perform the file system resize on another day.

...

psana

...

Science data access

...

Fri
Apr 1,
2011
6pm

...

Mon
Apr 4,
2011
10am

...

 

...

NEH cooling outage

...

Completed

...

psana

...

Science data access

...

Thu
Mar 31,
11am

...

Thu
Mar 31,
5pm

...

 

...

Enabling HA for Lustre system

...

Completed

...

All

...

All

...

Sat
Mar 26,
2011
7am

...

Sat
Mar 26,
2011
7pm

...

Mon
Mar 28,
2011
1pm

...

NEH power cut

...

Completed

...

psana

...

Science data access

...

Thu
Mar 24,
2011
11am

...

Thu
Mar 24,
2011
5pm

...

 

...

Lustre testing

...

Completed

...

All

...

All

...

Wed
Mar 23,
2011
10am

...

Wed
Mar 23,
2011
3pm

...

 

...

NEH power cut

...

This power cut was NOT planned

...

All

...

All

...

Sat
Mar 19,
2011
7am

...

Sat
Mar 19,
2011
7pm

...

Mon
Mar 21,
2011
10am

...

NEH power cut

...