Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

All times are PDT. Red entries are active. Most PST (Pacific Standard Time).  Most recent entry first.

...

Nodes

...

Services

...

Start_Time_

...

Expected_End_Time

...

Actual_End_Time

...

Reason

...

Comments

...

psananeh
lclsq
ana01
ana02

...

NEH storage and processing

...

Wed
Dec 21, 2011
6am

...

Tue
Dec 27, 2011
4pm

...

Mon
Dec 26, 2011
1pm

...

Chilled water outage

...

Completed. Chilled water was restored on Friday.

...

psana batch nodes

...

All Science data is currently unavailable. Psananeh psanafeh is up for Matlab use, but no access to data on Lustre file system.

...

Saturday, Oct 1, 6am

...

 

...

 

...

Lustre file system remains down after the unplanned power outage on Saturday.

...

The system administrators are working to bring them back.

...

 

...

 

...

 

...

All LCLS computing services

...

Monday Nov 14, 7am

...

 

...

 

...

Electrical work at NEH server room and FEH.

...

pslogin is up. NFS server, LDAP, DNS, pswww are up.
The daq nodes will not come up until after 4PM.
Lustre will not come up until after about 4.30PM.
Batch nodes (psana11* psana12*) and psana01* will not be up until Lustre is up.

...

 

...

 

...

psana,
NEH Online Nodes,
psimport,
psexport,
pslogin,
psdev.
psanasrv100,
psanasrv101,
psanasrv102

...

All Science data, All user home directories, all DAQ cache nodes. All online services.

...

Wed, Sep 28, 10am

...

Wed, Sep 28, 6pm

...

Wed, Sep 28, 6pm

...

Upgrade of Lustre hardware.
Installation of taylor on several offline systems. Update of kernel on Online nodes.

...

 

...

 

...

 

...

psana

...

Science data access

...

Tue, Sep 20, 11.15am

...

 

...

Tue, Sep 20, 6.15pm

...

NEH power outage

...

B950 and several other buildings experienced short power glitch but the lustre file servers did not survive the interruption and is still being brought up.

...

 

...

 

...

psana

...

Science data access

...

Thu, Jun 2, 1pm

...

Thu, Jun 2, 5pm

...

 

...

Lustre failover testing.

...

 

...

NEH online nodes
ana02
psexport, psimport

...

NEH DAQ, outside ssh access

...

Thu, May 25th, noon

...

Thu, May 25th, 7pm

...

 

...

Server room upgrade, ana02 memory upgrade

...

Completed

...

psana

...

Science data access

...

Thu, May 12, 1pm

...

Thu, May 12th, 6pm

...

Thu, May 12th, 6.30pm

...

Lustre maintenance

...

Completed. Upgraded memory on psanaoss101-104, and replaced 10Gbit cards with 1 port SMCs. 717W power supplies are in place on psanaoss103-104 now.

...

psana

...

Science data access

...

Thu, May 5th, 1pm

...

Thu, May 5th, 5pm

...

Thu, May 5th, 5pm

...

Lustre maintenance

...

Completed

...

All

...

All

...

Fri, Apr 29, 6.30pm

...

Sun, May 1st, 11pm

...

Sun, May 1st, 9pm

...

NEH power outage

...

Completed

...

psana

...

Science data access

...

Thu, Apr 28, 2pm

...

Thu, Apr 28, 6pm

...

Thu, Apr 28, 3pm

...

Lustre maintenance
pssrv100 NFS volume reconstruction.

...

Completed
Lustre maintenance postponed.
Raid reconstruction pssrv100 will take 2-3 days. The new volume size is not released by the controller, so we will have to perform the file system resize on another day.

...

psana

...

Science data access

...

Fri, Apr 1st, 6pm

...

Mon, Apr 4th, 10am

...

 

...

NEH cooling outage

...

Completed

...

psana

...

Science data access

...

Thu, Mar 31st, 11am

...

Thu, Mar 31st, 5pm

...

 

...

Enabling HA for Lustre system

...

Completed

...

All

...

All

...

Sat, Mar 26th, 7am

...

Sat, Mar 26th, 7pm

...

Mon, Mar 28th, 1pm

...

NEH power cut

...

Completed

...

psana

...

Science data access

...

Thu, Mar 24, 11am

...

Thu, Mar 24, 5pm

...

 

...

Lustre testing

...

Completed

...

All

...

All

...

Wed, Mar 23rd, 10am

...

Wed, Mar 23rd, 3pm

...

 

...

NEH power cut

...

This power cut was NOT planned

Info

Add planned outages or maintenance activities using this project: https://jira.slac.stanford.edu/projects/CDSO/

Excerpt

Planned or ongoing

Jira
serverSLAC National Accelerator Laboratory
columnIdspriority,summary,customfield_11121,customfield_11122,description,assignee
columnspriority,summary,Target start,Target end,description,assignee
maximumIssues20
jqlQueryresolution = Unresolved and project = "LCLS CDS Outages"
serverId1b8dc293-975d-3f2d-b988-18fd9aec1546

Completed

Jira
serverSLAC National Accelerator Laboratory
columnIdssummary,description,resolutiondate
columnssummary,description,resolutiondate
maximumIssues20
jqlQuerystatus=done and project = "LCLS CDS Outages"
serverId1b8dc293-975d-3f2d-b988-18fd9aec1546

...

All

...

All

...

Sat, Mar 19th, 7am

...

Sat, Mar 19th, 7pm

...

Mon, Mar 21st, 10am

...

NEH power cut

...