Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Activity 

start date

end date

status

link to further information

IPv6

2013-03-06

2013-09-30

in process

https://portal.slac.stanford.edu/info/ITPO/IPv6_phase1/SitePages/Home.aspx

LSF upgrade

2013-03-01

 

testing in progress

http://www.slac.stanford.edu/comp/unix/news/2013-04-09-LSF9.1.html
LSF9 migration for Fermi https://www-rt.slac.stanford.edu/rt3//Ticket/Display.html?id=455633

Cyber Safety planning and reviews

2013-04-22

2013-08-31

in process

https://slacspace.slac.stanford.edu/Operations/SCCS/Ops/Shared%20Documents/Forms/AllItems.aspx

Cyber Safety sudo_all

2013-03-15

2013-06-07

sudo all completed
Workgroup and user sudo
75% complete.

Proposal for Tracking sudo all privs
Status of conversion to new Sudo Process

PCDS - task list

2013-05-16

 

in process

PCDS Task List (original request)
https://www-rt.slac.stanford.edu/rt3/Ticket/Display.html?id=455960 (tracking ticket)

LCLS Unix account password process

2013-07 (proposed start date)

 

waiting resources

LCLS Unix account password process
https://www-rt.slac.stanford.edu/rt3/Ticket/Display.html?id=447547

.

New hardware planning and acquisitions



ongoing

New Hardware

Accomplishments

...

Clusters and High Performance Computing

...

 2013/06/14: The Scientific Computing Services storage team contacted NERSC and Vanderbilt University to gather information about their General Parallel File System (GPFS) deployments.  This allows us to learn from their experiences as we look at beginning our own deployment for SLAC scientific customers.

2013/06/07: Following an unexpected power outage on Thursday, May 30th, Scientific Computing Services restored services within 4 hours of the return of power and chilled water to Building 50.  SCS also responded to the failure of a controller in the PCDS/LCLS Lustre storage system, returning it to service by Friday evening.  The restoration of services enabled the Scientific Computing community to continue with their experiments and programs.

2013/06/07: Scientific Computing Services worked with Datacenter Technical Coordinators to modernize the server management infrastructure in Building 50. New server installations no longer require obsolete serial communications hardware. This will reduce cost overheads and shorten the amount of time required for initial system setup and deployment.

2013/05/24: LCLS users reported that they were unable to access various files stored on a 1PB Lustre filesystem. Scientific Computing Services diagnosed the problem and ran utilities to repair file system inconsistencies, restoring the access to user's files.

2013/05/10: The new PPA bullet cluster (~2900 cores) is now available to all SLAC Unix users via the production batch system.  This introduced the capability of selecting a newer release of the RedHat operating system.  Scientific Computing Services worked with key customer groups including Fermi, KIPAC and EXO in order to minimize disruption to their production environments and ensure the cluster will support parallel and single-core jobs.

...

2013/05/03: Scientific Computing Services is working with IBM to give a presentation on the new features of LSF 9.1 to scientific computing customers.   Along with the presentation, SCS staff will provide an overview of our use of MPI applications in our cluster environment.   This interaction will improve understanding between IBM and SLAC regarding the use of LSF and clarify features that would be valuable in this software product.

2013/04/19: Scientific Computing Services responded quickly to the April 9 power fluctuation and temporary chilled water loss that impacted services for research computing.   In addition, SCS revised the documentation and processes surrounding emergency response to such an event.   This enhances our ability to provide continuity of services for the lab.

2013/04/05: Scientific Computing Services has completed the initial tuning of the PPA cluster hardware for parallel computation.  Test runs included 256-core and 1024-core jobs using OpenMPI on the 40Gb/sec Infiniband network.   All 2900 compute cores will be made available to the general queues in addition to a high priority MPI queue.   This tuning has improved the overall performance of the cluster for scientific computing and research.

...

Storage

...

2013/0603/1429: The Scientific Computing Services storage team contacted NERSC and Vanderbilt University to gather information about their General Parallel File System (GPFS) deployments.  This allows us to learn from their experiences as we look at beginning our own deployment for SLAC scientific customers.

2013/06/07: Following an unexpected power outage on Thursday, May 30th, Scientific Computing Services restored services within 4 hours of the return of power and chilled water to Building 50.  SCS also responded to the failure of a controller in the PCDS/LCLS Lustre storage system, returning it to service by Friday evening.  The restoration of services enabled the Scientific Computing community to continue with their experiments and programs.

2013/05/24: LCLS users reported that they were unable to access various files stored on a 1PB Lustre filesystem. Scientific Computing Services diagnosed the problem and ran utilities to repair file system inconsistencies, restoring the access to user's files.

Scientific Computing Services has 3560 machines under configuration management, an increase of 3.2% over the previous month.   This increase is primarily in batch systems which provide additional support for scientific computing at the lab.

2013/2013/03/29: Scientific Computing Services has added 200 tapes to our tape libraries, providing more than a PetaByte of tape storage for our LCLS customers.

...

2013/04/12: Scientific Computing Services developed the automated tools for reviewing accounts with elevated privileges.   A process was established for handling this review at regular intervals.   In response to a DOE finding, 170 tickets were created to review and approve privileged accounts.   This supports the Cyber Safety program at SLAC and meets the DOE deadline for this security requirement.

...

Infrastructure Services

...

2013/06/07: Scientific Computing Services worked with Datacenter Technical Coordinators to modernize the server management infrastructure in Building 50. New server installations no longer require obsolete serial communications hardware. This will reduce cost overheads and shorten the amount of time required for initial system setup and deployment.

2013/04/19: Scientific Computing Services responded quickly to the April 9 power fluctuation and temporary chilled water loss that impacted services for research computing.   In addition, SCS revised the documentation and processes surrounding emergency response to such an event.   This enhances our ability to provide continuity of services for the lab.

2013/03/29: Scientific Computing Services has 3560 machines under configuration management, an increase of 3.2% over the previous month.   This increase is primarily in batch systems which provide additional support for scientific computing at the lab.

SCSPub space will be used as a repository of information from Scientific Computing Services which can be shared with others at SLAC. This information will include notes, agendas, working papers and proposals, etc, which the group wants to share with others at SLAC.  In addition, for certain types of documents needing version or check-out/check-in control there is a document library on the SCS SharePoint site.