Blog

A member of Scientific Computing Systems has completed work to enable webauth to integrate with Windows Desktop Single Sign on for the Drupal project. This feature will be deployed on September 9th, enabling a properly configured browser to use the desktop's kerberos credentials to access webauth protected pages without requiring the user to re-type the username and password at the webauth login screen.

Scientific Computing Services recently modified the LSF batch configuration to improve the scheduling of parallel MPI jobs which may request all of the CPU cores on one or more hosts.  This was an issue on the PPA-funded "bullet" cluster which provides compute cycles for both single-slot and parallel MPI jobs.  SCS is also working with the batch system vendor (IBM) in order to leverage features that may improve the batch MPI service for scientific computing customers.

Several system administrators from Scientific Computing Services attended an Intel presentation on the Lustre parallel file system.   The discussion about the types of applications that are suited to Lustre and best practices for storage hardware configurations was beneficial for our support of more than 3 PetaBytes of SLAC scientific data stored on Lustre servers.

Hardware for the SCS GPFS development filesystem has started to arrive in building 50. Hardware installation is finished and ready for GPFS development and test work.

330TB Fermi xrootd server (FERMI-XRD012) is installed and in production.

Ordered new KIPAC servers that will host interactive login VMs. Installation is complete and VMs are in production.

KI-NFS05 is now in production.

Installation is complete. (Jul 17)

New database server (EXODB01) has been installed and is in production. Compute server online and hosting EXO VMs. Storage server (EXOSERV05) is in production.

Scientific Computing Services has upgraded two clusters in the batch general fairshare queues to RHEL6-64. Outbound TCP connections from these two clusters (dole and kiso) are also enabled.  This will allow ATLAS and other experiments to run computational jobs on their required operating system and will also permit those jobs to access large volumes of data outside of SLAC.

Scientific Computing Services successfully migrated its High Performance Storage System (HPSS) databases from raw disk partitions to file system partitions.  Future HPSS software upgrades will make file system partitions mandatory for database storage.  This migration proactively enables SLAC to remain current with those planned changes.

Scientific Computing Services completed the migration to the latest version of LSF 9.1 (Load Sharing Facility) for batch job management.   The upgrade was done with assistance from our science customers and from Neal Adams in the Platform group within the Computing Division.   This release of LSF provides many new features of interest to the scientific computing community. 

Scientific Computing Services recently installed a new GPU compute server for SSRL. The system includes an NVIDIA 'Kepler' GPU with 2496 cores and the CUDA programming environment.  This hardware configuration could form the standard for a larger GPU cluster which would address needs expressed by other customers. The batch compute system migration to LSF 9.1 will also provide better integration and support for GPUs for the scientific community.

On July 19, 2013 at about 8:26 AM, Computing Division staff detected a loss of cooled water to Building 50.   Scientific Computing Services staff responded quickly by powering down about 700 batch servers at around 8:45 AM, as temperatures were rising in the machine room.  Services were restored by about 11:15 AM.   Shutting down the servers mitigated problems that might have developed with components inside the systems.

Scientific Computing Services upgraded the batch RTM (Real Time Monitoring) utility to the latest version.   The upgraded version of RTM will function with the current production version of LSF (Load Sharing Facility) and will also work when  the LSF software is upgraded to version 9.1.   RTM provides scientific computing customers with a visual representation of the state of the batch queues.

Scientific Computing Services staff completed the relocation of more than 25 machines to create more contiguous open rack space.   This involved coordination with Networking, Data Center Operations and with our scientific customers to move file, database, and infrastructure servers.   The result is more capacity in Building 50 for new systems that arrive before the new Stanford Research Computing Facility (SRCF) opens in January of 2014.

Scientific Computing Services has implemented a deployment pipeline for efficient hardware installations.   This includes the development of a standard for hardware acquisitions, IPMI for remote server management, and serial-over-lan  for console and logging.   SCS trained Computing Division technical coordinator staff to handle the initial server BIOS setup and console configuration, streamlining the installation process and speeding up the deployment of new systems for Scientific Computing.

Scientific Computing Services worked with the PPA directorate and with Facilities to negotiate a date when the "black box" batch systems will be decommissioned.   These machines were purchased in 2007 and are housed in specially-equipped shipping containers outside of Building 50.   The chiller to cool these systems failed after the May 30 power outage.   Shutting these systems down on July 15 will lower our overall power consumption and save the lab $13,600 in repair costs that would have been spent on an aging outdoor computing facility.