Blog from November, 2013

SCS bullet points for week ending 2013/11/22

In response to MPI problems on the bullet cluster, Scientific Computing Services staff completed work with customer groups and IBM/Platform to address priority and resource allocation issues.   Since the configuration was modified, 'bulletmpi' jobs typically account for more than 50% of the bullet cluster load and users have confirmed that their jobs now obtain the resources they request.   Long term goals for batch service may feature live job migration using VMs and Linux cgroups for restricting CPU and memory usage.   These changes are designed to provide optimal use of the bullet cluster for scientific computing customers at the Lab.

Scientific Computing Services worked with the Networking team to deploy link aggregation for the Unix infrastructure servers housed in the Building 50 High Availability (HA) rack.   The HA rack was installed in late 2012 to provide generator-backed power for critical infrastructure services.   To provide network redundancy, it was necessary to reconfigure servers to be able to use an alternate route in the event of a switch failure.   Over the course of many weeks, more than 70 Unix servers were reconfigured with minimal impact to the SLAC computing community.   The completion of this task provides an additional enhancement to computing service reliability for the Lab.

SCS Bullet pointss for week ending 2013/11/15

Scientific Computing Systems has implemented DNS split zone views to improve SLAC's security by not exposing data for non-internet routable machines. This also conforms with best practice recommendations for DNS configuration.

James Williams and several members of Scientific Computing Services attended a meeting with Rosio Alvarez, Adam Stone, and Gary Jung from LBNL to compare scientific computing at both labs.   The discussion, which included computing technologies, a comparison of services, and cost recovery models, will be valuable in formulating directions for scientific computing services at SLAC. 

Scientific Computing Services updated the Hierarchical Storage Interface (HSI) to the current release (4.0.1.3.p1).  HSI is a command line interface for admins and users of the High Performance Storage System (HPSS).  It offers a familiar Unix-style environment for working within the HPSS environment.  This software provides SLAC with efficient access to database information and the ability to script data migrations moves.

Adding 41 blade nodes (656 CPU cores) and additional Infiniband network hardware. Installation is complete and new nodes are in production.

Ordered storage server configuration with 60x4TB drives. Storage is now in production (/nfs/slac/g/ki/ki23)

 

Wei Yang (Scientific Computing Services), with support from Andrew Hanushevsky and Richard Mount (Scientific Computing Applications), published a paper at CHEP2013 on "Using Solid State Disk Array as a Cache for LHC ATLAS Data Analysis". It described the cache architecture and its positive impact on ATLAS data analysis, which also improves SLAC's batch system utilization.

Scientific Computing Services updated the webauth login pages in support of the Drupal project.  This new, improved version provides a consistent interface for SLAC web services.