SCS bullet points for week ending 2013/11/22

In response to MPI problems on the bullet cluster, Scientific Computing Services staff completed work with customer groups and IBM/Platform to address priority and resource allocation issues.   Since the configuration was modified, 'bulletmpi' jobs typically account for more than 50% of the bullet cluster load and users have confirmed that their jobs now obtain the resources they request.   Long term goals for batch service may feature live job migration using VMs and Linux cgroups for restricting CPU and memory usage.   These changes are designed to provide optimal use of the bullet cluster for scientific computing customers at the Lab.

Scientific Computing Services worked with the Networking team to deploy link aggregation for the Unix infrastructure servers housed in the Building 50 High Availability (HA) rack.   The HA rack was installed in late 2012 to provide generator-backed power for critical infrastructure services.   To provide network redundancy, it was necessary to reconfigure servers to be able to use an alternate route in the event of a switch failure.   Over the course of many weeks, more than 70 Unix servers were reconfigured with minimal impact to the SLAC computing community.   The completion of this task provides an additional enhancement to computing service reliability for the Lab.