Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Chart
typebar
titleLSF Job totals

Advanced Tables - CSV Table
urlhttp://www.slac.stanford.edu/~systems/metrics/lsf/lsf_running.csv

Recent Accomplishments

Scientific Computing Services worked with Fermi, Atlas, and BaBar to reallocate 10,000 shares from each group and provide a total of 30,000 shares to the Theory group on a temporary basis.  A  special queue has been set up to provide the parameters that would enable Theory to use the shares in a more intense manner than the regular queues would allow.  This will help the Theory group prepare for the Snowmass meeting at the end of the month. 

Scientific Computing Services has been working with Networking to test link aggregation for critical servers.   The testing is complete and SCS will begin rolling out this networking protocol to other crucial servers over  the next few months.  In the event that one of the networking connections fails, this strategy provides network redundancy and increases the availability of critical services for the lab.

Issues:

Scientific Computing Services is still dealing with fallout from the May 30 power outage.   Approximately 20 batch machines are down due to hardware issues that developed as a result of the sudden power loss.

Scientific Computing Services continues to work on LCLS/PCDS storage problems following the May 30 power outage .  A hardware RAID controller failed and may be responsible for corrupting one of the 1PB Lustre file systems.  Repair work is underway and the file system is currently offline.

Clusters and High Performance Computing

...