Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A python script has been developed to do the modeling. We are using "CPU factor" as the computing unit to account for differing oomphs of the various node types in the farm.

Jump to Summary

Needs Estimation

PPA projects were polled for their projected needs over the next few years. This is recapped here. Fermi intends to use its allocation fully for the next 2 years; that allocation is sufficient. BABAR should start ramping down, but they do not yet project that ramp. KIPAC sees a need for about 1600 cores for the needs of its users, including MPI and DES. All other PPA projects are small in comparison to these two, perhaps totaling 500 cores. In terms of allocation units, taking Fermi and BABAR at their current levels and an average core being about 13 units, the current needs estimate is:

...

Year

#hosts

#cores

SLAC-units

To Buy

Buy "bullets"

2013

729

7164

90698

18.5k

1.3k

2014

652

6568

83744

7k

0.5k

2015

575

5968

76736

7k

0.5k

2016

499

5376

69789

7k

0.5k

2017

420

4792

63039

7k

0.5k

2018

337

3944

52183

11k

0.8k

#Summary

Projections for PPA's cycles needs for the next few years are flat at 90k allocation units average. We expect the current capacity to be saturated for at least two years due to Fermi reprocessing and simulations needs for Pass8. It is still early for LSST to be needing serious cycles. It would be prudent to have some headroom, hence we recommend a 20% increase for peak bursts, taking us to a need for 109k allocation units.

The installed hardware is already old, except for the 2012-2013 purchase of "bullet" nodes as the first installment of the PPA common cluster purchases. The current capacity just matches the average need; 18.5k units are needed to cover peak usage. Depending on the retirement model, we need to replace the old hardware within two years at 25k units per year, or if via DNR, we need 7k units per year after this year.

If we were to buy the 18.5k units needed, this corresponds to about 1.3k bullet nodes. Currently 256 nodes cost $100k, so this could cost $500k.

We had planned to intersperse storage purchases in with compute nodes. The new cluster architecture is relying on Lustre as a shared file system, and also to provide scratch space for batch. Such an upgrade was anticipated and the 170 TB existing space can be doubled by adding trays to the existing servers for about $60k.