Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This proposal is to expand the bullet cluster with combined funds from PPA, ATLAS, and Theory.  This would double our existing parallel file system size (173->346TB) and add either 1649 or 1904 cores depending on which option we choose.  The first option is to provision infiniband (IB) in all nodes and add IB switches to allow additional future expansion of the IB network.  Because of the IB network topology allowing future expansion implies a jump in the number of core switches from 4 to 8.  The second option would split the cluster into IB and non-IB parts with the ATLAS nodes being non-IB.  Note the pricing below is based on several different quotes that would have to be been refreshed.  Hence the pricing is approximate and hopefully not low-balledto be verified but very close to actual.  The details are:

Option 1: Expand to 18 fully populated chassis with all-IB and future expansion capability (revised for increased IB cost (+6k/chassis))
  • 6 full chassis @97.227k  => 583.4k (quote 661664648)
  • 5 7 blades w/IB added to existing empty slots @5.018k => 2535.1k 2k (quote 661664256)
  • 4 IB switches with cables 50.2k (includes active fiber IB cables for lustre switch) (quote 662008993)
  • 2 60x2TB disk trays with controllers @30.3k => 60.6k (quote 661663331)

Total is $719$729.3k for 1616 cores and storage expansion.
Gross bullet cluster core count would then be 2960+16161648=4576 4608 (all IB)

costs are:

  • ATLAS: 60 blades @4.9375k => 296.25k                    (960c) (Based on quote 659024769 for a non-IB full chassis)
  • Theory: 97.227k + 5*5.018k => 122.32k                     (336c)
  • PPA: 300310.73k 83k                                                           (320c352c)

Notes:

  • PPA cost/core is not meaningful because it includes the storage expansion and subsidizing the IB infrastructure for the Atlas bladesfor a few dollars more... +10K would get 2 more blades to fill out the remaining 2 empty slots,  so PPA --> $310k.  It does make sense to have full chassis given the infrastructure is already there.

Benefit here is that we have a uniform cluster.

...