Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This proposal is to expand the bullet cluster with combined funds from PPA, ATLAS, and Theory.  This would double our existing parallel file system size (173->346TB) and add either 1649 or 1904 cores depending on which option we choose.  The first option is to provision infiniband (IB) in all nodes and add IB switches to allow additional future expansion of the IB network.  Because of the IB network topology allowing future expansion implies a jump in the number of core switches from 4 to 8.  The second option would split the cluster into IB and non-IB parts with the ATLAS nodes being non-IB.  Note the pricing below is based on several different quotes that would have to be refreshed.  Hence the pricing is approximate and hopefully not low-balled.  The details are:

...

Option 1: Expand to 18 fully populated chassis with all-IB and future expansion capability (revised for increased IB cost (+6k/chassis))

...

  • 6 full chassis @91k @97.227k  => 546k583.4k
  • 7 blades w/IB added to existing empty slots @4.817k => 33.72k
  • 4 IB switches with cables @10.5k => 42k
  • 2 60x2TB disk trays with controllers @34k => 68k

Total is $690k $627k for 1648 cores and storage expansion.
Gross bullet cluster core count would then be 4608 (all IB)

...

  • ATLAS: 60 blades @4.9375k => 296.25k                    (960c)
  • Theory: 91k 97k + 7*4.817k => 124130.72k                            (368c)
  • PPA: 268k   300.1k                                                              (320c)

Note the PPA cost/core is bad because it includes the storage expansion and IB infrastructure.

Benefit here is that we have a uniform cluster.

Option 2: Expand to 15 full IB chassis and 4 non-IB chassis

...

We could get new GPU servers (kipac's are old!) which are equivalent to bullet blades with ~5000 gpu-cores for ~10k each.  So  So we could top off to 300k if we got 3 of these.  We (PPA) do need to replace or existing GPU "system" that is hosted by kipac.  A good case for adding some of these is presented here by Debbie Bard.

...