You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

PPA has started buying hardware for the common good across the directorate. This was initiated in 2012 with the first purchase of 185 "Bullet" nodes in early 2013. These are infinband-connected, with Lustre storage. Historically the cluster was provisioned largely for BABAR, with other experiments riding its coattails. Currently there are three projects of comparable batch allocation size: BABAR, ATLAS and Fermi. BABAR stopped taking data in 2009 and it is presumed that their usage will tail off; Fermi is in routine operations with modest near real time needs and a 1.5-2 year program of intensive work around its "Pass8" reconstruction revamp; ATLAS operates a Tier2 center at SLAC and as such can be viewed as a contractual agreement to provide a certain level of cycles continuously. It is imagined that at some point, LSST will start increasing its needs, but at this time - 8 years from first light - those needs are still unspecified.

 The modeling has 3 components:

  • inventory of existing hardware
  • model for retirement vs time
  • model for project needs vs time
Purchase Record of Existing Hardware

Purchase Year

Node type

Node Count

Cores per node

CPU factor

2006

yili

156

4

8.46

2007

bali

252

4

10.

2007

boer

135

4

10.

2008

fell

164+179

8

11.

2009

hequ

192

8

14.6

2009

orange

96

8

10.

2010

dole

38

12

15.6

2011

kiso

68

24

12.2

2013

bullet

185

16

14.

Of these, ATLAS owns 78 boers, 40 fells, 40 hequs, 38 doles and 68 kiss. Also, note that as of 2013-07-15, the Black Boxes were retired, taking with them all the balis, boers and all but 25 of the yilis.

Snapshot of Inventory for Modeling - ATLAS hardware removed

Purchase Year

Node type

Node Count

Cores per node

CPU factor

2006

yili

25

4

8.46

2008

fell

277

8

11.

2009

hequ

192

8

14.6

2009

orange

96

8

10.

2013

bullet

185

16

14.

Retirement Models

Two models have been considered: strict age cut (eg all machines older than 5 years are retired) and a do not resuscitate model ("DNR" - machines out of Maintech support left to die). The age cut presumably allows better planning of the physical layout of the data center, as the DNR model would leave holes by happenstance. On the other hand, the DNR model leaves useful hardware in place with minimal effort, but does assume that floor space and power are not factors in the cost.

 In practice, we may adopt a hybrid of these two, especially since a strict age cutoff would make sudden drops in capacity, given our acquisition history.

  • No labels