First Edition: 6 Dec 2019
Version 1.1 (10:50 PT 6 Dec 2019)
Note |
---|
On 8 Dec 2019 this outage was postponed until July 2020 |
...
[Tentative proposal] This Not many details are currently known, but this power outage will affect substations #7 (next to bldg 50) and #8 (located on the 4th floor of bldg 50). All of bldg 50 will be without normal power. The facilities (F&O) group plan to do their maintenance during the 4-day period starting 26 Dec 2019. However, the outage will start earlier due to lack of staff during the holiday shutdown. Minimally, it is expected that all H.A. (High Availability) and experiment-critical equipment will be powered throughout the 16+ days of the holiday shutdown. This page captures what Fermi will need to maintain a minimal data processing effort running during the outage.
Note |
---|
Note that the ability to perform general science analysis at SLAC by the LAT collaboration will be seriously hindered by this outage due to the fact that much of the batch farm will be unavailable. |
Date | Time | Equipment * | Action |
---|---|---|---|
A day or two prior to 20 Dec 2019 | TBA | Test of power source switching (i.e., normal line power to generator) | |
Fri 20 Dec 2019 | TBA | switch to generator power (this could happen earlier) This will require a several-hour outage | |
Mon 6 Jan 2020 | TBA | return to normal power. This will require a several-hour outage |
...
...
...
...
Category† | server | VM/service | function |
---|---|---|---|
XC | fermi-gpfs01 fermi-gpfs02 fermi-gpfs05 fermi-gpfs06 fermi-gpfs07 fermi-gpfs08 | xrootd | xrootd server and storage |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v02 | xrootd redirector |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v12 | xrootd redirector |
XC | fermi-gpfs03 fermi-gpfs04 | GPFS | Fermi NFS/GPFS storage |
XC | fermi-cnfs01 fermi-cnfs02 | GPFS/NFS bridge | Fermi NFS storage access |
HA | staas-gpfs50 staas-gpfs51 | Critical ISOC NFS storage | |
HA | fermilnx01 | LAT config, fastcopy and real-time telemetry | |
HA | fermilnx02 | LAT config, fastcopy and real-time telemetry | |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v03 | archiver |
HA | fermi-oracle03 | oracle primary | |
XC | fermi-oracle04 | oracle secondary | |
HA | mysql05 mysql06 | mysql-node03 | calibration, etc. DB |
XC | 400 cores | (50 "hequ" equivalents) batch hosts for LISOC queues={express,short,medium,long,glastdataq} users={glast,lsstsim,lsstprod,glastmc,glastraw} | |
XC | 200 cores | (25 "hequ" equivalents) batch hosts for Science Pipelines | |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v07/tomcat01 | Commons, Group manager |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v16/tomcat06 | rm2 |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v05/tomcat08 | dataCatalog |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v17/tomcat09 | Pipeline-II |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v15/pipeline-mail01 | Pipeline-II email server |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v18/tomcat10 | FCWebView, ISOCLogging, MPWebView TelemetryMonitor, TelemetryTableWebUI |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v10/tomcat11 | DataProcessing |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v11/tomcat12 | TelemetryTrending |
NC | (non-Fermi server) | astore-new (HPSS) | FastCopy data archive **We have been granted a temporary quota increase of 1 TB on /nfs/farm/g/glast/u23, which has allowed this item to become "NC"** |
HA | (non-Fermi server) | trscron | tokenized cron |
HA | (non-Fermi server) | lnxcron | cron |
XC | (non-Fermi server) | (farm manager, etc.) | LSF management |
HA | yfs01/NN (non-Fermi) | basically all of AFS | |
HA | (non-Fermi server) | JIRA | issue tracking (HA as of 10/20/2017) |
XC | rhel6-64 | public login nodes (a small number is needed for interactive access) |
† Equipment categories
Category | Machine status |
---|---|
NC | non-critical for entire 16-day shutdown period |
XC | experiment critical but not in H.A. rack, only a few, short outages acceptable |
HA | high-availability (continuous operation) |
...
Machine Type | Total | Notes | |
---|---|---|---|
GPFS servers | 8 | ||
NFS/GPFS bridge | 2 | ||
VMware hypervisors | 2 | Not needed if all Fermi services can be moved to the two H.A. hypervisors | |
batch nodes ("hequ" equivalents) | 75 | Depending on which batch nodes are selected, some may already be in H.A. power | |
Oracle servers | 1 | There is rumor that this machine may already be on H.A. power – to be confirmed | |
Public login nodes | N | (where "N" is a small integer) | |
TOTAL | 88 | +N |
Note Note that HPSS is NOT required by Fermi.
...