First Edition: 6 Dec 2019
Version 1.1 (10:50 PT 6 Dec 2019)
Note |
---|
On 8 Dec 2019 this outage was postponed until July 2020 |
...
Note |
---|
Note that the ability to perform general science analysis at SLAC by the LAT collaboration will be seriously hindered by this outage due to the fact that much of the batch farm will be unavailable. |
Date | Time | Equipment * | ActionAction | |
---|---|---|---|---|
A day or two prior to 20 Dec 2019 | TBA | Test of power source switching (i.e., normal line power to generator) | ||
Fri 20 Dec 2019 | TBA | switch to generator power (this could happen earlier) This will require a several-hour outage | ||
Mon 6 Jan 2020 | TBA | return to normal power. This will require a several-hour outage |
...
Category† | server | VM/service | function |
---|---|---|---|
XC | fermi-gpfs01 fermi-gpfs02 fermi-gpfs05 fermi-gpfs06 fermi-gpfs07 fermi-gpfs08 | xrootd | xrootd server and storage |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v02 | xrootd redirector |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v12 | xrootd redirector |
XC | fermi-gpfs03 fermi-gpfs04 | GPFS | Fermi NFS/GPFS storage |
XC | fermi-cnfs01 fermi-cnfs02 | GPFS/NFS bridge | Fermi NFS storage access |
HA | staas-gpfs50 staas-gpfs51 | Critical ISOC NFS storage | |
HA | fermilnx01 | LAT config, fastcopy and real-time telemetry | |
HA | fermilnx02 | LAT config, fastcopy and real-time telemetry | |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v03 | archiver |
HA | fermi-oracle03 | oracle primary | |
XC | fermi-oracle04 | oracle secondary | |
HA | mysql05 mysql06 | mysql-node03 | calibration, etc. DB |
XC | 400 cores | (50 "hequ" equivalents) batch hosts for LISOC queues={express,short,medium,long,glastdataq} users={glast,lsstsim,lsstprod,glastmc,glastraw} | |
XC | 200 cores | (25 "hequ" equivalents) batch hosts for Science Pipelines | |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v07/tomcat01 | Commons, Group manager |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v16/tomcat06 | rm2 |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v05/tomcat08 | dataCatalog |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v17/tomcat09 | Pipeline-II |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v15/pipeline-mail01 | Pipeline-II email server |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v18/tomcat10 | FCWebView, ISOCLogging, MPWebView TelemetryMonitor, TelemetryTableWebUI |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v10/tomcat11 | DataProcessing |
XC/HA | fermi-vmclust01/02/03/04 | fermilnx-v11/tomcat12 | TelemetryTrending |
NC | (non-Fermi server) | astore-new (HPSS) | FastCopy data archive **We have been granted a temporary quota increase of 1 TB on /nfs/farm/g/glast/u23, which has allowed this item to become "NC"** |
HA | (non-Fermi server) | trscron | tokenized cron |
HA | (non-Fermi server) | lnxcron | cron |
XC | (non-Fermi server) | (farm manager, etc.) | LSF management |
HA | yfs01/NN (non-Fermi) | basically all of AFS | |
HA | (non-Fermi server) | JIRA | issue tracking (HA as of 10/20/2017) |
XC | rhel6-64 | public login nodes (a small number is needed for interactive access) |
† Equipment categories
Category | Machine status |
---|---|
NC | non-critical for entire 16-day shutdown period |
XC | experiment critical but not in H.A. rack, only a few, short outages acceptable |
HA | high-availability (continuous operation) |
...
Machine Type | Total | Notes |
---|---|---|
GPFS servers | 8 | |
NFS/GPFS bridge | 2 | |
VMware hypervisors | 2 | Not needed if all Fermi services can be moved to the two H.A. hypervisors |
batch nodes ("hequ" equivalents) | 75 | Depending on which batch nodes are selected, some may already be in H.A. power |
Oracle servers | 1 | There is rumor that this machine may already be on H.A. power – to be confirmed |
Public login nodes | N | (where "N" is a small integer) |
TOTAL | 88+N |
Note that HPSS is NOT required by Fermi.
...