Notes on VMWare Cluster
The VMWare cluster, noted as fermi-vmclust
, consists of 4 hypervisors. Two of the machines are on HA, and two are not. A VM may migrate between these two hypervisors on demand- the machines have 128GB of memory and two hypervisors are enough to run all the VMs without oversubscription of memory.
...
Note: OCIO should be contacted about setting up occasional VM snapshots, so that a VM may always be brought up even if a hypervisor dies.
Hosts and Services
Host/Service | Category† | Physical Server(s) | Function | |
---|---|---|---|---|
xrootd | XC | fermi-gpfs01 fermi-gpfs02 fermi-gpfs05 fermi-gpfs06 fermi-gpfs07 fermi-gpfs08 | xrootd server and storage | |
GPFS | XC | fermi-gpfs03 fermi-gpfs04 | Fermi NFS/GPFS storage | |
GPFS/NFS bridge | XC | fermi-cnfs01 fermi-cnfs02 | Fermi NFS storage access | |
NFS (ISOC) | HA | staas-gpfs50 staas-gpfs51 | Critical ISOC NFS storage | |
Oracle | HA | fermi-oracle03 | Oracle (primary) | |
Oracle | XC | fermi-oracle04 | Oracle (failover) | |
mysql-node03 | HA | mysql05 mysql06 | calibration, etc. DB | |
fermilnx01 | HA | fermilnx01 | LAT config, fastcopy and real-time telemetry | |
fermilnx02 | HA | fermilnx02 | LAT config, fastcopy and real-time telemetry | |
fermilnx-v02 | XC/HA | fermi-vmclust | xrootd redirector | |
fermilnx-v03 | XC/HA | fermi-vmclust | archiver | |
fermilnx-v04 | DataCatalog Crawler (Prod) | |||
fermilnx-v05/tomcat08 | XC/HA | fermi-vmclust | DataCatalog Web | |
fermilnx-v06 | fermi-vmclust | Xroot proxy server | ||
fermilnx-v07/tomcat01 | XC/HA | fermi-vmclust | Commons, Group manager | |
fermilnx-v08/tomcat02 glast-jobcontrol01 | fermi-vmclust | LSF Job Control Daemons (Notably glast, glastraw) | ||
Note: No fermilnx-v09 | ||||
fermilnx-v10/tomcat11 | XC/HA | fermi-vmclust | DataProcessing | |
fermilnx-v11/tomcat12 | XC/HA | fermi-vmclust | TelemetryTrending | |
fermilnx-v12 | XC/HA | fermi-vmclust | xrootd redirector | |
fermilnx-v13/tomcat05 | Pipeline-II (Prod) | |||
fermilnx-v14/centaurusa | fermi-vmclust | This machine requires user login. This machine is used as a Fermi CVS server, and a subversion server for a variety of user groups. svn functionality should move elsewhere | ||
fermilnx-v15/pipeline-mail01 | XC/HA | fermi-vmclust | Pipeline-II email server (james) | |
fermilnx-v16/tomcat06 | XC/HA | fermi-vmclust | rm2 | |
fermilnx-v17/tomcat09 | XC/HA | fermi-vmclust | Pipeline-II (Web) | |
fermilnx-v18/tomcat10 | XC/HA | fermi-vmclust | FCWebView, ISOCLogging, MPWebView TelemetryMonitor, TelemetryTableWebUI | |
fermilnx-v19/tomcat04 | fermi-vmclust | elog | ||
fermilnx7-v01 | fermi-vmclust | centos7 | Docker installed - must be added to proper group. | |
fermilnx7-v02 | fermi-vmclust | centos7 | Docker installed - must be added to proper group. | |
fermilnx7-v03 | fermi-vmclust | centos7 | Docker installed - must be added to proper group. | |
fermi-ci-test01 | fermi-vmclust | centos7 | Docker is installed. Runs a Jenkins worker. Jenkins can dispatch GlastRelease jobs to this node. | |
astore-new (HPSS) | NC | (non-Fermi server) | FastCopy data archive **We have been granted a temporary quota increase of 1 TB on /nfs/farm/g/glast/u23, which has allowed this item to become "NC"** | |
trscron | HA | (non-Fermi server) | tokenized cron | |
lnxcron | HA | (non-Fermi server) | cron | |
(farm manager, etc.) | XC | (non-Fermi server) | LSF management | |
HA | yfs01/NN (non-Fermi) | basically all of AFS | ||
JIRA | HA | (non-Fermi server) | issue tracking (HA as of 10/20/2017) | |
XC | rhel6-64 | public login nodes (a small number is needed for interactive access) |
...
Category | Machine status |
---|---|
NC | non-critical for entire 16-day shutdown period |
XC | experiment critical but not in H.A. rack, only a few, short outages acceptable |
HA | high-availability (continuous operation) |
Table of LISOC Tasks and Services
Function/Service | Sub-Functions | Needed Servers | Needed Databases | Needed File Systems | Other Needs | Needed During Shutdown? | Available During Shutdown? |
---|---|---|---|---|---|---|---|
Mission Planning, LAT Configurations | FastCopy | fermilnx01 and | TCDB | AFS | Fermi LAT Portal: Timeline Webview; Confluence, JIRA, Mission Planning s/w, FastCopy Monitoring Sharepoint (reference for PROCs and Narrative Procedures for commanding in case of anomalies) | yes | |
Real Time Telemetry Monitoring | fermilnx01 and fermilnx02 | spread Fermi LAT Portal: Real Time Telemetry, Telemetry Monitor | during anomalies | ||||
Logging | fermilnx01 and fermilnx02 | TCDB | Fermi LAT Portal: Log Watcher | yes | |||
Trending | TCDB | Fermi LAT Portal: Telemetry Trending | yes | ||||
L0 File Ingest and Archive | FastCopy | L0 Archive | yes | ||||
Data Gap Checking and Reporting | FastCopy | fermilnx01 and fermilnx02 | L0 Archive | yes, continuously | |||
L1 processing | pipeline | SLAC Farm | Data Catalog | Fermi LAT Portal: Pipeline, Data Processing | yes | ||
L1 Data Quality Monitoring | Fermi LAT Portal, Telemetry Trending | ||||||
L1 delivery | FastCopy | fermilnx01 and fermilnx02 | Data Catalog | yes | |||
L2 processing (ASP) and Delivery | FastCopy | fermilnx01 and fermilnx02 | Data Catalog | Fermi LAT Portal: Pipeline, Data Processing | daily, weekly |
...