...
The VMWare is cluster managed by OCIO Platforms team, and they can migrate VMs on demand.
fermi-vmclust01 and fermi-vmclust02 are not HA.
fermi-vmclust03 and fermivmclust04 are HA.
Note: OCIO should be contacted about setting up occasional VM snapshots, so that a VM may always be brought up even if a hypervisor dies.
...
Function/Service | Sub-Functions | Needed Servers | Needed Databases | Needed File Systems | Other Needs | Needed During Shutdown? | Available During Shutdown? |
---|---|---|---|---|---|---|---|
Mission Planning, LAT Configurations | FastCopy | fermilnx01 and | TCDB | AFS | Fermi LAT Portal: Timeline Webview; Confluence, JIRA, Mission Planning s/w, FastCopy Monitoring Sharepoint (reference for PROCs and Narrative Procedures for commanding in case of anomalies) | yes | |
Real Time Telemetry Monitoring | fermilnx01 and fermilnx02 | spread Fermi LAT Portal: Real Time Telemetry, Telemetry Monitor | during anomalies | ||||
Logging | fermilnx01 and fermilnx02 | TCDB | Fermi LAT Portal: Log Watcher | yes | |||
Trending | TCDB | Fermi LAT Portal: Telemetry Trending | yes | ||||
L0 File Ingest and Archive | FastCopy | L0 Archive | yes | ||||
Data Gap Checking and Reporting | FastCopy | fermilnx01 and fermilnx02 | L0 Archive | yes, continuously | |||
L1 processing | pipeline | SLAC Farm | Data Catalog | Fermi LAT Portal: Pipeline, Data Processing | yes | ||
L1 Data Quality Monitoring | Fermi LAT Portal, Telemetry Trending | ||||||
L1 delivery | FastCopy | fermilnx01 and fermilnx02 | Data Catalog | yes | |||
L2 processing (ASP) and Delivery | FastCopy | fermilnx01 and fermilnx02 | Data Catalog | Fermi LAT Portal: Pipeline, Data Processing | daily, weekly |
Hosts and Services
Host/VM or Service | Category† | server | Physical Server(s) | Functionfunction | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
xrootd | XC | fermi-gpfs01 fermi-gpfs02 fermi-gpfs05 fermi-gpfs06 fermi-gpfs07 fermi-gpfs08 | xrootd server and storage | fermilnx-v12 | XC/HA | fermi-vmclust | xrootd redirector | ||||
GPFS | XC | fermi-gpfs03 fermi-gpfs04 | Fermi NFS/GPFS storage | ||||||||
GPFS/NFS bridge | XC | fermi-cnfs01 fermi-cnfs02 | Fermi NFS storage access | ||||||||
NFS (ISOC) | HA | staas-gpfs50 staas-gpfs51 | Critical ISOC NFS storage | ||||||||
Oracle | HA | fermi-oracle03 | Oracle (primary) | ||||||||
Oracle | XC | fermi-oracle04 | Oracle (failover) | ||||||||
mysql-node03 | HA | mysql05 mysql06 | calibration, etc. DB | ||||||||
fermilnx01 | HA | fermilnx01 | LAT config, fastcopy and real-time telemetry | ||||||||
fermilnx02 | HA | fermilnx02 | LAT config, fastcopy and real-time telemetry | ||||||||
fermilnx-v02 | XC/HA | fermi-vmclust | xrootd redirector | ||||||||
fermilnx-v03 | XC/HA | fermi-vmclust | archiver | ||||||||
fermilnx-v04 | DataCatalog Crawler (Prod) | ||||||||||
fermilnx-v05/tomcat08 | XC/HA | fermi-oracle03vmclust | oracle primary | XCDataCatalog Web | |||||||
fermilnx-v06 | fermi-oracle04 | oracle secondary | |||||||||
mysql-node03 | HA | mysql05 mysql06 | calibration, etc. DB | ||||||||
vmclust | Xroot proxy server | ||||||||||
fermilnx-v07/tomcat01 | XC/HA | fermi-vmclust | Commons, Group manager | ||||||||
fermilnx-v08/tomcat02 glast-jobcontrol01 | fermi-vmclust | LSF Job Control Daemons (Notably glast, glastraw) | |||||||||
Note: No fermilnx-v09 | |||||||||||
fermilnx-v10/tomcat11 | XC | 400 cores | (50 "hequ" equivalents) batch hosts for LISOC queues={express,short,medium,long,glastdataq} users={glast,lsstsim,lsstprod,glastmc,glastraw} | XC | 200 cores | (25 "hequ" equivalents) batch hosts for Science Pipelines | fermilnx-v02 | XC/HA | fermi-vmclust[1] | xrootd redirectorDataProcessing | |
fermilnx-v07v11/tomcat01tomcat12 | XC/HA | fermi-vmclust | Commons, Group managerTelemetryTrending | ||||||||
fermilnx-v16/tomcat06v12 | XC/HA | fermi-vmclust | rm2xrootd redirector | ||||||||
fermilnx-v05/tomcat08 | XC/HAv13/tomcat05 | Pipeline-II (Prod) | |||||||||
fermilnx-v14/centaurusa | fermi-vmclustdataCatalog | This machine requires user login. This machine is used as a Fermi CVS server, and a subversion server for a variety of user groups. svn functionality should move elsewhere | |||||||||
fermilnx-v17v15/tomcat09pipeline-mail01 | XC/HA | fermi-vmclust | Pipeline-II email server (james) | ||||||||
fermilnx-v16/tomcat06 | XC/HA | fermi-vmclust | rm2 | ||||||||
fermilnx-v17/tomcat09v15/pipeline-mail01 | XC/HA | fermi-vmclust | Pipeline-II email server(Web) | ||||||||
fermilnx-v18/tomcat10 | XC/HA | fermi-vmclust | FCWebView, ISOCLogging, MPWebView TelemetryMonitor, TelemetryTableWebUI | ||||||||
fermilnx-v10/tomcat11 | XC/HAv19/tomcat04 | fermi-vmclust | elog | ||||||||
fermilnx7-v01 | fermi-vmclust | centos7 | Docker installed - must be added to proper group. | ||||||||
fermilnx7-v02 | fermi-vmclust | centos7 | Docker installed - must be added to proper group. | ||||||||
fermilnx7-v03 | fermi-vmclust | DataProcessingcentos7 | fermilnx-v11/tomcat12 | XC/HADocker installed - must be added to proper group. | |||||||
fermi-ci-test01 | fermi-vmclustTelemetryTrending | centos7 | Docker is installed. Runs a Jenkins worker. Jenkins can dispatch GlastRelease jobs to this node. | ||||||||
astore-new (HPSS) | NC | (non-Fermi server) | FastCopy data archive **We have been granted a temporary quota increase of 1 TB on /nfs/farm/g/glast/u23, which has allowed this item to become "NC"** | ||||||||
trscron | HA | (non-Fermi server) | tokenized cron | ||||||||
lnxcron | HA | (non-Fermi server) | cron | ||||||||
(farm manager, etc.) | XC | (non-Fermi server) | LSF management | ||||||||
HA | yfs01/NN (non-Fermi) | basically all of AFS | |||||||||
JIRA | HA | (non-Fermi server) | issue tracking (HA as of 10/20/2017) | ||||||||
XC | rhel6-64 | public login nodes (a small number is needed for interactive access) |
...
† Equipment categories
Category | Machine status |
---|---|
NC | non-critical for entire 16-day shutdown period |
XC | experiment critical but not in H.A. rack, only a few, short outages acceptable |
HA | high-availability (continuous operation) |
...