Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The VMWare is cluster managed by OCIO Platforms team, and they can migrate VMs on demand.

fermi-vmclust01 and fermi-vmclust02 are not HA.

fermi-vmclust03 and fermivmclust04 are HA.

Note: OCIO should be contacted about setting up occasional VM snapshots, so that a VM may always be brought up even if a hypervisor dies.

...

Function/ServiceSub-FunctionsNeeded ServersNeeded DatabasesNeeded File SystemsOther NeedsNeeded During Shutdown?Available During Shutdown?
Mission Planning, LAT ConfigurationsFastCopy

fermilnx01 and
fermilnx02

TCDB

AFS

Fermi LAT Portal: Timeline Webview; Confluence, JIRA, Mission Planning s/w, FastCopy Monitoring

Sharepoint (reference for PROCs and Narrative Procedures for commanding in case of anomalies)

yes
Real Time Telemetry Monitoring
fermilnx01 and fermilnx02

spread

Fermi LAT Portal: Real Time Telemetry, Telemetry Monitor

during anomalies
Logging
fermilnx01 and fermilnx02TCDB
Fermi LAT Portal: Log Watcheryes
Trending

TCDB
Fermi LAT Portal: Telemetry Trendingyes
L0 File Ingest and ArchiveFastCopy
L0 Archive

yes
Data Gap Checking and ReportingFastCopyfermilnx01 and fermilnx02L0 Archive

yes, continuously
L1 processingpipelineSLAC FarmData Catalog
Fermi LAT Portal: Pipeline, Data Processingyes
L1 Data Quality Monitoring



Fermi LAT Portal, Telemetry Trending

L1 deliveryFastCopyfermilnx01 and fermilnx02Data Catalog

yes
L2 processing (ASP) and DeliveryFastCopyfermilnx01 and fermilnx02Data Catalog
Fermi LAT Portal: Pipeline, Data Processingdaily, weekly

Hosts and Services

XCrm2XC/HAXC/HAXC/HA
Host/VM or ServiceCategory†serverPhysical Server(s)
Functionfunction
xrootdXC

fermi-gpfs01

fermi-gpfs02

fermi-gpfs05

fermi-gpfs06

fermi-gpfs07

fermi-gpfs08


xrootd server and storagefermilnx-v12XC/HAfermi-vmclustxrootd redirector
GPFSXC

fermi-gpfs03

fermi-gpfs04


Fermi NFS/GPFS storage
GPFS/NFS bridgeXC

fermi-cnfs01

fermi-cnfs02


Fermi NFS storage access
NFS (ISOC)HA

staas-gpfs50

staas-gpfs51


Critical ISOC NFS storage
OracleHAfermi-oracle03
Oracle (primary)
OracleXCfermi-oracle04
Oracle (failover)
mysql-node03HA

mysql05

mysql06


calibration, etc. DB
fermilnx01HAfermilnx01
LAT config, fastcopy and real-time telemetry
fermilnx02HAfermilnx02
LAT config, fastcopy and real-time telemetry
fermilnx-v02XC/HAfermi-vmclust
xrootd redirector
fermilnx-v03XC/HAfermi-vmclust
archiver
fermilnx-v04


DataCatalog Crawler (Prod)
fermilnx-v05/tomcat08XC/HAfermi-oracle03vmclustoracle primary
DataCatalog Web
fermilnx-v06
fermi-oracle04oracle secondary
mysql-node03HA

mysql05

mysql06

calibration, etc. DB
vmclust
Xroot proxy server
fermilnx-v07/tomcat01XC/HAfermi-vmclust
Commons, Group manager

fermilnx-v08/tomcat02

glast-jobcontrol01


fermi-vmclust

LSF Job Control Daemons (Notably glast, glastraw)

Note: No fermilnx-v09



fermilnx-v10/tomcat11XC400 cores(50 "hequ" equivalents) batch hosts for LISOC
queues={express,short,medium,long,glastdataq}
users={glast,lsstsim,lsstprod,glastmc,glastraw}
XC200 cores
(25 "hequ" equivalents) batch hosts for Science Pipelinesfermilnx-v02XC/HAfermi-vmclust[1]
xrootd redirectorDataProcessing
fermilnx-v07v11/tomcat01tomcat12XC/HAfermi-vmclust
Commons, Group managerTelemetryTrending
fermilnx-v16/tomcat06v12XC/HAfermi-vmclust
xrootd redirector
fermilnx-v05/tomcat08v13/tomcat05


Pipeline-II (Prod)
fermilnx-v14/centaurusa
fermi-vmclustdataCatalog
This machine requires user login. This machine is used as a Fermi CVS server, and a subversion server for a variety of user groups. svn functionality should move elsewhere
fermilnx-v17v15/tomcat09pipeline-mail01XC/HAfermi-vmclust
Pipeline-II email server (james)
fermilnx-v16/tomcat06XC/HAfermi-vmclust
rm2
fermilnx-v17/tomcat09v15/pipeline-mail01XC/HAfermi-vmclust
Pipeline-II email server(Web)
fermilnx-v18/tomcat10XC/HAfermi-vmclust
FCWebView, ISOCLogging, MPWebView
TelemetryMonitor, TelemetryTableWebUI
fermilnx-v10/tomcat11v19/tomcat04
fermi-vmclust

elog

fermilnx7-v01
fermi-vmclustcentos7Docker installed - must be added to proper group.
fermilnx7-v02
fermi-vmclustcentos7Docker installed - must be added to proper group.
fermilnx7-v03
fermi-vmclustDataProcessingcentos7fermilnx-v11/tomcat12Docker installed - must be added to proper group.
fermi-ci-test01
fermi-vmclustTelemetryTrendingcentos7Docker is installed. Runs a Jenkins worker. Jenkins can dispatch GlastRelease jobs to this node.
astore-new (HPSS)NC(non-Fermi server)
FastCopy data archive
**We have been granted a temporary quota increase of 1 TB on /nfs/farm/g/glast/u23, which has allowed this item to become "NC"**
trscronHA(non-Fermi server)
tokenized cron
lnxcronHA(non-Fermi server)
cron
(farm manager, etc.)XC(non-Fermi server)
LSF management

HAyfs01/NN (non-Fermi)
basically all of AFS
JIRAHA(non-Fermi server)
issue tracking (HA as of 10/20/2017)

XCrhel6-64
public login nodes (a small number is needed for interactive access)

...



† Equipment categories

Category
Machine status
NCnon-critical for entire 16-day shutdown period
XCexperiment critical but not in H.A. rack, only a few, short outages acceptable
HAhigh-availability (continuous operation)

...