Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

History

This page is a partial rewrite of the following pages to better help understand what hosts are running and where they are running.

December 2019 Central Computing Outage (Fermi)

Server and applications migration from glastlnx to fermilnx boxes

Notes on VMWare Cluster

The VMWare cluster, noted as fermi-vmclust, consists of 4 hypervisors. Two of the machines are on HA, and two are not. A VM may migrate between these two hypervisors on demand- the machines have 128GB of memory and two hypervisors are enough to run all the VMs without oversubscription of memory.

The VMWare is cluster managed by OCIO Platforms team, and they can migrate VMs on demand.

fermi-vmclust01 and fermi-vmclust02 are not HA.

fermi-vmclust03 and fermivmclust04 are HA.

Note: OCIO should be contacted about setting up occasional VM snapshots, so that a VM may always be brought up even if a hypervisor dies.

...

Hosts and Services

Function
VM or ServiceCategory†serverfunction
Host/Service
Sub-FunctionsNeeded ServersNeeded DatabasesNeeded File SystemsOther NeedsNeeded During Shutdown?Available During Shutdown?Mission Planning, LAT ConfigurationsFastCopy

fermilnx01 and
fermilnx02

TCDB

AFS

Fermi LAT Portal: Timeline Webview; Confluence, JIRA, Mission Planning s/w, FastCopy Monitoring

Sharepoint (reference for PROCs and Narrative Procedures for commanding in case of anomalies)

yesReal Time Telemetry Monitoringfermilnx01 and fermilnx02

spread

Fermi LAT Portal: Real Time Telemetry, Telemetry Monitor

during anomaliesLoggingfermilnx01 and fermilnx02TCDBFermi LAT Portal: Log WatcheryesTrendingTCDBFermi LAT Portal: Telemetry TrendingyesL0 File Ingest and ArchiveFastCopyL0 ArchiveyesData Gap Checking and ReportingFastCopyfermilnx01 and fermilnx02L0 Archiveyes, continuouslyL1 processingpipelineSLAC FarmData CatalogFermi LAT Portal: Pipeline, Data ProcessingyesL1 Data Quality MonitoringFermi LAT Portal, Telemetry TrendingL1 deliveryFastCopyfermilnx01 and fermilnx02Data CatalogyesL2 processing (ASP) and DeliveryFastCopyfermilnx01 and fermilnx02Data CatalogFermi LAT Portal: Pipeline, Data Processingdaily, weekly
Category†Physical Server(s)OSFunction
xrootdXC

fermi-gpfs01

fermi-gpfs02

fermi-gpfs05

fermi-gpfs06

fermi-gpfs07

fermi-gpfs08


xrootd server and storage
fermilnx-v12XC/HAfermi-vmclustxrootd redirector
GPFSXC

fermi-gpfs03

fermi-gpfs04


Fermi NFS/GPFS storage
GPFS/NFS bridgeXC

fermi-cnfs01

fermi-cnfs02


Fermi NFS storage access
NFS (ISOC)HA

staas-gpfs50

staas-gpfs51


Critical ISOC NFS storage
OracleHAfermi-oracle03
Oracle (primary)
OracleXCfermi-oracle04
Oracle (failover)
Oracle
sca-oracle02
Oracle (dev server)
mysql-node03HA

mysql05

mysql06


calibration, etc. DB
fermilnx01HAfermi-vmclust
LAT config, fastcopy and real-time telemetry
fermilnx02HA
fermilnx02
fermi-vmclust
LAT config, fastcopy and real-time telemetry
fermilnx-v02XC/HAfermi-vmclust
xrootd redirector
fermilnx-v03XC/HAfermi-vmclust
archiver
fermilnx-v04


DataCatalog Crawler (Prod)
fermilnx-v05/tomcat08XC/HAfermi-
oracle03
vmclust
oracle primary

DataCatalog Web
fermilnx-v06
XC

fermi-
oracle04oracle secondarymysql-node03HA

mysql05

mysql06

calibration, etc. DBXC400 cores(50 "hequ" equivalents) batch hosts for LISOC
queues={express,short,medium,long,glastdataq}
users={glast,lsstsim,lsstprod,glastmc,glastraw}
vmclust
Xroot proxy server
fermilnx-v07/tomcat01XC/HAfermi-vmclust
Commons, Group manager

fermilnx-v08/tomcat02

glast-jobcontrol01


fermi-vmclust

LSF Job Control Daemons (Notably glast, glastraw)

Note: No fermilnx-v09



fermilnx-v10/tomcat11
XC200 cores
(25 "hequ" equivalents) batch hosts for Science Pipelinesfermilnx-v02
XC/HAfermi-vmclust
[1]
xrootd redirector

DataProcessing
fermilnx-
v07
v11/
tomcat01
tomcat12XC/HAfermi-vmclust
Commons, Group manager

TelemetryTrending
fermilnx-
v16/tomcat06
v12XC/HAfermi-vmclust
rm2

xrootd redirector
fermilnx-
v05/tomcat08XC/HA
v13/tomcat05


Pipeline-II (Prod)
fermilnx-v14/centaurusa
fermi-vmclust
dataCatalog

This machine requires user login. This machine is used as a Fermi CVS server, and a subversion server for a variety of user groups. svn functionality should move elsewhere
fermilnx-
v17
v15/
tomcat09
pipeline-mail01XC/HAfermi-vmclust
Pipeline-II email server (james)
fermilnx-
v15/pipeline-mail01
v16/tomcat06XC/HAfermi-vmclust
rm2
fermilnx-v17/tomcat09XC/HAfermi-vmclust
Pipeline-II
email server
(Web)
fermilnx-v18/tomcat10XC/HAfermi-vmclust
FCWebView, ISOCLogging, MPWebView
TelemetryMonitor, TelemetryTableWebUI
fermilnx-
v10/tomcat11XC/HA
v19/tomcat04
fermi-vmclust

elog

fermilnx7-v01
fermi-vmclustcentos7Docker installed - must be added to proper group.
fermilnx7-v02
fermi-vmclustcentos7Docker installed - must be added to proper group.
fermilnx7-v03
fermi-vmclust
DataProcessingfermilnx-v11/tomcat12XC/HAfermi-vmclustTelemetryTrending
centos7Docker installed - must be added to proper group.
fermilnx-v22
fermi-vmclustcentos7SDF pipeline daemon
fermi-ci-test01
fermi-vmclustcentos7

Docker is installed. Runs a Jenkins worker. Jenkins can dispatch GlastRelease jobs to this node.

Note: This host should likely be renamed and/or removed. If so, the Jenkins should run on a fermilnx7 host.

sca-resty01
fermi-vmclustcentos7

Shared nginx server - sca-resty01. See Also

Server Locations and Functions

SCA NGINX Configuration

https://github.com/slaclab/sca-resty (requires authorization)

sca-nginx01/02
fermilnx12/13-vmm

Shared nginx server. This passes through all traffic ALL traffic to sca-resty01.

See Also:

SCA NGINX Configuration






astore-new (HPSS)NC(non-Fermi server)
FastCopy data archive
**We have been granted a temporary quota increase of 1 TB on /nfs/farm/g/glast/u23, which has allowed this item to become "NC"**
trscronHA(non-Fermi server)
tokenized cron
lnxcronHA(non-Fermi server)
cron
(farm manager, etc.)XC(non-Fermi server)
LSF management
AFSHAyfs01/NN (non-Fermi)
basically all of AFS
JIRAHA(non-Fermi server)
issue tracking (HA as of 10/20/2017
)XCrhel6-64public login nodes (a small number is needed for interactive access
)

[1] fermi-vmclust is the VMWare cluster.



† Equipment categories

Category
Machine status
NCnon-critical for entire 16-day shutdown period
XCexperiment critical but not in H.A. rack, only a few, short outages acceptable
HAhigh-availability (continuous operation)


Table of LISOC Tasks and Services

Function/ServiceSub-FunctionsNeeded ServersNeeded DatabasesNeeded File SystemsOther NeedsNeeded During Shutdown?Available During Shutdown?
Mission Planning, LAT ConfigurationsFastCopy

fermilnx01 and
fermilnx02

TCDB

AFS

Fermi LAT Portal: Timeline Webview; Confluence, JIRA, Mission Planning s/w, FastCopy Monitoring

Sharepoint (reference for PROCs and Narrative Procedures for commanding in case of anomalies)

yes
Real Time Telemetry Monitoring
fermilnx01 and fermilnx02

spread

Fermi LAT Portal: Real Time Telemetry, Telemetry Monitor

during anomalies
Logging
fermilnx01 and fermilnx02TCDB
Fermi LAT Portal: Log Watcheryes
Trending

TCDB
Fermi LAT Portal: Telemetry Trendingyes
L0 File Ingest and ArchiveFastCopy
L0 Archive

yes
Data Gap Checking and ReportingFastCopyfermilnx01 and fermilnx02L0 Archive

yes, continuously
L1 processingpipelineSLAC FarmData Catalog
Fermi LAT Portal: Pipeline, Data Processingyes
L1 Data Quality Monitoring



Fermi LAT Portal, Telemetry Trending

L1 deliveryFastCopyfermilnx01 and fermilnx02Data Catalog

yes
L2 processing (ASP) and DeliveryFastCopyfermilnx01 and fermilnx02Data Catalog
Fermi LAT Portal: Pipeline, Data Processingdaily, weekly