Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Function/ServiceSub-FunctionsNeeded ServersNeeded DatabasesNeeded File SystemsOther NeedsNeeded During Shutdown?Available During Shutdown?
Mission Planning, LAT ConfigurationsFastCopy

fermilnx01 and
fermilnx02

TCDB

AFS

Fermi LAT Portal: Timeline Webview; Confluence, JIRA, Mission Planning s/w, FastCopy Monitoring

Sharepoint (reference for PROCs and Narrative Procedures for commanding in case of anomalies)

yes 
Real Time Telemetry Monitoring fermilnx01 and fermilnx02  

spread

Fermi LAT Portal: Real Time Telemetry, Telemetry Monitor

during anomalies 
Logging fermilnx01 and fermilnx02TCDB Fermi LAT Portal: Log Watcheryes 
Trending  TCDB Fermi LAT Portal: Telemetry Trendingyes 
L0 File Ingest and ArchiveFastCopy L0 Archive  yes 
Data Gap Checking and ReportingFastCopyfermilnx01 and fermilnx02L0 Archive  yes, continuously 
L1 processingpipelineSLAC FarmData Catalog Fermi LAT Portal: Pipeline, Data Processingyes 
L1 Data Quality Monitoring    Fermi LAT Portal, Telemetry Trending  
L1 deliveryFastCopyfermilnx01 and fermilnx02Data Catalog  yes 
L2 processing (ASP) and DeliveryFastCopyfermilnx01 and fermilnx02Data Catalog Fermi LAT Portal: Pipeline, Data Processingdaily, weekly 


The following table of servers must remain powered up and operational for Fermi Level 1

...

and critical Science Pipelines to function.

Fermi has requested that all VMs be relocated (at least temporarily) to the two H.A. hypervisor machines, thus some of the tasks listed below are no longer relevant.

  •  Confirm current H.A. rack occupants.  spreadsheet from Christian Pama
    Old (2017) spreadsheet here
  •  (thanks Shirley!)  Confirm the VM-master for a given VM.  Use the 'node' command, e.g., $ node -whereis fermilnx-v12 (obsolete)
  •  Confirm the tomcat <-> service associations.  Table here.
  •  Confirm the tomcat-VM associations in this table. Use the 'node' command, e.g., $ node -whereis glast-tomcat01

...

Category†serverVM/servicefunction
XC

fermi-gpfs01

fermi-gpfs02

fermi-gpfs05

fermi-gpfs06

fermi-gpfs07

fermi-gpfs08

 xrootd server and storage
XC/HAfermi-vmclust01/02/03/04fermilnx-v02xrootd redirector
XC/HAfermi-vmclust01/02/03/04fermilnx-v12xrootd redirector
XC

fermi-gpfs03

fermi-gpfs04

GPFSFermi NFS/GPFS storage
XC

fermi-cnfs01

fermi-cnfs02

GPFS/NFS bridgeFermi NFS storage access
HA

staas-gpfs50

staas-gpfs51

 Critical ISOC NFS storage
HAfermilnx01 LAT config, fastcopy and real-time telemetry
HAfermilnx02 LAT config, fastcopy and real-time telemetry
XC/HAfermi-vmclust01/02/03/04fermilnx-v03archiver
HAfermi-oracle03 oracle primary
XCfermi-oracle04 oracle secondary
HA

mysql05

mysql06

mysql-node03calibration, etc. DB
XC400 cores (25 "hequ" equivalents) batch hosts for LISOC
queues={express,short,medium,long,glastdataq}
users={glast,lsstsim,lsstprod,glastmc,glastraw}
XC200 cores
 (12.5 "hequ" equivalents) batch hosts for Science Pipelines
XC/HAfermi-vmclust01/02/03/04fermilnx-v07/tomcat01Commons, Group manager
XC/HAfermi-vmclust01/02/03/04fermilnx-v16/tomcat06rm2
XC/HAfermi-vmclust01/02/03/04fermilnx-v05/tomcat08dataCatalog
XC/HAfermi-vmclust01/02/03/04fermilnx-v17/tomcat09Pipeline-II
XC/HAfermi-vmclust01/02/03/04fermilnx-v15/pipeline-mail01Pipeline-II email server
XC/HAfermi-vmclust01/02/03/04fermilnx-v18/tomcat10FCWebView, ISOCLogging, MPWebView
TelemetryMonitor, TelemetryTableWebUI
XC/HAfermi-vmclust01/02/03/04fermilnx-v10/tomcat11DataProcessing
XC/HAfermi-vmclust01/02/03/04fermilnx-v11/tomcat12TelemetryTrending
NC(non-Fermi server)astore-new (HPSS)FastCopy data archive
**We have arranged a temporary quota increase of 1 TB on /nfs/farm/g/glast/u23, which has allowed this item to become "NC"**
HA(non-Fermi server)trscrontokenized cron
HA(non-Fermi server)lnxcroncron
XC(non-Fermi server)(farm manager, etc.)LSF management
HAyfs01/NN (non-Fermi) basically all of AFS
HA(non-Fermi server)JIRAissue tracking (HA as of 10/20/2017)
    

...

Category
Machine status
NCnon-critical for entire 16-day shutdown period
XCexperiment critical but not in H.A. rack, only a few, short outages acceptable
HAhigh-availability (continuous operation)

 

Total non-HA machines to receive emergency power:



The services for L1:

oracle

...