On Februrary 14, 2012 Fermi nagios notification system was migrated over to the SCS nagios system. Access privileges are required to use the commands in this document.
This documentation is intended to apprise someone, new to Nagios, about how to work within SCS Nagios. This documentation is intended to be used only as a starting guide; therefore, this documentation is not extensive and focuses on only the features of Nagios available to Nagios administrators for Fermi. This documentation will outline the general procedure for completing a Nagios related task, such as
Nagios has extensive documentation available online (we are using Nagios 3.2.3) Nagios Core Version 3.x Documentation and Nagios Documentation (Library, Guides and Links)
...
Nagios is a network, and system, monitoring application. Essentially, Nagios watches specified hosts, and services, then alerts you when something goes wrong. Slac nagios is currently using Nagios core 3.2.3
...
The Fermi Gamma-ray Telescope mission relies heavily on computers and network services for tasks, such as Science Analysis Software development, ground processing and data handling, data storage, etc.. The Fermi team heavily depends on machines, and services, at SLAC, Goddard Space Flight Center, and other participating institutions. Therefore, team-members need to immediately know whenever equipment, services, or hosts, have any problems. Nagios 3.x is the open source host, and service, monitoring software that we are using to monitor hosts and services, and alert team-members about specified host and service events.
...
...
remctl (the client) and remctld (the server) implement a client/server protocol for running single commands on a remote host using Kerberos v5 authentication and returning the output. They use a very simple GSS-API-authenticated network protocol, combined with server-side ACL support and a server configuration file that maps remctl commands to programs that should be run when that command is called by an authorized user.
...
As a nagios administrator for Fermi you have basically three commands you can send to nagios using remctl (the port is 4373 for now).
Commands are:
Basic formats for host or service are (*fill in the uppercase words with desired command, host or service and comment. You may shorten the nagios02.slac.stanford.edu address to nagios02) |
---|
remctl -p 4373 nagios02.slac.stanford.edu nagios COMMAND host HOSTNAME COMMENT |
remctl -p 4373 nagios02.slac.stanford.edu nagios COMMAND host HOSTNAME HOURS COMMENT |
remctl -p 4373 nagios02.slac.stanford.edu nagios COMMAND host HOSTNAME MINUTES COMMENT |
remctl -p 43737 nagios02.slac.stanford.edu nagios COMMAND service HOSTNAME SERVICENAME COMMENT |
remctl -p 4373 nagios02.slac.stanford.edu nagios COMMAND service HOSTNAME SERVICENAME HOURS COMMENT |
remctl -p 4373 nagios02.slac.stanford.edu nagios COMMAND service HOSTNAME SERVICENAME MINUTES COMMENT |
You can also issue the remctl command without the port number |
Examples |
---|
remctl -p 4373 nagios02 nagios ack host wain007 'there is a problem, we are working on it' |
remctl -p 4373 nagios02 nagios ack service wain007 xroot-wain007 'ack - RT 9999' |
remctl -p 4373 nagios02.slac.stanford.edu nagios downtime host wain007 5 'down for 5 hours' |
remctl -p 4373 nagios02.slac.stanford.edu nagios downtime service sulky46 u02-diskspace 5 'service down for 5 hrs' |
remctl -p 4373 nagios02.slac.stanford.edu nagios schedule host sulky46 5 'run checks in 5 mins' |
remctl -p 4373 nagios02.slac.stanford.edu nagios schedule service sulky46 u02-diskspace 5 'check disk u02 in 5 mins' |
remctl -p 4373 nagios02 nagios help |
remctl -p 4373 nagios02 nagios man |
...
All the configuration files for Scientific Computing reside under /etc/nagios/sca/ on nagios02.slac.stanford.edu. Within this directory are a number of subdirectories on a per-monitoring-type basis. These are:
Within these directories are at least two files, a host.cfg (list of hosts and host groups) and a service.cfg (listing services and service groups). Inheritance is heavily used in order to avoid writing the same service checks or host directives repeatedly for each individual host. A generic template of basic services (e.g. ping, ssh) is inherited by all host files first and then the specific service directives are added on top of that. The service definitions can be found under /etc/nagios/conf.d/sca/fermi/www
To modify the sidebar you must check out the code from svn. The command is
svn checkout file://localhost/afs/slac.stanford.edu/g/scs/svn/systems/nagios-fermi-web
The sidebar is a temporary solution so SCS can work on groundworks.