The IEPM-BW system is controlled by various daemons which are started/restarted by various cron jobs. In addition there are 'servers', and cleanup scripts which are also started and run from the 'iepm' user crontab. The servers and daemons are shut down once a day and restarted  by 'kill-all-servers' and 'restart-all-servers' which also kill and restart the daemons around midnight.

Servers

The servers are the servers which are run on the monitoring system to respond to the probes from the other monitoring hosts.  Each monitoring host has a customization directory called 'config' which must be customized for the monitoring host. Very often the copy of this is the same across all monitoring hosts.  Currently this file 'config/servers.alive' contains the following lines:

bw-iperf-server,bin,-s -p 5000 -w 20M
thrulayd,bin,
pathload_snd,bin, -q -i
map-updated,,
pathchirp_snd,bin,
owampd,bin, -c /afs/slac/package/netmon/bandwidth-tests/v3src/config -Z >& /tmp/owampd.log &

The first column is the name of the server, the second column is the directory it is in, and the third column is the list of parameters that it should be started with.

The servers are kept alive via the script 'keep-servers-alive'.

Daemons

The daemons are actually used to control the scheduling of probes, the probing, and the loading of the results from the probes into the data base. The daemons have a directory in the MySQL data directory '/home/iepm/mysql/keepalives'. They each touch their respective keep-alive file every time they cycle through. The script 'keep-em-alive' checks periodically the time stamps on the daemon keep-alives, and restarts them if they are not running.

The current daemons are:

bw-synchd.alive
load-datad.alive
load-scheduled.alive
owpingd.alive
pathchirpd.alive
pingd.alive
traced.alive

load-scheduled

'load-scheduled' reads the TOOLSPECS table periodically and looks for probes that are due to be run. Each probe has a 'lastrunepoch' field that is updated when that probes is run. 'load-scheduled' reads the 'lastrun'  and the 'runinterval' fields, adds them together to tell if it is time to run another instance of the probe. If it is, it adds the command information to the SCHEDULE table.

load-datad

'load-datad' loops over all the 'load-test-data' scripts and calls them sequentially to load the data in the data base. All the results from the probes go into files in the '/home/iepm/mysql/data' with the name which is the 'scheduleid.probetype'.

bw-synchd

'bw-synchd' is the daemon which runs all probes that MUST be run sequentially and not concurrently. It reads all 'background-syn' probes that are scheduled and loops through them one at a time, running the probe command. The output from the probe is stored in the '/home/iepm/mysql/data'directory.

owpingd, pathchirpd, pingd, and traced

perform the probe type that matched their name. These are all low bandwidth probes which can be run concurrently (testtype 'background').  They fetch from the SCHEDULE table the scdeduled probes that they are responsible for and execute them. The data goes in the '/home/iepm/mysql/data' directory.

Cleanup Scripts

bw-cleanup

From experience I have found that there are various probes and programs which hang for one reason or another. bw-cleanup is called from the crontab to look at a list of known troublesome processes, and if they are around longer than a specified time, it kills them. One example in here is 'gnuplot'. Gnuplot 3.7 had some bugs in it that caused it to loop and hang. There was no way to work around these problems, so I created 'bw-cleanup' and its control file '/home/iepm/v3src/config/cleanup-list' to handle these circumstances.

Currently '/home/iepm/v3src/config/cleanup-list'  contains:

post-test,120
gnuplot,5
triganal,120
runperiod,120
pathload_rcv,3
pathchirp_rcv,3
/bin/ping,3
/bin/owping,3
/bin/tlaytcp,3
/bin/bw-iperf-client,3
whois,3
asn.pl,3

 The first column is the unique name or character string which represents the process. The second colomn is the number of minutes after the start of the process to kill the process. Note that the 'iperf' server is named 'bw-iperf-server' so that one can easily tell the client apart from the server. The client ('bw-iperf-client') hangs, but the server does not. Note '/bin/ping'. Ping can and will hang do to some network problems.

  • No labels