The IEPM-BW system architecture is based on MySQL database on each individual monitoring host, various daemons and servers, and a crontab.
There are three file 'systems' associated with IEPM-BW Monitoring systems: the source code, the report storage, and the IEPM database which is a MySQL data base. Each monitoring host has these. The target hosts only have the target system source code.
The IEPM data base is used for storage of the monitoring system's configuration and data storage. There are also two short configuration files which reside in /etc/. They are /etc/iepm.cnf which provides absolutely required information to the system, and /etc/my.cnf which is used by MySQL to define the IEPM data base.
The MySQL database contains all the configuration information for the probes that are to be made (node information, probe types, probe options, probe timing, etc.) and the specifications for the various analyses and graphs that are to be generated. It also contains tables where the probe data is stored and some tables to save the results of the various analyses.
The IEPM data base has the following tables: CONFIGURATION TABLE NAMES |
UNIQUE INDEX |
FUNCTION : Contains |
|
NODES |
nodeID |
Details on all target and monitoring host nodes |
|
MONHOSTS |
monhostID |
Additional details for the monitoring hosts |
|
NODESPECS |
nodespecsID |
Architecture information for each target and monitoring host |
|
PLOTSPECS |
plotspecID |
Specifications for various types of plots |
|
SCHEDULE |
scheduleID |
An entry for each scheduled probe run |
|
TOOLSPECS |
toospecsID |
Specifications for each probe for each target host |
|
|
|
|
|
DATA STORAGE TABLE NAMES |
UNIQUE INDEX |
FUNCTION : Contains |
|
ASN |
asnID |
IP address ASN information |
|
ASNINFO |
asninfoID |
ASN details |
|
BWDATA |
bwdataID |
Bandwidth probe results |
|
OWAMPDATA |
owampdataID |
Owamp probe results |
|
PINGDATA |
pingdataID |
Ping probe results |
|
ROUTEDATA |
routedataID |
Traceroute results |
|
ROUTENO |
routenoID |
Unique route number for traceroutes |
|
|
|
|
|
ANALYSIS RESULTS TABLE NAMES |
UNIQUE INDEX |
FUNCTION : Contains |
|
ALERTS |
commentsID |
Details on alerts detected by the analysis |
|
GRAPHS |
graphID |
Graphs from the analysis |
|
NEWALERTS |
alertID |
Details on alerts in another format |
|
COMMENTS |
commentsID |
Comment storage |
|
Optional Analysis Table Names (Experimental, Currently NOT used) |
UNIQUE INDEX |
FUNCTION : Contains |
|
HWALERTS |
|
Holt-winters alerts |
|
HWGRAPHS |
|
Holt-winters graphs |
|
REGDATA |
regdataID |
Regularized data for Hold-winters |
|
|
|
|
|
IEPM-BW Daemons, Servers, and Cleanup Scripts
Servers
The servers are the servers which are run on the monitoring system to respond to the probes from the other monitoring hosts. Each monitoring host has a customization directory called 'config' which must be customized for the monitoring host. Very often the copy of this is the same across all monitoring hosts. Currently this file 'config/servers.alive' contains the following lines:
bw-iperf-server,bin,-s -p 5000 -w 20M thrulayd,bin, pathload_snd,bin, -q -i map-updated,, pathchirp_snd,bin, owampd,bin, -c /afs/slac/package/netmon/bandwidth-tests/v3src/config -Z >& /tmp/owampd.log &
The first column is the name of the server, the second column is the directory it is in, and the third column is the list of parameters that it should be started with.
The servers are kept alive via the script 'keep-servers-alive'.
Daemons
The daemons are actually used to control the scheduling of probes, the probing, and the loading of the results from the probes into the data base. The daemons have a directory in the MySQL data directory '/home/iepm/mysql/keepalives'. They each touch their respective keep-alive file every time they cycle through. The script 'keep-em-alive' checks periodically the time stamps on the daemon keep-alives, and restarts them if they are not running.
The current daemons are:
bw-synchd.alive load-datad.alive load-scheduled.alive owpingd.alive pathchirpd.alive pingd.alive traced.alive
load-scheduled
'load-scheduled' reads the TOOLSPECS table periodically and looks for probes that are due to be run. Each probe has a 'lastrunepoch' field that is updated when that probes is run. 'load-scheduled' reads the 'lastrun' and the 'runinterval' fields, adds them together to tell if it is time to run another instance of the probe. If it is, it adds the command information to the SCHEDULE table.
load-datad
'load-datad' loops over all the 'load-test-data' scripts and calls them sequentially to load the data in the data base. All the results from the probes go into files in the '/home/iepm/mysql/data' with the name which is the 'scheduleid.probetype'.
bw-synchd
'bw-synchd' is the daemon which runs all probes that MUST be run sequentially and not concurrently. It reads all 'background-syn' probes that are scheduled and loops through them one at a time, running the probe command. The output from the probe is stored in the '/home/iepm/mysql/data'directory.
owpingd, pathchirpd, pingd, and traced
perform the probe type that matched their name. These are all low bandwidth probes which can be run concurrently (testtype 'background'). They fetch from the SCHEDULE table the scdeduled probes that they are responsible for and execute them. The data goes in the '/home/iepm/mysql/data' directory.
Cleanup Scripts
bw-cleanup
From experience I have found that there are various probes and programs which hang for one reason or another. bw-cleanup is called from the crontab to look at a list of known troublesome processes, and if they are around longer than a specified time, it kills them. One example in here is 'gnuplot'. Gnuplot 3.7 had some bugs in it that caused it to loop and hang. There was no way to work around these problems, so I created 'bw-cleanup' and its control file '/home/iepm/v3src/config/cleanup-list' to handle these circumstances.
Currently '/home/iepm/v3src/config/cleanup-list' contains:
post-test,120 gnuplot,5 triganal,120 runperiod,120 pathload_rcv,3 pathchirp_rcv,3 /bin/ping,3 /bin/owping,3 /bin/tlaytcp,3 /bin/bw-iperf-client,3 whois,3 asn.pl,3
The first column is the unique name or character string which represents the process. The second colomn is the number of minutes after the start of the process to kill the process. Note that the 'iperf' server is named 'bw-iperf-server' so that one can easily tell the client apart from the server. The client ('bw-iperf-client') hangs, but the server does not. Note '/bin/ping'. Ping can and will hang do to some network problems.
The Crontab
Once the system has been fully configured and tested, it is time to start it up. To do this, simply load the crontab under the 'iepm' account. The following is the crontab template which is used for most systems.
#kill all servers to clean up any hung ones and to rotate the logs 5 0 * * * /home/iepm/v3src/kill-all-servers >> /home/iepm/mysql/logs/kill-all-servers 2>&1 # restart the servers 10 0 * * * /home/iepm/v3src/restart-all-servers >> /home/iepm/mysql/logs/restart-all-servers.today 2>&1 # copy and date the logs for the day - after shutting down servers and before restarting them 6 0 * * * /home/iepm/v3src/copylogs /home/iepm/mysql/logs > /tmp/iepmbw-logcopy 2>&1 # # # back up the data base 15 0 * * * /home/iepm/v3src/backup-iepm-mysql-database /home/iepm/public_html/mysql-backup >> /home/iepm/mysql/logs/backup-iepm-mysql-database.today 2>&1 # # rotate and date the backups 0 3 * * * /home/iepm/v3src/copylogs /home/iepm/public_html/mysql-backup >> /home/iepm/mysql/logs/mysql-backup.today 2>&1 # # # run keepalive check - keeps the daemons alive by checking /home/iepm/mysql/keepalives 5,15,25,35,45,55 * * * * /home/iepm/v3src/keep-em-alive >> /home/iepm/mysql/logs/keep-em-alive.today 2>&1 # # run keep server alive check 1,11,21,31,41,51 * * * * /home/iepm/v3src/keep-servers-alive >> /home/iepm/mysql/logs/keep-servers-alive.today 2>&1 # # cleanup hung clients and other processes 3,13,23,33,43,53 * * * * /home/iepm/v3src/bw-cleanup >> /home/iepm/mysql/logs/bw-cleanup.today 2>&1 # # # run the analyses 23 1,3,5,7,9,11,13,15,17,19,21,23 * * * /home/iepm/v3src/post-test-processing-script -g 1 >> /home/iepm/mysql/logs/post-test-processing-script.today 2>&1 # # run the overnight analysis 15 3 * * * /home/iepm/v3src/overnight-processing-script > /home/iepm/mysql/logs/overnight-processing-script.today 2>&1 # # run the trace analysis 10 * * * * /home/iepm/v3src/traceanal/traceanal -d today -i 0 >> /home/iepm/mysql/logs/traceanal.today 2>&1 # #Run the bandwidth change analysis code 5 0,2,4,6,8,10,12,14,16,18,20,22 * * * /home/iepm/v3src/alerts/analyze-for-alerts -t "iperf,pathchirp,pathload,miperf,tlaytcp" -p "iperf,pathchirp,thrumin,miperf,tlaytcp" >> /home/iepm/mysql/logs/analyze-for-alerts.today 2>&1 # # run historical alerts web page 15 0,2,4,6,8,10,12,14,16,18,20,22 * * * /home/iepm/v3src/report-alerts >> /home/iepm/mysql/logs/report-alerts.today 2>1 #
Note that each crontab entry has a log file that the results are written to. If you need to put in debug statements, make sure they write to STDERR and they will end up in the log file. The log file for 'today' is always 'scriptname.today'. These are rotated daily at 00:06 by the 'copylogs' script. Note that 'copylogs' is written to /tmp so it does not try to rotate itself.