You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Current »

The IEPM-BW system architecture is based on  MySQL database on each individual monitoring host, various daemons and servers, and a crontab.

There are three file 'systems' associated with IEPM-BW Monitoring systems: the source code, the report storage, and the IEPM database which is a MySQL data base. Each monitoring host has these. The target hosts only have the target system source code.

The IEPM data base is used for storage of the monitoring system's configuration and data storage. There are also two short configuration files which reside in /etc/. They are /etc/iepm.cnf which provides absolutely required information to the system, and /etc/my.cnf which is used by MySQL to define the IEPM data base.

The MySQL database contains all the configuration information for the probes that are to be made (node information, probe types, probe options, probe timing, etc.) and the specifications for the various analyses and graphs that are to be generated. It also contains tables where the probe data is stored and some tables to save the results of the various analyses.

The IEPM data base has the following tables: 

CONFIGURATION  TABLE NAMES

UNIQUE INDEX

FUNCTION : Contains

NODES

nodeID

Details on all target and monitoring host nodes

MONHOSTS

monhostID

Additional details for the monitoring hosts

NODESPECS

nodespecsID

Architecture information for each target and monitoring host

PLOTSPECS

plotspecID

Specifications for various types of plots

SCHEDULE

scheduleID

An entry for each scheduled probe run

TOOLSPECS

toospecsID

Specifications for each probe for each target host

 

 

 

DATA STORAGE TABLE NAMES

UNIQUE INDEX

FUNCTION : Contains

ASN

asnID

IP address ASN information

ASNINFO

asninfoID

ASN  details

BWDATA

bwdataID

Bandwidth probe results

OWAMPDATA

owampdataID

Owamp probe results

PINGDATA

pingdataID

Ping probe results

ROUTEDATA

routedataID

Traceroute results

ROUTENO

routenoID

Unique route number for traceroutes

 

 

 

ANALYSIS RESULTS TABLE NAMES

UNIQUE INDEX

FUNCTION : Contains

ALERTS

commentsID

Details on alerts detected by the analysis

GRAPHS

graphID

Graphs from the analysis

NEWALERTS

alertID

Details on alerts in another format

COMMENTS

commentsID

Comment storage

Optional Analysis Table Names (Experimental, Currently NOT used)

UNIQUE INDEX

FUNCTION : Contains

HWALERTS

 

Holt-winters alerts

HWGRAPHS

 

Holt-winters graphs

REGDATA

regdataID

Regularized data for Hold-winters

 

 

 

 

IEPM-BW Daemons, Servers, and Cleanup Scripts

The IEPM-BW system is controlled by various daemons which are started/restarted by various cron jobs. In addition there are 'servers', and cleanup scripts which are also started and run from the 'iepm' user crontab. The servers and daemons are shut down once a day and restarted  by 'kill-all-servers' and 'restart-all-servers' which also kill and restart the daemons around midnight.

Servers

The servers are the servers which are run on the monitoring system to respond to the probes from the other monitoring hosts.  Each monitoring host has a customization directory called 'config' which must be customized for the monitoring host. Very often the copy of this is the same across all monitoring hosts.  Currently this file 'config/servers.alive' contains the following lines:

bw-iperf-server,bin,-s -p 5000 -w 20M
thrulayd,bin,
pathload_snd,bin, -q -i
map-updated,,
pathchirp_snd,bin,
owampd,bin, -c /afs/slac/package/netmon/bandwidth-tests/v3src/config -Z >& /tmp/owampd.log &

The first column is the name of the server, the second column is the directory it is in, and the third column is the list of parameters that it should be started with.

The servers are kept alive via the script 'keep-servers-alive'.

Daemons

The daemons are actually used to control the scheduling of probes, the probing, and the loading of the results from the probes into the data base. The daemons have a directory in the MySQL data directory '/home/iepm/mysql/keepalives'. They each touch their respective keep-alive file every time they cycle through. The script 'keep-em-alive' checks periodically the time stamps on the daemon keep-alives, and restarts them if they are not running.

The current daemons are:

bw-synchd.alive
load-datad.alive
load-scheduled.alive
owpingd.alive
pathchirpd.alive
pingd.alive
traced.alive

load-scheduled

'load-scheduled' reads the TOOLSPECS table periodically and looks for probes that are due to be run. Each probe has a 'lastrunepoch' field that is updated when that probes is run. 'load-scheduled' reads the 'lastrun'  and the 'runinterval' fields, adds them together to tell if it is time to run another instance of the probe. If it is, it adds the command information to the SCHEDULE table.

load-datad

'load-datad' loops over all the 'load-test-data' scripts and calls them sequentially to load the data in the data base. All the results from the probes go into files in the '/home/iepm/mysql/data' with the name which is the 'scheduleid.probetype'.

bw-synchd

'bw-synchd' is the daemon which runs all probes that MUST be run sequentially and not concurrently. It reads all 'background-syn' probes that are scheduled and loops through them one at a time, running the probe command. The output from the probe is stored in the '/home/iepm/mysql/data'directory.

owpingd, pathchirpd, pingd, and traced

perform the probe type that matched their name. These are all low bandwidth probes which can be run concurrently (testtype 'background').  They fetch from the SCHEDULE table the scdeduled probes that they are responsible for and execute them. The data goes in the '/home/iepm/mysql/data' directory.

Cleanup Scripts

bw-cleanup

From experience I have found that there are various probes and programs which hang for one reason or another. bw-cleanup is called from the crontab to look at a list of known troublesome processes, and if they are around longer than a specified time, it kills them. One example in here is 'gnuplot'. Gnuplot 3.7 had some bugs in it that caused it to loop and hang. There was no way to work around these problems, so I created 'bw-cleanup' and its control file '/home/iepm/v3src/config/cleanup-list' to handle these circumstances.

Currently '/home/iepm/v3src/config/cleanup-list'  contains:

post-test,120
gnuplot,5
triganal,120
runperiod,120
pathload_rcv,3
pathchirp_rcv,3
/bin/ping,3
/bin/owping,3
/bin/tlaytcp,3
/bin/bw-iperf-client,3
whois,3
asn.pl,3

 The first column is the unique name or character string which represents the process. The second colomn is the number of minutes after the start of the process to kill the process. Note that the 'iperf' server is named 'bw-iperf-server' so that one can easily tell the client apart from the server. The client ('bw-iperf-client') hangs, but the server does not. Note '/bin/ping'. Ping can and will hang do to some network problems.

The Crontab

Once the system has been fully configured and tested, it is time to start it up. To do this, simply load the crontab under the 'iepm' account. The following is the crontab template which is used for most systems.

#kill all servers to clean up any hung ones and to rotate the logs
5 0 * * * /home/iepm/v3src/kill-all-servers >> /home/iepm/mysql/logs/kill-all-servers 2>&1
# restart the servers
10 0 * * * /home/iepm/v3src/restart-all-servers >> /home/iepm/mysql/logs/restart-all-servers.today 2>&1
# copy and date the logs for the day - after shutting down servers and before restarting them
6 0 * * * /home/iepm/v3src/copylogs /home/iepm/mysql/logs > /tmp/iepmbw-logcopy 2>&1
#
#
# back up the data base
15 0 * * * /home/iepm/v3src/backup-iepm-mysql-database /home/iepm/public_html/mysql-backup >> /home/iepm/mysql/logs/backup-iepm-mysql-database.today 2>&1
#
# rotate and date the backups
0 3 * * * /home/iepm/v3src/copylogs /home/iepm/public_html/mysql-backup >> /home/iepm/mysql/logs/mysql-backup.today 2>&1
#
#
# run keepalive check - keeps the daemons alive by checking /home/iepm/mysql/keepalives
5,15,25,35,45,55 * * * * /home/iepm/v3src/keep-em-alive >> /home/iepm/mysql/logs/keep-em-alive.today 2>&1
#
# run keep server alive check
1,11,21,31,41,51 * * * * /home/iepm/v3src/keep-servers-alive >> /home/iepm/mysql/logs/keep-servers-alive.today 2>&1
#
# cleanup hung clients and other processes
3,13,23,33,43,53 * * * * /home/iepm/v3src/bw-cleanup >> /home/iepm/mysql/logs/bw-cleanup.today 2>&1
#
#
# run the analyses
23 1,3,5,7,9,11,13,15,17,19,21,23 * * * /home/iepm/v3src/post-test-processing-script -g 1 >> /home/iepm/mysql/logs/post-test-processing-script.today 2>&1
#
# run the overnight analysis
15 3 * * * /home/iepm/v3src/overnight-processing-script > /home/iepm/mysql/logs/overnight-processing-script.today 2>&1
#
# run the trace analysis
10 * * * * /home/iepm/v3src/traceanal/traceanal -d today -i 0 >> /home/iepm/mysql/logs/traceanal.today 2>&1
#
#Run the bandwidth change analysis code
5 0,2,4,6,8,10,12,14,16,18,20,22 * * * /home/iepm/v3src/alerts/analyze-for-alerts -t "iperf,pathchirp,pathload,miperf,tlaytcp" -p "iperf,pathchirp,thrumin,miperf,tlaytcp" >> /home/iepm/mysql/logs/analyze-for-alerts.today 2>&1
#
# run historical alerts web page
15 0,2,4,6,8,10,12,14,16,18,20,22 * * * /home/iepm/v3src/report-alerts >> /home/iepm/mysql/logs/report-alerts.today 2>1
#

Note that each crontab entry has a log file that the results are written to. If you need to put in debug statements, make sure they write to STDERR and they will end up in the log file. The log file for 'today' is always 'scriptname.today'. These are rotated daily at 00:06 by the 'copylogs' script. Note that 'copylogs' is written to /tmp so it does not try to rotate itself.

Since everything is started up by the crontab, when the system reboots it automatically restarts itself. You must make sure that 'http' and 'MySQL' daemons are autostarted by the operating system upon reboot.

  • No labels