You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 36 Next »

After many tests we concluded that Planet Lab landmarks, and to a lesser extent the PingER landmarks, are unpredictable in terms of availability. If the landmark is not available the TULIP thread has to time out and this can dramatically extend the measurement duration. Thus, ideally, requests for pings should not be sent to such landmarks, i.e. they need to be removed from the list of active landmarks. The purpose of this web page is to indicate how we accomplish this.

For every access by TULIP the response for each landmark is logged. We analyze this logging information and generate a list containing nodes with corresponding success or failure percentages and reasons of their failures. These percentages are generated by /afs/slac/package/pinger/tulip/tulip-log-analyze.pl  and the results can be seen here. The idea is that landmarks with a low success rate should be removed from the active list of landmarks. This is done by disabling the entry in the tulip database.

There are two ways in which we can disable the landmarks which are not responding. 

1) Manual Process

2) Automated Process (updated by trscrontab job running in pinger@pinger.slac.stanford.edu)

 Manually disabling hosts

This is accomplished via the tulip data base. In the tulip database, the table landmarks has a parameter "enabled" which is used to decide which landmark is to be added to sites.xml (the list of active landmark sites). This XML file is later used by the reflector to query the active landmarks for the results. Sites.xml is generated by a trscronjob so if we change the value of enabled to '0' it would automatically not appear in Sites.xml after it has re-run through the trscronjob. We can also update Sites.xml manually. The process is discussed below.

  • Login to tulip database (username and password available in escrow -c iepm iepmacct)
  • Change the database to tulip by cmd
 mysql> use tulip;
  • Now update the value of enabled using the following sql cmd; in this instance we are using ipv4Addr = 141.22.213.35; generally ipv4Addr is the primary key but we can also use hostName as an identifier to disable landmarks
 update landmarks set enabled = '0' where ipv4Addr = '141.22.213.35';
  • Update Sites.xml so that it can now use the updated landmarks using follwing cmd
create_sites-xml.pl > /afs/slac/www/comp/net/wan-mon/tulip/sites.xml

Automated Process (updated by trscronjob)

reflector.cgi

This is run twice nightly by the trscrontab on pinger@pinger.slac.stanford.edu to ping the target www.slac.stanford.edu. The first time is to use the enabled landmarks, the second time is to use the disabled landmarks. Reflector.cgi has to run on www-wanmon since that is where the PlanetLab cookie is kept. Since one cannot remotely run a trscronjob on www-wan mon, there is a script (reflector.pl) to execute reflector.cgi twice (with ability=1 and then ability=0) via a wget command. The first time it uses all enabled landmarks (the default or ability=1). The second time it uses the disabled landmarks (or ability=0). Running it regularly ensures the tulip log files are current and therefore the analysis is also current.

tulip-tuning.pl

Using the results from the Tulip log analysis script for the last 3 days we can identify the hosts and their success percentages. We opted to disable all the hosts which were having success less than 20%, and to enable the ones with success rate greater than 20%. The tulip-tuning.pl script is in /afs/slac.stanford.edu/package/pinger/tulip and it performs the listed functions. It uses the LWP package to call reflector.cgi?function=log&days=3 to access the Tulip log analysis data for the last 3 days and downloads the analyzed tulip log file by requesting it from the reflector using http://www-wanmon.slac.stanford.edu/cgi-wrap/reflector.cgi?function=analyze&days=3 and saves it in a file and then parses the output to get: for option ability=1, the faulty (landmarks (i.e. the ones below 20% success rate by default) updating the database to disable such hosts; or option ability=0 the landmarks with a success rate greater than 20% and re-enables them in the database. It (tulip-tuning.pl) is run twice nightly (see the trscrontab) once to disable non working landmarks, once  to enable landmarks that are now working again. The runs are before the sites.xml or sites-disabled.xml are created and specify using the tulip log file analyzed for the last 3 days. 

The output generated by tulip-tuning.pl is placed at /afs/slac.stanford.edu/package/pinger/tulip/tuning_log. This log file, contains blocks of logs for each run untill a month has passed, when it truncates the first entry. Each block starts with a unix time stamp embedded in hyphens, indicating when tulip-tuning.pl ran, and ends with _END_.

  • No labels