You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 41 Next »

After many tests we concluded that Planet Lab landmarks, and to a lesser extent the PingER landmarks, are unpredictable in terms of availability. The TULIP Landmarks Map identifies the landmarks that are currently in use and those that have been disabled since they are not responding. If the landmark is not available the TULIP thread has to time out and this can dramatically extend the measurement duration and insert unnecessary traffic on the Internet. Thus, ideally, requests for pings should not be sent to such landmarks, i.e. they need to be removed from the list of active landmarks. The purpose of this web page is to indicate how we accomplish this.

For every access by TULIP the response for each landmark is logged. We analyze this logging information and generate a list containing nodes with corresponding success or failure percentages and reasons of their failures. These percentages are generated by /afs/slac/package/pinger/tulip/tulip-log-analyze.pl  and the results can be seen here. The idea is that landmarks with a low success rate should be removed from the active list of landmarks. This is done by disabling the entry in the tulip database.

There are two ways in which we can disable the landmarks which are not responding. 

1) Manual Process

2) Automated Process (updated by trscrontab job running in pinger@pinger.slac.stanford.edu)

 Manually disabling hosts

This is accomplished via the tulip data base. In the tulip database, the table landmarks has a parameter "enabled" which is used to decide which landmark is to be added to sites.xml (the list of active landmark sites). This XML file is later used by the reflector to query the active landmarks for the results. Sites.xml is generated by a trscronjob so if we change the value of enabled to '0' it would automatically not appear in Sites.xml after it has re-run through the trscronjob. We can also update Sites.xml manually. The process is discussed below.

  • Login to tulip database (username and password available in escrow -c iepm iepmacct)
  • Change the database to tulip by cmd
 mysql> use tulip;
  • Now update the value of enabled using the following sql cmd; in this instance we are using ipv4Addr = 141.22.213.35; generally ipv4Addr is the primary key but we can also use hostName as an identifier to disable landmarks
 update landmarks set enabled = '0' where ipv4Addr = '141.22.213.35';
  • Update Sites.xml so that it can now use the updated landmarks using follwing cmd
create_sites-xml.pl > /afs/slac/www/comp/net/wan-mon/tulip/sites.xml

Automated Process (updated by trscronjob)

reflector.cgi

This is run twice nightly by the trscrontab on pinger@pinger.slac.stanford.edu to ping the target www.slac.stanford.edu. The first time is to use the enabled landmarks (it obtains these from the URL http://www.slac.stanford.edu/comp/net/wan-mon/tulip/sites.xml), the second time is to use the disabled landmarks (it obtains these from http://www.slac.stanford.edu/comp/net/wan-mon/tulip/sites-disabled.xml). Reflector.cgi has to run on www-wanmon since that is where the PlanetLab cookie is kept. Since one cannot remotely run a trscronjob on www-wan mon, there is a script (reflector.pl) to execute reflector.cgi twice (with ability=1 and then ability=0) via a wget command. The first time it uses all enabled landmarks (the default or ability=1). The second time it uses the disabled landmarks (or ability=0). Running it regularly ensures the tulip log files are current and therefore the analysis is also current.

tulip-tuning.pl

Using the results from the Tulip log analysis script for the last 3 days we can identify the hosts and their success percentages. We opted to disable all the hosts which were having success less than 20%, and to enable the ones with success rate greater than 20%. The tulip-tuning.pl script is in /afs/slac.stanford.edu/package/pinger/tulip and it performs the listed functions. It uses the LWP package to call reflector.cgi?function=analyze&days=3 to access the Tulip analyzed log data for the last 3 days and downloads the analyzed tulip log file by requesting it from the reflector using http://www-wanmon.slac.stanford.edu/cgi-wrap/reflector.cgi?function=analyze&days=3 and saves it in a file and then parses the output to get: for option ability=1, the faulty landmarks (i.e. the enabled ones with below 20% success rate by default) updating the database to disable such hosts; or option ability=0 the disabled landmarks with a success rate greater than 20% and re-enables them in the database. It (tulip-tuning.pl) is run twice nightly (see the trscrontab) once to disable non working landmarks, once to enable landmarks that are now working again. tulip-tuning.pl is run before the sites.xml or sites-disabled.xml are created by http://www-dev.slac.stanford.edu/cgi-wrap/scriptdoc.pl?name=create_sites-xml.pl.

The output generated by tulip-tuning.pl is placed at /afs/slac.stanford.edu/package/pinger/tulip/tuning_log. This log file, contains blocks of logs for each run untill a month has passed, when it truncates the first entry. Each block starts with a unix time stamp embedded in hyphens, indicating when tulip-tuning.pl ran, and ends with _END_

  • No labels