Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

After alot of many tests we concluded that Planet Lab nodes highly landmarks, and to a lesser extent the PingER landmarks, are unpredictable in nature in terms of availability. If the landmark is not available the TULIP thread has to time out and this can dramatically extend the measurement duration. Thus, ideally, requests for pings should not be sent to such landmarks, i.e. they need to be removed from the list of active landmarks. The purpose of this web page is to indicate how we accomplish this.

For every access by TULIP we log the response for each landmark . With the help of is logged. We analyze this logging information we and generate a list containing of nodes with corresponding success or failure percentages and reasons of their failsuresfailures. These percentages are generated by /afs/slac/package/netmon/tulip/tulip-log-analyze.pl  pl  and the results can be seen here. The idea is that landmarks with a low success rate should be removed from the active list of landmarks. This is done by disabling the entry in the tulip database.

There are two ways in which we can disable the landmarks which are not responding. 

...

2) Automated Process (updated by cronjobtrscrontab job running in pinger@pinger.slac.stanford.edu)

 Manual disable of hosts

This can be done by is accomplished via the tulip data base. In the tulip database  database, the table landmarks has a parameter "enabled" which is used to decide which host landmark is to be added to sites.xml (the list of active landmark sites). This XML file is latter later used by the reflector to query the active landmarks for the results. Sites.xml is generated by cron job a trscronjob so if we change the value of enabled to '0' it would automatically not appear in Sites.xml after it has re-run after through the cronjobtrscronjob. We can also update Sites.xml manually. The process is discussed below.

...

  • Update Sites.xml so that it can now use the updated landmarks using follwing cmd
Code Block
 create_sites-xml.pl > /afs/slac/www/comp/net/wan-mon/tulip/sites.xml

Automated Process (updated by

...

trscronjob)

  tulip-tuning.pl

Now with Using the help of this results from the analysis script we can very well identify the hosts and thier their success percentagepercentages. We opted to disable all the hosts which were having success less than 20%. The above mentioned script is in $tulipdir and it performs the listed functions. It use uses the LWP package to access the webpage, download the file and then parse the output to get the faulty landmarks.

...

After solving the cleaning up process, we landup in need to address another situation and that is weather whether any of those hosts would come back and if yes how we would know. Should we disable them forever ir or should we build up some mechanism to bring them back ? To solve this problem we devised pretty straight forward mechanism i.e. to devise a notification process, which can help us in identifying the landmarks which are up.

...