The main purpose of TULIP Central Reflector is to proxy the TULIP queries to PlanetLab's Scriptroute Service. It may also be extended to issue all queries. This decision will be related to speed of execution and security among other things. The PlanetLab Scriptroute service provides a cookie which works for a single IP address only. So in this way all the requests will be issued from the Central reflector and the responses will be sent back to the TULIP JNLP Client. Here is a map of Planet Lab servers.
The TULIP Central reflector will be a CGI Script (reflector.cgi) deployed at SLAC. The TULIP client will issue a single request and the Reflector will go ahead and probe all the landmarks in that region*\[1\] and return the results to the TULIP Client. Probing the target site from more vantage points may give us a better estimate of its location. |
After discussing with Yee and Booker it was seen that forks may be too complicated. The version of Perl at SLAC did not support threading. Also the security people will not allow forks running inside a CGI-script. So I had to come up with an alternative. The solution to this problem was to use Asynchronous IO. A bunch of requests could be send to the landmarks without waiting for he response. The LWP::Parallel library provides all this functionality. It supports asynchronous IO. Currently it is not installed so I am using a local version in my home directory. Ultimately this module has to be installed on the production server.
I have implemented most of the functionality. The script is running fine. I will have to taken measures to make the script more secure so that it could not be used as a platform to launch DDOS attacks, by limiting the number of concurrent process of reflector.cgi to 10. Also the script produces customized messages such as ( request time out or Connection failed so that TULIP client can differentiate between the various kind of error conditions). Also there is a blacklisting mechanism so that particular IP addresses can be blocked.
There are two scripts: reflector.cgi and EventHandler.pm. Both have -T (Tainting), warning (-w), use strict, use the 3 parameter version of open, all opens and closes have a die or its equivalent. EventHandler.pm is called by reflector.cgi. The Scripts are deployed in the path /afs/slac.stanford.edu/g/www/cgi-wrap-bin/net/shahryar/smokeping/.
The reflector is called by a URL of the form:
http://www-wanmon.slac.stanford.edu/cgi-wrap/reflector.cgi?region=northamerica&target=134.79.16.9 |
The responses appear as:
Landmark=http://128.6.192.158, Client=134.79.117.29, failed to connect response code 500
Landmark=http://141.149.218.208, Client=134.79.117.29, 10 packets transmitted, 0 received, 100% packet loss, rtt min/avg/max = 0/0/0
Landmark=http://128.193.33.7, Client=134.79.117.29, 10 packets transmitted, 10 received, 0% packet loss, rtt min/avg/max = 29.178/29.2495/29.316
Failed to connect to http://129.22.150.90 response code 500
PlanetLab Server Error: ERROR: you're (134.79.18.134) already running a measurement on socket 14. http://128.83.122.179
10 packets transmitted, 0 received, 100% packet loss, time 0 ms rtt min/avg/max = 0/0/0 http://141.149.218.208
Can't resolve DNS: submitted:6:in `ip_dst=': unable to resolve $target: running in a chroot without dns support (RuntimeError)
PlanetLab Server Error: submitted:9: warning: didn't see packet 5 leave: pcap overloaded or server bound to incorrect interface?
To 134.79.16.9 timed out
In addition to the normal web server (Apache) logging, we use Log4perl for logging. The following types of error messages can be found in the log file.
2007/09/03 20:02:25 ERROR> EventHandler.pm:70 EventHandler::on_failure - Landmark=http://128.6.192.158, Client=134.79.117.29, failed to connect response code 500<BR>
2007/09/03 20:02:34 ERROR> EventHandler.pm:142 EventHandler::parseData - Landmark=http://129.22.150.90, Client=134.79.117.29, 10 packets transmitted, 0 received, 100% packet loss, rtt min/avg/max = 0/0/0:
2007/09/03 20:09:09 ERROR> EventHandler.pm:115 EventHandler::parseData - Landmark=http://128.143.137.250, Client=134.79.117.29, request timed out: To 134.79.16.9 timed out
2007/09/03 20:02:58 ERROR> EventHandler.pm:125 EventHandler::parseData - Landmark=http://128.4.36.11, Client=134.79.117.29, PlanetLab Server Error: submitted:9: warning: didn't see packet 5 leave: pcap overloaded or server bound to incorrect interface?
There is a script at ~cottrell/bin/tulip-analyze-log.pl to aggregate the errors. Typical output appears as:
28cottrell@wanmon:~>bin/tulip-log-analyze.pl tulip-log-analyze.pl: Mon Sep 3 22:16:47 2007=1188883007 100%_loss,128.111.52.61=1 100%_loss,128.227.56.81=4 100%_loss,129.22.150.90=4 100%_loss,129.24.211.25=5 100%_loss,131.247.2.242=1 100%_loss,141.149.218.208=3 100%_loss,152.14.92.58=1 100%_loss,155.225.2.72=4 100%_loss,205.189.33.178=1 100%_loss,206.207.248.34=2 100%_loss,208.117.131.115=2 100%_loss,63.64.153.84=1 100%_loss,65.241.38.58=3 100%_loss,75.130.96.12=4 failed_to_connect,128.192.101.217=5 failed_to_connect,128.238.88.64=5 failed_to_connect,128.6.192.158=5 failed_to_connect,129.105.44.252=5 failed_to_connect,141.149.218.208=2 failed_to_connect,143.215.129.115=5 failed_to_connect,169.229.50.16=5 failed_to_connect,192.197.121.3=5 failed_to_connect,208.216.119.19=2 failed_to_connect,216.165.109.81=1 not_sent,128.4.36.11=1 timeout,128.143.137.250=1 timeout,128.4.36.11=1 timeout,129.130.252.138=1 timeout,155.225.2.72=1 timeout,64.151.112.20=1 timeout,65.241.38.58=1 Mon Sep 3 22:16:47 2007 tulip-log-analyze.pl: took 0 seconds |
As we review the logs we will determine whether probing from some landmarks is reliable enough to warrant their use.
Some spot measures of performance indicate that for 10 pings per target and 86 PlanetLab landmarks for region=northamerica as we vary the number of landmarks accessed simultaneously, the number of parallel requests per landmark, and the timeout for each request the duration is as follows (n.b. there is a timeout of 100 seconds on the complete process, and the default values are ins boldface in the table below:
Simultaneous landmarks |
Parallel requests / landmark |
Request timeout |
Duration (secs) |
---|---|---|---|
20 |
5 |
2 |
50 |
20 |
5 |
10 |
60 |
10 |
5 |
2 |
88 |
40 |
5 |
2 |
34 |
20 |
10 |
2 |
50 |
traceroute.pl: This script has been written with special security considerations so it will help in implementing reflector.cgi
topology.pm: This is a multi-threaded script written by Yee so this will help understand the threading issues in perl which are a bit complex.