Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

The main purpose of TULIP Central Reflector is to proxy the TULIP queries to landmarks in PlanetLab's Scriptroute Service. It may also be extended to issue all queries.  This decision will be related to speed of execution and security among other things service, and the perfSONAR and  PingER reverse ping servers. The PlanetLab Scriptroute service provides a cookie which works for a single IP address only. So in this way all the requests will be issued from the Central reflector and the responses will be sent back to the TULIP JNLP Client. 

Implementation 

Wiki MarkupThe TULIP Central reflector will be a CGI Script (reflector.cgi) deployed at SLAC. The TULIP client will issue a single request and the Reflector will go ahead and probe all the landmarks in that region*\[1\] and return the results to the TULIP Client. Probing the target site from more vantage points may give us a better estimate of its location.

Requirements

  • Should fetch sites.txt or have a local copy of sites.txt, what changes should be made to sites.txt ?
    • A new parameter should be added to sites.txt to include teir0 or teir1tier1. Also the region of tier1 sites needs to be specified in sites.txt
  • A separate thread should be used for each landmark and Semaphores should be used for locking, so that data from different threads should not inter-mix.
  • There should be a limit on the number of threads that can be launched at a time (say 10).
  • Should there be extra logging on the reflector or can we rely on the standard web logs which will log each query including time stamp, the client name.  What else it logs depends on whether the rewuest request is Get or a Post.
  • Where are the results parsed, could be in the reflector or in the Java client. In the client distributes the parsing load, reduces the load on the reflector, simplifies the CGI script.
  • What should happen if a landmark responds with a bad data. ( Should it process the error or send the raw data back?). Since there will be some anomalies I suspect the reflector will need to return the full response and anyhow needs to inform the user, so I suspect initially the client will process the response and spot errors etc.) Also if the client parses the result it will probably be easily able to spot problems.
  • Special consideration for security as the script ultimately has to be deployed at SLAC (Perl taint option, warning option, special open method etc)
  • Need to agree on a common format for the exchange of data.
  • Needs a blacklisting mechanism for malicious hosts.

...

There are two scripts: reflector.cgi and  EventHandler.pm. Both have -T (Tainting), warning (-w), use strict, use the 3 parameter version of open, all opens and closes have a die or its equivalent. EventHandler.pm is called by reflector.cgi.  The CGI scripts are  deployed in the path /afs/slac.stanford.edu/g/www/cgi-wrap-bin/net/shahryar/smokeping/.

Invocation

The reflector script is called by a URL of the form:

Code Block
http://www-wanmon.slac.stanford.edu/cgi-wrap/reflector.cgi?region=North%20America&target=134.79.16104.980&tier=all&type=PlanetLab&ability=1

Leaving out the region will assume all regions, leaving out the tier will assume all tiers, leaving out the type will assume both PlanetLabs PingER and SLAC perfSONAR type landmarks. If the region is included then only landmarks in that region will be used. If the tier is specified then only that tier's landmarks will be used. If the type is specified then only that type landmark(s) (e.g. type=PingER,perfSONAR will use only landmark types PingER and perfSONAR) will be used.  Any or all of the  tier, region and type region may be specified as "all".

...

The sites xml files are created a by a trscrontanb running in pinger@pinger.slac.stanford.edu. The script that writes sites.xml is: create_sites-xml.pl. There is also a debug parameter. Use debug=1 or greater if you want debugging output.

The function parameter, enables one to specify to ping (function=ping, default), to print out usage information (function=help), to print out the log file(function=log), to print out an HTML table of the landmarks (function=landmarks) from the TULIP database, and to analyze the log file (function=analyze). The function analyze has a sub option ability=0 to analyze the logs for disabled landmarks and ability =1 (default) to analyze log entries excluding those for disabled landmarks. The function landmarks has an additional parameter out, which can toggle output between html out (out=html (default) or out=csv), which can toggle output between html and csv. Using this link, one can easily have a look at landmarks present in the database, along with their most important properties, like tracerouteURL, pingURL, longitude, latitude, city, country and hostname.

...

There are about 60 SLAC/Looking Glass landmarks, and about 156 PlanetLabs landmarks.  We filter the landmarks nightly using tulip-tuning.pl to disable non- or poorly responding landmarks, and re-enable them when they are working well. The PlanetLab landmarks send 10 pings very quickly, whereas the SLAC/Looking Glass landmarks send five 56 byte pings with one second between them, they will also wait for a deadline time of 30 seconds for pings to be replied to.

...

The first 3 response are from PlanetLab landmarks and the latter is from a SLAC type landmark. 

Errors Reported by PlanetLab

Nb. a negative loss is reported if the target provides duplicate responses (emplification), such as

Code Block

3 packets transmitted, 5 packets received, 1.67 times amplification

PlanetLab Landmarks

To access the PlanetLab landmarks one needs a cookie that is associated with a subnet (in our case 134.79/16). In addition one needs a ruby script that is sent to the PlanetLab landmark to execute. These are put together by reflector.cgi to create a URL in hex form.

Errors Reported by PlanetLab

Failed to connect to http://129.22.150.90 response code 500
ERROR: you're (134.79.18.134) already running a measurement on socket 14. Wiki MarkupFailed to connect to [http://129.22.150.90] response code 500 ERROR: you're (134.79.18.134) already running a measurement on socket 14. [http://128.83.122.179]
10 packets transmitted, 0 received, 100% packet loss, time 0 ms rtt min/avg/max = 0/0/0 [httphttp://141.149.218.208]
Can't resolve DNS: submitted:6:in `ip_dst=': unable to resolve $target: running in a chroot without dns support (RuntimeError)
submitted:9: warning: didn't see packet 5 leave: pcap overloaded or server bound to incorrect interface?
To 134.79.16.9 timed out
Error connecting: Connection refused
ERROR: you need a valid scriptroute authentication cookie to use this server, or the cookie you used does not match\
your client IP 134.79.18.163; go to [http://www.scriptroute.org/cookies.html] to get one.
ERROR: you're (134.79.18.134) already running a measurement on socket 10.
PlanetLab Server Error: Received: IP (tos 0xc0, ttl 253, id 51592, offset 0, flags \ [none\], length: 56)
192.70.187.218 > 198.82.160.220: icmp 36: time exceeded in-transit
Error connecting: No buffer space available
submitted:9:in `send_train': scriptrouted error:    unable to send to 137.138.137.177: No buffer space available (ScriptrouteError)

Parsing the SLAC Landmarks

...

Some spot measures of performance indicate that for 10 pings per target and 86 PlanetLab landmarks for region=northamerica as we vary the number of landmarks accessed simultaneously,  the number of parallel requests per landmark, and the timeout for each request the duration is as follows (n.b. there is a timeout of 100 seconds on the complete process, and the default values are in boldface in the table below:

Simultaneous landmarks

Parallel requests / landmark

Request timeout

Duration (secs)

 20

5

2

50

20

5

10

60

10

5

2

88

40

5

2

34

20

10

2

50

Performance Version=3.0

There was a problem found with reflector that the request timeout was fixed at 5sec and the parallel requests was fixed at 10. Modifying the parameters in the code didn't change anything. This was causing two problems, 1) 5 seconds was too less for most of the number of landmarks accessed simultaneously,  landmarks causing large number of timeouts and 2) The speed of the script was not fast enough. The timeout was changed to 10sec and the number of parallel requests per landmark, and the timeout for each request the duration is as follows (n.b. there is a timeout of 100 seconds on the complete process, and the default values are in boldface in the table below:was increased to 80. During the time of testing there were a total of 200 active landmarks of which 91 were in North America, 15 in South Asia and 49 in Europe.

Region

Tier

Time (seconds)

All

0

10

North America

1

23

Europe

1

35

South Asia

1

17

All

All

65

Simultaneous landmarks

Parallel requests / landmark

Request timeout

Duration (secs)

 20

5

2

50

20

5

10

60

10

5

2

88

40

5

2

34

20

10

2

50

Testing

 It can be tested from a web browser by entering the URL (e.g. from a browser or from wget), e.g.

...

We have also considered whether the knowledge that a machine and possibly the usual owner can be accurately located may violate some privacy issue. This may require us to add some fuzz to results. So far this has not been done.

Sample Scripts

traceroute.pl: This script has been written with special security considerations so it will help in implementing reflector.cgi

...