Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition to the normal web server (Apache) logging, we  use Log4perl for logging. The configuration file is very simple. The following types of error messages can be found in the log file (this is at /scratch/tulip_log on wanmon.slac.stanford.edu and is also available here) together with with time stamped records of all requests, the requesting host, and the target.
2007/09/03 20:02:25 ERROR> EventHandler.pm:70 EventHandler::on_failure - Landmark=http://128.6.192.158,\
Client=134.79.117.29, failed to connect response code 500<BR>
2007/09/03 20:02:34 ERROR> EventHandler.pm:142 EventHandler::parseData - Landmark=http://129.22.150.90,\
Client=134.79.117.29, 10 packets transmitted, 0 received, 100% packet loss, rtt min/avg/max = 0/0/0:
2007/09/03 20:09:09 ERROR> EventHandler.pm:115 EventHandler::parseData - Landmark=http://128.143.137.250,\
Client=134.79.117.29, request timed out: To 134.79.16.9 timed out

...

Code Block
28cottrell@wanmon:~>bin/tulip-log-analyze.pl
===============Failure types by landmark =======================
           Landmark, Success,  100%_loss,connect_fail, not_sent, timeout, refused,  in_use,  no_name, transit_exc, Totals,
    143.225.229.236, 100.0%,  0.0%, 0.0%, 0.0%, 0.0%, 0.0%, 0.0%, 0.0%,        0.0%           1,
      149.48.230.20, 40.0%,  37.8%, 2.2%, 0.0%, 11.1%, 0.0%, 0.0%,        8.9%,        0.0%          45,...
itchy.cs.uga.edu_PL, 0.0%,  15.4%, 84.6%, 0.0%, 0.0%, 0.0%, 0.0%,        0.0%,        0.0%          26,
           Landmark, Success,  100%_loss,connect_fail, not_sent, timeout, refused, in_use,  no_name, transit_exc,  Totals,
             Totals,  2258, 422, 457, 11, 401, 0, 52, 287, 0, 3888
Wed Oct  3 14:58:38 2007 tulip-log-analyze.pl: took 38 seconds to analyze 4378 records for 323 requests.
Successful hosts=111, Failing hosts=108, PlanetLabs=128(100% success=26), SLACs=16(100% success=10)

As we review the logs we will determine whether probing from some landmarks is reliable enough to warrant their use The log is analyzed to understand usage, look  for abusers etc.

Landmark Failures

The typical failure mechanisms for the target www.cern.ch with a timeout of 2  and 10 seconds made in the evening (PDT) of September 8th 2007 is seen in the table below. The multiple numbers in each cell are for different requests. It is seen that increasing the timeout from 2 to 10 seconds does not provide much, if any help.  So we utilize a timeout of 2 seconds.

...