You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

PingER Daily Management

With the amount of data gathered worldwide, the constant flux (hosts move, get replaced, are removed etc., links change in performance and routing) in Internet hosts (in particular for hosts in developing regions), it is critical to validate the gathered and configuration data on a regular basis. The daily workflow is described elsewhere. This document is to describe the daily validity checking etc.

Validating Meta Database

Many automatic validity checks and reports are made on PingER's meta database NODEDETAILS that describes the various hosts.
These checks are made on a daily basis and include validating that:

  • the latitudes and longitudes are not missing and are in a valid range;
  • the IP address for a node is in the valid range;
  • two or more hosts do not share the same IP address;
  • all nodes being pinged appear in the data base, this includes hosts chosen by individual monitoring sites.

Validating Hosts

Daily checks are made on the validity of the remote hosts by pinging each in turn. This is performed by the ping-beacons.pl script that is run from a trscronjob on pinger@pinger.slac.stanford.edu. We check and report:

  • if the host is not resolvable by the name servers;
  • if the IP address extracted from the ping result does not match that in the NODEDETAILS database then we track the changes and check:
    • if the IP address change is small (e.g. in the last field), and there is a small change in the min-RTT this usually indicates a host name being assigned to multiple hosts. Typical examples are clusters of hosts and we identify such hosts. Currently about 4% of the remote hosts fall in this category;
    • If the change in min-RTT is large then we further investigate to see whether the host has moved. This is typically associated with a large address change. For example the address of the host with the name 24-7online.co.za (in South Africa) recently changed from 196.3.165.25 to 78.31.108.62. Further investigation of the PingER archived data showed that on November 23 2008 the Min-RTT from SLAC abruptly changed from 330ms to 144ms. Geo IP Tools now shows it is in Reading, England. On average we see about 1 remote host per week making such changes. Typically it is a web server seeking better response time by using a proxy with a good Ethernet connection.

Discovering and Replacing Faulty Hosts

Besides looking for and following up on address changes, we make daily measurements via the downsites.pl script to create a sortable, color coded table of Monitoring/Remote host pairs for any pairs for which pings fail to respond. We categorize the reasons for no response as follows:

  • Remote hosts for which the DNS lookup fails;
  •  Remote hosts that do respond to queries to one of a set of well-known ports (80,7,53,23,25,21,37,79 [Reference]). Such hosts are probably blocking pings.

In addition we note the following in the sortable table:

  • hosts that respond to an IP address but not to the name.
  • whether a host was down for at least a day in the last week:
    • if so the the number of days it was down is reported as well as the latest date it was found up, and whether it is responding now
    • if the host was down for the entire week then we search back to see when it was last up, and how report how long it has been down.
    • how many consecutive days a host has not responded to a ping.

We also create tables of min-RTT from monitoring hosts to remote hosts sorted by monitor host followed by remote host's region. This enables us to quickly discover hosts in a region with anomalous min-RTTs. Typically these are hosts with a TLD of a developing country in the region but where the host is actually in a developed country.

Remote hosts that are now deemed invalid (e.g. moved and no longer represent a region, do not respond, multiple hosts in different locations respond to the same name (e.g. route name servers, distributed servers such as email servers etc.) are Disabled in NODEDETAILS. However the data is not removed from the archive. To accomodate invalid data data from such hosts the analysis keeps a list of filters to remove invalid data between host pairs for selected periods. The newly developed PingER metrics motion chart tool also quickly enables us to spot hosts that have anomalous PingER metrics (e.g. min-RTT) and lie outside the chart area occupied by similar hosts for some period of time.

Finding New Hosts to Monitor

n cases where the removed remote host is important (e.g. is one of two hosts representing an entire country) and needs replacing with another host, we have developed a HostSearcher' tool. It first interrogates Google for up to 1000 unique hosts in a selected country (using the Top Level Domain (TLD) feature). The tool then pings each host 10 times (by default) to ensure it responds and saves the min-RTT. Finally it checks the filtered hosts with Geo IP Tool to obtain the TLD, Latitide/Longitude, country and city. The final list of hosts obtained can be furher filtered by whether they are really in the country, their loss rate, their min-RTT, diverse location within the country etc. Though not always accurate this has been a great aid on numerous occasions. Unfortunately for a few developing countries (e.g. Chad, Libya) even this method fails to produce suitable monitorable hosts.

Validating Gathering of Data

After the script getdata.pl is run from a trscrontab on pinger@pinger.slac.stanford.edu to gather data from the monitoring hosts, the data is inspected for invalid data such as missing tokens, inability to send 10 packets etc. In addition a table is constructed showing the state (no response from the monitor, no data from monitor, partial data from the monitor, success) of gathering the data for each monitor node. Emails are sent to the administrator indicating which monitors were not successful. The typical follow up after a few days is to email the contact(s) at the monitor node to request help in fixing the problem.

  • No labels