Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

With the amount of data gathered worldwide, the constant flux (hosts move, get replaced, are removed etc., links change in performance and routing, pings get blocked or rate limited) in Internet hosts (in particular for hosts in developing regions), it is critical to validate the gathered and configuration data on a regular basis. The daily workflow is described elsewhere. This document is to describe the daily validity checking etc.

...

Finding New Hosts to Monitor

n In cases where the removed remote host is important (e.g. is one of two hosts representing an entire country) and needs replacing with another host, we have developed a HostSearcher' tool. It first interrogates Google for up to 1000 unique hosts in a selected country (using the Top Level Domain (TLD) feature). The tool then pings each host 10 times (by default) to ensure it responds and saves the min-RTT. Finally it checks the filtered hosts with Geo IP Tool to obtain the TLD, Latitide/Longitude, country and city. The final list of hosts obtained can be furher filtered by whether they are really in the country, their loss rate, their min-RTT, diverse location within the country etc. Though not always accurate this has been a great aid on numerous occasions. Unfortunately for a few developing countries (e.g. Chad, Libya) even this method fails to produce suitable monitorable hosts.

...

After the script getdata.pl is run from a trscrontab on pinger@pinger.slac.stanford.edu to gather data from the monitoring hosts, the data is inspected by checkdata_gif.pl for invalid data such as missing tokens, inability to send 10 packets etc. In addition a table is constructed showing the state (no response from the monitor, no data from monitor, partial data from the monitor, success) of gathering the data for each monitor node. Besides showing the gathering status going back many months, the table also provides easy links to dynamically test the monitoring host for its ping reachability and the response of its response to the web gather request. Emails are sent daily to the administrator central administrators indicating which monitors monitoring hosts were not successful. The typical follow up after a few days is to email the contact(s) at the monitor monitoring node to request help in fixing the problem. At any given time we are uanbel to gather data from about 10% of the monitoring nodes.