Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

  1. The monitoring host collects data on its local disk
    • The data for most PingER v2 sites is stored in /usr/local/share/pinger/data
    • For the SLAC monitoring host, the data is stored in NFS at /nfs/slac/g/net/pinger/pinger_mon_data. This is the "raw" data just showing the ping times from SLAC to the rest of the world. It is retrieved the same way as any other monitoring host, via http://www.slac.stanford.edu/cgi-wrap/ping_data.pl .
  2. A trscron job run as the pinger user on the pinger host runs getdata.pl which contacts a web server at each monitoring site and requests that site's data with the ping_data.pl script.
    • This data is stored on NFS in /nfs/slac/g/net/pinger/pingerdata/hep/data/<hostname>/
  3. A trscron job run as the pinger user on the pinger host runs checkdata_gif.pl which validates that the data has been collected and sends the email reporting on missing data. This is scheduled for ~ 2 hours after getdata.pl is run. checkdata_gif.pl uses checkdata.pl which looks at each of the data files for the month in the NFS path mentioned above.
  4. Additional trscron jobs submit LSF batch jobs for the analyze_* scripts. Each script is run four times, for 100 and 1000 byte pings, and for by-site and by-node aggregation. A report is created for each metric, e.g. packet_loss, average_rtt, throughput, etc. These files are created in /nfs/slac/g/net/pinger/pingerreports/<metric name>/ and are named like: <metric name><packet size><by site|node>-<time period>.txt.gz
    1. analyze-hourly.pl runs first. It takes the data gathered from the monitoring sites in /nfs/slac/g/net/pinger/pingerdata/hep/data and creates a report for a whole day with one data point for each hour (the average of the two half hourly readings). The time period for the file name is yyyy-mm-dd. The output file is of the form:/nfs/slac/g/net/pinger/pingerreports/hep/minimum_rtt/minimum_rtt-100-by-node-2006-09-28.txt.gz
    2. The remaining scripts all depend on the results of analyze-hourly and can be run anytime after it has completed. They are scheduled as additional trscron jobs. they all return the average of the hourly results for the specified period.
      1. Wiki Markup[analyze-daily.pl|http://www-dev.slac.stanford.edu/cgi-wrap/scriptdoc.pl?name=analyze-daily.pl] by default creates a report covering a whole month with one data point for each day. The time period for the file name is yyyy-mm. Output files are of the form:&nbsp; /nfs/slac/g/net/pinger/pingerreports/hep/data/\[metric\]/\[metric\]-\[size\]-by-\[site\|node\]-YYYY-MM.txt.gz
      2. analyze-daily.pl with the --date 60days option creates a report covering the last 60 days with a data point for each day. The time period for the file name is 60days.
      3. analyze-daily.pl with the --date 120days option creates a report covering the last 120 days with a data point for each day The time period for the file name is 120days.
      4. analyze-monthly.pl creates a report covering the last 24 months with one data point for each month. The time period for the file name is not included and the - preceding the time period is removed, i.e. <metric name><packet size><by site|node>.tar.gz. The output file is of the form: /nfs/slac/g/net/pinger/pingerreports/hep/packet_loss/packet_loss-100-by-node.txt.gz
    3. Once per month on the first day of the month, a set of analysis scripts that work on longer periods are run.
      1. analyze-allmonths.pl creates a report covering all months for which there is data with one data point for each month. The time period used in the report file name is allmonths.
      2. analyze-allyears.pl creates a report covering all years for which there is data with one data point for each year. The time period used in the report file name is allyears. the output file is of the form: /nfs/slac/g/net/pinger/pingerreports/hep/average_rtt/average_rtt-100-by-node-2010-02.txt.gz

The exact trscrontab file used by the pinger user is available (http://www\-iepm.slac.stanford.edu/pinger/crontab-slaconly.txt).

Some information on what to do when (re-)processing missing data is available as well (http://www\-iepm.slac.stanford.edu/pinger/tools/restoredata.html).