Background
This page describes the proceeduring in converting over 10 years of PingER data into a format which is more suitable for analysis and presentation using the perfSONAR PingER MA formats.
PingER Data File Format
All of the old pinger data is stored under:
pinger:/nfs/slac/g/net/pinger/pingerdata/
Each file contains the same data output format; each line represents one probe (of pings) with data stored in the following format:
www.nbi.dk 130.225.212.55 lns62.lns.cornell.edu 128.84.47.62 100 1041466552 10 10 143 144 148 0 1 2 3 4 5 6 7 8 9 143 144 143 143 143 143 148 145 144 143
Field |
Data (above) |
Description |
---|---|---|
0 |
www.nbi.dk |
dns name of source of ping |
1 |
130.225.212.55 |
ip addresss of source of ping |
2 |
lns62.lns.cornell.edu |
destination dns name of ping |
3 |
128.84.47.62 |
destination ip address of ping |
4 |
100 |
ICMP packet size of ping |
5 |
1041466552 |
Epoch time stamp of test (start or end?) |
6 |
10 |
Number of pings sent |
7 |
10 |
Number of pings recieved |
8 |
143 |
Minimum Rtt of all pings |
9 |
144 |
Mean Rtt of all pings |
10 |
148 |
Maximum Rtt of all pings |
11-20 |
0 1 2 3 4 5 6 7 8 9 |
Sequence number in the recieving order of pings |
21-30 |
143 144 143 143 143 143 148 145 144 143 |
Individual ping singleton values |
The complexity in the data format lies in the sequence number and ping singleton values which must be derived from the number of pings recieved.
As some of the PingER files were scattered, they were unified into the following location (on pinger.slac.stanford.edu):
/nfs/slac/g/net/pinger/pingerdata/hep/data/<YYYY>/<source>/ping-<YYYY>-<MM>-<DD>.txt.gz
SQL Storage Format
Three tables are required for the storage of the data into SQL tables.
Table holds information about a single ping node.
CREATE TABLE host ( ip_name varchar(52) NOT NULL, ip_number varchar(64) NOT NULL, comments text, PRIMARY KEY (ip_name, ip_number));
The metadata table holds constants of the test settings, such as the source and destination
CREATE TABLE metaData ( metaID BIGINT NOT NULL AUTO_INCREMENT, ip_name_src varchar(52), ip_name_dst varchar(52), packetSize smallint NOT NULL, count smallint NOT NULL, protocol varchar(10) NOT NULL, period smallint, deadline smallint, ttl smallint, INDEX (ip_name_src, ip_name_dst, packetSize, count), FOREIGN KEY (ip_name_src) references ipaddr_intervals (ip_name), FOREIGN KEY (ip_name_dst) references ipaddr_intervals (ip_name), PRIMARY KEY (metaID));
# pinger data table, some fields have names differnt from XML schema since there where # inherited from the current pinger data table # its named data_yyyyMM to separate from old format - pairs_yyyyMM CREATE TABLE data_200707 ( metaID BIGINT NOT NULL, minRtt float, meanRtt float, medianRtt float, maxRtt float, timestamp bigint(12) NOT NULL, minIpd float, meanIpd float, maxIpd float, duplicates tinyint(1), outOfOrder tinyint(1), clp float, iqrIpd float, lossPercent float, INDEX (meanRtt, medianRtt, lossPercent, meanIpd, clp), FOREIGN KEY (metaID) references metaData (metaID), PRIMARY KEY (metaID, timestamp));
Creation
A database was set up on a host for testing and initial loading purposed. all relevant permmisisons were granted to allow the pinger.slac.stanford.edu host write access to this database.
mysql> create database pinger; Query OK, 1 row affected (0.00 sec)
mysql -D pinger -p pinger < /u/sf/ytl/Work/perfSONAR/perfSONAR-PS/trunk/perfSONAR-PS/MA/PingER/create_pingerMA_MySQL.sql
Script
A script, pinger-ma_conversion.pl
was created (in PingER svn bin
) which reads in the raw output from these log files and inserts them into a database.
$ perl pinger-ma_conversion.pl --help Parses PingER flat file data into the specified databaseUsage: cat <PingER data file> | pinger-ma_conversion.pl [--db Options: --host=s hostname of database location --port=s port number for database --db=s database name --user=s username for database --password=s password for database --verbose provide statistical data of inserts --help this help message
zcat /nfs/slac/g/net/pinger/pingerdata/1997/ping-1997-07.txt.gz | perl pinger-ma_conversion.pl --host=<hostname> --port=<port> --user=<dbuser> --password=<dbpassword> --db=<dbname>
Conversion
The log configuratoin file was changed to the following for the conversion process:
log4perl.logger = INFO, A1 log4perl.appender.A1=Log::Dispatch::File log4perl.appender.A1.filename=/tmp/pinger-ma_conversion.log log4perl.appender.A1.mode=append log4perl.appender.A1.layout=Log::Log4perl::Layout::PatternLayout log4perl.appender.A1.layout.ConversionPattern=%d %m%n
A file containing a list of all of the txt.gz
files was created and a simple bash command was created to run through the list:
for f in `cat processing_list`; do echo "Analysing $f"; echo $f >> processed_data_files; zcat $f | perl pinger-ma_conversion.pl --host=<host> --port=<port> --user=<user> --password=<password> --db=<dbname>; done