Analysis and Clustering of PingER Network Data. The first 6  sentences describing the origin of PingER are excellent. Then the inaccuracies creep in. The data is stored (and still is) as space separated flat files with a sophisticated method to enable fast searching for data.   There was a Pakistani attempt to use a relational data base but that was unsuccessful (too slow) and did not scale well. N.b I am not sure whether ref [2] has much to  with RDF and nothing to do with pingER. RDF  never went into production. It did not replace the flat files.  The RDF triples are also not the format currently in use. It only worked on a subset of the data and was an attempted proof of concept.  The objective reported in the paper is true.

The Pinger repository is closer to two decades old (1998-2016). Why it reports what we had in 1999 rather than say Dec 2014. Is not understood.  Bottom of the page under D. Clustering there is something missing.

Under Data cleaning you should also estimate the jitter (e.g. the standard deviation for the N received packets or the Inter packet delay Variation (the latter is better).

In D Clustering item 1

The data obtained from SLAC for all .pk was that limited to both monitoring and remote sites in .pk monitoring was .pk and remote was all the remotes those monitors monitored?

When you show Fig 7 for all monitoring site I was wondering why the monitoring site does not show up.  There is data, see for example

The clustering  results are interesting especially since they identify outliers.

There are some typos:

C Data Mining Techniques


A Experimental Setup

B. data Cleaning