The involvement of SLAC towards the improvement of perfSONAR network monitoring are three-fold:

PingER Services

Ping provides a simple, yet effective metric for network performance. It can provide both latency and loss measurements between two sites. When tests are performed in a mesh between numerous sites, much information about the network can be gleamed from the cross correlation and analysis of such data. More importantly, it typically requires very little network support and is easily deployable.

With this in mind, an effort to upgrade the PingER 1 tool set to support the perfSONAR framework was undertaken. This involved not only re-architecting PingER to better support metrics such as jitter, but also to provide dynamic discovery of PingER sites and the conversion of SQL database data for representation in perfSONAR XML.

PingER for perfSONAR is now a standard part of the perfSONAR-PS distribution, with easily installable packages in tar.gz, RPM and CPAN modules 2. The introduction of a 'Live CD' 3 that incorporates both perfSONAR and PingER toolsets has also provided the opportunity for a level of ease of deployment and thus outreach that was previously not possible.

Topology

As perfSONAR aims to provide global outreach to provide network monitoring, it was felt that visual presentation of the dispersion and inter-relation of the various perfSONAR deployments around the world would provide impact and momentum towards the growth and deployment of perfSONAR in the future.

Due to the openness and community driven nature of perfSONAR and the use of XML for communication, a Google Maps 'mashup' 4 was created that provides interactive examination of all perfSONAR services. Utilizing the Global Lookup Service to provide a hierarchical depiction of the numerous perfSONAR services deployed worldwide, this mashup provides intuitive and a visually appealing method by which one can explore and display Topology, PingER, BWCTL, and OWAMP performance metrics. This has been used very successfully in several public demonstrations of perSONAR.

This very simple, yet effective tool provided a platform from which various parts of the perfSONAR framework could be thoroughly tested and vetted. This included validation of data collection, registration of services to the Lookup Services, consistency of data representation and ability to interrogate service metadata and performance data.

A start was made on a 3D display of a globe plotting the perfSONAR nodes together with rotation of the globe, and drill down access to more data. The project was delayed by the need to set up a PingER MA at SLAC and a 3D mathematical model to draw lines and rotate them with the Globe rotation.

Diagnosis

One of the more important applications perfSONAR will be the automatic analysis and diagnosis of performance changes. Due to the vast volume of distributed (yet federated) that will be available from a global deployment, it becomes important that, for example, a network engineer is alerted of problems on the network before users complain of network issues.

A step towards this is the low level analysis of performance data for anomaly detection. Applied to say, throughput between two sites, measurements can be fitted against various distributions and outliers flagged as being potential performance alerts. We have prepared a paper that is being submitted to IMC 09, with application towards generic data as proof of concept. The first draft of the technical report is available online 5. N.b the paper submitted to IMC proposes an approach with significant revisions to the methods suggested in the first draft - which was compiled in July '08.

In addition, the collection and cross correlation of various network metrics (that perfSONAR provides) in order to identify a performance 'culprit' can provide an extensive method of wading through the vast amounts of data. We first experimented with Multi-route Event Detection using PCS 6, however the results were disappointing. We then looked at mining the events detected to look for correlations and reduce false positives and negatives and provide improved diagnostics 7. Though it had some successes, the work should be classified as only the preliminay investigation of the overall problem.

1 The PingER Project: Active Internet Performance Monitoring for the HENP Community, IEEE Communications Magazine on Network Traffic Measurements and Experiments.
2 Internet2 Software
3 Performance Node: Live CD
4 Google Maps Mashup http://psvis0.internet2.edu:8008/
5 Technical Report: A Decision-Theoretic Approach to Detect Anomalies in Internet Paths by Fida Hussain, Umar Kalim, Noman Latif, Syed Ali Khayam. A revised approach has been proposed and is being submitted to IMC 09.
6 Multi-route Event Detection Using PCA, Adnan Iqbal, unpublished, available at http://confluence.slac.stanford.edu/display/IEPM/Multi-route+Event+Detection+Using+PCA.
7 Event Diagnosis Adnan Iqbal, accepted for ICIMP 2007 (San Jose), but withdrawn, also see http://confluence.slac.stanford.edu/display/IEPM/Event+Diagnosis and the presentations referenced therein.

  • No labels