Minutes for SLAC SEECS Meeting February 1st 2012

General

Ghulam has an SID and an account GNCHISTI@SLAC.STANFORD.EDU

An email has been sent out to the contacts of the monitoring hosts telling them about: Pinger Explorer, ICFA report, motion charts, traceroute archive, PingER video and PingER Map.

IPV6 - Anjum and Ghulam (this has been de-prioritized until new database PingER is working)

IPV6 machine is working fine. Ghulam installed pinger2 on it and tried to collect data. But it was unable to resolve the IPV6 address. Seems like we need to do some changes in the whole architecture to make it workable with IPV6. Les looked at pinger2.pl, it verifies the address is IPv4 4 octets. He made a suggestion to use valid_ip.pl to verify both ipv4 and ipv6 addresses. Ghulam are there other parts of pinger2.pl that need modifiying. It already can access ping6. In addition one will need a copy of pinger.xml with IPv6 hosts and their addresses.

A possible project would be to make traceroute.pl work on a dual stack IPv6 host (say to traceroute to ipv6.google.com). Will need to look at gethostbyname etc. Is there any interest?

Another is to make pingtable.pl and getdata.pl IPV6 capable, again this could use valid_ip.pl. Since Ghulam is working on these it would be good for him to add this.

pinger2.pl

There is a bug in pinger2.pl that results in the <BeaconList> being empty. Les has made a fix that may help. * **Sadia has created a new tar file /afs/slac/g/www/www-iepm/pinger/tools/pingER-2.0.3.tar.gz. It has been sent to Joun who is currently installing pinger2.pl on various machines. He said that its working fine right now but require more days to be sure of its working. *Is it working can we update the iepm.slac.stanford.edu/pinger/tools/pingER-2.0.2.tar.gz file and the documentation?

Joun says he has installed pingER2.pl in more than 15 nodes, a few of which are the PoP nodes. He is monitored the affect of pinger2.pl on these nodes and stated:

 "Pinger2.pl is working fine till 31 Jan. As the month changes, BeaconList is empty in some nodes, some are working fine and in 4 or 5 nodes pinger2 script was not run automatically. Once  I run script manually after that it runs automatically and working fine."

ICFA Report - Les and Amber

It has been released to Harvey on Jan 20th. The draft is at: http://www.slac.stanford.edu/xorg/icfa/icfa-net-paper-jan12/ICFA-report-12/ICFA-report-2012-final2.docx. The master copy will be at http://www.slac.stanford.edu/xorg/icfa/icfa-net-paper-jan12/report-jan12.docx

HEC Report - Anjum and Amber

The report has been sent to Anjum. Its also placed in Case studies at https://confluence.slac.stanford.edu/display/IEPM/PERN+Six+Monthly+Report+%28June+2011-+November+2011%29

Pakistani Hosts

  1. There is a discrepancy between the PERN monitors reported by Joun as not working and those SLAC is able to gather data from (reported in checkdatahere). It was reported a couple of weeks ago. Kashif and Joun are looking at it. For example we are unable to gather data from pinger.pern.edu.pk, there is something strange with the wget not working (see here). This maybe tied to some improvements to security installed on Dec 14th. Les confirmed there are still problems affecting 4 PERN POP nodes and 12 other Pakistani monitors. See Hosts pingable from some regions but not others for more details.
    1. Currently pinger.pern.edu.pk is not pingable so we are unable to test further. Progress
    2. All nodes deployed at PERN PoPs are being upgraded to enhance security. Anjum had discussed with Umar as to what security features should be implemented on these nodes. 
  2. In addition some PERN POP monitoring hosts (pingerisl-fjwu.pern.edu.pk, pingerisl-qau.pern.edu.pk, nuisb.seecs.edu.pk, nukhimain.seecs.edu.pk and pinger.pern.edu.pk ) are only pingable from Pakistan and Jordan. This may or may not be related. This needs to be resolved.  Using reflector.pl to ping nukhimain.seecs.edu.pk and also www.cern.ch, the number of landmarks able to ping nukhiman was 26, while for cern it was 106. It appears only landmarks in Pakistan, Algeria, India, Brazil, and Russia can ping nukhimain. Kashif and Joun are looking at. Progress
  3. Joun reports that pingermtn.pern.edu.pk is taken down each day due to lack of power just when we go to gather data from it. We have a backup gather at 9:00am each day to get around this. Is this likely to get fixed or do we just accept that this is the way the world is? Anjum
  4. Kashif reports we need a system for air university because they have a shortage of systems. Anjum
    The status of hosts as of 30th Jan 2012 is available here .

Responsible person: Joun Muhammad

HEC is sending out letters to the contact persons (who are non-cooperative) after which the nodes will be more reliable. 2-3 weeks will show much more stable nodes.

Joun is looking at archiving the reports where we can get at them if we need to mine them. Progress Anjum

PingER Archive Site - Ghulam

Ghulam has rebuilt the database.

After all  the discussion, it is decided to try the database scheme for any previous one month data to see the further steps required for sharding and managing the data.As PerfSonar has some fields which are not measured/needed by pingER, they can be left NULL. PerfSonar libraries will take that a NULL. The  fields which are required by pingER but not  applicable in PerfSonar, will be added in table. PerfSonar libraries will ignore these new added columns.

Current Schema : see here

There will be three tables(just like Seecs schema)

  1. host table: will have PerfSonar host table fields and location_location fields (only which are required by PingER) plus pingER required feilds.
  2. data table will be the same as new proposed one.However packet recieved/sent if used by pingER packet loss calculation or in anywhere can be added in data table.(Sadia believes there might be some PerfSonar library which will deal with it. She will see and tell Ghulam if there exist any)
  3. Add "by-class" field as required by pingER in metadata table.

2.Raw data for one month will be dumped into database and aggregation will done only for monthly data( which means one extra table for selected month for time being)

Ghulam: 

  1. Build database based on new schema(Send it to Sadia as well so same database can be built at SLAC)
  2. Modify getdata.pl(fine if its without parallel loops or threads as long a sit takes less than 5-6hours
  3. Run and test the getdata.pl
  4. Test it with queries as if one gives from pingtable.pl html page and measure the performance

Sadia 

  1. Modify getdata.pl to shift data from flat files to database

Future concerns:(Will be considered once  the performance of above monthly aggregated data is observed)

1.How to store raw data for one year

2.How should it be sharded

3.For how long data should be in database

Sadia :Adding max RTT and Alpha to pingtable.pl and the analyze scripts 

  • We need the alpha for identifying strange Pakistani routes. This will be done before we move to the new schema, it cannot wait anymore. In the meantime Amber is looking at using the PingER map with coloring of links by min RTT to sport anomalies by eye. Sadia has modified analyze hourly and analyze daily for MOS & alpha. Now she has to run for all days back to 1998.
  • Analysis scripts to add Mean Opinion Score and Alpha, some things need to be correctly configured. It has been deployed at http://pinger.seecs.edu.pk/cgi-bin/pingtable.pl for testing. Ghulam there was some problem in pingtable.pl alpha value . For some links alpha was having value of 200 . As we know alpha can have maximum of value 2. So there must be something wrong in calculation.

TULIP - Bilal

Following table lists the targets in Europe which are not plotted on maps. For example the first target can be explored here . This can be compared to a target which can be plotted on the map. Bilal looked into it and found that nodes are plotted using other GeoIP and IP tracking tools. 

Country Name

IP Address

Progress

Austria

62.218.39.47

1/10/2012: Fails due to an unreachable host in Germany 212.201.44.81

Austria

212.33.36.188

1/10/2012: Fails due to an unreachable host in Germany 212.201.44.81 

Italy

193.206.84.12

1/10/2012: Can be plotted successfully

Ukraine

193.29.220.3

1/10/2012: Can be plotted successfully

Amber looked at the host 212.201.44.81 (in Germany) in Pinger and Tulip database to find out why this host is unreachable. We do not have this host in Pinger Database, however, in Tulip database we only have one information about this host which is its IP. Amber and Les decided to delete this host from Tulip database. Bilal has to rerun the script to find out if Austria still gets plotted or not. 

Deleting the unreachable host is a temporary solution. We still need to find out a permanent solution for plotting a landmark whose one of the nearest host is not working; Tulip should consider the other nearest landmark to plot it.

One is to change the appolonius code, and the other is to give up on appollonius as we have already proved that Appollonius is useless.

Bilal: Remove apollonius

CBG TULIP Integration -- FYP (Bilal)
  • Bilal did some stress testing. The landmarks are 331 while the targets will be the ones generated by Sadia. He will compare the results with the 4 month old results with 59 hosts.
  • From the latest results it is apparent that if the landmark is also the target then we can get 0 error. Bilal has modified the tests to filter out such cases. Les has sent him the URL to the Landmarks file so he knows to filter out measurements from a landmark to itself. He will rerun the tests for N. America, Europe, S. Asia, E. Asia and Australia and will send the new results before Sunday. 
  • Stress testing results for Europe are now complete and available at: https://confluence.slac.stanford.edu/display/IEPM/Tulip+CBG+Stress+testing+for+Europe
  • It is seen that Asia performs better than Europe.
  • Bilal will be testing CBG on North America this week.

PerfSONAR (Pakistan)

  • PerfSONAR at SEECS: PerfSONAR throughput and latency nodes are now up and running at SEECS. Hostnames and corresponding IP adresses are:**  throughput measurement node: http://psbw.seecs.edu.pk/ (http:115.186.132.154/toolkit/) 
  • Select options under "Service Graphs" to view throughput or latency graphs. Added 5 Stratum 1 NTP servers to cater for clock delay and everything seems to work fine.
  • There are some interesting one-way latency graphs at 115.186.132.155 (SEECS PerfSONAR Latency node). Dst to Src (e.g MIT to SEECS) latency is less than Src to Dst (e.g SEECS to MIT) latency. This might uncover some trends in outbound network traffic from Pakistan.
  • Bilal and Ghulam will have a meeting with Zafar to know about PerfSonar and to maintain it in future. Update?

Possible projects

  • There can be a paper kind of talking on Pinger if we could just find the right conference. MCN, ICC and Globecomm do provide network monitoring topics. We can talk of GEO-Location experiences. For example within Pakistan it works fine, however as we go within regions or continents this gets worse. We can publish some stats on that for example. We are yet not ready for Tulip paper.
  • See [https://confluence.slac.stanford.edu/display/IEPM/Future+Projects].
  • Extend the NODEDETAILS data base to allow entry support for whether the host is currenty pingable. 
  • Extend Checkdata to provide emails automatically, see [https://confluence.slac.stanford.edu/display/IEPM/Extend+checkdata+to+make+it+more+useful]. Many of the ideas in the script node-contacts.pl are a step in this direction.
  • Improve the PingER2 installation procedures to make it more robust. This might be something for the person(s) in Pakistan who are responsible for installing PingER2 at the Pakistani monitoring sites. They probably have found where the failures occurs. Also look at the FAQ, and ping_data.pl which has been improved to assist in debugging, could it be further improved (e.g. provide access to the httpd.conf file so one can see if it properly configured)? There are 2 students working on the PingER archive. Is this something they could work on?
  •  [Fix PingER archiving/analysis package to be IPv6 conformant|IEPM:Make PingER IPV6 compliant]. Will build a proposal for an IPv6 testbed. They will try various transition techniques. A proposal has been prepared and that has been submitted to PTA. Adnan is a co PI. It is being evaluated today.  A small testbed has been established in SEECS and the plan to shift some of the network to IPv6. Bilal is part of 3 students involved with PingER and they will be involved with IPv6. They are porting the PingER archive site site to using a database. They have redeveloped the archive site using Umar's documentation. They have set up a small test archive site. They have gathering, archiving, analysis. They will design a new database. They will also try a port of PingER to IPv6. 
  • Look at RRD event detection based on thresholds and how to extend, maybe adding plateau algorithm. Umar's algorithm did  not work in a predictable manner. 
  • Provide near realtime plots of current pinger data using getdata_all.pl/wget. It will work as a CGI script with a form to select the host, the ping size, and the time frame to plot. It will use wget or getdata_all.pl to get the relevant data and possibly RRD/smokeping to display the data. 

Future meeting time - Les

  1. Next meeting on Wednesday 8th February, 2012 at 8:00 pm in US and Thursday 9th February, 2012 at 9:00am in Pakistan.
  • No labels