Time & date 

Wednesday  Sep 17th,  2014 9:00pm Pacific Standard Time, Thursday Sep 18th 2014 9:00am Pakistan time, Thursday Sep 18th 2014 12:00 noon Malaysian time, Thursday Sep 18th, 2014 1:00am Rio Standard Time.

Attendees

Invitees:

Anjum+, Hassaan Khaliq, Kashif, Raja,  Johari+, Nara, Adnan+, Abdullah, Badrul, Ridzuan+, Ibrahim, Hanan, Saqib+, Adib+, Les+, Renan, Bebo+

+ Confirmed attendance

- Responded but  Unable to attend: 

Actual attendees:

Kashif, Raja, Johari, Adnan, Les, Bebo

Administration

  • At the last meeting the connectivity was dreadful, lots of noise, restarted several times.  Maybe we should try Google hangouts. Does anyone have accounts or experience?

  • Anjum reports (6/23/2014) that "the proposal for conference has been submitted for approval and Pinger has been added in the agenda. Travel expenses for Les and Bebo have also been included in the conference proposal. We are awaiting the proposal approval. 8/16/2014: The faculty management at UM just changed and many matters required urgent attention. Abdullah will be able to update us soon once he gets a chance to see the Vice Chancellor. Once the approval is given the venue for the conference can be at UM or UUM. 

  • With the uncertainty and other responsibilities Bebo is unable to commit to the meeting though he could present by  video. Right now Les is in the same boat.

  • Anjum suggested putting together a paper on metrics provided by PingER for Sigmetrix. The due date is in November. Does someone want to take the lead - Anjum?

Renan

Following the last meeting Les made available via anonymous FTP the PingER analyzed data for 100Byte pings going back to 1998.

From Renan, he is working on Approaches to handle PingER current data, method 2b (see below):

  • Distributed and parallel approach – Utilization of a data warehouse on top of a distributed file system to provide low latency response to complex queries (like the ones we were not able to do on my previous work). Additionally, how Scientific Workflow Management Systems may help in the ETL process of transforming PingER so it can easily be stored on the data warehouse

 Renan's update as of 9/14/2014:

I am using a Parallel Scientific Workflow Management System called Chiron (http://chironengine.sourceforge.net/index.php/home) to process PingER big data which you have recently made fully available through FTP. That saved me A LOT of time. It is much, much faster than the approach of retrieving from Pingtable through HTTP GETs. Specifically, I am dealing with hourly measurement.

I am happy to say that I was finally able to process the whole hourly data and convert it into a suitable CSV format, following a multidimensional model. The way the CSV files are structured will help us to make complex OLAP queries (http://en.wikipedia.org/wiki/Online_analytical_processing).

Additionally, I was able to load part of the processed CSV data into a shared nothing cluster on top of which I have a distributed file system built. More specifically, I could only load 2012 and 2013 hourly data up to now due to lack of space on the cluster's hard disk. It is not a very powerful cluster. I am seeing if I should get more space or find a better environment to run the experiments. Finally, I am using a structured datawarehouse on top of the distributed file system on this cluster. The data warehouse technology is called Cloudera Impala (http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html). As a result, I am able to perform complex OLAP queries to query PingER hourly data in less than 15 s, which is a great time given the size of the data. Considering that I only have 2012 and 2013 up to now. An example of a complex OLAP query that took 12s is: Show me all HOURLY Throughput and Packet Loss measurement, from all nodes in the US to all nodes in Brazil, which were measured from Feb 2013 to Sep 2013, and aggregate the results by month. 

I may have some updates on (item 1: Quantitative analysis on PingER data - They want to know how PingER has grown, since 1998 until today and how it might be in the next years. By doing this, we may focus on more suitable technologies that deal with scenarios that have a similar profile with PingER) ) too. Since I needed to process PingER hourly data since 1998 until recent days, I am able to inform the actual decompressed size of each of those files compressed on those tar balls. I am also able to inform for which combination of parameters there were no files. I will try to write a report about it. 

That's all I have for now. Next, I need to find a way to load more data (I need more space on the hard disks or get a better cluster) and execute more complex queries.

It is uncertain yet if I can open the cluster for one to somehow execute queries. I believe it is possible. More research is needed on this.

UM

The ping server at http://pinger.fsktm.um.edu.my/cgi-bin/traceroute.pl?target=www.slac.stanford.edu&function=ping gives ping server busy at the moment. Please try again later. Some one with access to the web servers should look at that (e.g. review the web logs). Fixed by installing an updated traceroute.pl with better diagnostics. Don't understand why.

Last week Dr. Badrul provided Ibrahim with a cloud computing platform which has four cluster nodes configured. Each node has Linux OS installed on it. we will be using this cloud environment  to run the experiment. Therefore, the first step is, conduct performance evaluation of key-value stores for RDF Data. then based on the result we will  proposed our model for pingER data.

Badrul (6/23/2014) is still awaiting hearing from his student (Abdulrahim Haroun Ali who is out of the country) on  the paper on anomalies in PingER measurements  and will update later once the paper ready. For the minute the paper is not ready. No update 8/20/2014.

Regarding the Myren cloud service registration, Ridzuan has not received any feedback although several attempts have been made to contact them. Lately Myren are on road tour to introduce their services and Ridzuan expects that their cloud service is on the preliminary stage. However, our faculty managed to setup our own cloud service and Ridzuan has been given some access. Currently, on the setup stage and will try to load Pinger data into the Hadoop environment. If storage is sufficient, will try to download all the compressed analyzed data for secondary backup. Will let Les know later if it is successfully done.

Anjum points out "since I have setup the cloud, I can tell in advance that the storage is not sufficient to download entire Pinger data. Every virtual machine is given a 20GB hard disk space by default. If you plan to do big data experiments, let Dr. Badrul and Dr. Abdullah know and I can allocate dedicated resources for the purpose. We have more than 1.5 Terabytes of configures storage and close to 5 terabytes of non-configured storage that can be configured if required."

UNIMAS

Pinger 2 (Raspberry Pi)  needed clock resetting, it was 1.5 months off. Thus no data was usefully gathered. It has been running successfully since Sept 2nd. See  ePingER project Malaysia for recent plots of data from pinger and pinger2.unimas.my (Raspberry Pi).

Traceroute server: Status unsolved. The problem is the same on Pinger2. Johari talked to the network administrator at the centre about this issues and he suggested to talk to the security manager to check whether the firewall is blocking the icmp packer from the traceroute command (to do list). No progress 9/17/2014.

Custom iso: He can get as far as the boot screen, but is unable to get to the desktop. They will started work on it  but student is still unable to boot ISO (9/17/2014)

Research: one student currently doing master by research on pinger project. Progress is a bit slow since the student lacks sound technical and programming skill to implement potential solution. Also will supervise another Advanced Project (Master by coursework) this coming Sep 2014. Planning to investigate whether two pinger monitoring host has any differences in term of data collection (pinger and pinger2 nodes in UNIMAS). 
They looked at the potential projects and selected two. Putting together a framework for anomaly detection. Interested to know of any more projects

UTM

After revision the FRGS proposal was submitted to RMC. It is under review, expect feedback at the end of September.

According to Professor Francis Lee, SingAREN SLIX core router is a key node for international research and education networks – including APAN, GLORIAD, Internet2, and TEIN – and peers directly with Australia’s AARNet and Japan’s NII and NICT networks. The first 100 GbE international connection is likely to be made within the next year as a result of a US funding call for a 100 Gbps research network link to Asia.

Saqib has run the mtr for more than 3 days from UM to UNIMAS, UTM to UM, and UTM to UNIMAS. From UM to UNIMAS, best RTT from 6 to 9, gets down to ~2ms from ~49ms (for the worst case).  However, for hop 10 it stayed at ~42ms as shown in attached figure (UM-UNIMAS). Thus it appears the the MYREN guy is right that there is much congestion at hop 6 and possibly 7-9 for most of the time. Below are the UM-UNIMAS mtr results.

It sounds like the MYREN guy is right  that there is significant congestion at least at hop 6. I wonder if we can get a better handle on this by monitoring hop 6 from UM and seeing the time periods when the congestion occurs. Some one would need to add to pinger.xml <HostList> something like:

 <Host>
<Alarm>
<TimeOfFirstFailure>1410091899</TimeOfFirstFailure>
</Alarm>
<DnsLastChecked>1410982759</DnsLastChecked>
<IP>203.80.23.73</IP>
<Name> te-0-3-0-0.drc96.jaring.my</Name> 
</Host>

It is possible it may not respond to pings which will make this not useful. Check that first. I notice from SLAC:

 249cottrell@pinger:~$ping 203.80.23.73

PING 203.80.23.73 (203.80.23.73) 56(84) bytes of data.
^C
--- 203.80.23.73 ping statistics ---
55 packets transmitted, 0 received, 100% packet loss, time 54850ms

 However: 

 250cottrell@pinger:~$ping te-0-3-0-0.drc96.jaring.my PING te-0-3-0-0.drc96.jaring.my (61.6.51.2) 56(84) bytes of data.
64 bytes from te-0-3-0-0.drc96.jaring.my (61.6.51.2): icmp_seq=1 ttl=242 time=192 ms
64 bytes from te-0-3-0-0.drc96.jaring.my (61.6.51.2): icmp_seq=2 ttl=242 time=194 ms
64 bytes from te-0-3-0-0.drc96.jaring.my (61.6.51.2): icmp_seq=3 ttl=242 time=193 ms ^C
--- te-0-3-0-0.drc96.jaring.my ping statistics ---
4 packets transmitted, 3 received, 25% packet loss, time 3193ms rtt min/avg/max/mdev = 192.982/193.542/194.490/0.763 ms

I.e. the name refers to a different IP address.

UUM

UUM Monitor is running and data is being gathered. Raja will add the ping landmark.  Les has sent information on IPv6 and porting the archive to Adib UUM. Adib is interested in porting PingER to IPv6.

The Monitor was un-pingable since Sep 3rd, Adib was notified 9/12/2014. Adib responded 9/13/2014: Will check and update you. t has been running successfully since15th September.

NUST

Raja will be joining the PingER project at NUST as researcher. It will take a coiple of weeks to get the paper work completed.

We are looking at making VTrace IPV6 compatible. We need a machine that has access to IPv6. I am looking to see if we can provide a machine at SLAC.

Raja is also looking at fixing an application that provides a map of the world with each country shaded by the metric performance. He will add the extra metrics and the current year.

Installation is in progress for the Bahawalpur site. Install complete needs approval from head.

Kashif fixed:

  • pinger.kohat.edu.pk, is fixed

The following Pakistani hosts are having problems

  • www.upesh.edu.pk is pingable but unable to gather data from it

  • pingerkhi-ouk.pern.edu.pk is pingable but unable to gather data from it

 Kashif is now at UNSW Australia for a six months research visit. Following this, he will be joining his parent organisation. This also means he will most likely not be available for Pinger Management. At present, Zeeshan is looking after the tasks that were originally being performed by Kashif. Zeeshan has never been part of meetings, however, he has been associated with the project for very long time. Zeeshan is a temporary arrangement and we need to find a permanent resource for the purpose.

PingER at SLAC

The full set of data by host for 100Byte pings of PingER pingtable.pl analyzed data since 1998 can be found at ftp://ftp.slac.stanford.edu/users/cottrell/*.tar.  There are about 20GBytes of data and 100,000 files. For information on how it is formatted and how to retrieve etc. see Archiving PingER data by tar for retrieval by anonymous ftp. It will there until end September if you wish to gather a copy.

Added Beacons in 14 African countries to facilitate a case study, e.g. compare connectivity within Africa from Burkina Faso, Algeria and South Africa.

South African monitoring site fixed.

Working to restore Bolivian site (email reminder sent 9/12/2014)

CDACMumbai (mail requests are being automatically rejected) host is down.

Added pinger.uum.edu.my to monitors.

Old Items

Linked Open Data

Renan  finished the new pingerlod web site. The new thing is that it should be much easier now to modify the info texts. What Renan did was to put the texts into a separate file. The new version has been loaded on the server and some text added to describe how to use the map. However there is a bug that prevents it from executing the map. Renan reports that the bugs should be easy to fix. He has talked to his professor who suggested trying RDF Owlink, it should have faster responses to queries. Renan will research this.  It will probably mean reloading the PingER data so is a lot of work, hopefully this will improve performance. Before the rebuild he will make the fixes and provide a new WAR for us to load on pingerlod.slac.stanford.edu. He is also working on documentation (he has finished the ontology and has a nice interactive tool for visualizing it, since the ontology is the core of the data model of our semantic solution, this will be very helpful for anyone who uses our system, both a developer of the system and a possible user) and his thesis. Bebo pointed out that to get publicity and for people to know about the data, we will need to add pingerlod to lod.org.

Things he will soon do regarding documentation:

  1. A task/process flow writing all java classes involved on all those batch jobs;
  2. A Javadoc <http://www.oracle.com/technetwork/java/javase/documentation/index-jsp-135444.html> which will explain all classes and how they are used.

For the Linked Open Data / RDF which is in pre-alpha days, you can go to http://pingerlod.slac.stanford.edu. As can be seen this page is not ready for prime time. However the demos work as long as one carefully elects what to look at:

  • Click on Visualizations, there are two choices:
    • Multiple Network Metrics: Click on the image: gives a form, choose from Node pinger.slac.stanford.edu pinging to www.ihep.ac.cn, time parameters yearly, 2006 2012, metrics throughput, Average RTT Packet loss and display format Plot graph, then click on submit. In a few seconds time series graph should come up. Mouse over to see details of values at each x value (year).
    • A mashup of network metrics x university metrics Click on image: gives another form, pinging from pinger.slac.stanford.edu, School metric number of students, time metric years 2006 2012, display format plot graph, click on submit. Longer wait, after about 35 seconds a google map should show up. Click on "Click for help." Area of dots = number of students, darkness of dots = throughput (lighter is better), inscribing circle color gives university type (public, private etc.) Click on circle for information on university etc.
  • Renan will be working on providing documentation on the programs, in particular the install guide for the repository and web site etc. This will assist the person who takes this over. 

Renan is using OWLIM as RDF Repository. He is using an evaluation version right now. Renan looked into the price for OWLIM (that excellent RDF Database Management System he told us about). It would cost 1200EUR minimum  (~ 1620 USD, according to Google's rate for today) for a one time eternal license. It seems too expensive. No wonder it is so good. Anyhow, he heard about a different free alternative. Just not sure how good it would be for our PingER data. He will try it out and evaluate. He will also get a new evaluation of the free OWLIM lite.  

He has also made some modifications on the ontology of the project (under supervision of his professor in Rio) hence he  will have to modify the code to load the data accordingly.

Maria and Renan are advancing in some approaches to deal with PingER data, making it easier to be analyzed and integrated. In particular they have been busy studying and evaluating alternatives, analyzing results from the latest benchmarks on NoSQL (including RDF and graph based storages) database management, distributed processing and mediated  solutions over relational databases, and also other experiments with multidimensional analyses on Linked Data.  The new students involved are now understanding better the scenario and they have been interacting with Renan regularly. 

They have separated the tasks into 2: 

  1. Quantitative analysis on PingER data
    1. They want to know how PingER has grown, since 1998 until today and how it might be in the next years. By doing this, we may focus on more suitable technologies that deal with scenarios that have a similar profile with PingER.
      1. Two students are working on this.
  2. Approaches to handle PingER current data
    1. Conventional approach – Utilization of Cassandra as back-end database to provide easy crossing of parameters to get PingER data.
      1. One student is working on this.
    2. Distributed and parallel approach – Utilization of a data warehouse on top of a distributed file system to provide low latency response to complex queries (like the ones we were not able to do on my previous work). Additionally, how Scientific Workflow Management Systems may help in the ETL process of transforming PingER so it can easily be stored on the data warehouse.
      1. Renan is working on this.
    3. Pure RDF approach – Good ways of modeling and natively storing RDF data.
      1. Maria-Luiza is working on this.
    4. NoSQL approaches – How other NoSQL DBMS may be adequate for PingER multidimensional data.
      1. Two students are evaluating existing NoSQL solutions for multidimensional scenarios (such as PingER)
    5. Key-Value storages for PingER data in RDF
      1. This is Ibrahim’s work.

In the end, they want to compare all these approaches.

NUST

At the Connect Asia Pacific Summit in Bangkok in  January and seeing the  project "Mapping the pan Asia Pacific information Superhighway and closing gaps in infrastructure  connectivity" Shahryar found that very much related to the work in the PingER project. So Shahryar sent email to a UN agency for a possible collaboration with them on PingER project. He has heard nothing so he will write a detailed proposal and then should contact them again. No update 2/5/2014, 3/5/2014.

Tulip
Follow up from workshop
  • Hossein Javedani of UTM is interested in anomalous event detection with PingER data. Information on this is available at https://confluence.slac.stanford.edu/display/IEPM/Event+Detection. We have sent him a couple of papers and how to access the PingER data. Hossein and Badrul have been put in contact. Is there an update Badrul?

The Next step in funding is to go for bigger research funding, such as LRGS or eScience. Such proposals must lead to publications in high quality journals. They will need an infrastructure such as the one we are building. We can use the upcoming workshop (1 specific session) to brainstorm and come up with such proposal. We need to do some groundwork before that as well. Johari will take the lead in putting together 1/2 page descriptions of the potential research projects. 

  1. Need to identify a few key areas of research related to PingER Malaysia Initiative and this can be shared/publicized through the website. These might include using the infrastructure and data for: anomaly detection; correlation of performance across multiple routes; and for GeoLocation. Future projects as Les listed in Confluence herehttps://confluence.slac.stanford.edu/display/IEPM/Future+Projects can also be a good start and also Bebo's suggestion. 
  2. Need to synchronize and share research proposals so as not to duplicate research works. how to share? Maybe not through the website, or maybe can create a member only section of the website to share sensitive data such as research proposal?

Anjum suggested Saqib,  Badrul and Johari put together a paper on user experiences with using the Internet in Malaysia as seen from Malaysian universities. In particular round trip time, losses, jitter, reliability, routing/peering, in particular anomalies, and the impact on VoIP, throughput etc.  It would be good to engage someone from MYREN.

Ibrahim

Ibrahim Abaker  is planning to work on a topic initially entitled " leveraging pingER big data with a modified pingtable for event-correlation and clustering".  Ibrahim has a proposal, see https://confluence.slac.stanford.edu/download/attachments/17162/leveraging+pingER+big+data+with+a+modified+pingtable+for+event-correlation+and+clustering.docx. Ibrahim reports 7/15/2014 "I have spent the last few months trying to understand the concept of big data storage and its retrieval as well as the traditional approach of storing RDF data. I have integrated a single hadoop cluster in our cloud. but for this project we need multiple clusters, which I have already discussed with Dr. Badrul and he will provide me with big storage for the experiment." No Update 8/20/2014.

"I have come up with initial proposed solution model. This model consists of several parts. The upper parts of the Figure below shows the data source, in which PingER data will be convert into RDF format. Then the data pre-processor will take care of converting RDF/XML into N-triples serialization formats using N-triples convertor module. This N-triple file of an RDF graph will be as an input and stores the triples in storage as a key value pair using MapReduce jobs"

Potential projects

See list of Projects

Future meeting  - Les

Next meeting Wednesday October 8th  2014 9:00pm Pacific Standard Time, Thursday October 9th 2014 9:00am Pakistan time, Thursday October 9th, 2014 noon Malaysian time, Thursday  October 9th, 2014 01:00am Rio Standard Time.

Coordinates of team members:

See: http://pinger.unimas.my/pinger/contact.php

  • No labels