Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Wednesday  Aug 20th,  2014 9:00pm Pacific Standard Time, Thursday Aug 21st 2014 9:00am Pakistan time, Thursday Aug 21st, 2014 12:00 noon Malaysian time, Thursday Aug 21st, 2014 1:00am Rio Standard Time.

Attendees

Invitees:

Anjum-, Hassaan Khaliq, Kashif+, Raja+,  Johari+, Nara, Adnan+, Abdullah, Badrul, Ridzuan, Ibrahim+, Hanan, Saqib+, Adib-, Les+, Renan, Bebo+

+ Confirmed attendance

- Responded but  Unable to attend: 

Actual attendees:

Kashif, Raja, Johari, Adnan, Les, Bebo

Administration

  • The connectivity was dreadful, lots of noise, restarted several times.  Maybe we should try Google hangouts.

  • Anjum reports (6/23/2014) that "the proposal for conference has been submitted for approval and Pinger has been added in the agenda. Travel expenses for Les and Bebo have also been included in the conference proposal. We are awaiting the proposal approval. 8/16/2014: The faculty management at UM just changed and many matters required urgent attention. Abdullah will be able to update us soon once he gets a chance to see the Vice Chancellor. Once the approval is given the venue for the conference can be at UM or UUM.


    As discussed earlier, the only twist here is that Pinger will be seen as a case study for big data. This is good in a sense that people interested in doing research in the domain of big data can deploy pinger monitoring nodes at their respective universities/organizations and in return, play around with the data. We agreed that it looked like the 25th would be a good day for the PingER workshop. Les should be able to make it from Burkina Faso, and Bebo should be able to get back to the US for Thanksgiving.  There would be back to back presentation on how PingER gathers, archives data, what data there is, the data types, how to access etc.  by Les followed by Bebo on Google Tools for Big data.

  • Anjum suggested putting together a paper on metrics provided by PingER for Sigmetrix. The due date is in November.

Renan

Luiza has proposed three approaches to provide big data analysis/mining of PingER multidimensional data:

  1. Conventional. Utilization of Pentaho environment to handle big multidimensional data, which enables utilization of enhanced user interfaces.
  2. Linked Data. Benchmarking of more sophisticated Triple Stores than the one we use today at PingER LOD (Sesame). Preferably, we should analyze parallel and distributed solutions. CumulusRDF is an example.  
  3. Utilization of Greenplum (http://en.wikipedia.org/wiki/Greenplum). This is an intensive high performance database from EMC with many features such as caching. It is partly from the EMC acquisition of Pivotal. There is also a DBMS called Grindplan that explores lots of features using Pivotal.
    1. Renan is investigating an alternative to Hadoop, which utilizes a Scientific Workflow Management System and makes use of Map/Reduce paradigm to help both querying and provenance of the Linked Data (RDF) data.
    2. Ibrahim is investigating an approach that utilizes Hadoop Map/Reduce in a Key/Value store with PingER data in RDF.

Following the last meeting Les will made available via FTP examples of PingER data. There are two types:

  1. Raw data as gathered daily from all the monitoring hosts. This data is ie measured at 30 minute intervals and is quite dirty.
  2. Analyzed data by metric. This has been cleaned up. Les recommends UFRJ uses the cleaned up data., 

The instructions for the data were also sent as well as size estimates and information on how PingER data has been used.

...

Following the last meeting Les made available via FTP examples of PingER data. There are two types:

  1. Raw data as gathered daily from all the monitoring hosts. This data is ie measured at 30 minute intervals and is quite dirty.
  2. Analyzed data by metric. This has been cleaned up. Les recommends UFRJ uses the cleaned up data., 

The instructions for the data were also sent as well as size estimates and information on how PingER data has been used.

Maria and Renan are advancing in some approaches to deal with PingER data, making it easier to be analyzed and integrated. In particular they have been busy studying and evaluating alternatives, analyzing results from the latest benchmarks on NoSQL (including RDF and graph based storages) database management, distributed processing and mediated  solutions over relational databases, and also other experiments with multidimensional analyses on Linked Data.  The new students involved are now understanding better the scenario and they have been interacting with Renan regularly. 

They have separated the tasks into 2: 

  1. Quantitative analysis on PingER data
    1. They want to know how PingER has grown, since 1998 until today and how it might be in the next years. By doing this, we may focus on more suitable technologies that deal with scenarios that have a similar profile with PingER.
      1. Two students are working on this.
  2. Approaches to handle PingER current data
    1. Conventional approach – Utilization of Cassandra as back-end database to provide easy crossing of parameters to get PingER data.
      1. One student is working on this.
    2. Distributed and parallel approach – Utilization of a data warehouse on top of a distributed file system to provide low latency response to complex queries (like the ones we were not able to do on my previous work). Additionally, how Scientific Workflow Management Systems may help in the ETL process of transforming PingER so it can easily be stored on the data warehouse.
      1. Renan is working on this.
    3. Pure RDF approach – Good ways of modeling and natively storing RDF data.
      1. Maria-Luiza is working on this.
    4. NoSQL approaches – How other NoSQL DBMS may be adequate for PingER multidimensional data.
      1. Two students are evaluating existing NoSQL solutions for multidimensional scenarios (such as PingER)
    5. Key-Value storages for PingER data in RDF
      1. This is Ibrahim’s work.

In the end, they want to compare all these approaches.

UM

The ping server at http://pinger.fsktm.um.edu.my/cgi-bin/traceroute.pl?target=www.slac.stanford.edu&function=ping gives ping server busy at the moment. Please try again later. Some one with access to the web servers should look at that (e.g. review the web logs). Maybe it is being hit with a lot of requests simultaneously. If they are coming from SLAC we may want to look at reflector.cgi.

Badrul (6/23/2014) is still awaiting hearing from his student (Abdulrahim Haroun Ali who is out of the country) on  the paper on anomalies in PingER measurements  and will update later once the paper ready. For the minute the paper is not ready. No update 8/20/2014.

Ridzuan has put together a rough proposal to use Hadoop to store and make available PingER data.  He has registered for the Myren cloud services last month. But until now still not getting any approval for the use of the mentioned services. Will follow up again with them. For the Hadoop implementation, He is  considering the use of Hortonworks Hadoop Data (HDP2) platform, however there are some problems with the latest installation because UM adopted IPV6. Most of the HDP2 repositories are resided in IPV4 server thus make it difficult to correctly install to our server. He is trying to use another platform or find a way to solve this installation problem. No update 8/20/2014.

Ibrahim Abaker  is planning to work on a topic initially entitled " leveraging pingER big data with a modified pingtable for event-correlation and clustering".  Ibrahim has a proposal, see https://confluence.slac.stanford.edu/download/attachments/17162/leveraging+pingER+big+data+with+a+modified+pingtable+for+event-correlation+and+clustering.docx. Ibrahim reports 7/15/2014 "I have spent the last few months trying to understand the concept of big data storage and its retrieval as well as the traditional approach of storing RDF data. I have integrated a single hadoop cluster in our cloud. but for this project we need multiple clusters, which I have already discussed with Dr. Badrul and he will provide me with big storage for the experiment." No Update 8/20/2014.

"I have come up with initial proposed solution model. This model consists of several parts. The upper parts of the Figure below shows the data source, in which PingER data will be convert into RDF format. Then the data pre-processor will take care of converting RDF/XML into N-triples serialization formats using N-triples convertor module. This N-triple file of an RDF graph will be as an input and stores the triples in storage as a key value pair using MapReduce jobs"

Les fowarded forwarded by email the information from Ibrahim to Renan following the meeting

...

Pinger 2 (Raspberry Pi) is at the data centre: Johari went to the data centre (8/19/2014) with Adnan. Managed to successfully troubleshoot and setup the pinger2 raspberry pi unit at the data centre. Seems to be running and collecting data. The host is pingable and one can ping from it. DHowever ping_data.pl gatherer is unable to find cgi-lib.working with ping server, making PingER measurements  and gathering data all successful. A next step will be to see if it is reliable and if there are significant differences between it and the pinger host at UNIMAS. 

The tool to enable synchronizing Malaysian sites: added request from Saqib to sort the sites by country. Also have added another page to view statistic of sites by country. Have completed Troubleshoot and solve issues with form when inserting and updating record. The new page is available from the following page (two links on top of the table)                http://pinger.unimas.my/pinger/sites.php

Traceroute server: Status unsolved. But talk The problem is the same on Pinger2. Johari talked to the network administrator at the centre about this issues and he suggested to talk to the security manager to check whether the firewall is blocking the icmp packer from the traceroute command (to do list)

...

Saqib has talked to MYREN they say the routers at hops 5 and 6 in the traceroute from UM to UNIMAS are both at UM and the long delay between them is due to congestion. I am Les is skeptical since hops 6-12 have similar RTT and 12 is near Kuching. I suggest Saqib Les suggested Saqib run mtr for a day or more from UM to UNIMAS see if there is any day night variation in RTT. If min RTT gets down to <2 ms, then the MYREN guy is right. If it is ~ 50ms  and persists for several days then it really does not look like congestion (which should vary day night as the number of users changes). In that case then it really appears hop 6 is physically close to hop 12 since they have the same min RTT. Taken together with the email from UM I would be very suspicious of the MYREN statement.

Saqib met with MYREN who have made many topology changes. Saqib will also incorporate these into the Malaysian case study. He is seeing anomalously long delays between mainland Malaysia and Sarawak. It does not appear to be due to congestion. We need to understand the routing and which undersea cables are being used. Saqib will send more details after the meeting. He will also contact MYREN.

Saqib's proposal has submitted  his proposal to FRGS. He received a requested revision. Saqib has sent a copy to Anum and Les. Anjum is going to help edit the proposal.

Saqib also reports there is a change in traceroute from SLAC to UNIMAS in this month. Previously, it goes via TEIN3. Further, it did not touch the MYREN network. However, the traceroute from SLAC to UTM and UM remains the same. Johari is going to look at.

Is there an update?

Saqib has updated and re-submitted  his proposal to FRGS. Saqib sent a copy to Anum and Les. Anjum is going to help edit the proposal.

Saqib also reports there is a change in traceroute from SLAC to UNIMAS in this month. Previously, it goes via TEIN3. Further, it did not touch the MYREN network. However, the traceroute from SLAC to UTM and UM remains the same. Johari is going to look at.

Les did a binary search (using http://www-wanmon.slac.stanford.edu/cgi-wrap/traceroutearchive.cgi?fromLes did a binary search (using http://www-wanmon.slac.stanford.edu/cgi-wrap/traceroutearchive.cgi?from=www-wanmon.slac.stanford.edu&to=pinger.unimas.my&date1=2014_06_17&date2=2014_06_18&date3=2014_06_19). It appears the change happened on Wed 2014_06_18 (there is actually no trace route measured that day) . Les has no idea of the significance, Saqib may need to check with his contacts in Malaysia.  Traceroute from UM to SLAC:

...

9 58.26.240.62 (58.26.240.62) [AS4788] 215.915 ms

BTW you can examine the geography of the above routes by cutting and pasting them into http://www-wanmon.slac.stanford.edu/cgi-wrap/reflector.cgi?function=vtrace

UUM

Regarding the monitoring host in UUM, Adib has assigned one student to prepare the configuration/installation plan including how to secure their host from attack. He has a public IP address.  He needs to the DNS registration by Sunday 25th May or Monday.  He is in the last stage of working with the Computer Center. Adib requested Johari to share  the UNIMAS setting so it is easier for the student to follow. No update 6/5/2014, 6/25/2014. 

...

can examine the geography of the above routes by cutting and pasting them into http://www-wanmon.slac.stanford.edu/cgi-wrap/reflector.cgi?function=vtrace

UUM

Adib reports they completed the installation of the PingER server at UUM. However, the second hand (old) machine is not working properly :(, suddenly shut down/restart!

NUST

Installation is in progress for the Bahawalpur site. Install complete needs approval from head, hopefully up on Monday.

The following are now up and running:

...

  • pinger.kohat.edu.pk, Still, trying to find motherboard of Dell Optiplex 760. System, is old, hard to find motherboard, hope to solve soon. Currently the name is not resolving

  • also host pinger.nwfpuet.edu.pk name is working now.

  • there was no data this month from pingerfsbd.pern.edu.pk and the host was unpingable from SLAC, it is working now

  • sau.seecs.edu.pk was ungatherable (and unpingable) since Aug 7th, prior to that it was unreliable, it is working now

  • www.upesh.edu.pk Problem has been resolved by replacing new files and by giving full rights. Pingto Ping to is OK, data is also being collected now, however ping from is giving permission denied error.

...

Anjum suggested Saqib,  Badrul and Johari put together a paper on user experiences with using the Internet in Malaysia as seen from Malaysian universities. In particular round trip time, losses, jitter, reliability, routing/peering, in particular anomalies, and the impact on VoIP, throughput etc.  It would be good to engage someone from MYREN.

Potential projects

See list of Projects

Future meeting  - Les

Next meeting Wednesday September 17th  2014 9:00pm Pacific Standard Time, Thursday September 18th 2014 9:00am Pakistan time, Thursday September 18th, 2014 noon Malaysian time, Thursday  September 18th, 2014 01:00am Rio Standard Time.

...