Time & date 

Wednesday  Jan 7,  2015 8:00pm Pacific Standard Time, Thursday Jan 8th 2014 9:00am Pakistan time, Thursday Jan 8th 12:00 noon Malaysian time, Thursday Jan 8th, 2015 2:00am Rio Standard Time.

Attendees

Invitees:

Anjum, Hassaan Khaliq, Kashif, Raja,  Samad Riaz+, Johari+, Nara, Adnan, Abdullah, Badrul, Ridzuan, Ibrahim+, Hanan, Saqib+, Adib-, Les+, Renan-, Bebo+

+ Confirmed attendance

- Responded but  Unable to attend: 

Actual attendees:

Johari, Samad, Les, Bebo. Saqib and Ibrahim had network problems

Administration

  • Following the workshop Johari contacted a  MYREN technical guy who seemed very interested and there has been an exchange of emails. PingER monitors have been installed and are working at MYREN hosts in Cyberjaya and at UNIMAS. The traceroute servers also work. They plan to add an extra 10 monitors. 
  • Anjum and Raja have been working on a paper on Geolocation as developed for TULIP. Using an exponential relation between the Directivity (Alpha) and RTT for Pakistan the accuracy is ~ 18Km. Now Raja needs to run for Europe and the US. Meanwhile Raja has got a job and has less time to work on this so it was stalled. Les contacted Raja and Raja agrees it is important to finish the measurements and the paper, and will endeavor to do so.
  • Anjum's contract in Malaysia ends in 20 days. It may be extended or he may pursue a Senior lectureship  elsewhere. He is awaiting hearing an update.
  • Bebo will be in Kuching for the CITA 2015 (see http://www.cita.my/ an International Conference 4th - 6th August 2015, on transforming Big Data into Knowledge, Sponsored by UNIMAS and including workshops) that precedes the RAIN FOREST MUSIC FESTIVAL. Is there interest in co-locating a PingER workshop?  Also Ridzuan or Ibrahim or Renan have interest in submitting a full paper by March 2nd 2015. Johari will suggest to the conference committee on having a workshop/tutorial session about PingER project. We need a specific topic for the workshop, and it should be inline with the conference theme which is about big data, 

UUM

There have been several extended periods where we cannot gather data from the pinger.uum.edu.my, see the plot below where black indicates no data gathered. Adib fixes these outages when he is notified. It would be good to automate these recoveries. Email has been sent to Adib asking if this can be done.

Renan

No update 1/7/2015.

Maria Luiza Campos of UFRJ reports that there are people at UFRJ taking care of the PingER data analytics project at this moment. Maria is now in LOA Trento Italy for 1 year post doc. Adriana Vivacqua, is now in charge of this subproject at CRDB at UFRJ.  There are 11 people on the team: 4 professors1 post doc, 1 doctoral student, 2 masters students and 3 undergraduates.There is a document in which they describe the project proposal with more details. 

Renan should continue their work as soon as Cristiane Ceia begins her BSc dissertation thesis, in which Renan will be her co-advisor (together with Luiza).

UM

Ibrahim has setup distributed hadoop clusters. He has 2TB of disk space. Les has provided information on getting a subset of PingER data by anonymous ftp via ftp://ftp.slac.stanford.edu/users/cottrell.  It was put there last September. Information on how the data was put together is at https://confluence.slac.stanford.edu/display/IEPM/Archiving+PingER+data+by+tar+for+retrieval+by+anonymous+ftpThere is information on formatting etc at http://www-iepm.slac.stanford.edu/pinger/tools/retrievedata.html and some on the dataflows at https://confluence.slac.stanford.edu/display/IEPM/PingER+data+flow+at+SLACRenan at UFRJ has successfully used this data, he has also characterized the data in terms of bytes/metric per year etc.

Les has requested Renan to provide an estimate of how DF bloats the data. Renan/Christiane are looking at this. Renan's pointed out "Finding RDF data size in bytes is not simple because it depends on which Triple Store will be used and how each triple is physically stored. One may store triples as plain texts, other may do as compressed data in specific formats, which would be much smaller."Once we have the number of PingER triples and how much the used Triple Store needs (in bytes) to store a known number of general triples, we may estimate PingER RDF data size."

 Ibrahim has started downloading all zip files in the local machines. Last week he downloaded 2 GB of Weather data to test his nodes cluster, he  wrote a simple Java program (Map, Reduce) to find the Average and it was working fine.

UNIMAS

The two major issues with the Raspberry Pi would be:

  • are the results statistically the same as for the other monitor at UNIMAS (e.g. use the Kolmogorov-Smirnov test); There is Advanced Project (Master by coursework student) working on the statistics of the data from the raspberry Pi and the production PingER monitor at UNIMAS to see how much they differ.
  • is it reliable/robust is it clear what to do to debug problems remotely (e.g. if it is at Bario).  Looking at the monitoring data I have been unable to collect any from it (it is pingable, and port 80 responds, however the remote traceroute and ping_data.pl are not working) since Oct 20th which does not sound promising. Will need to evaluate the robustness of the unit by doing simulated scenario of various events such as power failure, hard and cold reboot, etc. Johari will need access to computer center to verify it comes up correctly after reboot etc.
  • Johari will go to the computer center the coming weekend and look at improving the auto re-start.

If/when it works it would be instructive to look at the data from pinger and raspberry pi to Malaysia since the distances are shorter and the differences may show up better. For Sep-Oct 2014 when there was data measured from both Oct-Nov the averages for 20 paths was 52+-21ms (from pinger.unimas.my to 20 other Malaysian hosts) and 56+-21ms for raspberry pi to 20 other Malaysian hosts.

The traceroute problem maybe the same as for UTM (see below). Johari will request unblocking of the appropriate UDP ports,

Custom iso: He can get as far as the boot screen, but is unable to get to the desktop. It is on hold as of 1/7/2015 awaiting  student with the appropriate skills/background.

They are also looking at anomaly detection:  http://slac.stanford.edu/pubs/slacpubs/13250/slac-pub-13399.pdf or http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.363.1087 for comparisons of some techniques and http://people.cs.missouri.edu/~calyamp/publications/ontimedetect_mascots10.pdf. Next they will look at performance among correlated routes. There are quite a lot of papers in this are so a literature search is highly recommended.

UTM

After revision the FRGS proposal was submitted to RMC. It was not accepted. We need to update it again in order to fulfill the requirements of the grant. Is there an update?

Saqib has updated the case study and is available in Google drive as a "Shared-PingER" document for review at https://drive.google.com/folderview?id=0B-NEKleLll79ZFNmUnhiVGJ0Nmc&usp=sharing_eid (thanks to Bebo who will notify all of how to access). Further it needs some updates from UNIMAS (on Raspberry  Pi),  UM (on big data) and UUM.

The traceroute problem regarding maximum reachable hops ( i.e. 11 hopes ) may be since the Unix/Linux/OSX  traceroute uses UDP to send the requests. The first request is sent to a particular port (33434), with a ttl  to tell it how many hops to go to.  The ttl starts at 1 is incremented as it tries the next hop, also the port is incremented (up to 33465).  It looks like the first few UDP ports are enabled and then they are blocked. The Windows traceroute uses ICMP to send the probes.

NUST

We are unable to resolve the name of the host: host pinger.uob.edu.pk

We are unable to get data from the following hosts for a long time

  • buitms.seecs.edu.pk #The person does not wish to continue working with us.Should we Disable gathering data from this host?
  • nukhimain.seecs.edu.pk #Unable to gather data since 20th November, 2014
  • pinger.nca.edu.pk #This node was working following the previous meeting but we have been unable to gather data since 18th Dec.
  • pinger.uettaxila.edu.pk #The node need fresh installation but Samad has issue in visiting the site that's why it is not working from many days. Last gathered data 17 September
  • pingerisl-fjwu.pern.edu.pk #There is a problem in the link to that node that's why it is not working from a long time. i am in contact with the concern person. hope it will be up in a week. Last gathered data 27th October 2014.
  • pingerlhr.pern.edu.pk. Last gathered data October 13,2014
  • sau.seecs.edu.pk. Unable to gather data since 4th December 2014

The following are pingable but there is no data:

  • ns3.pieas.edu.pk
  • www.upesh.edu.pk #After the last meeting Samad indicated need installation.will be up in a day. The concern person will re install Pinger today according to our commitment.  However with the recent attack on a school in Peshawar several universities are closed for security reasons.

Pinger at SLAC

Added hosts over Xmas, now monitor 171 countries that contain 99.14% of the world's population. Monitor all countries with > 1M people except Central African Republic, Chad, Guinea-Bissau, North Korea.

Putting together annual report for PingER.

Looking at whether the incidence of Duplicate pings is reducing with time.

Bebo came up with a self funded researcher who is interested in communications. Les reviewed the information and sent information on a possible project to the researcher: I include the information below in case there is a self funded researcher/graduate student from Pakistan or Malaysia who is interested.

Dear Yu Su,

 I would like to describe what we plan to do starting from Feb. 2015.

 We have applied to the US Department of Energy (DoE) for a 9 month grant to further the design and implementation of a ultra-high performance parallel data transfer software for both LAN and WAN. Currently this effort will use the SLAC 10/100Gbit/sec network and the 100Gbps connections to our providers and the Internet.  At present there are no plans to extend this to mobile networks. This will change in the future as mobile/wireless connection exceed Gbit/sec.

 The software leverages high-performance backend storage (e.g. NVMe SSD devices aggregated using a parallel file system, e.g. IBM GPFS, Intel Lustre, Fraunhofer BeeGFS), cluster compute (provided by a cluster of commodity servers), and a multiple TCP stream approach to attain excellent out-of-box bandwidth utilization with minimum tuning over multiple network connections (e.g. four 10Gbps NICs per server). It's also aimed to achieve such good utilization regardless data set types: Large files, mixed-sized, or lots of small files (LOSF).  Encryption for data in-transit is provided by SSL with optional perfect forward secrecy (PFS).  The data transfers are managed using a US patent-pending protocol.

 An earlier version of the software was demonstrated at Super Computing 2014 (SC14) in this past November.

 We have 40Gbps and 10Gbps connected hosts at 2 data centers and 100Gbps between the sites.

 We will start out with local transfers and later extend to long distance (trans-continental, inter-continental) links

 Should the aforementioned grant be awarded, we will initially focus on the following four technical objectives:

 1.            Issue resolution, design refining and implementation tuning.

2.            Implement missing features.

3.            More extensive validation over 100Gbps networks.

4.            More extensive storage scale-out validation

Once the above objectives are accomplished, we will enhance the software for long distance testing over transit links.

The work will involve writing scripts to automate measurements, statistical analysis of measured data, sophisticated configuration/tuning of TCP stacks, file systems, and network connections  to achieve high performance. We are also looking at funding a graduate student (possibly from Pakistan). English is a probably a second language for this student so you will need a good command of the English language.

Please let us know if the above scope fits with your interest and goals.

Next meeting

Next meeting:  Wednesday Feb 4th 2015 8:00pm Pacific Standard Time, Thursday Feb 5th  2015  9:00am Pakistan time, Thursday Feb 5th 2015 noon Malaysian time, Thursday  Feb 5th, 2015 02:00am Rio Standard Time.  

Old Items

Linked Open Data

Renan  finished the new pingerlod web site. The new thing is that it should be much easier now to modify the info texts. What Renan did was to put the texts into a separate file. The new version has been loaded on the server and some text added to describe how to use the map. However there is a bug that prevents it from executing the map. Renan reports that the bugs should be easy to fix. He has talked to his professor who suggested trying RDF Owlink, it should have faster responses to queries. Renan will research this.  It will probably mean reloading the PingER data so is a lot of work, hopefully this will improve performance. Before the rebuild he will make the fixes and provide a new WAR for us to load on pingerlod.slac.stanford.edu. He is also working on documentation (he has finished the ontology and has a nice interactive tool for visualizing it, since the ontology is the core of the data model of our semantic solution, this will be very helpful for anyone who uses our system, both a developer of the system and a possible user) and his thesis. Bebo pointed out that to get publicity and for people to know about the data, we will need to add pingerlod to lod.org.

Things he will soon do regarding documentation:

  1. A task/process flow writing all java classes involved on all those batch jobs;
  2. A Javadoc <http://www.oracle.com/technetwork/java/javase/documentation/index-jsp-135444.html> which will explain all classes and how they are used.

For the Linked Open Data / RDF which is in pre-alpha days, you can go to http://pingerlod.slac.stanford.edu. As can be seen this page is not ready for prime time. However the demos work as long as one carefully elects what to look at:

  • Click on Visualizations, there are two choices:
    • Multiple Network Metrics: Click on the image: gives a form, choose from Node pinger.slac.stanford.edu pinging to www.ihep.ac.cn, time parameters yearly, 2006 2012, metrics throughput, Average RTT Packet loss and display format Plot graph, then click on submit. In a few seconds time series graph should come up. Mouse over to see details of values at each x value (year).
    • A mashup of network metrics x university metrics Click on image: gives another form, pinging from pinger.slac.stanford.edu, School metric number of students, time metric years 2006 2012, display format plot graph, click on submit. Longer wait, after about 35 seconds a google map should show up. Click on "Click for help." Area of dots = number of students, darkness of dots = throughput (lighter is better), inscribing circle color gives university type (public, private etc.) Click on circle for information on university etc.
  • Renan will be working on providing documentation on the programs, in particular the install guide for the repository and web site etc. This will assist the person who takes this over. 

Renan is using OWLIM as RDF Repository. He is using an evaluation version right now. Renan looked into the price for OWLIM (that excellent RDF Database Management System he told us about). It would cost 1200EUR minimum  (~ 1620 USD, according to Google's rate for today) for a one time eternal license. It seems too expensive. No wonder it is so good. Anyhow, he heard about a different free alternative. Just not sure how good it would be for our PingER data. He will try it out and evaluate. He will also get a new evaluation of the free OWLIM lite.  

He has also made some modifications on the ontology of the project (under supervision of his professor in Rio) hence he  will have to modify the code to load the data accordingly.

Maria and Renan are advancing in some approaches to deal with PingER data, making it easier to be analyzed and integrated. In particular they have been busy studying and evaluating alternatives, analyzing results from the latest benchmarks on NoSQL (including RDF and graph based storages) database management, distributed processing and mediated  solutions over relational databases, and also other experiments with multidimensional analyses on Linked Data.  The new students involved are now understanding better the scenario and they have been interacting with Renan regularly. 

They have separated the tasks into 2: 

  1. Quantitative analysis on PingER data
    1. They want to know how PingER has grown, since 1998 until today and how it might be in the next years. By doing this, we may focus on more suitable technologies that deal with scenarios that have a similar profile with PingER.
      1. Two students are working on this.
  2. Approaches to handle PingER current data
    1. Conventional approach – Utilization of Cassandra as back-end database to provide easy crossing of parameters to get PingER data.
      1. One student is working on this.
    2. Distributed and parallel approach – Utilization of a data warehouse on top of a distributed file system to provide low latency response to complex queries (like the ones we were not able to do on my previous work). Additionally, how Scientific Workflow Management Systems may help in the ETL process of transforming PingER so it can easily be stored on the data warehouse.
      1. Renan is working on this.
    3. Pure RDF approach – Good ways of modeling and natively storing RDF data.
      1. Maria-Luiza is working on this.
    4. NoSQL approaches – How other NoSQL DBMS may be adequate for PingER multidimensional data.
      1. Two students are evaluating existing NoSQL solutions for multidimensional scenarios (such as PingER)
    5. Key-Value storages for PingER data in RDF
      1. This is Ibrahim’s work.

In the end, they want to compare all these approaches.

NUST

At the Connect Asia Pacific Summit in Bangkok in  January and seeing the  project "Mapping the pan Asia Pacific information Superhighway and closing gaps in infrastructure  connectivity" Shahryar found that very much related to the work in the PingER project. So Shahryar sent email to a UN agency for a possible collaboration with them on PingER project. He has heard nothing so he will write a detailed proposal and then should contact them again. No update 2/5/2014, 3/5/2014.

Tulip
Follow up from workshop
  • Hossein Javedani of UTM is interested in anomalous event detection with PingER data. Information on this is available at https://confluence.slac.stanford.edu/display/IEPM/Event+Detection. We have sent him a couple of papers and how to access the PingER data. Hossein and Badrul have been put in contact. Is there an update Badrul?

The Next step in funding is to go for bigger research funding, such as LRGS or eScience. Such proposals must lead to publications in high quality journals. They will need an infrastructure such as the one we are building. We can use the upcoming workshop (1 specific session) to brainstorm and come up with such proposal. We need to do some groundwork before that as well. Johari will take the lead in putting together 1/2 page descriptions of the potential research projects. 

  1. Need to identify a few key areas of research related to PingER Malaysia Initiative and this can be shared/publicized through the website. These might include using the infrastructure and data for: anomaly detection; correlation of performance across multiple routes; and for GeoLocation. Future projects as Les listed in Confluence herehttps://confluence.slac.stanford.edu/display/IEPM/Future+Projects can also be a good start and also Bebo's suggestion. 
  2. Need to synchronize and share research proposals so as not to duplicate research works. how to share? Maybe not through the website, or maybe can create a member only section of the website to share sensitive data such as research proposal?

Anjum suggested Saqib,  Badrul and Johari put together a paper on user experiences with using the Internet in Malaysia as seen from Malaysian universities. In particular round trip time, losses, jitter, reliability, routing/peering, in particular anomalies, and the impact on VoIP, throughput etc.  It would be good to engage someone from MYREN.

Ibrahim

Ibrahim Abaker  is planning to work on a topic initially entitled " leveraging pingER big data with a modified pingtable for event-correlation and clustering".  Ibrahim has a proposal, see https://confluence.slac.stanford.edu/download/attachments/17162/leveraging+pingER+big+data+with+a+modified+pingtable+for+event-correlation+and+clustering.docx. Ibrahim reports 7/15/2014 "I have spent the last few months trying to understand the concept of big data storage and its retrieval as well as the traditional approach of storing RDF data. I have integrated a single hadoop cluster in our cloud. but for this project we need multiple clusters, which I have already discussed with Dr. Badrul and he will provide me with big storage for the experiment." No Update 8/20/2014.

"I have come up with initial proposed solution model. This model consists of several parts. The upper parts of the Figure below shows the data source, in which PingER data will be convert into RDF format. Then the data pre-processor will take care of converting RDF/XML into N-triples serialization formats using N-triples convertor module. This N-triple file of an RDF graph will be as an input and stores the triples in storage as a key value pair using MapReduce jobs"

Potential projects

See list of Projects

Coordinates of team members:

See: http://pinger.unimas.my/pinger/contact.php

  • No labels