Time & date
Wednesday July 16th, 2014 9:00pm Pacific Standard Time, Thursday July 17th 2014 9:00am Pakistan time, Thursday July 17th, 2014 12:00 noon Malaysian time, Thursday Jul 17th, 2014 1:00am Rio Standard Time.
Attendees
Invitees:
Anjum, Hassaan Khaliq, Kashif+, Raja+, Johari, Nara, Adnan+, Abdullah, Badrul, Ridzuan, Ibrahim+, Hanan, Saqib, Adib, Les+, Renan, Bebo+
+ Confirmed attendance
- Responded but Unable to attend:
Actual attendees:
Administration
Anjum reports (6/23/2014) that "the proposal for conference has been submitted for approval and Pinger has been added in the agenda. Travel expenses for Les and Bebo have also been included in the conference proposal. We are awaiting the proposal approval. If the proposal gets approved, we can then decide on wether to actually conduct the Pinger Workshop at UM or at another place. I am saying this because I believe a stand alone pinger workshop will be more preferable." It has been approved by the dean, next it goes to the chancellor. Hope to have decision in a couple of weeks. Once the approval is given the venue for the conference can be at UM or UUM.
As discussed earlier, the only twist here is that Pinger will be seen as a case study for big data. This is good in a sense that people interested in doing research in the domain of big data can deploy pinger monitoring nodes at their respective universities/organizations and in return, play around with the data. We agreed that it looked like the 25th would be a good day for the PingER workshop. Les should be able to make it from Burkina Faso, and Bebo should be able to get back to the US for Thanksgiving. There would be back to back presentation on how PingER gathers, archives data, what data there is, the data types, how to access etc. by Les followed by Bebo on Google Tools for Big data.- Anjum suggested putting together a paper on metrics provided by PingER for Sigmetrix. The due date is in November.
Renan
Les met with Renan and his superviser (Maria Luiza Campos). The minutes are at: https://confluence.slac.stanford.edu/display/IEPM/20140703+Meeting+between+UFRJ+and+SLAC.
Luiza has set up a small project in the UFRJ Reference center to provide big data analysis/mining of PingER multidimensional data
Luiza has proposed three approaches to provide big data analysis/mining of PingER multidimensional data:
- Conventional. Utilization of Pentaho environment to handle big multidimensional data, which enables utilization of enhanced user interfaces.
- Linked Data. Benchmarking of more sophisticated Triple Stores than the one we use today at PingER LOD (Sesame). Preferably, we should analyze parallel and distributed solutions. CumulusRDF is an example.
- Renan is investigating an alternative to Hadoop, which utilizes a Scientific Workflow Management System and makes use of Map/Reduce paradigm to help both querying and provenance of the Linked Data (RDF) data.
- Ibrahim is investigating an approach that utilizes Hadoop Map/Reduce in a Key/Value store with PingER data in RDF.
- Utilization of Greenplum (http://en.wikipedia.org/wiki/Greenplum). This is an intensive high performance database from EMC with many features such as caching. It is partly from the EMC acquisition of Pivotal. There is also a DBMS called Grindplan that explores lots of features using Pivotal.
Les will make available via FTP examples of PingER data. There are two types:
- Raw data as gathered daily from all the monitoring hosts. This data is ie measured at 30 minute intervals and is quite dirty.
- Analyzed data by metric. This has been cleaned up. Les recommends UFRJ uses the cleaned up data.,
The instructions for the data will also be sent to Luiza. Also see PingER data flow at SLAC.
Les will also send Luiza information on how PingER data has been used.
UM
Badrul (6/23/2014) is still awaiting hearing from his student (Abdulrahim Haroun Ali who is out of the country) on the paper on anomalies in PingER measurements and will update later once the paper ready. For the minute the paper is not ready.
Ridzuan has put together a rough proposal to use Hadoop to store and make available PingER data. He has registered for the Myren cloud services last month. But until now still not getting any approval for the use of the mentioned services. Will follow up again with them. For the Hadoop implementation, He is considering the use of Hortonworks Hadoop Data (HDP2) platform, however there are some problems with the latest installation because UM adopted IPV6. Most of the HDP2 repositories are resided in IPV4 server thus make it difficult to correctly install to our server. He is trying to use another platform or find a way to solve this installation problem.
Ibrahim Abaker is planning to work on a topic initially entitled " leveraging pingER big data with a modified pingtable for event-correlation and clustering". Ibrahim has a proposal, see https://confluence.slac.stanford.edu/download/attachments/17162/leveraging+pingER+big+data+with+a+modified+pingtable+for+event-correlation+and+clustering.docx. Ibrahim reported that in the last few months he has tried to come up with the scenario to pre-process RDF format using N-Triple and store in Key-value store database in Hadoop platform. He installed hadoop but in a Single not using CMC which is cloud services provided by UM Single Node.
UNIMAS
Johari will try to attend this skype meeting but at the same time bring Dr. Adnan Shahid Khan who recently joined UNIMAS, to the meeting as well. Adnan is coming up to speed. He has been added to http://pinger.unimas.my/pinger/contact.php and Adib has been requested to add him to pinger-my
The Raspberry Pi is at the data centre and has a public IP address. It was working last week, until the UPS failed over the weekend. It did not reboot itself. Johari will look at the problem.
The tool to enable synchronizing Malaysian monitors is completed. It has been tested by Saqib. Saqib requested to add sorting the HostList by country, Johari will add this.
The traceroute server at http://pinger2.unimas.my/cgi-bin/traceroute.pl has the same problem as before. They know (sort of) the problem but haven't got the chance to rectify it (mapping NAT address, needs to be added). There is no progress 12/4/2013, 1/8/2013, 1.22.2014, 2/5/2014, 3/26/2014, 4/8/2014, 4/23/2014, 6/4/2014. Now that the historical traceroutes are working for UM (see below) there is an extra incentive to get the reverse traceroute working at UTM and UNIMAS
Johari has a research student who finalized a proposal in order to officially apply for his masters. He will start in February. He is currently working on threshold/anomaly detection, and will extend to correlating performance over multiple routes. He will share the proposal with Les and others April. No progress 6/4/2014.
UTM
No updates regarding traceroute problem at UTM. However, Saqib thinks the problem is still in CICT firewall or router as tcp traceroute command works fine from UTM.
Saqib met with MYREN who have made many topology changes. Saqib will also incorporate these into the Malaysian case study. He is seeing anomalously long delays between mainland Malaysia and Sarawak. It does not appear to be due to congestion. We need to understand the routing and which undersea cables are being used. Saqib will send more details after the meeting. He will also contact MYREN.
Saqib's proposal is almost ready however we do not see somewhere (funding agency) to submit it to. The next round of the FRGS may be the next opportunity
UUM
Regarding the monitoring host in UUM, Adib has assigned one student to prepare the configuration/installation plan including how to secure their host from attack. He has a public IP address. He needs to the DNS registration by Sunday 25th May or Monday. He is in the last stage of working with the Computer Center. Adib requested Johari to share the UNIMAS setting so it is easier for the student to follow. No update 6/5/2014, 6/25/2014.
UUM pinger is almost ready. Adib has got an public IP address together with a dns name. Once this is settled the tracreoute.pl will follow. This will increase the number of landmarks in mainland Malaysia by 50% and improve geolocation. Adib plans to get to this next week when he returns from vacation.
NUST
The Bahawalpur site for new PingER Monitoring node: Kashif contacted once again the Director IT, and this time he replied positively and he is waiting for the new machine and hopefully the installation of PingER node will be done in next week.
Kashif reports the following are now fixed.
- airuniversity.seecs.edu.pk
- pinger.nca.edu.pk
- buitms.seecs.edu.pk
The latter two report a problem with accessing the httpd file. However, this does not interfere with their correct working. The issue is quite strange. For some reason apache can't access the /etc/httpd/ directory when ping_data.pl is called from the web but from the terminal it works just fine (both as root and apache).
There are also several sites that seem to have power problems and are often not available at the normal early morning (Pacific time) gathering time. We have added extra gathering times for a couple of sites.
Kashif is working on the rest of nodes.
- duhs.seecs.edu.pk, Kashif recommends to drop they do not seem interested. This has been done.
- pinger.kohat.edu.pk, System issue, we are sending new motherboard in this week.
- pingerkhi-uok.pern.edu.pk, SFP Connector Problem, We are sending new one in this week.
- sau.seecs.edu.pk, System Issue, New system is ready and will send in this week.
- uaf.seecs.edu.pk, Drop (IP blocked frequently so currently not good as a PingER). Anjum recommended that if we do not have another host in faislabad then this host may be very important. Since Kashif is from Faisalabad, Anjum suggested we try a bit more if there is no other host. Looking at the map (see http://www-wanmon.slac.stanford.edu/wan-mon/viper/pinger-coverage-gmap.html), it appears PK.PERNFSBDPOP.EDU.N1 is nearby. Thus UAF has been dropped.
Raja
Raja has added an optional feature to exclude water areas from the acceptable area. This reduces the error (proportional to area), but sometimes leads to a less accurate centroid. Currently it is only available for N. America. He will update the documentation with examples. The number of working, usable landmarks is now up to 340.
PingER at SLAC
Les requested an update from Yahoo about TULIP's geolocation. They answered "We are very much interested in getting IP triangulation at internet scale, we will have internal sync-up on how we can leverage this initiative if there is rate limit and get back. Regarding opening up yahoo sites for deploying ping server requires some more time to discuss this with relevant stake holders with in yahoo." No word, sent a reminder 5/19/2014. No response 6/4/2014, 6/25/2014..
Les sent email to Google as follows: "I would like to bring to your attention that we have developed a geolocation tool using delay based (using RTTs from known ping server landmarks) distance estimates to triangulate the location of an IP host target. The tool is accessible at: http://www-wanmon.slac.stanford.edu/cgi-wrap/reflex.cgi. We have identified that the accuracy of the geolocation is directly related to the landmark density (e.g. # of landmarks/ million sq km). The higher the density the smaller the error and the fall off is exponential. We currently have over 1000 registered landmarks, of which at any given time ~300 are working. The tool not only finds the location of the target, it also gives an estimated error. To the best of our knowledge it is the only freely available delay based measurement geolocation service publicly available today. A drawback (compared to database methods such as those based on GeoMind) is the time taken to make the measurements. We have worked on this from many directions including parallelization of the ping requests, caching, tiering to get the rough location (i.e. region of the world) then zooming in using all landmarks in the region. We are putting together a publication on this." Les sent an update to his contact at Google 6/23/2014, stressing the applicability to traceroute visualization. No response 7/13/2014.
Old Items
Linked Open Data
Renan finished the new pingerlod web site. The new thing is that it should be much easier now to modify the info texts. What Renan did was to put the texts into a separate file. The new version has been loaded on the server and some text added to describe how to use the map. However there is a bug that prevents it from executing the map. Renan reports that the bugs should be easy to fix. He has talked to his professor who suggested trying RDF Owlink, it should have faster responses to queries. Renan will research this. It will probably mean reloading the PingER data so is a lot of work, hopefully this will improve performance. Before the rebuild he will make the fixes and provide a new WAR for us to load on pingerlod.slac.stanford.edu. He is also working on documentation (he has finished the ontology and has a nice interactive tool for visualizing it, since the ontology is the core of the data model of our semantic solution, this will be very helpful for anyone who uses our system, both a developer of the system and a possible user) and his thesis. Bebo pointed out that to get publicity and for people to know about the data, we will need to add pingerlod to lod.org.
Things he will soon do regarding documentation:
- A task/process flow writing all java classes involved on all those batch jobs;
- A Javadoc <http://www.oracle.com/technetwork/java/javase/documentation/index-jsp-135444.html> which will explain all classes and how they are used.
For the Linked Open Data / RDF which is in pre-alpha days, you can go to http://pingerlod.slac.stanford.edu. As can be seen this page is not ready for prime time. However the demos work as long as one carefully elects what to look at:
- Click on Visualizations, there are two choices:
- Multiple Network Metrics: Click on the image: gives a form, choose from Node pinger.slac.stanford.edu pinging to www.ihep.ac.cn, time parameters yearly, 2006 2012, metrics throughput, Average RTT Packet loss and display format Plot graph, then click on submit. In a few seconds time series graph should come up. Mouse over to see details of values at each x value (year).
- A mashup of network metrics x university metrics Click on image: gives another form, pinging from pinger.slac.stanford.edu, School metric number of students, time metric years 2006 2012, display format plot graph, click on submit. Longer wait, after about 35 seconds a google map should show up. Click on "Click for help." Area of dots = number of students, darkness of dots = throughput (lighter is better), inscribing circle color gives university type (public, private etc.) Click on circle for information on university etc.
- Renan will be working on providing documentation on the programs, in particular the install guide for the repository and web site etc. This will assist the person who takes this over.
Renan is using OWLIM as RDF Repository. He is using an evaluation version right now. Renan looked into the price for OWLIM (that excellent RDF Database Management System he told us about). It would cost 1200EUR minimum (~ 1620 USD, according to Google's rate for today) for a one time eternal license. It seems too expensive. No wonder it is so good. Anyhow, he heard about a different free alternative. Just not sure how good it would be for our PingER data. He will try it out and evaluate. He will also get a new evaluation of the free OWLIM lite.
He has also made some modifications on the ontology of the project (under supervision of his professor in Rio) hence he will have to modify the code to load the data accordingly.
Renan has provided a 4 page Appendix on PingERLOD to the ICFA report. This is also available at PingER LOD Overview
Raspberry Pi
A quick comparison of the performance of the two hosts (raspberry pi and regular UNIMAS host) without statistical quantification is available at https://confluence.slac.stanford.edu/display/IEPM/Comparison+of+PinGER+RTTs+from+UNIMAS+monitors+N4+and+RASPBERRY. A page has been created to compare the hardware spec between the pinger.unimas.my node (Intel architecture) and the pinger2.unimas.my node (Raspberry Pi ARM architecture), available from the unimas pinger website at http://pinger.unimas.my/pinger/hardware.php. There is a link to hardware.php in the Comparison+of+PinGER+RTTs+from+UNIMAS+monitors+N4+and+RASPBERRY web page.
NUST
At the Connect Asia Pacific Summit in Bangkok in January and seeing the project "Mapping the pan Asia Pacific information Superhighway and closing gaps in infrastructure connectivity" Shahryar found that very much related to the work in the PingER project. So Shahryar sent email to a UN agency for a possible collaboration with them on PingER project. He has heard nothing so he will write a detailed proposal and then should contact them again. No update 2/5/2014, 3/5/2014.
Tulip
Follow up from workshop
- Hossein Javedani of UTM is interested in anomalous event detection with PingER data. Information on this is available at https://confluence.slac.stanford.edu/display/IEPM/Event+Detection. We have sent him a couple of papers and how to access the PingER data. Hossein and Badrul have been put in contact. Is there an update Badrul?
The Next step in funding is to go for bigger research funding, such as LRGS or eScience. Such proposals must lead to publications in high quality journals. They will need an infrastructure such as the one we are building. We can use the upcoming workshop (1 specific session) to brainstorm and come up with such proposal. We need to do some groundwork before that as well. Johari will take the lead in putting together 1/2 page descriptions of the potential research projects.
- Need to identify a few key areas of research related to PingER Malaysia Initiative and this can be shared/publicized through the website. These might include using the infrastructure and data for: anomaly detection; correlation of performance across multiple routes; and for GeoLocation. Future projects as Les listed in Confluence herehttps://confluence.slac.stanford.edu/display/IEPM/Future+Projects can also be a good start and also Bebo's suggestion.
- Need to synchronize and share research proposals so as not to duplicate research works. how to share? Maybe not through the website, or maybe can create a member only section of the website to share sensitive data such as research proposal?
Anjum suggested Saqib, Badrul and Johari put together a paper on user experiences with using the Internet in Malaysia as seen from Malaysian universities. In particular round trip time, losses, jitter, reliability, routing/peering, in particular anomalies, and the impact on VoIP, throughput etc. It would be good to engage someone from MYREN.
Potential projects
Future meeting - Les
Next meeting Wednesday July 16th 2014 9:00pm Pacific Standard Time, Thursday July 17th 2014 9:00am Pakistan time, Thursday July 17th, 2014 noon Malaysian time, Thursday July 17th, 2014 01:00am Rio Standard Time.