Time & date
Wednesday September 2nd 2015 9:00pm Pacific Standard Time, Thursday Sept 3rd 2015 9:00am Pakistan time, Thursday Sept 3rd 2015 noon Malaysian time, Thursday Thursday September 3rd , 2015 02:00am Rio Standard Time.
Coordinates of team members:
See: http://pinger.unimas.my/pinger/contact.php
Attendees
Invitees:
Hassaan Khaliq+, Kashif, Raja, Samad Riaz (SEECS); Johari, Nara, Adnan Khan+ (UNIMAS); Abdullah, Badrul, Anjum+, Ridzuan, Ibrahim (UM); Hanan, Saqib+ (UTM); Adib-, Fatima- (UUM); Fizi Jalil (MYREN); Thiago-, Les+, Bebo (SLAC)
+ Confirmed attendance
- Responded but Unable to attend:
? Individual emails sent
Actual attendees:
Hassaan, Johari, Adnan, Anjum, Saqib (we could see Saqibs icon and at first we could hear, but after a few minutes we were unable to communicate by message or voice), Bebo, Les
Administration
- Membership of pinger-my in https://groups.google.com
We are working with Dr Zaidi the new head of SEECS to investigate resources need to continue support of PingER in Pakistan. We communicated first week in August. No word since. Sent a reminder 8/1/2015. We had a rump meeting of Anjum, Hassaan and Les to discuss the way forward.
- NETAPPS2015: Adib reports "We have received some good number of submissions and the reviewing process have started already. we will have a committee meeting tomorrow at the same time of PINGER group meeting. I will update you afterward."
Four papers were submitted related to PingER August 25th
Anjum submitted a paper to NETAPPS2015 "Adaptive Geolocation of Internet Hosts"
Les, Thiago, Johari, Bebo and Topher White submitted a paper on "Worldwide Internet Performance Measurements Using Lightweight Measurement Platforms"
Saqib submitted a paper "PingER Malaysia-Internet Performance Measuring Project: A Case Study". This paper needs reformatting, Saqib is working on it. It also needs reviewing. Les will look at.
Thiago and the Brazilian team submitted a paper on "Applying Data Warehousing and Big Data Techniques to Analyze Internet Performance'. Adib has kindly come up with a way in which the authors are unlikely to be able to attend, so somebody else will present.
Geolocation Anjum,
Anjum believes the TULIP Geolocation application can be improved significantly. At least there are few ideas that we can try. For this, either a group of undergraduate students or an active masters student is required. The resultant work can easily be the thesis of masters level. Who is interested?
Saqib is going to see if he has a student interested. He will contact Anjum to learn more. Update Saqib, Anjum not heard
Johari will contact Anjum to learn more of the requirements. Update Johari/Adnan
See http://www.slac.stanford.edu/comp/net/tulip/. Basically TULIP uses pings to a target from landmarks at known locations and converts the minimum RTTs to estimate the distances. Then uses the distances with mulitlateration to estimate the location of the target
To improve TULIP one needs the right selection of landmarks, i.e. good (working landmarks) at the right locations (not too far from the target), straddling the target, and with a a reasonable estimate of the indirectness (directivity or alpha) of the path from the landmark to the target (so we can reasonably accurately estimate the distance). One also needs a reasonable density of landmarks (e.g. number of targets/100,000sq km)
The landmarks come from PingER and perfSONAR sites. We have a reasonable density in the US, Pakistan and Europe. Currently Anjum is getting better than 20km accuracy for Pakistani targets
As the number of landmarks goes up so does the accuracy, but so does the time to make the measurements (pings).
One needs to find the optimal density
Anjum proposes to speed up the measurements using a cluster for parallelization and also proposes to improve the adaptation of alpha based region. He regards the adaptive geolocation and parallelization as MS projects
He is also interested in geolocation in small proximity (e.g.indoors), e.g. using cell tower signals. This is a new area of research. It is possible that the port of PingER to an Android could be rleated to this. This is a PhD project
Android - Bebo
Bebo said Topher would be interested in getting a student to port PingER to an Android. Les has put the PingER MA server on Github. Les put out a version on Github at https://github.com/iepm/. He put them out there in 2013 as part of coordinating with Google( which did not get very far). It has been dormant since it was put out there. He is not a github expert or user so is unclear how complete it is or who can access it.
Unrest in Malaysia - Anjum, Johari, Les
- Anjum wondered if PingER could detect anomalies related to the recent unrest (protests against the government and DOS attacks against universities (such as UM or UNIMAS) in Malaysia. there were several attacks from the Anonymous group between August 28-31st. Following the meeting Les put out a web page at: Malaysian unrest Aug-Sep 2015
Intern Anjum
- We discussed the potential for an undergraduate from UM spending 6 months at SLAC as an intern. SLAC would be very interested, however SLAC cannot sponsor a J1 visa for anyone who does not have the equivalent of a US bachelors degree. Anjum will pass this on and pursue from the UM end.
UFRJ
UUM
Unable to gather data from pinger.uum.edu.my since July 12th, 2015. Adib is still waiting computer center staff to check the firewall at their side. There is an idea to relocate Pinger.UUM to computer center to avoid this hassle.
Fatima is currently occupied with putting together her report. Regarding her work, she has successfully implemented some MapReduce codes on the data and presently fixing some errors in the code.
UM
Ibrahim had downloaded PingER in Zip files format, however, when he stored them in the Hadoop distributed file system (HDFS) and try to process them, the file got corrupted, so he had to extract the file, but one file zip has more than 10000 zip files with small size. So he is trying to create a mapreduce job which can accept zip format, that will save lot of his time. Currently mapreduce can only read from files like .txt, and any doc file format or database. He will have meeting with Dr. Anjum on 11 of june asking for advice and seeking of how we can work on this together. No update 9/2/2015. Anjum will contact Ibrahim.
Renan reports: "I had a similar experience. HDFS works better with bigger files rather than many small files. What I did was to create a Map-Reduce job to reduce all those thousands of small files into only 17 big files, each of them containing all data for a given year [1998-2014].I didn't use Hadoop MapReduce for this, though. I used a different dataflow distributed engine that also implements map and reduce operators. I am working on providing Thiago these 17 big files. Once he gets the data, he can share them with you and explain how the data on each file are stored".
UNIMAS
UTM
Johari will contact Hanan to request someone to support PingER at UTM, now Saqib has left. (6/3/2015/. Hanan not replying to Johari, Johari will try another route to get a replacement for Saqib at UTM 8/12/2015). 9/2/2015 still unable to contact Hanan, Johari will try and contact the Dean.
Saqib points out a number of Malaysian routes are IPv6 which could have problems for traceroute. Saqib is checking
MYREN
No update 8/12/2015, no Update 9/2/2015
perfsonar-unimas.myren.net.my is down since July 1st. Johari (9/2/2015) believes it may be due to a recent upgrade of the network SAN.
NUST
Working with Dr Zaidi to get support. Await update from Dr Zaidi (8/12/2015, sent reminder 9/1/2015). Following this meeting, Anjum, Hassaan and Les met to discuss the way forward.
PingER at SLAC
Thiago completed setting up the PingER data SQL Impala warehouse running on a Nebula/Cloudera cluster using the Hadoop File System (HDFS)the PingER warehouse database at SLAC. Unfortunately it is not currently accessible from outside SLAC. There have been several attempts, but no success yet, we need to engage the subject matter experts. Thiago is now a SLAC associate so he still has an account at SLAC. There is a cyber security alert on the version of java installed with Cloudera. Thiago is in New York and does not currently have WiFi.
- The Raspberry Pi at SLAC is still running smoothly since June 11th.
Working on the following hosts to be able to gather data
Host | State | last seen | Status |
---|---|---|---|
web.hepgrid.uerj.edu | emails 12/2/2014, 12/8/2014, 2/26/2015, 4/30/2015, 6/1/2015 | Oct 23, 2014 | traceroute.pl works but no response from ping_data.pl |
pinger.stanford.edu | email 3/14/2015 | Feb 18, 2015 | Works |
pinger.unesp.br | email 11/28/2014, 5/22/2015, 6/1/2015. | Nov 3, 2014 | Host is pingable from SLAC. |
Bebo arranged a meeting with the Colombia RENATA NREN folks and the minister of IT to discuss the use of PingER in Colombia. There is a web page at: Colombia. Les has sent an email asking them to install pinger2.pl at at least one site in Columbia. Sent a reminder email 2/27/2015. Bebo will send a gentle reminder to the RENATA people of Columbia to see whether they continue to be interested and need a meeting. They still seem interested. 9/2/2015, still no consistent answers from Colombia, we will give up.
Next Meeting
Next meeting: Wednesday Oct 7th 2015 9:00pm Pacific Standard Time, Thursday Oct 8th 2015 9:00am Pakistan time, Thursday Oct 8th 2015 noon Malaysian time, Thursday Oct 8th 2015 02:00am Rio Standard Time.
Old Items
NUST/SEECS Pakistani PingER nodes status
Pink Background indicates host was bad last month, strike through says it is fixed, yellow is an new bad host.
Current status of Pakistani Hosts 7/1/2015
|
Is it time to start paring down the list of PingER monitor hosts in Pakistan, starting with those that have been down for a while and despite your efforts they are not cooperating. One might also look at the coverage by region in Pakistan and try and keep good coverage for all regions.
Traceroute at UTM 5/9/2015
The traceroute problem regarding maximum reachable hops ( i.e. 11 hopes ) may be since the Unix/Linux/OSX traceroute uses UDP to send the requests. The first request is sent to a particular port (33434), with a ttl to tell it how many hops to go to. The ttl starts at 1 is incremented as it tries the next hop, also the port is incremented (up to 33465). It looks like the first few UDP ports are enabled and then they are blocked. The Windows traceroute uses ICMP to send the probes so does not see the problem.
Linked Open Data
Cristiane reports (7/1/2015): "I am trying to automatize the triplification of PingER data on Kettle. For now, part of the transformation is made on Kettle and another is made by a Java code. Although this solution works for a data sample, is important to have the entire process on Kettle because it facilitates to understand, modify and control the triplification process."
Feb 2015
The plan is still the one seen before (see project proposal), experimenting those alternatives. Right now, they managed to triplify the data according to a new ontology that takes advantage of a combination of a current standard for multidimensional data (called data cube vocabulary) and a revised version of Renan's Moment ontology adaptation. With this we expect to have a better data organization than the previous solution.
They are now preparing a test plan (like a small benchmark) to be used on all alternatives so that we can compare the results accordingly.
Aug 2014
Renan finished the new pingerlod web site. The new thing is that it should be much easier now to modify the info texts. What Renan did was to put the texts into a separate file. The new version has been loaded on the server and some text added to describe how to use the map. However there is a bug that prevents it from executing the map. Renan reports that the bugs should be easy to fix. He has talked to his professor who suggested trying RDF Owlink, it should have faster responses to queries. Renan will research this. It will probably mean reloading the PingER data so is a lot of work, hopefully this will improve performance. Before the rebuild he will make the fixes and provide a new WAR for us to load on pingerlod.slac.stanford.edu. He is also working on documentation (he has finished the ontology and has a nice interactive tool for visualizing it, since the ontology is the core of the data model of our semantic solution, this will be very helpful for anyone who uses our system, both a developer of the system and a possible user) and his thesis. Bebo pointed out that to get publicity and for people to know about the data, we will need to add pingerlod to lod.org.
Things he will soon do regarding documentation:
- A task/process flow writing all java classes involved on all those batch jobs;
- A Javadoc <http://www.oracle.com/technetwork/java/javase/documentation/index-jsp-135444.html> which will explain all classes and how they are used.
For the Linked Open Data / RDF which is in pre-alpha days, you can go to http://pingerlod.slac.stanford.edu. As can be seen this page is not ready for prime time. However the demos work as long as one carefully elects what to look at:
- Click on Visualizations, there are two choices:
- Multiple Network Metrics: Click on the image: gives a form, choose from Node pinger.slac.stanford.edu pinging to www.ihep.ac.cn, time parameters yearly, 2006 2012, metrics throughput, Average RTT Packet loss and display format Plot graph, then click on submit. In a few seconds time series graph should come up. Mouse over to see details of values at each x value (year).
- A mashup of network metrics x university metrics Click on image: gives another form, pinging from pinger.slac.stanford.edu, School metric number of students, time metric years 2006 2012, display format plot graph, click on submit. Longer wait, after about 35 seconds a google map should show up. Click on "Click for help." Area of dots = number of students, darkness of dots = throughput (lighter is better), inscribing circle color gives university type (public, private etc.) Click on circle for information on university etc.
- Renan will be working on providing documentation on the programs, in particular the install guide for the repository and web site etc. This will assist the person who takes this over.
Renan is using OWLIM as RDF Repository. He is using an evaluation version right now. Renan looked into the price for OWLIM (that excellent RDF Database Management System he told us about). It would cost 1200EUR minimum (~ 1620 USD, according to Google's rate for today) for a one time eternal license. It seems too expensive. No wonder it is so good. Anyhow, he heard about a different free alternative. Just not sure how good it would be for our PingER data. He will try it out and evaluate. He will also get a new evaluation of the free OWLIM lite.
He has also made some modifications on the ontology of the project (under supervision of his professor in Rio) hence he will have to modify the code to load the data accordingly.
Maria and Renan are advancing in some approaches to deal with PingER data, making it easier to be analyzed and integrated. In particular they have been busy studying and evaluating alternatives, analyzing results from the latest benchmarks on NoSQL (including RDF and graph based storages) database management, distributed processing and mediated solutions over relational databases, and also other experiments with multidimensional analyses on Linked Data. The new students involved are now understanding better the scenario and they have been interacting with Renan regularly.
Cristiane has studied the PinGER data and how to cast it into Linked Open Data form. The size of the PingER hourly data for 1998-Sep 2014 archived via FTP in text form amounts to ~ 5.12GB and this corresponds to 15.66*10^9 (billion) triples. Then using 5 triples for each measurement and using Turtle without compression gives us 685 Gbytes or an inflation factor of ~ 200.
When Christiane made the estimation of PingER triples, she wrote two documents that explain the process but they were in Portuguese. She has written the new versions in English.
- Counting PingER Measurements: https://www.dropbox.com/s/35itp7v6yasy3rb/Counting%20PingER%20Measurements%20_EnglishVersion.docx?dl=0
- PingER LOD Triples: https://www.dropbox.com/s/4oj5jqupwbujja5/PingERLOD%20Triples%20_EnglishVersion.docx?dl=0
Christiane's report is at: Size Inflation of PingER Data for use in PingER LOD
UM
Moved here 3/4/2015:
Ibrahim has setup distributed hadoop clusters. He has 2TB of disk space. Les has provided information on getting a subset of PingER data by anonymous ftp via ftp://ftp.slac.stanford.edu/users/cottrell. It was put there last September. Information on how the data was put together is at https://confluence.slac.stanford.edu/display/IEPM/Archiving+PingER+data+by+tar+for+retrieval+by+anonymous+ftp. There is information on formatting etc at http://www-iepm.slac.stanford.edu/pinger/tools/retrievedata.html and some on the dataflows at https://confluence.slac.stanford.edu/display/IEPM/PingER+data+flow+at+SLAC. Renan at UFRJ has successfully used this data, he has also characterized the data in terms of bytes/metric per year etc.
Ibrahim has started downloading all zip files in the local machines. 6 weeks ago he downloaded 2 GB of Weather data to test his nodes cluster, he wrote a simple Java program (Map, Reduce) to find the Average and it was working fine.
Anjum reported that UM had experienced a TCP syn DOS attack prior to Mar 12th (when an IDS was put in place). It occurred mainly for several days before between the hours on noon- 2pm and 7-7 in the evening (Malaysia time). He suggested looking to see if PingER could spit the effect. Ibrahim, Les and Anjum will look at. Les analyzed the data and sent it to Anjum
NUST
The following is from Samad 2/24/2015.
- buitms.seecs.edu.pk #We have to disable gathering data from this host because the person still don't want to continue with us as i have tried once again to convince him but the answer is same. Les has disabled from SLAC.
- nukhimain.seecs.edu.pk # We were unable to gather data since 20th November, 2014 and now the Node is working fine and collecting data as well.
- pinger.uettaxila.edu.pk #The node is working fine from last two weeks.
- sau.seecs.edu.pk. #This Node is working fine now.
- pingerjms.pern.edu.pk #This node is working now.
- pinger.uet.edu.pk # this was also not working from so many days. and now its working fine and collecting data as well.
- pinger.isra.edu.pk # This node is also working fine now.
- pingerlhr-pu.pern.edu.pk # This is also working fine now.
- pinger.kohat.edu.pk # Collecting data now.
The IP of "pingerqta.pern.edu.pk" has been changed, Les has updated the databas at SLAC with the following
Old IP: 121.52.157.157
New IP: 121.52.157.148
Follow up from workshop
- Hossein Javedani of UTM is interested in anomalous event detection with PingER data. Information on this is available at https://confluence.slac.stanford.edu/display/IEPM/Event+Detection. We have sent him a couple of papers and how to access the PingER data. Hossein and Badrul have been put in contact. Is there an update Badrul?
The Next step in funding is to go for bigger research funding, such as LRGS or eScience. Such proposals must lead to publications in high quality journals. They will need an infrastructure such as the one we are building. We can use the upcoming workshop (1 specific session) to brainstorm and come up with such proposal. We need to do some groundwork before that as well. Johari will take the lead in putting together 1/2 page descriptions of the potential research projects.
- Need to identify a few key areas of research related to PingER Malaysia Initiative and this can be shared/publicized through the website. These might include using the infrastructure and data for: anomaly detection; correlation of performance across multiple routes; and for GeoLocation. Future projects as Les listed in Confluence herehttps://confluence.slac.stanford.edu/display/IEPM/Future+Projects can also be a good start and also Bebo's suggestion.
- Need to synchronize and share research proposals so as not to duplicate research works. how to share? Maybe not through the website, or maybe can create a member only section of the website to share sensitive data such as research proposal?
Anjum suggested Saqib, Badrul and Johari put together a paper on user experiences with using the Internet in Malaysia as seen from Malaysian universities. In particular round trip time, losses, jitter, reliability, routing/peering, in particular anomalies, and the impact on VoIP, throughput etc. It would be good to engage someone from MYREN.
Ibrahim
Ibrahim Abaker is planning to work on a topic initially entitled " leveraging pingER big data with a modified pingtable for event-correlation and clustering". Ibrahim has a proposal, see https://confluence.slac.stanford.edu/download/attachments/17162/leveraging+pingER+big+data+with+a+modified+pingtable+for+event-correlation+and+clustering.docx. Ibrahim reports 7/15/2014 "I have spent the last few months trying to understand the concept of big data storage and its retrieval as well as the traditional approach of storing RDF data. I have integrated a single hadoop cluster in our cloud. but for this project we need multiple clusters, which I have already discussed with Dr. Badrul and he will provide me with big storage for the experiment." No Update 8/20/2014.
"I have come up with initial proposed solution model. This model consists of several parts. The upper parts of the Figure below shows the data source, in which PingER data will be convert into RDF format. Then the data pre-processor will take care of converting RDF/XML into N-triples serialization formats using N-triples convertor module. This N-triple file of an RDF graph will be as an input and stores the triples in storage as a key value pair using MapReduce jobs"
Potential projects