Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. Introduction

Wiki Markup
PingER has a huge amount of data and, until the conclusion of this project, the easiest way to retrieve it is through Pingtable \[ref\]. Pingtable provides a friendly web interface to retrieve PingER raw data (millions of text files) and load it into a human readable HTML page. However, this is not a web standard and crossing PingER data to generate very specific information may not be possible or extremely difficult using the existing way to retrieve PingER data. This project attempts to provide a standard semantic web format to data retrievable in Pingtable.

...

Wiki Markup
Finally, the existing APIs to handle RDF provide well-known publishing formats such as JSON \[ref\], CSV \[ref\], and XML \[ref\]. We can conveniently get the results of a query and put it into visualization libraries to come up with very interesting visualizations of the data.

2. Ontology

a. Goal

To define the vocabulary used by PingER as well as its terms, concepts, taxonomy, and relations with each other.

...

Wiki Markup
After doing very complex _mashups_ \[ref\] using the ontology, we need to verify if the ontology needs any adjustments.

3. RDF Repository

a. Goal

To establish a good environment for the RDF repository.

...

Run very complex queries to test the performance.

4. Accessing the RDF Repository

a. Goal

Establish an easy way to access the RDF data.

...

Use CSS and JavaScript to make it prettier.

5. Loading the RDF repository

a. Goal

Generate RDF data using external datasets and PingER data.

...

iii. Instantiate countries

Wiki Markup
                                                         The program uses a HTTP GET to access Geonames API \[ref\] to retrieve a JSON with data of all countries. Then, for each entry of the JSON, the program instantiates a country in RDF and loads into the repository.

                                                         Frequency: Only once.
Time to load: Less than 3 minutes.

  1.                                               iv.            Generate Node Details JSON

                                                         A HTTP GET is done on PingER data to retrieve the nodes and their information (IP, Nickname, Site Name, Latitude, Longitude, etc.). A JSON is then generated (and written into a file) to be used by the program.

  • Frequency: Same as the generation of the %NODE_DETAILS (about each 4 hours).
  • Time to generate JSON: Less than 4 seconds.
  1.                                                 v.            Instantiate Towns and States

                                                         For each entry in the Node Details JSON, the program runs HTTP GETs on the Geonames API to try to find the nearest town (with at least 1000 habitants) and city (with at least 15000 habitants) based on the latitude and longitude of the site. The state where the town is located (if applicable) is also instantiated and linked to the town. The program finally inserts the instantiated data into the repository.

Wiki Markup
                                                         Note: The program also tries to link the existing found town with other known RDF datasets (DBPedia \[ref\] and Freebase \[ref\]).

  • Frequency: Probably the same as (iv).
  • Time to load: ~30 minutes.
  1.                                               vi.            Instantiate Schools

                                                         For each entry in the Node Details JSON, the program runs HTTP GETs on the DBPedia Sparql Endpoint to try to find a school whose name is similar to the site’s Full Name, on Node Details JSON.

                                                         If found, an instance of a school is inserted into the repository. Information about school includes endowment, number of students (undergrad and postgrad), faculty size, etc.)

                                                         Note: The program also tries to link the existing found school with Freebase.

                                                         Note2: DBPedia (just like Wikipedia) is not complete. Thus, for example, information that is available for a very famous and well-known university may not be available for another university that is not as famous.

  • Frequency: Probably the same as (iv).
  • Time to load: ~30 minutes.
  1.                                             vii.            Instantiate Nodes

                                                         After instantiating Towns, States, Schools, Continents, and Countries, we can finally instantiate the PingER nodes.

                                                         Again, for each entry of the Node Details JSON, the program instantiates a Node linking it with its respective Town, School (if it is one), Country, State (if applicable), and Continent.

  • Frequency: Probably the same as (iv).
  • Time to load: Less than 3 minutes.

c. To do

                                      Make better (more precise and informative) annotations of how long each step above takes to run. Maybe space utilized should also be included.

                                      Complex tests should be performed.

                                      Study other forms of retrieving information about schools.

                                      Study ways to optimize this process. Maybe parallelizing?

6. Loading the RDF Repository (PingER Measurements)

a. Goal

                                      Although this step remains in the same line of the previous one (4), it can be separated into a totally different context – the measurements context, just as a matter of better organization and understanding. The goal is to generate RDF data from PingER dataset and load it into the repository.

b. Progress

                                      After loading the repository with the data specified in the previous step (4), the program needs to load PingER measurement data.

Wiki Markup
                                      The first step in this process is to generate the Monitoring-Monitored \[ref\] JSON. The program executes a HTTP GET in [http://www-wanmon.slac.stanford.edu/cgi-wrap/dbprac.pl?monalias=all|http://www-wanmon.slac.stanford.edu/cgi-wrap/dbprac.pl?monalias=all] to retrieve all monitoring nodes. Then, for each monitoring node, another HTTP GET is executed in [http://www-wanmon.slac.stanford.edu/cgi-wrap/dbprac.pl?monalias= EDU.SLAC.STANFORD.N3&find=1|http://www-wanmon.slac.stanford.edu/cgi-wrap/dbprac.pl?monalias=MONITORING_NODE&find=1], where the value of _monalias_ is a given monitoring node, to retrieve the monitored nodes by that monitoring node. The Monitoring-Monitored JSON is then generated and written into a file.

  • Frequency of generating this JSON: Probably the same as (4.iv)
  • Time to generate: Less than 2 minutes.

Wiki Markup
                                      Having the JSON, the instantiating process happens according to this approach: For each monitoring node (entry of the JSON), for each metric, for each packet size, for each time parameter, the program executes a HTTP GET in the Pingtable \[ref\] Tab Separated Values (TSV) file specified by crossing all these parameters. A TSV URL example is of the form

                                      http://www-wanmon.slac.stanford.edu/cgi-wrap/pingtable.pl?format=tsv&file=average_rtt&by=by-node&size=100&tick=allyearly&from=EDU.SLAC.STANFORD.N3&to=WORLD&ex=none&only=all&dataset=hep&percentage=any

                                      Where the parameters:

  • from – is the monitoring node that pings other monitored nodes.
  • tick – represents the time aggregation. PingER has data from 1998 to 2013. At this moment, the project is considering only the following tick parameters:

-          allyearly

-          allmonthly

-          last365days

  • size – is the packet size. At this moment, the project is considering only packets of size 100 bytes.
  • file – is the network measurement metric. At this moment, the project is considering only the following metrics:

-          Mean Opinion Scores

-          Directivity

-          Average Round Trip Time

-          Conditional Loss Probability

-          Duplicate Packets

-          Inter Packet Delay Variation

-          Minimum Round Trip Delay

-          Packet Loss

-          TCP Throughput

-          Unreachability

-          Zero Packet Loss Frequency

                                      Note: This process is totally independent of the previous step (4). Hence it can be independently parallelized. However, if this step is executed before the previous, the measurement information regarding the nodes will point to broken links, which is not a big problem and does not prevent loading measurement data. The broken links will be automatically repaired when the nodes are successfully instantiated (section 4.vii).

c. Performance Evaluation

                                      Last 365 days: For each monitoring node, for each metric, it is taking around 1 hour to load the data into the repository. Hence, for 80 monitoring, for the 11 metrics, it is going to take approximately 880 hours (36 days). Thus, impracticable amount of time.

d. To do

                                      It is taking a huge amount of time to load the entire data. We must optimize this process.

                                      Measure time and space taken to load the repository.

                                      Complex tests.

7. Rich visualization of the data

a. Goal

                                      Provide smart and useful visualization of PingER data in RDF format.

b. Progress

                                      We studied to possibility of using 3 APIs:

...

Wiki Markup
                                                  i.            Google Maps JavaScript API v3 \[ref\]

...

Wiki Markup
                                                ii.            Google Geo Charts \[ref\]

...

The program uses a HTTP GET to access Geonames API \[ref\] to retrieve a JSON with data of all countries. Then, for each entry of the JSON, the program instantiates a country in RDF and loads into the repository.

  • Frequency: Only once.
  • Time to load: Less than 3 minutes.

iv. Generate Node Details JSON

A HTTP GET is done on PingER data to retrieve the nodes and their information (IP, Nickname, Site Name, Latitude, Longitude, etc.). A JSON is then generated (and written into a file) to be used by the program.

  • Frequency: Same as the generation of the %NODE_DETAILS (about each 4 hours).
  • Time to generate JSON: Less than 4 seconds.

v. Instantiate Towns and States

For each entry in the Node Details JSON, the program runs HTTP GETs on the Geonames API to try to find the nearest town (with at least 1000 habitants) and city (with at least 15000 habitants) based on the latitude and longitude of the site. The state where the town is located (if applicable) is also instantiated and linked to the town. The program finally inserts the instantiated data into the repository.

Wiki Markup
Note: The program also tries to link the existing found town with other known RDF datasets (DBPedia \[ref\] and Freebase \[ref\]).

  • Frequency: Probably the same as (iv).
  • Time to load: ~30 minutes.

vi. Instantiate Schools

For each entry in the Node Details JSON, the program runs HTTP GETs on the DBPedia Sparql Endpoint to try to find a school whose name is similar to the site’s Full Name, on Node Details JSON.

If found, an instance of a school is inserted into the repository. Information about school includes endowment, number of students (undergrad and postgrad), faculty size, etc.)

Note: The program also tries to link the existing found school with Freebase.

Note2: DBPedia (just like Wikipedia) is not complete. Thus, for example, information that is available for a very famous and well-known university may not be available for another university that is not as famous.

  • Frequency: Probably the same as (iv).
  • Time to load: ~30 minutes.

vii. Instantiate Nodes

After instantiating Towns, States, Schools, Continents, and Countries, we can finally instantiate the PingER nodes.

Again, for each entry of the Node Details JSON, the program instantiates a Node linking it with its respective Town, School (if it is one), Country, State (if applicable), and Continent.

  • Frequency: Probably the same as (iv).
  • Time to load: Less than 3 minutes.

c. To do

Make better (more precise and informative) annotations of how long each step above takes to run. Maybe space utilized should also be included.

Complex tests should be performed.

Study other forms of retrieving information about schools.

Study ways to optimize this process. Maybe parallelizing?

6. Loading the RDF Repository (PingER Measurements)

a. Goal

Although this step remains in the same line of the previous one (4), it can be separated into a totally different context – the measurements context, just as a matter of better organization and understanding. The goal is to generate RDF data from PingER dataset and load it into the repository.

b. Progress

After loading the repository with the data specified in the previous step (4), the program needs to load PingER measurement data.

Wiki Markup
The first step in this process is to generate the Monitoring-Monitored \[ref\] JSON. The program executes a HTTP GET in [http://www-wanmon.slac.stanford.edu/cgi-wrap/dbprac.pl?monalias=all|http://www-wanmon.slac.stanford.edu/cgi-wrap/dbprac.pl?monalias=all] to retrieve all monitoring nodes. Then, for each monitoring node, another HTTP GET is executed in [http://www-wanmon.slac.stanford.edu/cgi-wrap/dbprac.pl?monalias= EDU.SLAC.STANFORD.N3&find=1|http://www-wanmon.slac.stanford.edu/cgi-wrap/dbprac.pl?monalias=MONITORING_NODE&find=1], where the value of _monalias_ is a given monitoring node, to retrieve the monitored nodes by that monitoring node. The Monitoring-Monitored JSON is then generated and written into a file.

  • Frequency of generating this JSON: Probably the same as (4.iv)
  • Time to generate: Less than 2 minutes.

Wiki Markup
Having the JSON, the instantiating process happens according to this approach: For each monitoring node (entry of the JSON), for each metric, for each packet size, for each time parameter, the program executes a HTTP GET in the Pingtable \[ref\] Tab Separated Values (TSV) file specified by crossing all these parameters. A TSV URL example is of the form

http://www-wanmon.slac.stanford.edu/cgi-wrap/pingtable.pl?format=tsv&file=average_rtt&by=by-node&size=100&tick=allyearly&from=EDU.SLAC.STANFORD.N3&to=WORLD&ex=none&only=all&dataset=hep&percentage=any

Where the parameters:

  • from – is the monitoring node that pings other monitored nodes.
  • tick – represents the time aggregation. PingER has data from 1998 to 2013. At this moment, the project is considering only the following tick parameters:
    • allyearly
    • -  allmonthly
    • - last365days
  • size – is the packet size. At this moment, the project is considering only packets of size 100 bytes.
  • file – is the network measurement metric. At this moment, the project is considering only the following metrics:
    • Mean Opinion Scores
    • Directivity
    • Average Round Trip Time
    • Conditional Loss Probability
    • Duplicate Packets
    • Inter Packet Delay Variation
    • Minimum Round Trip Delay
    • Packet Loss
    • TCP Throughput
    • Unreachability
    • Zero Packet Loss Frequency 

Note: This process is totally independent of the previous step (4). Hence it can be independently parallelized. However, if this step is executed before the previous, the measurement information regarding the nodes will point to broken links, which is not a big problem and does not prevent loading measurement data. The broken links will be automatically repaired when the nodes are successfully instantiated (section 4.vii).

c. Performance Evaluation

Last 365 days: For each monitoring node, for each metric, it is taking around 1 hour to load the data into the repository. Hence, for 80 monitoring, for the 11 metrics, it is going to take approximately 880 hours (36 days). Thus, impracticable amount of time.

d. To do

It is taking a huge amount of time to load the entire data. We must optimize this process.

Measure time and space taken to load the repository.

Complex tests.

7. Rich visualization of the data

a. Goal

Provide smart and useful visualization of PingER data in RDF format.

b. Progress

We studied to possibility of using 3 APIs:

Wiki Markup
i. Google Maps JavaScript API v3 \[ref\]
ii. Google Geo Charts \[ref\]
iii. Google Public Data Explorer \[ref\]

                                      All of them seem to be very useful and can provide rich visualizations. (i) and (iii) seem to be the most powerful of them.

c. To do

                                      We need to think about good and useful mashups to use within this entire project and showing them on these visualization APIs. One type of mashup that is being investigated is retrieving data from DBPedia to cross PingER data with information related to universities (such as endowment, number of students, if the university is public or private, etc).

8. Documentation

a. Goal

                                      Document the whole entire project.

b. Progress

                                      This Project Progress is being built. It is being kept in both MS WORD format and HTML (to be used in Confluence).

                                      The Confluence page To do-Doing-Done is kept updated more frequently.

c. To do

                                      The Java Project should be documented. Javadocs are supposed to be generated for each class and method.

                                      An interactive JavaScript document is to be generated to graphically represent the ontology in order to support users to use the RDF data.

                                      Installation guide should be written. This should include how to configure the environment and everything needed to compile and run the project. Both the RDF Repository (with Tomcat settings) and the Sparql Endpoint projects should have an installation guide.

9. References

Wiki Markup
\[1\] Project MOMENT Ontologies. Retrieved from [https://svn.fp7-moment.eu/svn/moment/public/Ontology/|https://svn.fp7-moment.eu/svn/moment/public/Ontology/] on June 5, 2013.

...

Wiki Markup
\[3\] PingER Node Details. Retrieved from [https://confluence.slac.stanford.edu/display/IEPM/PingER+NODEDETAILS\||../../../../../../../../../display/IEPM/PingER+NODEDETAILS\||] on June 5, 2013.

Wiki Markup
\[4\] Food And Agriculture Organization Of The United Nations Ontology. Retrieved from [http://www.fao.org/countryprofiles/geoinfo/geopolitical/|http://www.fao.org/countryprofiles/geoinfo/geopolitical/] on June 5, 2013.

...