Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Wiki Markup
    Freebase \[ref\], an open large graph database.
  • Wiki Markup
    DBPedia \[6\], the ontology and the resources, will be used to provide more information about any geographic location or any other thing Wikipedia has that can be connected to the PingER. So it should be possible to make very specific queries.

...

c. To do

...

Wiki Markup
After doing very complex _mashups_ \[ref\] using the ontology, we need to verify if the ontology needs any adjustments.

3. RDF Repository

...

a. Goal

...

To establish a good environment for the RDF repository.

...

b. Progress

...

Wiki Markup
We analyzed the existing technologies to make it possible. There are well-known Triple Storages such as Jena, Sesame, and Virtuoso \[7\].  According to \[7\], Virtuoso struggles to load large datasets (>1M triples). Hence, we first decided to try Jena SDB (with MySQL) and then Sesame Native.

...

Therefore, we decided to migrate the project to use Open RDF Sesame Native as RDF Repository.

...

c. To do

...

Run very complex queries to test the performance.

4. Accessing the RDF Repository

...

a. Goal

...

Establish an easy way to access the RDF data.

...

b. Progress

...

Wiki Markup
We are using Java Tomcat Webserver to host a Sparql Endpoint \[ref\]. The HTML page has a text area in which the user will be able to write Sparql 1.1 queries \[ref\] to access the RDF repository. By default, the results are showed in HTML tables, in a JSP page. However, there will be a combo box to choose in which format the results will be shown. Available formats will be triples in CSV, JSON, and XML-RDF.

...

c. To do

...

Use CSS and JavaScript to make it prettier.

5. Loading the RDF repository

...

a. Goal

...

Generate RDF data using external datasets and PingER data.

...

b. Progress

...

The process of generating RDF data and populating the RDF repository is divided into subsections:

i.  Set Set up the prefixes

Wiki Markup
Following the RDF standard, all resources are uniquely identified by an URI \[ref\]. In order to write less and to provide a better organization of the statements, it is common to use namespaces (prefixes) instead of writing absolute URI. For example, it is common to use the namespace *rdfs* for the W3C rdf-schema ([http://www.w3.org/TR/rdf-schema/|http://www.w3.org/TR/rdf-schema/]).

...

  • Frequency: Probably the same as (iv).
  • Time to load: Less than 3 minutes.

...

c. To do

...

Make better (more precise and informative) annotations of how long each step above takes to run. Maybe space utilized should also be included.

...

Study ways to optimize this process. Maybe parallelizing?

6. Loading the RDF Repository (PingER Measurements)

...

a. Goal

...

Although this step remains in the same line of the previous one (4), it can be separated into a totally different context – the measurements context, just as a matter of better organization and understanding. The goal is to generate RDF data from PingER dataset and load it into the repository.

...

b. Progress

...

After loading the repository with the data specified in the previous step (4), the program needs to load PingER measurement data.

...

  • from – is the monitoring node that pings other monitored nodes.
  • tick – represents the time aggregation. PingER has data from 1998 to 2013. At this moment, the project is considering only the following tick parameters:
    • allyearly
    • -  allmonthlyallmonthly
    • - last365days
  • size – is the packet size. At this moment, the project is considering only packets of size 100 bytes.
  • file – is the network measurement metric. At this moment, the project is considering only the following metrics:
    • Mean Opinion Scores
    • Directivity
    • Average Round Trip Time
    • Conditional Loss Probability
    • Duplicate Packets
    • Inter Packet Delay Variation
    • Minimum Round Trip Delay
    • Packet Loss
    • TCP Throughput
    • Unreachability
    • Zero Packet Loss Frequency 

Note: This process is totally independent of the previous step (4). Hence it can be independently parallelized. However, if this step is executed before the previous, the measurement information regarding the nodes will point to broken links, which is not a big problem and does not prevent loading measurement data. The broken links will be automatically repaired when the nodes are successfully instantiated (section 4.vii).

...

c. Performance Evaluation

...

Last 365 days: For each monitoring node, for each metric, it is taking around 1 hour to load the data into the repository. Hence, for 80 monitoring, for the 11 metrics, it is going to take approximately 880 hours (36 days). Thus, impracticable amount of time.

...

d. To do

...

It is taking a huge amount of time to load the entire data. We must optimize this process.

Measure time and space taken to load the repository.

Complex tests.

7. Rich visualization of the data

...

a. Goal

...

Provide smart and useful visualization of PingER data in RDF format.

...

b. Progress

...

We studied to possibility of using 3 APIs:

...

All of them seem to be very useful and can provide rich visualizations. (i) and (iii) seem to be the most powerful of them.

...

c. To do

...

We need to think about good and useful mashups to use within this entire project and showing them on these visualization APIs. One type of mashup that is being investigated is retrieving data from DBPedia to cross PingER data with information related to universities (such as endowment, number of students, if the university is public or private, etc).

8. Documentation

...

a. Goal

...

Document the entire project.

...

b. Progress

...

This Project Progress is being built. It is being kept in both MS WORD format and HTML (to be used in Confluence).

The Confluence page To do-Doing-Done is kept updated more frequently.

...

c. To do

...

The Java Project should be documented. Javadocs are supposed to be generated for each class and method.

...

Installation guide should be written. This should include how to configure the environment and everything needed to compile and run the project. Both the RDF Repository (with Tomcat settings) and the Sparql Endpoint projects should have an installation guide.

9. References

Wiki Markup
\[1\] Project MOMENT Ontologies. Retrieved from [https://svn.fp7-moment.eu/svn/moment/public/Ontology/|https://svn.fp7-moment.eu/svn/moment/public/Ontology/] on June 5, 2013.

...