Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Maria Luiza has requested Raphaela and Christiane to give an update on what they've been working on in the last weeks, and maybe talk with Thiago on how they've been using their local cloud of 4 nodes with Cloudera and a data cube version of the ontology.

Cristiane has studied the PinGER data and how to cast it into Linked Open Data form. The size of the PingER hourly data for 1998-Sep 2014 archived via FTP in text form amounts to ~ 5.12GB and this corresponds to 15.66*10^9 (billion) triples. Then using 5  triples for each measurement and using Turtle without compression gives us 685 Gbytes or an inflation factor of ~ 200. 

When Christiane made the estimation of PingER triples, she wrote two documents that explain the process but they were in Portuguese. She has written the new versions in English.

Christiane's report is at: Size Inflation of PingER Data for use in PingER LOD

UUM

Adib reports 6/1/2015

PingER UUM problem has solved half way.

reports (7/1/2015): "I am trying to automatize the triplification of PingER data on Kettle. For now, part of the transformation is made on Kettle and another is made by a Java code. Although this solution works for a data sample, is important to have the entire process on Kettle because it facilitates to understand, modify and control the triplification process."

UUM

Adib reports 6/1/2015

PingER UUM problem has solved half way.

Fatima has installed hadoop on all three machines. One will stand Fatima has installed hadoop on all three machines. One will stand as the host machine from which the remote machines will be controlled. She is currently trying out some MapReduce examples and Hive installation.  

...

Maria and Renan are advancing in some approaches to deal with PingER data, making it easier to be analyzed and integrated. In particular they have been busy studying and evaluating alternatives, analyzing results from the latest benchmarks on NoSQL (including RDF and graph based storages) database management, distributed processing and mediated  solutions over relational databases, and also other experiments with multidimensional analyses on Linked Data.  The new students involved are now understanding better the scenario and they have been interacting with Renan regularly. 

Cristiane has studied the PinGER data and how to cast it into Linked Open Data form. The size of the PingER hourly data for 1998-Sep 2014 archived via FTP in text form amounts to ~ 5.12GB and this corresponds to 15.66*10^9 (billion) triples. Then using 5  triples for each measurement and using Turtle without compression gives us 685 Gbytes or an inflation factor of ~ 200. 

When Christiane made the estimation of PingER triples, she wrote two documents that explain the process but they were in Portuguese. She has written the new versions in English.

Christiane's report is at: Size Inflation of PingER Data for use in PingER LOD

UM

Moved here 3/4/2015:

Ibrahim has setup distributed hadoop clusters. He has 2TB of disk space. Les has provided information on getting a subset of PingER data by anonymous ftp via ftp://ftp.slac.stanford.edu/users/cottrell.  It was put there last September. Information on how the data was put together is at https://confluence.slac.stanford.edu/display/IEPM/Archiving+PingER+data+by+tar+for+retrieval+by+anonymous+ftp. There is information on formatting etc at http://www-iepm.slac.stanford.edu/pinger/tools/retrievedata.html and some on the dataflows at https://confluence.slac.stanford.edu/display/IEPM/PingER+data+flow+at+SLAC. Renan at UFRJ has successfully used this data, he has also characterized the data in terms of bytes/metric per year etc.

...