Background

Over the past 15 years pingER project have generated a tremendous amount of data stored in flat CSV files in a form of Linked open data, which have been used to anticipate the performance of internet links between laboratories, universities and research institutions (Cottrell, 2012). Hence, to access such data, Pingtable application (Cottrell et al., 2013) is used to retrieve row data stored in archive and load it into normal HTML page. However, this method is not suitable to store and retrieve data effectively, due to lack of adequate access to PingER Linked Open data.

Furthermore, the existing solutions consist of relational database architecture for PingER project developed by  Nabi et al. (2011)  to overcome issues of using flats files for searching, managing and preforming data analysis. The system convert all archive data architecture from flat files to relational database driven architecture to improve the scalability of PingER. It is noticed that while the proposed solution work for smaller amount of datasets, rather they do not provide the same efficiently and scalability as they would for large amount of datasets.

Apart from that, Souza (2013), has proposed a semantic web format based on RDF (Resource Description Framework) and SPARQL  approach for Ping data storage and retrievable. RDF is the framework for storing and representing data and SPARQL is a W3C recommendation query language to retrieve data from an RDF store. While this is widely approach for publishing data according community and W3C recommendation; however, the classical “triple-store” approaches are not good enough because most of the queries require a high number of self-joins on the triples table (Virgilio et al., 2011).

Moreover, with the growth of pingER data collected from different sources, there are still rooms of significant improvement of data management and retrieval to drive complex analysis and visualize essential information in vast amounts of pingER data.

Research Objectives

The aim of this research work is to propose a novel data storage model for PingER project using key-value store and MapReduce strategy for SPARQL. This purpose is achieved through the following research objectives:

        i.            To investigate and evaluate the performance of key-value stores for storing and retrieving Linked Open Data   

      ii.            To develop a model with a generic distributed storage system to store massive network monitoring data on top of Hadoop distributed file system (HDFS).

    iii.            To demonstrate and measure the accuracy of the proposed model by applying the proposed model in the existing ping data that are made available in the web.

    iv.            To highlight the advantages of the proposed model by comparing it with existing data storage model using some evaluation criteria.

Scope of Research

This study is primarily concerned with storing of data collected for monitoring internet performance. The propose model of data storage is based on distributed storage system which utilizes Hadoop platform.

Significance of the Study

Actively measuring the worldwide Internet provide valuable information to understand the network state and be able to aware of the network performance, which helps network engineers in making decisions about routing  and sources allocation and scientists alike. Therefore, with repaid development of data storage requirements from end users are growing, demanding more capacity, more reliability and the capability to access such information.

 

  • No labels