Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

1.There should be two main tables raw and data separately. Raw will have node id, ping sequence no, rrts , packet sent, packet received similar to what is currently in raw flat files. It will then have multiple tables to store either yearly or 6months data in each table. Similarly for analyzed data there would multiple hourly tables ,multiple monthly table and so on.Each table will be similar to analyzed data as in pinger a flat files. In such a case it will be different from PerfSonar where everything is in one table and if we place everything what is in raw table into analyzed table, there will replicate date(like  4 columns min/max/avg rtts and  seqno) however it will be similar to PerfSonar.

Ghulam/Zafar > Having one table has a drawback on 32-bit systems. The size limit for a table is 4 GB. This can be overcome but the tables will need sharding in any case. The reason is the size of data collected each day:

  • 2.1 MB flat file per site * 65 monitoring sites = ~137 MB per day.
    • -bash-4.1$ ls -lh /nfs/slac/g/net/pinger/pingerdata/hep/data/pinger.slac.stanford.edu/ping-2012-06-08.txt.gz
      -rw-rw-r-- 1 pinger iepm 2.1M Jun  9 01:04 /nfs/slac/g/net/pinger/pingerdata/hep/data/pinger.slac.stanford.edu/ping-2012-06-08.txt.gz
  • 137 per day * 30 days = 4.1 GB per month
  • This is a rough estimation for size of data table. Others such as host and meta-data tables were not yet considered.
  • A solution is to divide MySQL tables in terms of months, regions or weeks (to make it slightly more scalable in case monitoring sites in increase in the future).
  • To shard is also better for performance in future. As the data increases, queries will take longer. More tables mean things can be loaded in parallel.

2.The other query is timestamp. tiemstamp is unique key in pingerDB assigned by Ghulam. So 

...

What i believe could be done is we make packet size also unique key.

Ghulam/Zafar> Making "packet_size" a joint Primary Key should work. Latest schema here (just a minor change in meta data table).

What is left to be done

1.Getdata.pl is currently saving raw pings after every hour rather than after every half an hour. Modification is required to store data for every half an hour means every ping(total 48 for one pair per day)

...