Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

1.There should be two main tables raw and data separately. Raw will have node id, ping sequence no, rrts , packet sent, packet received similar to what is currently in raw flat files. It will then have multiple tables to store either yearly or 6months data in each table. Similarly for analyzed data there would multiple hourly tables ,multiple monthly table and so on.Each table will be similar to analyzed data as in pinger a flat files. In such a case it will be different from PerfSonar where everything is in one table and if we place everything what is in raw table into analyzed table, there will replicate date(like  4 columns min/max/avg rtts and  seqno) however it will be similar to PerfSonar.

Ghulam/Zafar/Sadia>

Having one table has a drawback on 32-bit systems. The size limit for a table is 4 GB. This can be overcome but it can create performance bottlenecks (for example during read operations for loading data on PingER webpage). The size of data will grow everyday. Some approximations to support this claim:

  • 2.1 MB flat file per site * 65 monitoring sites = ~137 MB per day.
    • -bash-4.1$ ls -lh /nfs/slac/g/net/pinger/pingerdata/hep/data/pinger.slac.stanford.edu/ping-2012-06-08.txt.gz
    • -rw-rw-r-- 1 pinger iepm 2.1M Jun  9 01:04 /nfs/slac/g/net/pinger/pingerdata/hep/data/pinger.slac.stanford.edu/ping-2012-06-08.txt.gz
  • 137 per day * 30 days = 4.1 GB per month
  • This is a rough estimation for size of data table. Others such as host and meta-data tables were not yet considered.
  • Possible solutions include dividing MySQL tables in terms of months, regions or weeks (to make it more scalable in case monitoring sites increase in future).
  • To shard is also better for performance in future and ensures sustainability by design. 
    • As the data increases, queries will take longer (especially for read operation for loading data onto PingER webpage).
    • Sharded tables mean data can be loaded in parallel using Perl threads.

2.The other query is timestamp. tiemstamp is unique key in pingerDB assigned by Ghulam. So 

...