Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Having one table has a drawback on 32-bit systems. The size limit for a table is 4 GB. This can be overcome but it can create performance bottlenecks (for example during read operations for loading data on PingER webpage). The size of data will grow everyday. Some approximations to support this claim:

  • 2.1 MB flat file per site * 65 monitoring sites = ~137 MB per day.**
    • -bash-4.1$ ls -lh /nfs/slac/g/net/pinger/pingerdata/hep/data/pinger.slac.stanford.edu/ping-2012-06-08.txt.gz
    • -rw-rw-r-- 1 pinger iepm 2.1M Jun  9 01:04 /nfs/slac/g/net/pinger/pingerdata/hep/data/pinger.slac.stanford.edu/ping-2012-06-08.txt.gz
  • 137 per day * 30 days = 4.1 GB per month
  • This is a rough estimation for size of data table. Others such as host and meta-data tables were not yet considered.
  • Possible solutions include dividing MySQL tables in terms of months, regions or weeks (to make it more scalable in case monitoring sites increase in future).
  • To shard is also better for performance in future and ensures sustainability by design. 
    • As the data increases, queries will take longer (especially for read operation for loading data onto PingER webpage).
    • Sharded tables mean data can be loaded in parallel using Perl threads.

...

2.The other query is timestamp. tiemstamp is unique key in pingerDB assigned by Ghulam. So 

...

To resolve the issue of "how to" save raw data, we discussed and agreed in our previous meeting that we shall continue storing the raw data in flat files. This is in addition to storing analyzed data in MySQL database. This has two advantages:

  • We won't need to modify the data collection part, thus making the current problem simpler.
  • We will have raw data in it's original form in case anything breaks down. This ensures backup. 

2. Once if schema is decided, data can be transferred from old flat files into table. But first need to decide what would be in raw data table and how to store previous 10 years raw and analyzed data.

...