Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

In order to implement check-pointing, I save the state of the scoreboards to disk, while processing the scoreboards for each flow-file. When the program starts for the first time, it checks for checkpoints on disk before starting the scoreboading of every direction and if a checkpoint is present it unmarshalls the data from the checkpoint and starts updating the scoreboards from that point on. 

Saving the Checkpoints 

Wiki MarkupThe first issue was how to save the state to disk. I decided to go with marshalling and saving the part of the [nested hash structure |http://users.telenet.be/jurgen.kobierczynski/jkflow/mylist.pdf] containing the current scoreboard to disk. Specifically, mylist \{direction\}\{$direction\} \{scoreboard\} \{aggregate\} \{report\} points to an array of nested hashes , one for each type of report defined  (as shown highlighted in the thumbnail below).  ( let $ref = mylist \{direction\}\{$direction\} \{scoreboard\} \{aggregate\} \{report\}\[$i\] for the $ith report type ).  Then $ref->\{aggdata\}\{tuplevalues\} contains a nested hash containing the current counters for the flows/packets/bytes both inward and outward for a particular 'direction'(Such as ATLAS BNL to CERN' ) specified by the $direction variable in structure  containing the current scoreboard to disk. Specifically, mylist {direction}{$direction} {scoreboard} {aggregate} {report} points to an array of nested hashes , one for each type of report defined  (as shown highlighted in the thumbnail below).  ( let $ref = mylist {direction}{$direction} {scoreboard} {aggregate} {report}[$i] for the $ith report type ).  Then $ref->{aggdata}{tuplevalues} contains a nested hash containing the current counters for the flows/packets/bytes both inward and outward for a particular 'direction'(Such as ATLAS BNL to CERN' ) specified by the $direction variable in mylist.

I create a temporary data strcuture ,each time the scoreboarding function is called in JKFlow.pm for a particular flowflile and then for a particular direction , and store the values of $ref->{aggdata}{tuplevalues} as well as certain other important variables representing the state of scoreboard including the $ref-> {count}, $ref->{startperiod}and $ref->{counter} in that data strcuture. It is then marshalled and written to disk using a module written in C called Storable. The technique i explained on this page

...

 The scoreboarding functionality has been refactorised to take care of this problem. The following examples will illustrate what has been implemented:

  1. If the first flow-file is timestamped 12:12:00 21st March 2007 and one of the report types specified is hourly reports, then the first report will come out at 13:00:00. After that all reports will keep coming out at 14:00:00, 15:00:00 and so on for the regular clock hours.
  2. Similarly for the above example if 15 minute reports are also specified then the first report will come out at 12:15:00 i.e. after only two flow files. However after that reports will keep coming out after 15 minutes at 12:30:00 and 12:45:00 etc .
  3. Again taking the above example, if three-hourly reports are also specified then the first report will come out at 15:00:00 and then at 18:00:00
  4. Similarly if daily reports are specified then the first report will come out for as soon as the final flow-file for 21st March 2007 has been processed. ( Note that if the earlier system were in place, then in this case the first daily report would have come out at 12:12:00 22nd March 2007)
  5. Also if 15 day reports are specified then the first report will come out on the 15th day of the month regardless of whether the flow-files for all fifteen days are available.

I will ensure in the generation of the reports that the start and end time of the flow-files for which the report is generated is clearly specified so that one knows the duration in time that the report represents.

The algorithm has been implemented such that reports of any duration ranging from a minute to a year are automatically aligned with regular clock hours. Below is a snippet of the code that implements the alignment. The basic trick used is that I store the startperiod of the first report to represent the point in time which should have been the starting point if the report had been aligned with regular clock hours. Once the first report is generated at the proper alignment, all others are generated correctly

Code Block

if ( $report->{count} * $samplerate < 60 ){ # i.e. if size of report interval is less than an hour (e.g. 15 minute reports)

                        my $aligned_timestamp = Time::Local::timelocal (0,0,$hour, $mday, $mon, $year,$wday, $yday, $isdst );
                        my $current_timestamp = $self->{filetime};
                        my $mod = ($current_timestamp - $aligned_timestamp) % ($report->{count} * $samplerate * 60);
                        $report->{aggdata}{startperiod} = $current_timestamp - $mod;
                }
                elsif( $report->{count} * $samplerate < 1440 ){ # i.e. if size of report interval is less than a day (e.g. six-hourly reports)

                        my $aligned_timestamp  = Time::Local::timelocal (0,0,0, $mday, $mon, $year,$wday, $yday, $isdst );
                        my $current_timestamp = $self->{filetime};
                        my $mod = ($current_timestamp - $aligned_timestamp) % ($report->{count} * $samplerate * 60) ;
                        $report->{aggdata}{startperiod} = $current_timestamp - $mod;
                }
                elsif( $report->{count} * $samplerate < 40320 ) { # i.e. if the size of the report interval is less than a month (e.g. 15 day report)

                        my $aligned_timestamp  = Time::Local::timelocal (0,0,0,1, $mon, $year,0, 0, $isdst );
                        my $current_timestamp = $self->{filetime};
                        my $mod = ($current_timestamp - $aligned_timestamp) % ($report->{count} * $samplerate * 60) ;
                        $report->{aggdata}{startperiod} = $current_timestamp - $mod;
                }