Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The first completed implementation in the new refactorised scoreboarding code is a scheme for checkpointing and crash-recovery of the scoreboards and alignment of the scoreboarding with regular clock-hours/day-periods. Both of these improvements are discussed below:

CheckPointing

In order to implement check-pointing, I save the state of the scoreboards to disk, while processing the scoreboards for each flow-file. When the program starts for the first time, it checks for checkpoints on disk before starting the scoreboading of every direction and if a checkpoint is present it unmarshalls the data from the checkpoint and starts updating the scoreboards from that point on. 

Saving the Checkpoints 

Wiki Markup
The first issue was how to save the state to disk. I decided to go with marshalling and saving the part of the [nested hash structure |http://users.telenet.be/jurgen.kobierczynski/jkflow/mylist.pdf] containing the current scoreboard to disk. Specifically, mylist \{direction\}\{$direction\} \{scoreboard\} \{aggregate\} \{report\} points to an array of nested hashes , one for each type of report defined  (as shown highlighted in the thumbnail below).  ( let $ref = mylist \{direction\}\{$direction\} \{scoreboard\} \{aggregate\} \{report\}\[$i\] for the $ith report type ).  Then $ref->\{aggdata\}\{tuplevalues\} contains a nested hash containing the current counters for the flows/packets/bytes both inward and outward for a particular 'direction'(Such as ATLAS BNL to CERN' ) specified by the $direction variable in mylist.

...

This is the top level of the score directory for the 'USATLAS Computing Farm - CERN' direction. There are four checkpoints in this directory. The starting number before the checkpoint is the duration in minutes of the report-type (as defined in JKFlow.xml) for this checkpoint. Thus 1440checkpoint.dat is the checkpoint for the daily aggregate report type and 360checkpoint.dat is the checkpoint for the six-hourly report.

Loading the Checkpoints

The checkpoints only need to be loaded at the start of the program run. For this purpose I have a variable called FIRSTTIME which has module-wide visibility for all objects of JKFlow. THe value of this variable is default to 0 and set to 1 after the first flow-file has been processed. Thus I can load the checkpoint (if present) for the first run of the program.

...

In this way if the machine crashes and is restarted the scoreboarding would be started from the exact same point where it was left off.

Alignment with regular clock-hours/days