One of the issues I came across while implmenting a checkpointing mechanism for the scoreboarding data in JKFlow was how to marshall and unmarshall the complex nested hash structures between checkpointing the data to disk and loading the checkpoints. I searched on the web for any useful modules for this purpose. One particularly useful resource I found was a chapter on object persistance in perl in the advanced perl programming book by O'Reilly http://www.unix.org.ua/orelly/perl/advprog/ch10_02.htm.
This chapter discusses two types of techniques:
- Techniques using Streamed Data
- Record-Oriented Approach
In the techniques using Streamed Data there are three implementations:
- FreezeThaw, written by Ilya Zakharevich, is a pure Perl module (no C extensions) and encodes complex data structures into printable ASCII strings.
- Data::Dumper, written by Gurusamy Sarathy, is similar in spirit to FreezeThaw, but takes a very different approach. Still implemented in Perl though. Also it basically pretty prints the data structure so it is not of much use for marshalling and unmarshalling.
- Storable is a C extension module for serializing data directly to files and is the fastest of the three approaches.
(For an indepth discussion of the streamed approaches you may refer to http://www.unix.org.ua/orelly/perl/advprog/ch10_02.htm)
For the Record-Oriented Approaches, once again there are three implementations:
- DBM is a disk-based hash table, originally written by Ken Thompson for the Seventh Edition Unix system.
- MLDBM (multilevel DBM) stores complex values in a DBM file. It uses Data::Dumper to serialize any data structures, and uses a DBM module of your choice.
- Berkeley DB - is a public-domain C library of database access methods, including B+Tree, Extended Linear Hashing, and fixed/variable length records. The latest release also supports concurrent updates, transactions, and recovery.
(For a more detailed discussion of record based approached you may refer to http://www.unix.org.ua/orelly/perl/advprog/ch10_03.htm)
Choice of Technique:
Since I am implementing a simple checkpointing mechanism which is not only easy to implement but also efficient, I am currently working with the streamed approaches as I donot require the elaborate transaction recovery functionality present in the record based approaches.
Within the stream based approaches 'Storable' is an attractive option since:
- It is the only approach which allows you to directly serialize the nested hash strcture to file and back through a seamless interface.
- It is implemented in C which should give some performance advantage over the other two modules implemented in Perl.
I have already tested this approach in JKFlow and it successfully marshalls the