One of the issues I came across while implmenting a checkpointing mechanism for the scoreboarding data in JKFlow was how to marshall and unmarshall the complex nested hash structures between checkpointing the data to disk and loading the checkpoints. I searched on the web for any useful modules for this purpose. One particularly useful resource I found was a chapter on object persistance in perl in the advanced perl programming book by O'Reilly http://www.unix.org.ua/orelly/perl/advprog/ch10_02.htm.
This chapter discusses two types of techniques:
- Techniques using Streamed Data
- Record-Oriented Approach
In the techniques using Streamed Data there are three implementations:
- FreezeThaw, written by Ilya Zakharevich, is a pure Perl module (no C extensions) and encodes complex data structures into printable ASCII strings.
- Data::Dumper, written by Gurusamy Sarathy, is similar in spirit to FreezeThaw, but takes a very different approach. Still implemented in Perl though. Also it basically pretty prints the data structure so it is not of much use for marshalling and unmarshalling.
- Storable is a C extension module for serializing data directly to files and is the fastest of the three approaches.
(For an indepth discussion of the streamed approaches you may refer to http://www.unix.org.ua/orelly/perl/advprog/ch10_02.htm)
For the Record-Oriented Approaches, once again there are three implementations:
- DBM is a disk-based hash table, originally written by Ken Thompson for the Seventh Edition Unix system.
- MLDBM (multilevel DBM) stores complex values in a DBM file. It uses Data::Dumper to serialize any data structures, and uses a DBM module of your choice.
- Berkeley DB - is a public-domain C library of database access methods, including B+Tree, Extended Linear Hashing, and fixed/variable length records. The latest release also supports concurrent updates, transactions, and recovery.
(For a more detailed discussion of record based approached you may refer to http://www.unix.org.ua/orelly/perl/advprog/ch10_03.htm)
Choice of Technique:
Since I am implementing a simple checkpointing mechanism which is not only easy to implement but also efficient, I am currently working with the streamed approaches as I donot require the elaborate transaction recovery functionality present in the record based approaches.
Within the stream based approaches 'Storable' is an attractive option since:
- It is the only approach which allows you to directly serialize the nested hash strcture to file and back through a seamless interface.
- It is implemented in C which should give some performance advantage over the other two modules implemented in Perl.
I have already tested this approach in JKFlow and it successfully marshalls and unmarshalls the complex nested hash based data strcutures in JKFlow by simply passing a reference to the root of the nested structure to its component sub-routines. Follwoing is the description of how to use the module.
Usage of the Storable Module
Storable is a C extension module for serializing data directly to files and is the fastest of the three approaches. The store function takes a reference to a data structure (the root) and the name of a file. The retrieve method does the converse: given a filename, it returns the root:
Code Block |
---|
use Storable; $a = [100, 200, {'foo' => 'bar'}]; eval { store($a, 'test.dat'); }; print "Error writing to file: $@" if $@; $a = retrieve('test.dat'); |
If you have more than one structure to stuff into a file, simply put all of them in an anonymous array and pass this array's reference to store.
You can pass an open filehandle to store_fd instead of giving a filename to store. The functions nstore and nstore_fd can be used for storing the data in "network" order, making the data machine-independent. When you use retrieve or retrieve_fd, the data is automatically converted back to the native machine format (while storing, the module storesa flag indicating whether it has stored it in a machine-independent format or not).