One of the issues I came across while implmenting a checkpointing mechanism for the scoreboarding data in JKFlow was how to marshall and unmarshall the complex nested hash structures between checkpointing the data to disk and loading the checkpoints. I searched on the web for any useful modules for this purpose. One particularly useful resource I found was a chapter on object persistance in perl in the advanced perl programming book by O'Reilly http://www.unix.org.ua/orelly/perl/advprog/ch10_02.htm.

 This chapter discusses two types of techniques:

  1. Techniques using Streamed Data
  2. Record-Oriented Approach

In the techniques using Streamed Data there are three implementations:

  1. FreezeThaw, written by Ilya Zakharevich, is a pure Perl module (no C extensions) and encodes complex data structures into printable ASCII strings.
  2. Data::Dumper, written by Gurusamy Sarathy, is similar in spirit to FreezeThaw, but takes a very different approach. Still implemented in Perl though. Also it basically pretty prints the data structure so it is not of much use for marshalling and unmarshalling.
  3. Storable is a C extension module for serializing data directly to files and is the fastest of the three approaches.

(For an indepth discussion of the streamed approaches you may refer to http://www.unix.org.ua/orelly/perl/advprog/ch10_02.htm)

For the Record-Oriented Approaches, once again there are three implementations:

  1. DBM is a disk-based hash table, originally written by Ken Thompson for the Seventh Edition Unix system.
  2. MLDBM (multilevel DBM) stores complex values in a DBM file. It uses Data::Dumper to serialize any data structures, and uses a DBM module of your choice.
  3. Berkeley DB - is a public-domain C library of database access methods, including B+Tree, Extended Linear Hashing, and fixed/variable length records. The latest release also supports concurrent updates, transactions, and recovery.

(For a more detailed discussion of record based approached you may refer to http://www.unix.org.ua/orelly/perl/advprog/ch10_03.htm)

Choice of Technique:

Since I am implementing a simple checkpointing mechanism which is not only easy to implement but also efficient, I am currently working with the streamed approaches as I donot require the elaborate transaction recovery functionality present in the record based approaches.

Within the stream based approaches 'Storable' is an attractive option since:

  1. It is the only approach which allows you to directly serialize the nested hash strcture to file and back through a seamless interface.
  2. It is implemented in C which should give some performance advantage over the other two modules implemented in Perl.

 I have already tested this approach in JKFlow and it successfully marshalls and unmarshalls the complex nested hash based data strcutures in JKFlow by simply passing a reference to the root of the nested structure to its component sub-routines. Follwoing is the description of how to use the module.

Usage of the Storable Module 

Storable is a C extension module for serializing data directly to files and is the fastest of the three approaches. The store function takes a reference to a data structure (the root) and the name of a file. The retrieve method does the converse: given a filename, it returns the root:

use Storable;
$a = [100, 200, {'foo' => 'bar'}];
eval {
    store($a, 'test.dat');
};
print "Error writing to file: $@" if $@;
$a = retrieve('test.dat');

If you have more than one structure to stuff into a file, simply put all of them in an anonymous array and pass this array's reference to store.

You can pass an open filehandle to store_fd instead of giving a filename to store. The functions nstore and nstore_fd can be used for storing the data in "network" order, making the data machine-independent. When you use retrieve or retrieve_fd, the data is automatically converted back to the native machine format (while storing, the module storesa flag indicating whether it has stored it in a machine-independent format or not).

  • No labels