Frank has presented some ideas on enhancing LCIO, and Tony has an older document discussing Adding support for Random Access to LCIO files.

The current LCIO file format is documented here. The SIO format is documented here.

The following list summarizes some of the requirements/desires from these documents.

  1. Support direct (random) access to events or other records in LCIO files
    1. Ideally we could also (eventually) support access based on criteria, i.e. events with Emiss>100
  2. Efficient access to sub-parts of event.
    1. i.e we should be able to read an LCIO file and if only interested in reconstructed particle read only those blocks from the file.
      1. Ideally this would be totally transparent to the user, especially when reading the file.
  3. Ability to distribute data over several files. This could be useful to:
    1. Append multiple files together to form a single large "logical" file
    2. To allow data to be added to an existing file. E.g.
      1. Do reconstruction and add the reconstruction output to the file without changing the input file
      2. Allow users to add extra information to the file
  4. More efficient way to store user defined data E.g.
    1. Using LCIO in testbeam DAQ requires very fast IO
    2. LCGenericObject is not very user friendly.

Some consequences that follow from these requirements

  • Currently the mapping between blockname and blocktype is stored only in the EventHeader. This makes it hard to add additional information into the event. Frank proposes storing BlockName#Blocktype where we currently store blockname.
  • Currently SIO pointers can only point within the same block. We need the ability for pointers to point between blocks or even files.
  • No labels

3 Comments

  1. Unknown User (gaede)

    To implement the 'on demand' reading of LCIO collections we could store a fixed mapping of collection types and keys (names) to records in the first record. This information would then be used to read only requested blocks: when LCEvent->getCollection("MyHits") is called by the user the collection type and record is retrieved from that mapping. Later this could be extended to also automatically open another file that might have the data (through some appropriate file directory that has been created beforehand).
    The users would have to specify the mapping before opening a file with the LCWriter,e.g.
    LCWriter wrt ;
    wrt->register("montecarlo", LCIO::MCPARTICLE, "MCParticle" ) ;
    wrt->register("montecarlo", LCIO::SIMCALORIMETERHIT ) ;
    //..

    This would be similar to creating branches in root.
    If we introduce such a free but fixed (for any one file) mapping we would not need the additional "BlockKey#BlockType" - so the only change to the SIO format would be the extra 64-bit record key.

    1. I think this is a good idea, but a couple of issues:

      1. A single record at the beginning of the file makes it hard to append files. We would probably want to allow the special record to occur anywhere and apply to the following records
      1. If we want to cleanly handle on demand loading of collections we also need to worry about pointer dereferencing. If I read a collection of ReconstructedParticles and then follow a pointer to a Track I want that collection containing the track to be loaded automatically. Since pointers in LCIO are created on a collection by collection basis, this would be possible by having an additional list of indexes into the pointer list which allows us to figure out which collection a particular pointer refers to.
  2. SIO already contains a mechanism which could be used to handle DAQ records. Currently LCIO only allows 3 SIO record types, but we could easily allow used defined handlers to be added to support other record types, and these could then contain raw DAQ information, or calibration information. SIO allows compression to be turned off (on a block-by-block basis I think), and there is nothing preventing users writing an large array of bytes to circumvent the SIO byte ordering. (It would be good to actually do some tests to see how much overhead compression and byte ordering actually take for a typical DAQ record).