You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Adding random access to LCIO files.

Goals

Allow efficient access to specific events in LCIO files. Events should be selectable by

  • Run
  • Run+Event
  • Index (i.e. 10000th event in file)
  • Tag (e.g. EMISS>200)
  • Access must work for "chains" of files in addition to individual files.

The last criteria is optional, since there is not an urgent requirement for it, but it would be useful to at least consider how this could be supported in future.

Proposed implementation

One way to implement this would be to (optionally) include two new types of records in LCIO files.

LCIORandomAccess.xml
  <record name="LCIORandomAccess">
     There are two types of LCIORandomAccess records
       file record -- one per file, always first record on file
       index record -- one or more per file, points to associated LCIOIndex record
     <block name="LCIORandomAccess" major="1" minor="0">
         <data type="int" name="runMin"/>
         <data type="int" name="eventMin"/>
         <data type="int" name="runMax"/>
         <data type="int" name="eventMax"/>
         <data type="int" name="nRunHeaders"/>
         <data type="int" name="nEvents"/>
         <data type="int" name="recordsAreInOrder"/>
         <data type="long" name="indexLocation">
            Location in file off associated index. Always null for file record.
         </data>
         <data type="long" name="prevLocation">
            For file record location of first index record in file
            For index record location of previous index record (or null if first)
         </data>
         <data type="long" name="nextLocation">
            For file record location of last index record in file
            For index record location of next index record (or null if last)
         </data>
     </block>
  </record>
LCIOIndex.xml
  <record name="LCIOIndex">
     <block name="LCIOIndex" major="1" minor="0">
         <data type="int" name="controlWord">
            Bit 0 = single Run
            Bit 1 = long offset required
         </data>
         <data type="int" name="runMin"/>
         <data type="long" name="baseOffset"/>
         <data type="int" name="size"/>
         <repeat count="size">
             <if condition="(controlWord&amp;1)==0">
                <data type="int" name="runOffset">
                    Relative to runMin
                </data>
             </if>
             <data type="int" name="eventNumber">
                Event number, or -1 for run header records
             </data>
             <if condition="(controlWord&amp;2)==1">
                <data type="long" name="locationOffset">
                    Relative to baseOffset
                </data>
             <else/>
                <data type="int" name="locationOffset">
                    Relative to baseOffset
                </data>
             </if>
         </repeat>
     </block>
  </record>
LCIO Random Access

If the file is written with support for random access enabled then the first record in the file will be a LCIORandomAccess record which describes the entire file. This block will then point to one or more LCIORandomAccess records elsewhere in the file, each of which will have an associated LCIOIndex record. The LCIOIndex record contains the location of each record EventHeader/RunHeader record in the file.

Reading files with random access records.

TBD

Implementation notes

  • The purpose of having two types of records is that the (small) LCIORandomAccess records can be written uncompressed. This allows them to be read quickly, and updated after they are written using the SIO random access mechanism described below. The larger LCIOIndex records can be written with compression turned on since they never need to be updated once they are written.
  • The reason for having multiple LCIOIndex records in a file is to prevent the problem of having to store a potentially infinitely large index blocks in memory. By storing something like 10-100k records per index block performance to access records is still good, without huge memory usage even for very large (chains) of files. This also makes the task of appending to an existing files much easier.
  • Older LCIO readers should ignore/skip the LCIOIndex/LCIORandomAccess records.

Changes needed in SIO

Small changes are needed in the SIO library to support random access.

  • Changes for writing files
    • When creating a record return the location in the file of the record
    • Allow existing records to be overridden. New record must be exactly the same size as existing record (or possibly allow smaller records). If smaller records are not allowed the file format on disk is completely unchanged. If smaller records are allowed then some care is needed in interpreting record lengths, which may break older readers. Because size of compressed records cannot be predicted, replacing compressed records is not recommended (or allowed?)
    • Optionally allow space to be reserved for a future record (not required for LCIO).
  • Changes for reading
    • Allow record at a given location to be read.

A prototype updated version of the Java SIO library is available which incorporates these modifications. The documentation is here. Newly added methods are marked since 2.1.

  • No labels