You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Adding random access to LCIO files.

Goals

Allow efficient access to specific events in LCIO files. Events should be selectable by

  • Run
  • Run+Event
  • Index (i.e. 10000th event in file)
  • Tag (e.g. EMISS>200)
  • Access must work for "chains" of files in addition to individual files.

The last criteria is optional, since there is not an urgent requirement for it, but it would be useful to at least consider how this could be supported in future.

Proposed implementation

One way to implement this would be to (optionally) include two new types of records in LCIO files.

LCIORandomAccess.xml
  <record name="LCIORandomAccess">
     <block name="LCIORandomAccess" major="1" minor="0">
         <data type="int" name="runMin"/>
         <data type="int" name="eventMin"/>
         <data type="int" name="runMax"/>
         <data type="int" name="eventMax"/>
         <data type="int" name="nRunHeaders"/>
         <data type="int" name="nEvents"/>
         <data type="int" name="recordsAreInOrder"/>
         <data type="long" name="indexLocation"/>
         <data type="long" name="prevLocation"/>
         <data type="long" name="nextLocation"/>
     </block>
  </record>
LCIOIndex.xml
  <record name="LCIOIndex">
     <block name="LCIOIndex" major="1" minor="0">
         <data type="int" name="controlWord"/>
         <data type="int" name="runMin"/>
         <data type="long" name="baseOffset"/>
         <data type="int" name="size"/>
         <repeat count="size">
             <if condition="(controlWord&amp;1)==0">
                <data type="int" name="runOffset"/>
             </if>
             <data type="int" name="eventNumber"/>
             <if condition="(controlWord&amp;2)==1">
                <data type="long" name="locationOffset"/>
             <else/>
                <data type="int" name="locationOffset"/>
             </if>
         </repeat>
     </block>
  </record>
LCIO Random Access

If the file is written with support for random access enabled then the first record in the file will be a LCIORandomAccess record which describes the entire file. This block will then point to one or more LCIORandomAccess records elsewhere in the file, each of which will have an associated LCIOIndex record.

Implementation notes

  • The purpose of having two types of records is that the (small) LCIORandomAccess can be written uncompressed. This allows it to be read quickly, and updated after it is written using the SIO random access mechanism described below. The larger LCIOIndex record can be written with compression turned on since it never needs to be updated once it is written.
  • The reason for having multiple LCIOIndex records in a file is to prevent the problem of having to store a potentially infinitely large index blocks in memory. By storing something like 10-100k records per index block performance to access records is still good, without huge memory usage even for very large (chains) of files. This also makes the task of appending to an existing files much easier.
  • No labels