Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A skimming job is organized as a sequence of steps. All but the two last one ones are preparation steps, which are analyzing the shell variables defined by the user, the textual parameter files he is providing, and establish what to skim. Currently, the steps are :

  1. MAKE_FILE_LIST : establish the list of the input ROOT data files to be skimmed.
  2. MAKE_LIBRARY_LIST : eventually find out the release of the corresponding C++ code, and search for the associated shared libraries.
  3. MAKE_BRANCH_LIST : establish the list of branches to be duplicated.
  4. MAKE_EVENT_LIST : establish the list of events to be duplicated.
  5. SKIM : the actual skimming.
  6. CHECK : optional check of output data, which could take a long time to perform.

How to control the skimming job

...

All kinds of parameter files can contain any number of empty lines and comments starting with "#". The lines starting with "#!" are called special comments. The first special comment in any parameter file should express the global file format release, currently "CEL TXT 0.1". The second special comment should be of the form "SECTION <name>", where <name> depends on the kind of information in the rest of the file. For example, if the file contain the list of input data files, <name> will be "Files". Several examples will be given below.

...

  1. From a CompositeEventList : if a CEL file is given as input to the skimmer, and defined with variable SK_INPUT_CEL.
  2. From a textual parameter file made by the user : the format is given below, and the path of the parameter file is the value of the variable SK_INPUT_FILE_LIST.
  3. From the Pipeline I Oracle Database : if the data to be skimmed has been generated with the Pipeline I, one can define SK_INPUT_TASK.

One and only one of those three variables must be non-null.

In the case of Pipeline I products, SK_INPUT_TASK is enough and should be any of the tasks recognized by the Pipeline I Oracle Database. On top of that, one can select a subset of the task runs through the shell variables SK_RUN_MIN and SK_RUN_MAX. If the value of SK_RUN_MAX is 0, all the runs will be taken into consideration.

...

Whatever the source for the list of input data files, one can obtain a copy of this list when giving a value to SK_OUTPUT_FILE_LIST. This list is restrained to the files whose at least one entry has been kept after skimming. The ouput format is the same as the input format above. After a skimming, one can copy the output file defined by SK_OUTPUT_FILE_LIST, edit it freely and reuse the copy later as a SK_INPUT_FILE_LIST. It is not recommended to use the same single file for both input and ouput, because the file defined by SK_OUTPUT_FILE_LIST is always overwritten, and you could easily loose your modifications.

...

Instead of the file above, if the user knows about it, he can provide the data code release with the variable SK_EXPECTED_RELEASE, and a set of directories where to search for the shared libraries, defined by SK_LIBRARY_DIRS (which has a default value relevant for SLAC site). The latter is a ':' set of directories paths. SK_EXPECTED_RELEASE should have the form <main_package>/<main_package>-<release>, as one can see in the example above. The exact names of the libraries for a given data type ar currently hardcoded, and described in the guide /Skimmer at SLAC/. For example, For each <dir> element in SK_LIBRARY_DIRS, and a given <main_package> and a given <release>, the skimmer will look for <dir>/<main_package>/<main_package>-<release>/lib/libcommonRootData.so.

...

The list of selected events can be obtained from different sources :

  1. From a CompositeEventList : if a CEL file is given as input to the skimmer, and defined with variable SK_INPUT_CEL.
  2. From a textual file made by the user : the format is given below, and the file path is given thanks to variable SK_INPUT_EVENT_LIST.
  3. Indirectly with a cut : the skimmer can generate an event list, based on the values of SK_TCUT and SK_TCUT_DATA_TYPE. The syntax of SK_TCUT should be the ROOT one. Currently, the only valid value for SK_TCUT_DATA_TYPE is merit.

...

In the case you do want to keep all the entries, i.e. to merge the input data files, you should not give any kind of cut to the skimmer : SK_INPUT_CEL, SK_INPUT_EVENT_LIST and SK_TCUT should be empty.

In theory, we should also be able to combine any number of those cuts above, but this is not yet implemented. Currently, you must define either an input CEL, or a textual file, or a cut. Also, it is not yet possible to define a TCut which is going through several data types, but it will be studied as soon as we have several possible values for SK_TCUT_DATA_TYPE.

As usual, one can obtain a file containing the final list of events by giving a value to SK_OUTPUT_EVENT_LIST, whose ouput format is the same as the input format above.

...

The skimmer can also take into account a list of the branches to be activated or desactivated. This list is given through a file, whose full path is given by variable SK_INPUT_BRANCH_LIST. Each line should contains a data type prefix, the name of the tree, a { + } or a - (so to activate or desactivate respectively), and the specification of one or several branches (with the ROOT syntax). The lines are applied one after the other : you can desactivate all the branches of a given type with -*, then activate the only ones of interest. There is a first implicit +* for all the data types used in the skimming job (see SK_DATA_TYPES in next section). So, all the data types which are not explicitly in the branch list will have all their branches activated. Here is an example of such file :

...