Page History

...

The skimmer is only usable on linux. For what concerns the external tools, skimmer v5r0 v6r0 depends on :

Perl 5, which should be found with "/usr/bin/env perl".
ROOT 5.10.00 to 5.18.00 : the user can specify $ROOTSYS to any ROOT release, and it will be used as is by the skimmer, but the only validated releases are 5.10.00, 5.14.00g, 5.16.00-gl1 and 5.18/00 ; if not defined, the skimmer will search for $GLAST_EXT/ROOT/v5.10.00/root ; if GLAST$GLAST_EXT is not defined, it will be set to /afs/slac/g/glast/ground/GLAST_EXT/$CMTCONFIG ; if $CMTCONFIG is not defined, it will be set to rh9_gcc32.

...

GET_FILE_LIST : establish the list fo the input ROOT data files to be skimmed.
GET_LIBRARY_LIST : eventually find out the release of the corresponding C++ code, and search for the associated shared libraries.
GET_BRANCH_LIST : establish the list of branches to be duplicated.
GET_EVENT_LIST : establish the list of events to be duplicated.
SKIM : the actual skimming.

The skimmer Skimmer is also known to GLAST people as the Data Server Back End. It has a command-like interface which can be used directly, or it can also be used through a web interface, also known as the Data Server Front End, or Skimmer Web Application. Here, one will only find the documentation of the command-like interface, but this can also help to understand the corresponding web interfacefront-end.

The behavior of the skimmer is all tuned by some predefined shell variables. For a complete list of those variables, one can type "skimmer help", but the explanations will hardly make sense if you have not read this guide before.

...

Data files mining parameters

The list of the input data files to be processed is something can be obtained by the skimmer can get from the Pipeline I Oracle Database, if you provide the data types, the task name and the runs range through the shell variables SK_DATA_TYPES, SK_TASK, SK_RUN_MIN and SK_RUN_MAX. For the Pipeline II data files, read the note at the end of this section.from different sources :

From a CompositeEventList : if a CEL file is given as input to the skimmer, and defined through variable SK_INPUT_CEL.
From a textual file made by the user : the format is given below, and the file path is given thanks to SK_INPUT_FILE_LIST.
From the Pipeline I Oracle Database : if the data to be skimmed has been generated by the old Pipeline I, one must define SK_INPUT_TASK.
One and only one of those three variables must be non-null.

In the case of Pipeline I products, SK_INPUT_TASK is enough and should be any of the tasks recognized by the Pipeline I Oracle Database. On top of that, one can select a subset of the task runs through the shell variables SK_RUN_MIN and SK_RUN_MAXSK_DATA_TYPES should be a ":" separated list of data types. The current recognized types are "merit", "mc", "digi" and "recon". If SK_DATA_TYPES is empty, a default value of "merit:mc:digi:recon" will be used. SK_TASK should be any of the tasks recognized by the Pipeline I Oracle Database. If SK_RUN_MAX is set to 0, all the runs will be taken into consideration.The resulting list of ROOT data file names is expected to be stored in

In the case of a textual file whose complete path is defined by SK_FILE_LIST_FILE. The skimmer will require the list from Oracle only if this file does not already exist, or if SK_FORCE_GET_FILE_LIST is set to true. Also, if this file does not exists and you want to prevent its creation, you must set SK_SKIP_GET_FILE_LIST to true (rarely useful). The file defined by SK_FILE_LIST_FILE, after a skimmer execution, can be freely edited and eventually reused. Also, if you have a set of data files you want to process, you can write such a file from scratch. Each line of the file is simply expected to be the full path of a given ROOT file, eventually prefixed by its data type. made by the user, it must conform to the usual rules for the skimmer parameter files : the header is made of a first special comment which recall the global file format release (CEL TXT 0.1), and a special comment which declare that what follows is a list of files (SECTION Files). Then, each line of the file is the full path of an input ROOT file, eventually prefixed by the data types of the trees within the file. For example :

Panel

#
#! CEL TXT 0.1
#

#! SECTION Files
(recon)/nfs/farm/g/glast/u35/MC-tasks/BeamTest-0100/output/000001/BeamTest-0100_000001_recon_RECON.root
(recon)/nfs/farm/g/glast/u35/MC-tasks/BeamTest-0100/output/000002/BeamTest-0100_000002_recon_RECON.root
(mc)/nfs/farm/g/glast/u35/MC-tasks/BeamTest-0100/output/000001/BeamTest-0100_000001_mc_MC.root
(mc)/nfs/farm/g/glast/u35/MC-tasks/BeamTest-0100/output/000002/BeamTest-0100_000002_mc_MC.root
(merit:pointing:jobinfo)/nfs/farm/g/glast/u35/MC-tasks/BeamTest-0100/output/000001/BeamTest-0100_000001_merit_merit.root
(merit:pointing:jobinfo)/nfs/farm/g/glast/u35/MC-tasks/BeamTest-0100/output/000002/BeamTest-0100_000002_merit_merit.root
(digi)/nfs/farm/g/glast/u35/MC-tasks/BeamTest-0100/output/000001/BeamTest-0100_000001_digi_DIGI.root
(digi)/nfs/farm/g/glast/u35/MC-tasks/BeamTest-0100/output/000002/BeamTest-0100_000002_digi_DIGI.root

When a SK_FILE_LIST_FILE is already available, there is no obligation to define SK_TASK, yet it is recommended because you often need it for the default value of some other shell variable.

Here are the default values of the shell variables for this section :

No Format


SK_TASK=""
SK_RUN_MIN=0
SK_RUN_MAX=0
SK_DATA_TYPES="merit:mc:digi:recon"
SK_FILE_LIST_PATH="${PWD}/${SK_TASK}_FileList.txt"
SK_SKIP_GET_FILE_LIST="false"
SK_FORCE_GET_FILE_LIST="false"
SK_DEBUG_GET_FILE_LIST="false"

...

Whatever the source for the list of input data files, one can obtain a copy of this list, when giving a value to SK_OUTPUT_FILE_LIST. This list is restrained to the files whose at least one entry has been used. The ouput format is the same as the input format above. After a skimmer execution, one can copy the resulting SK_OUTPUT_FILE_LIST, edit it freely and reuse it later as a SK_INPUT_FILE_LIST. It is not recommended to use a single file in such a case, because the SK_OUTPUT_FILE_LIST is overwritten if it already exists, and you could loose your modifications.

Here are the default values of the shell variables for this section (i.e. the values which are used if the variables are undefined or empty strings) :

No Format


SK_INPUT_CEL = ""
SK_INPUT_FILE_LIST = ""
SK_INPUT_TASK = ""
SK_DATA_TYPES = "merit:mc:digi:recon"
SK_RUN_MIN = 0
SK_RUN_MAX = 0
SK_OUTPUT_FILE_LIST=""
SK_FORCE_FILE_LIST="false"
SK_DEBUG_FILE_LIST="false"

Release libraries determination parameters

...

As usual, you can edit this generated file, or write one from scratch. Each line should contains a data type prefix, the name of the tree, a {+} or a - (so to activate or desactivate respectively), and the specification of one or several branches (with the ROOT syntax). The lines are applied one after the other : you can desactivate all the branches of a given type with *, then activate the only ones of interest. For example :

...

We are now to the point where to say which types of data we want to skim. This is said by shell variable SK_DATA_TYPES, the same which is described in the "data files mining" section abovewhich should be a ":" separated list of data types. The current recognized types can be found in the guide /Skimmer at SLAC/. If SK_DATA_TYPES is empty, a default value of "merit:mc:digi:recon" will be used.

The skimmed files will be stored in the directory defined by shell variable SK_OUT_DIR, in
files called SK_OUT_FILE_BODY_<datatype>.root. Yet, if they turned to be very big files, ROOT could automatically close the first file and open new ones, appending a rank number to the file name. The maximum size of each ouput ROOT file can be changed with shell variable SK_MAX_FILE_SIZE. If the value of 0 is given to this variable (this is the default), ROOT will use its own default value. Also, if the value is 0 and the job is merging all the events, the ROOT fast merging method will be used.

...

Space shortcuts

Child pages

Versions Compared

Old Version 2

New Version 3

Key

Data files mining parameters

Release libraries determination parameters