Warning

The skimming tool has been generalized and externalized. The information below is obsolete and related to old releases. If possible, use a more recent release and get the corresponding documentation from the new TRAC server.

v7r0 vs v6r3

  • Rename the skimmer into tskim.
  • Rename SK_* variables into TS_* variables (yet keeping backward compatibility).
  • ENABLE APPLYING A NEW CUT ON AN INPUT CEL.
  • Enable fast merge whatever the maximum size, if ROOT release is >= 5.18 .
  • Introduce TS_META_DATA, whose default value is $TS_DIR/share/GlastMetaData.txt.
  • The TS_DEBUG_* variables are now integers : the bigger, the more debug output.
  • Fix a bug into the interpretation of big values for TS_$ variables whose type if Long64_t, for example TS_MAX_FILE_SIZE.

v6r3 vs v6r2p2

  • Enable cuts on non-tuple data type (for example digi).
  • Better check the types and values of RunId/EventId.

v6r2p2 vs v6r2p1

  • Checks the types and values of of RunId/EventId.

v6r2p1 vs v6r1

  • Restore support of SK_OUTPUT_FILE_LIST
  • Remove the last use of ACliC, which indirectly solves the issue with /tmp/Skimmer temporary areas.
  • Remove bug, and print a single warning when the input data files which are chained come from several different ROOT versions (and do not warn if they only differ of 1000000).
  • New variable SK_REBUILD_INDEX. If true (the default), the skimmer does not trust the TTree indexes which already stay in the input data files, and rebuild them all on the fly.
  • New variable SK_OUTPUT_FILE_PATH, only used if there is a single kind of data which is skimmed.
  • For consistency, rename SK_OUT_DIR into SK_OUTPUT_DIR, SK_OUT_FILE_BODY into SK_OUTPUT_FILE_BODY.
  • Exit with error code if one the input data file could not be opened.

v6r1 vs v6r0

First support for CEL files

  • When SK_OUTPUT_CEL is defined, output data us written into an output CompositeEventList instead of real output ROOT files, for all the components
    which are a sequence of events (the latter is testified by the presence of branches for run id and event id).
  • When SK_INPUT_CEL is defined and neither SK_INPUT_FILE_LIST nor SK_TASK, the list of input data files will be taken from the CEL.
  • When SK_INPUT_CEL is defined and neither SK_INPUT_EVENT_LIST nor SK_TCUT, the list of events will be taken from the CEL.

v6r0 vs v5r0

Interface reviewed

Due to the introduction of CEL support, the list of SK_* variables has been partly modified. Typically, any variable defining the name of a textual parameter file has been replaced by two variables, a SK_INPUT_* one and a SK_OUPUT_* one. The corresponding SK_FORCE_* and SK_SKIP_* variables have disappeared. Please read the new Confluence User Guide.

TO BE NOTED : the definition of SK_TASK was previously mandatory, because it was used in the default value of other variables. From now on, it must only be defined for pipeline I data, when one want to get the list of files from the corresponding catalog.

TO BE NOTED : from now on, to get a merge job, it should be enough to not define SK_TCUT and SK_INPUT_EVENT_LIST.

Implementation reviewed

Almost all of the C++ executables have been joined into a single one. Many redundant operations are now done once, or will be soon.

v5r0 vs v4r0

Pointing history and Jobinfo

The former auxiliaries trees, pointing history and job info, are now considered as full data types. Actually, what we have always called a "data type" was before a "kind of file", and it is now rather a "kind of tree".

As a consequence, when one want to merge those trees, the merit data file paths must be preceded by (merit:pointing:jobinfo) instead of (merit), and
SK_DATA_TYPES must be added ":pointing:jobinfo".

No More Root Scripts

All the former ROOT scripts are now compiled applications. It should avoid us the bugs we regurlarly faced when using the interpreter. Also, at least on small data set, it is executing much faster !

The natural drawback is that the installation area is quite larger than before, and the installation process is little more complex : for every different ROOT release which has been used for the generation of the non-tuple data we want to skim, the administrator must compile the skimmer executables.

Oh, yes ! And now it seems to work with 5.18 (smile)

v4r0 vs v3r10

  • The format of parameters files has been reviewed so that they are more flexible and closer to the awaited CEL TXT format. The old parameter files are still readable, EXCEPT the list of events, where the two first special lines should be commented with "#!". Here are the details of the changes :
    • empty lines are allowed everywhere
    • comments starting with # are allowed everywhere
    • the prefix #! refers to "special comments".
    • the first special comment expected in a
      file is "#! CEL TXT 0.1"
    • the second special comment expected in a file
      is of the form "#! SECTION <name>", where <name>
      depends on the kind of file :
      • "Branches" for the lists of branches
      • "Events" for the lists of events
      • "Files" for the lists of input data files
      • "Libraries" for the lists of libraries

v3r10 vs v3r9

  • When skimming merit input files, the skimmer is now also merging jobinfo in a separate output file, the same as for pointing history.

v3r9 vs v3r8

  • Debug problem with ROOT 5.16 rootmaps.
  • New variable SK_DATA_DIRS, which give the list of directories where to search for input data files, when their paths in the file list are relative paths (not starting with /). mainly added for the test suite, but who knows what it can be good at...

v3r8 vs v3r7p2

  • Introduce the data types : cal, svac, gcr.

v3r7p2 vs v3r7p1

  • INTERNAL: restrict the usage of TSystem::ExpandPathName() to the parameters which are kind of path.

v3r7p1 vs v3r7

  • Accept old format for branch file.
  • Accept comments (#) and empty lines in the branch/file/library list files
  • Also remove the eventual non-printable characters at the end of branch/file/library list files, typically exotic end of lines.

v3r7 vs v3r6

  • When SK_ENFORCE_OUTPUT_FILES is true, always generate output files, even empty.
  • When skimming merit input files, the skimmer now also merge pointing history in a separate output file.
  • Merge "fastly" only if SK_MAX_FILE_SIZE is 0.
  • INTERNAL: do not check read/written bytes if one and only one of the file has a size >= 2GB.

v3r6 vs v3r5

  • Better check file opening.
  • Accept that a cut select 0 event.

v3r5 vs v3r4

  • Validated with ROOT 5.14/00d and Glast Release v11.
  • Create all TFiles with TFile::Open() so to better support the xrootd files.
  • BUG FIX : the external value of ROOTSYS is now better taken into account.
  • BUG FIX : when generating the branch list, there was an error for branches which are arrays (their name ends with "something").
  • BUG FIX : when generating the event list, the cut was not written, although it was expected to.
  • Add Skimmer::displaySizes() for testing purpose.

v3r4 vs v3r3

First draft of merge use case.

v3r3 vs v3r2

List-like shell variables

Bug fixes in the the way to expand SK_DATA_TYPES, and automatic remove of duplicates.

Prefixes in list-like parameter files

Change the prefix from "datatype:" to "(datatype)" within the file-list parameter file.

Former release file

Replace the release file with a file containing an explicit list of shared libraries.

  • All RELEASE shell variables are replaced with LIBRARY_LIST equivalent ones.
  • The content of the file now designed with SK_LIBRARY_LIST_FILE is not any more the path of the release directory, but the list of the libraries to be loaded, with their complete paths. The same as SK_FILE_LIST_FILE, each library path can be prefixed with a datatype between brackets, which mean that this library is only needed for the skimming of this datatype.
  • Better survive the lack of library list when there is no need (do not require SK_SKIP_GET_LIBRARY_LIST=true).

v3r2 vs v3r0p1

Management of parameter files

When a given SK_FORCE_* is true, the skimmer remove the corresponding parameter file before running the corresponding execution step. This way, if the execution step is failing, the skimming will not goes on normally, as it was doing before.

Large review of the GET_RELEASE execution step

The previous skimmer release has demonstrated several weaknesses :

  • It loads an arbitray release of libcommonRootData.so so to be able to interpret the instance of FileHeader, which could prove irrelevant, and prevent the later loading of the correct release of libcommonRootData.so .
  • It looks for the FileHeader in the first input file of a given data type, which can be customized by the user. There is no universal relevant value, so the user has to be aware of the relevant type for each task.
  • After constructing the expected path of the libraries directory, it does not check this directory exists, and if it includes the expected librairies.

The new implementation in v3r2 provides a precompiled FileHeader library, searches for some header in all the input data types, checks the existence of libcommonRootData.so, and look for it in mutiple directories. For a detailed description, look at the Skimmer FAQ.

From a user point of view, here are the changes in the interface :

  • SK_GET_RELEASE_HEADER_RELEASE is deprecated.
  • SK_GET_DEFAULT_DATA_TYPE is deprecated.
  • SK_GET_RELEASE_RELEASES_DIR becomes SK_LIBRARY_DIRS, and it is now expected to be a list of ":" separated paths. Its default value is "/nfs/farm/g/glast/u09/builds/rh9_gcc32:/nfs/farm/g/glast/u30/builds/rh9_gcc32:/afs/slac.stanford.edu/g/glast/ground/releases/rh9_gcc32opt".

Note: before searching for libcommonRootData.so in the directories from SK_LIBRARY_DIRS, the skimmer is trying the original directory which is given in the FileHeader.

Shell variables renamed

SK_GET_RELEASE_DEFAULT_RELEASE is renamed SK_EXPECTED_RELEASE. It is always used as a default. On top of it, when the release foun in some FileHeader differs from the expected one (if defined), a warning is issued.

The variables which define the names of the parameter files are renamed with _FILE instead of _PATH :

  • SK_FILE_LIST_PATH becomes SK_FILE_LIST_FILE.
  • SK_EVENT_LIST_PATH becomes SK_EVENT_LIST_FILE.
  • SK_BRANCH_LIST_PATH becomes SK_BRANCH_LIST_FILE.
  • SK_RELEASE_PATH becomes SK_RELEASE_FILE.

The old names will still be recognized for few future releases.

Improvement in the treatment of list-like shell variables

For the list-like shell variables such as SK_DATA_TYPES and SK_LIBRARY_DIRS, the eventual empty slot in the value is filled with the default value. An empty slot is a ":" as first or last character, and/o a "::" with the value.

  • No labels