Highest Priority Items

  1. Support deletion of some files in the input data file list.
  2. The output data file list should include only the file whose at least one event has been kept.

Documentation & Test suite

  1. Review all the inline documentation of DataServer package, which is mainly obsolete.
  2. Discuss with ROOT team the best way to characterize a ROOT file (currently based on leaves, see the check step).
  3. We should integrate large-scale tests, and track the size of the parameter files (files end events lists) so to check they stay in reasonnable limits.

Interface & Features

  1. generically scan, display, and copy auxiliary objects found in ROOT files => do study hadd.
  2. That same configuration file, or a separate one, should contain parameters instead of the environnement variables.
  3. For the communication with the Web front-end, we should favor a early detection of problems, and setup a standard way to feed back the front-end with such early information (which could include an evaluation of the job end). Perhaps a "fake" run mode, which would be started by typing "skimmer try" ?
  4. Enable tcuts through multiple data kinds.
  5. Idea : ensure that when a skimming is failing, the incomplete output data files is erased. Then, change the skimming step so that it will check that some output files already exists or not, and only do the ones which are lacking ? (a SK_FORCE_SKIP would enforce the skimming whatever the files, and perhaps would be "true" by default).

Bugs

  1. From time to time, the content of the produced ROOT files seems to change a little. Before we can investigate more this bug, we should wait for the test suite to be improved, and everything to be compiled, so that we can avoid CINT problems.

Implementation

  1. I think we should plug to the FileListManager a list of PipelineConfig.
    When a FileListManager needs an information for a given task, it would ask to all
    instances of PipelineConfig, and see which one recognize this task and provide
    information about where to find the files. Such a structure should greatly help to
    manage the many kinds of pipelines under work.
  2. Study new class TEntryList. Replace TEventList with TEntryList ?
  3. To be explored : the use of ROOT MakeClass/MakeProject so to avoid loading libraries. First experiments are not very successfull, especially with mc/digi/recon files. But perhaps with tuples-like and FileHeader it could be of use.
  4. To be explored : hadd.
  5. The different datakinds are skimmed one by one. We could investigate if we can/should skim them in parallel.
  6. Study Riostream.h : it is defining a "using namespace std", so NOT TO BE USED IN HEADER FILES.
  • No labels