Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

TASK FUNCTION

Task Overview

Each Fermi data run corresponds to a single top-level stream of this task. The task structure is fairly simple, with various short bookkeeping steps, and two 'heavy lifters': processClump.py runs Gleam; 'mergeClumps.py' merges run fragments and produces any final data products. Depending on the task configuration, these two job steps may be substantial (xlong queue) or trivial (short queue).

To date, general reprocessing has been done with two independent and sequential pipeline tasks. The first task runs Gleam and produces all desired ROOT files. The second task reads in MERIT files and produces all desired FITS files. This task separation is done to better match the needs of a reprocessing life-cycle most efficiently: development, validation, and resource management while the tasks are running. It also avoids a certain amount of unnecessary job duplication when rollbacks are necessary.

top-level stream

sub-stream

type

primary function

 

setupRun

 

py

  • discover run number, input files; calculate # parallel substreams

 

createClumps

 

jy

  • create substreams

 

 

processClump

py

  • Gleam reprocessing of run fragment (clump)

 

 

clumpDone

jy

  • no-op

 

setupMerge

 

jy

  • collect data from Pipeline II DB and write to file

 

mergeClumps

 

py

  • merge output from processClump;
    create post-merge data products

 

runDone

 

jy

  • register datasets; update processing history DB

 

...

  • Unpack task-level pipeline vars
  • Register (merged) output file in dataCat
  • Make entry in HISTORYRUNS DB table

CODE

Directories

...

Primary task-specific reprocessing code

/nfs/farm/g/glast/u38/Reprocess-tasks

Subdirectories

...

  • /P201-ROOT - specific task producing (primarily) ROOT files, e.g., runs Gleam
  • /P201-FITS - specific task producing (primarily) FITS files, e.g., creates FT1, LS1, etc.
Common reprocessing code

/nfs/farm/g/glast/u38/Reprocess-tasks

  • /commonTools - common files (both code and parameters) available to all tasks
Other code dependencies
  • /afs/slac.stanford.edu/g/glast/ground/PipelineConfig/GPLtools/GPLtools-02-00-00/ - common pipeline tools
  • /afs/slac/g/glast/ground/PipelineConfig/python/@sys/bin/python - Fermi installation of python

commonTools

/nfs/farm/g/glast/u38/Reprocess-tasks/commonTools/00-01-00

...

Task preparation

 

taskConfig.xml

task definition

genRunFile.csh*

generate list of input files for reprocessing

 

 

Pipeline code

 

envSetup.sh*

set up environment to run GR/ST/FT/etc (called by pipeline)

config.py

task configuration (imported by all .py)

setupRun.py*

setup for reprocessing a single run

createClumps.jy

create subprocess for processing a "clump" (part of a run)

processClump.py*

process a clump of data

clumpDone.jy

cleanup after clump processing

setupMerge.jy

setup for merging clumps

mergeClumps.py*

merge all clumps for single run

runFT1skim.sh*

skim FT1 events

runDone.jy

final bookkeeping after run reprocessed (dataCat and runHistory)

commonTools@

link to commonTools

 

 

Input data to pipeline code

 

doRecon.txt

Gleam job options

fullList.txt

List of reprocessing input data files

removeMeritColumns.txt

List of columns to remove from MERIT files

runFile.txt@

Sym link to fullList.txt

 

 

Pipeline control code

 

trickleStream.py*

task-specific config for trickle.py

Running environment

Setting up a proper running environment for the many varied applications and utilities needed to perform data reprocessing is an issue to be treated with care. Therefore a short discussion of this topic will be given.

A given release of Fermi code is built for a finite number of operating systems, compiler versions, hardware address size, compiler options (e.g., optimized or debug), build system (CMT or SCons). Over time, the standard location for these builds at SLAC can move about. Closely related but independent of this, SLAC and SCCS supports slowly evolving set of hardware and software architectures (RHEL5-32, RHEL5-64, RHEL6-64, etc.) and compiler versions (gcc). A system was developed to automate the matching of the best combination of hardware with Fermi software, e.g., there is no available RHEL6-64 build, but a RHEL4-32 build of GlastRelease will run on a RHEL5- or RHEL6- machine.

The first step is setting up an environment to run a particular application or utility, e.g., GlastRelease, ScienceTools, Ftools, Xroot, Oracle, etc. This typically involves defining one or more environment variables and then running a shell script prior to running the compiled executable (in '/exe'). In the case of an SCons build, one can optionally skip the explicit running of the setup shell script by invoking the '/bin' version of the application - which is really a shell script wrapper which then calls the compiled executable. One problem with this implicit shell script wrapper is that one cannot then override selected variables which are set by it. At this point in time, all setup scripts do nothing more than define environment variables.

The approach taken here is to run a setup script in a sub-process, capture all environment variables and store them, along with the original set. When a particular application is invoked, the stored environment variables are defined, the application executed, then the environment restored to its previous condition. In this way, one can easily run, say, Gleam, ScienceTools, Ftools and anything requiring a specific (and, possibly conflicting) setup within the same python script

The current system works only for SCons release builds.

  1. (e.g., GlastRelease, ScienceTools, Ftools, Xroot, Oracle, etc.)