Name:

Automated processing Pipeline (aka Fermi Pipeline)

What:

A system for automatically managing arbitrary graphs of processing jobs. Major features:

  • Automated submission and monitoring of batch jobs with high reliability
  • Maintains full history of all data processing
  • Ability to parallelize processing of subgraphs of jobs
  • Ability to embed python scripts to perform simple computations between job steps
  • Ability to rerun jobs (whether successful or not)
    The pipeline consists of the core pipeline server, line mode client, web interface, and job control daemons. Currently supported batch systems: LSF, BQS, GridEngine, Condor.

Related Projects:

Data Catalog

Who:

Current SCA Developers: Tony Johnson, Brian Van Klaveren
Past SCA Developers: Dan Flath, Charlotte Hee, Karen Heidenreich
Key SCA Users: Tom Glanzman, Warren Focke

Used by:

  • Fermi Gamma Ray Space Telescope for Prompt (L1) Processing, Monte Carlo, Science Processing, reprocessing 
  • EXO for MC simulation, data processing, reprocessing
  • CDMS for MC Simulation at SLAC and SMU
  • CTA for MC Simulation at SLAC

Status:

Stable/Supported. Pipeline core designed for use by experiments and projects at SLAC, use of pipeline core beyond SLAC could be considered if a strong use case is discovered. Supported for use by current and new SLAC projects. Job Control daemons can be installed at remote sites to allow jobs to be submitted from SLAC to other sites. No major new features are currently planned but incremental improvements, bug fixes and minor feature requests will be supported.

Planned Work:

  • Extension to support submission of jobs to Grid via Dirac. This work is being performed mainly by Fermi and CTA collaborators in Europe. 
  • Performance improvements
  • Completion of job cancellation features.
  • Support for "split mode" (jobs within a task being submitted to different sites)

Possible Future Work:

  • Remove dependency on oracle. 
  • More performance improvements.
  • More interactive web interface.
  • JSON interface for improved integration with languages like Python.
  • Better job throttling (limit max number of running jobs, based on resource usage).
  • Handle more batch job types, including perhaps Open Science Grid.

Last Updated:

May 2012 by tonyj