You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Introduction

This Pipeline facility has five major functions:

  • automatically process Level 0 data through reconstruction (Level 1)
  • provide near real-time feedback to IOC
  • facilitate the verification and generation of new calibration constants
  • re-process existing data
  • produce bulk Monte Carlo simulations
  • backup all data that passes through

The Pipeline can be expressed as five components:

  1. database access layer
  2. execution layer
  3. scheduler
  4. user interface
  5. relational database (management system): RDBMS

The scheduler is the main loop of the Pipeline. This long running process polls the database for new tasks, and dispatches processes to the execution layer.

The execution layer exists to abstract site specific details about how computing resources are invoked. It handles launching jobs and collecting output. At SLAC, this will be a thin wrapper around the LSF batch system toolchain. Other implementations will support simple clusters of machines using SSH for remote invocation, and single machine use where jobs are launched on same machine as scheduler.

The database access module contains all SQL queries and statements required by other parts of the system. By keeping the rest of the system from knowing anything about the database, we isolate from changes to both the schema and the database engine.

Inferred Functionality

  • operate in an automated fashion. After initial configuration, all processing should be automatic, requiring no human intervention
  • maintain at least a linear chain of processes ("process elements") per run identifier that will be executed for given task type
  • process elements should know what their input and output datasets are
  • maintain the state of processing: status of completed and pending elements. The state of processing should be recoverable from the database.
  • keep track of all datasets involved in the processing, keeping metadata about the file properties and path
  • No labels