Pipeline II

Pipeline II is an upgrade of the existing pipeline server. It likely to be a somewhat scaled back version of our earlier plans for Pipeline NG. It should use a database schema similar to the current pipeline, but extended to handle new requirements.

Documentation

Requirements

  • Task scheduling should be more flexible that current linear chain
    • Should support parallel execution of tasks
    • Should allow dependency chain to be more general than the input file requirements
    • Should support parallel sub-tasks, with number of sub-tasks defined at runtime
    • Perhaps support conditions based on external dependencies
  • Should allow for remote submission of jobs
    • Perhaps using GRID batch submission component, or Glast specific batch submission system
    • Will need to generalize current system (e.g. get rid of absolute paths)
  • Support reprocessing of data without redefining task
    • Need way to mark Done task as "ReRunnable"
    • Need to support multiple versions of output files
  • Ability to Prioritize tasks
  • Ability to work with "disk space allocator"
  • Would be nice to set parameters (env vars) in task description
  • Would be nice to be able to pass in parameters in "createJob"
  • Ability to suspend tasks
  • Ability to kill tasks
  • Ability to throttle job submission (ie max number of jobs in queue)
  • Ability to map absolute path names to FTP path names (site specific)
  • Would be nice to remove need for "wrapper scripts"
  • Ability to specify batch options (but portability problems)

For more details see Talks at Developers Workshop

  • No labels