You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

Pipeline II has been designed to allow batch jobs to be submitted at SLAC or other sites. Tasks can even be designed to run at multiple sites, allowing jobs to flow to different sites as machines become available to run them. We expect this distributed job submission to be used mainly for MC processing.

When submitting jobs to remote sites the Pipeline II server still runs a single instance at SLAC. Batch jobs are submitted (at SLAC or elsewhere) using a simple batch submission daemon which is designed to hide the details of the batch submission process from the Pipeline II server. When jobs start they send an initial e-mail back to the pipeline server, and when they complete they send an second e-mail stating that they have finished (successfully or not). A data catalog is maintained at SLAC which can contain information on data available at SLAC or elsewhere. Batch jobs to not communicate directly with the database or with the pipeline II server.

Initially the batch submission daemon has been set up to work with LSF, the batch system in use at SLAC, but it has been designed to make few requirements on the specifics of the batch submission system so it should be reasonably easy to port to other sites.

Porting the Job Submission system to a new site

The Job Submission daemon is a separate project in GLAST CVS, with some documentation available here.
Strictly speaking the pipeline server can work with anything that extends the JobControlClient class, which allows jobs to be submitted, queried for their status, and canceled. In practice only a couple of classes in the implementation of the job control client depend on LSF, so we expect remote sites to run a modified version of the standard JobControlClient. The classes which depend on LSF are:

  • No labels