You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Overview

Just like the CMT Release Manager, the SCons Release Manager is split into several components. These components are the batch submission system, the workflow system, and the release manager system. Unlike the CMT Release Manager, the SCons Release Manager is not written in perl. It is written in C++ relying mostly on the Qt library. Additionally, it depends on the Boost Spirit library, and the stx-exparser library. The Qt library is used for most platform independent interactions. The Boost spirit and stx-exparser are used on the linux side only for linux components only.

Batch Submission System

Just like its CMT counterpart, the Batch Submission System is used to submit jobs to lsf, monitor the jobs, and notify the layers above of finished jobs. The information for the Batch Submission system is stored in a MySQL database on mysql-node03.slac.stanford.edu. Additionally, Batch Submission System consists of a c++ library and two applications which obtain their information from the database.

Just like the CMT counterpart, the Batch Submission System accepts jobs to execute from the upper layers of the system, executes them on lsf, and notifies the upper layer of state changes.

Database

The MySQL database for the Batch Submission System can be accessed by the rd_lsf_u user and the glastreader user. The database name is rd_lsf. The database contains the following tables

  • arguments – This table contains the arguments to be passed to the program when executing on lsf.
  • callback – This table contains the details on which state changes the upper layer should be notified.
  • callbackArgs – This table contains arguments to pass to the callback program when executing it.
  • command – This table contains the command to execute on lsf.
  • job – This table contains contains meta data, such as when it was registered, on commands to be executed on lsf.
  • lsfOptions – This table contains any options to be passed to lsf when submitting.
  • output – This table contains the output of the lsf job.
  • run – This table contains the details of a job's attempt at executing on lsf.
  • settings – This table contains global settings for the Batch Submission System.

Applications

The Batch Submission System consists of two applications. The lsf Daemon and the lsf callback application. The daemon is setup to run on glastlnx20 via trscron. It checks the database for new jobs. Jobs are submitted to lsf in suspend mode. For each job, there are three lsf tasks submitted. The first tasks is the actual command to be executed. The second task is a call to lsf callback application with the condition that it be executed as soon as the first task starts execution. The third task is a call to lsf callback application with the condition that it be executed as soon as the first task finishes execution. All lsf tasks are submitted and initially put into suspend mode. If all three tasks are submitted successfully, the jobs are changed from suspend to pending mode so they actually start execution. If at any time anything goes wrong, the lsf tasks are killed and the run in the table is marked as failed. The job is put back into the queue to be attempted at a later time as a new run.

The lsf callback application is called once when the lsf task starts and once when it finishes. When started, it updates the run table with the information about the start time. Additionally, it checkes the callback table to determine if the higher layers wish to be notified when the job has started. If so, it calls the callback command to notify the higher levels. When called for a task that has finished. The callback system checks the lsf output, parses some of the information, like the return code, and stores that information in the run, job, and output tables. Additionally, it checks for any callback information for the higher layers.

Workflow

The Workflow system is a rule based script execution system. Each script is considered a stage in the workflow. The workflow moves from one stage to the next by evaluating rules set forth for each stage. The rules and stages for the workflow are stored in a database on mysql-node03. The workflow consists of a static library and a single executable. The library is used by other systems that wish to use the workflow for initial submission and controlling various settings of the workflow runs. The executable is run after each stage of the workflow finishes execution in the batch submission system described above. It computes which new stages need to be executed based on the rules described for the finished stage.

Database

The database is stored on mysql-node03 and the database name is rd_workflow. The database contains the following tables

+-----------------------+
| Tables_in_rd_workflow |
+-----------------------+
| batchOpts             |
| batchOptsOverride     |
| conditions            |
| run                   |
| runArgs               |
| runScripts            |
| runScriptsOverride    |
| runSettings           |
| settings              |
| workflowScripts       |
| workflows             |
+-----------------------+

Each table performs a unique function as follows

  • settings – This table contains name/value pairs of settings for the workflow system.
  • workflows – This table contains a list of workflows registered.
  • workflowScripts – This table contains a list of all the scripts (aka stages) for a particular registered workflow.
  • conditions – This table lists all the conditions for a particular stage and what next stage to execute if condition evaluates to true.
  • batchOpts – This table contains a name/value pair of batch options to pas to the batch submission system for a particular stage.
  • run – This table contains the information for an actual execution of a workflow.
  • runScripts – This table contains which stage(s) are currently executing for a particular run.
  • runScriptsOverride – This table contains a few override settings of default stage settings such as which batch queue to execute.
  • runSettings – This table contains name/value pairs of settings to override for a particular run.
  • runArgs – This table contains the arguments to pass to the stages for a particular run.
  • batchOptsOverride – This table contains name/value pairs of batch options to override for a particular stage in a particular run.

Library

The static library for the workflow system contains a set of functions that allows C++ programs to create new workflow runs and for specifying settings for these runs. The source code and the available functions can be viewed in CVS and is also compiled and available at ~glastrm/grits-cpp/src. The library makes use of the Qt libraries and uses qmake as its build system. The Qt library currently used is version 4 and it's installed in ~glastrm/externals which is a set of symlinks using AFS @sys variable to the appropriate platform installation in nfs.

Executable

The workflow system has a single executable called workflowCallback. It is executed by the batch submission system when each stage starts executing and stops executing. When called for the start of the execution of a stage, it simply updates the mysql tables to indicate the start time. When executed to indicate the stop of execution of a stage, additionally it also checks the condition table to determine which, if any, other stages need to be executed. If new stages for execution are found they are submitted to the batch submission system and the cycle is repeated until a particular run has no more stages to execute.

  • No labels