You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview

The CMT Release Manager is split into several components. The Batch submission system, the Workflow manager, and the Release Manager. These components work together to perform the automated builds in CMT. The Batch submission is at the lowest level. It accepts input from the layers above, such as the Workflow manager, and submits the jobs to lsf. Optionally, it notifies the layer above when jobs have started and when they finished, along with any information such as return code or output. The Workflow manager is a simple rule engine. It submits jobs to the Batch submission system and, based on rules and the output from a job submitted to lsf, it determines the next job to execute. Finally, the Release Manager is a set of scripts that are registered in the Workflow system. The are executed based on the rules set in the Workflow.

Cron system

The cron system is a simple script designed to execute a script at the specified time on the specified host. It's supposed to supplement trscron by allowing us to execute the cron job on several computers at once while only having one computer actually execute the script. The goal is to allow for failover when a host crashes and becomes unavailable for a prolonged period. The information of the cron system is stored in a MySQL database on glastDB.slac.stanford.edu (aka glastlnx01.slac.stanford.edu). The information is used by a script to perform the actions described in it.

Database

The cron system's database contains two tables. The first table is called crontab and contains the same information as what would be stored in the cron command configuration file. The settings table contains information such as which host is the primary host for executing the cron jobs. All other hosts, except the primary one, will not execute the job.

The crontab table is organized as follows:

+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| user       | varchar(255) |      |     | glast   |       |
| minute     | varchar(255) |      |     | *       |       |
| hour       | varchar(255) |      |     | *       |       |
| dayOfMonth | varchar(255) |      |     | *       |       |
| month      | varchar(255) |      |     | *       |       |
| dayOfWeek  | varchar(255) |      |     | *       |       |
| command    | varchar(255) |      |     |         |       |
+------------+--------------+------+-----+---------+-------+

The user field contains the user under which the cron job should be executed. All other fields are identical to the cron command and will not be explained here.

The settings table is organized as follows:

+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| name  | varchar(255) |      |     |         |       |
| value | varchar(255) |      |     |         |       |
+-------+--------------+------+-----+---------+-------+

The settings table contains only name/value pairs to describe the settings. Currently supported name/value pairs are:

  • master – The value field for this name field will contain which host is considered the master. Cron jobs only execute on the master.
  • runTime – This field controls how long the cron script will remain running before it shuts down.

The runTime field is meant as a workaround to afs tokens. Since afs tokens expire, this shuts down the cron script. A new instance of the script is started shortly after by trscron with a new afs token.

Batch Submission

The Batch submission system consists of two a few simple scripts and a simple database system. Its only purpose is to submit jobs to lsf and notify the caller of the job when it has finished. Additionally, it provides the caller with the return code information as well as any output produced. Due to technical difficulties, the Batch submission system does not guarantee that it can provide the output or the return code. The central information about the Batch submission system is stored in a MySQL database. This database is accessed by a bunch of scripts to perform the tasks described in it. Additionally, the information in the database is displayed visually on a webpage.

Database

The database table for the Batch Submission system is stored on glastDB.slac.stanford.edu (aka glastlnx01.slac.stanford.edu). The database name is bsub and contains a single table entry named jobs.

The jobs table is structured as follows:

+----------------+--------------------------------------------------+------+-----+---------+----------------+
| Field          | Type                                             | Null | Key | Default | Extra          |
+----------------+--------------------------------------------------+------+-----+---------+----------------+
| jobId          | bigint(20) unsigned                              |      | PRI | NULL    | auto_increment |
| lsfId          | bigint(20) unsigned                              |      |     | 0       |                |
| command        | varchar(255)                                     |      |     |         |                |
| args           | varchar(255)                                     | YES  |     | NULL    |                |
| queue          | varchar(255)                                     |      |     | short   |                |
| batchOpts      | varchar(255)                                     | YES  |     | NULL    |                |
| status         | enum('waiting','submitting','pending','running') |      |     | waiting |                |
| tries          | int(11)                                          |      |     | 0       |                |
| onSuccess      | varchar(255)                                     | YES  |     | NULL    |                |
| workingDir     | varchar(128)                                     |      |     |         |                |
| outputLocation | varchar(255)                                     | YES  |     | NULL    |                |
| user           | varchar(128)                                     | YES  |     | NULL    |                |
+----------------+--------------------------------------------------+------+-----+---------+----------------+

The fields in the table are used as follows:

  • jobID – A unique ID that is automatically created when a new job is registered with the batch submission.
  • lsfId – Initially contains a value of 0 and will be filled with the ID provided by lsf once the job is submitted.
  • command – The full path of the executable to be submitted to lsf.
  • args – The arguments to be passed to the executable.
  • queue – The queue the job is submitted to.
  • batchOpts – A string containing the options to pass to bsub while submitting.
  • status – Field containing what the status of the job is.
    • waiting – The job is waiting to be submitted.
    • submitting – The bsub command is in process of executing.
    • pending – The job has been submitted to lsf but hasn't started executing yet.
    • running – The job has started execution.
  • tries – A counter indicating how many times the job has attempted to execute the bsub command (and failed).
  • onSuccess* – A string indicating what to do when the job is submitted, starts executing, or finishes.
    • The format of the string is either script:/path/to/script or email:valid@email.address.
  • workingDir – Path to the working directory to use when running the job.
  • outputLocation – Full path to the file where the output of the job should be saved to.
  • user – The user to run this job under. Currently only supports the glast user accounts glastrm, glast, etc.

Scripts

The Release Manager is controlled by a bunch of scripts that are located in /u/gl/glastrm/ReleaseManager/, /u/gl/glast/perl-modules, /u/gl/glast/ReleaseManager, and /u/gl/glast/infraCron. The list of these script that require explanation is:

  • trigger.pl – This script is executed by the Workflow system to determine when a new job should be started.
  • rmTodo.pl – This script is executed to perform user initiated functions such as erasing builds, triggering builds, etc.
  • All other scripts are fairly self explanator and require arguments of the form packageversiontag.

Web interface

The web page for the Release Manager is https://www.slac.stanford.edu/www-glast-dev/cgi/ReleaseManager. It is controlled by the SLAC web server. The information is displayed by a bunch of perl scripts located in /afs/slac/g/www/cgi-wrap-bin/glast/ground/cg/ReleaseManager.

  • No labels