Overview
Just like the CMT Release Manager, the SCons Release Manager is split into several components. These components are the batch submission system, the workflow system, and the release manager system. Unlike the CMT Release Manager, the SCons Release Manager is not written in perl. It is written in C++ relying mostly on the Qt library. Additionally, it depends on the Boost Spirit library, and the stx-exparser library. The Qt library is used for most platform independent interactions. The Boost spirit and stx-exparser are used on the linux side only for linux components only.
Batch Submission System
Just like its CMT counterpart, the Batch Submission System is used to submit jobs to lsf, monitor the jobs, and notify the layers above of finished jobs. The information for the Batch Submission system is stored in a MySQL database on mysql-node03.slac.stanford.edu. Additionally, Batch Submission System consists of a c++ library and two applications which obtain their information from the database.
Just like the CMT counterpart, the Batch Submission System accepts jobs to execute from the upper layers of the system, executes them on lsf, and notifies the upper layer of state changes.
Database
The MySQL database for the Batch Submission System can be accessed by the rd_lsf_u user and the glastreader user. The database name is rd_lsf. The database contains the following tables
- arguments – This table contains the arguments to be passed to the program when executing on lsf.
- callback – This table contains the details on which state changes the upper layer should be notified.
- callbackArgs – This table contains arguments to pass to the callback program when executing it.
- command – This table contains the command to execute on lsf.
- job – This table contains contains meta data, such as when it was registered, on commands to be executed on lsf.
- lsfOptions – This table contains any options to be passed to lsf when submitting.
- output – This table contains the output of the lsf job.
- run – This table contains the details of a job's attempt at executing on lsf.
- settings – This table contains global settings for the Batch Submission System.
Applications
The Batch Submission System consists of two applications. The lsf Daemon and the lsf callback application. The daemon is setup to run on glastlnx20 via trscron. It checks the database for new jobs. Jobs are submitted to lsf in suspend mode. For each job, there are three lsf tasks submitted. The first tasks is the actual command to be executed. The second task is a call to lsf callback application with the condition that it be executed as soon as the first task starts execution. The third task is a call to lsf callback application with the condition that it be executed as soon as the first task finishes execution. All lsf tasks are submitted and initially put into suspend mode. If all three tasks are submitted successfully, the jobs are changed from suspend to pending mode so they actually start execution. If at any time anything goes wrong, the lsf tasks are killed and the run in the table is marked as failed. The job is put back into the queue to be attempted at a later time as a new run.
The lsf callback application is called once when the lsf task starts and once when it finishes. When started, it updates the run table with the information about the start time. Additionally, it checkes the callback table to determine if the higher layers wish to be notified when the job has started. If so, it calls the callback command to notify the higher levels. When called for a task that has finished. The callback system checks the lsf output, parses some of the information, like the return code, and stores that information in the run, job, and output tables. Additionally, it checks for any callback information for the higher layers.
Workflow
The Workflow system is a rule based script execution system. Each script is considered a stage in the workflow. The workflow moves from one stage to the next by evaluating rules set forth for each stage. The rules and stages for the workflow are stored in a database on mysql-node03. The workflow consists of a static library and a single executable. The library is used by other systems that wish to use the workflow for initial submission and controlling various settings of the workflow runs. The executable is run after each stage of the workflow finishes execution in the batch submission system described above. It computes which new stages need to be executed based on the rules described for the finished stage.
Database
The database is stored on mysql-node03 and the database name is rd_workflow. The database contains the following tables
+-----------------------+ | Tables_in_rd_workflow | +-----------------------+ | batchOpts | | batchOptsOverride | | conditions | | run | | runArgs | | runScripts | | runScriptsOverride | | runSettings | | settings | | workflowScripts | | workflows | +-----------------------+
Each table performs a unique function as follows
- settings – This table contains name/value pairs of settings for the workflow system.
- workflows – This table contains a list of workflows registered.
- workflowScripts – This table contains a list of all the scripts (aka stages) for a particular registered workflow.
- conditions – This table lists all the conditions for a particular stage and what next stage to execute if condition evaluates to true.
- batchOpts – This table contains a name/value pair of batch options to pas to the batch submission system for a particular stage.
- run – This table contains the information for an actual execution of a workflow.
- runScripts – This table contains which stage(s) are currently executing for a particular run.
- runScriptsOverride – This table contains a few override settings of default stage settings such as which batch queue to execute.
- runSettings – This table contains name/value pairs of settings to override for a particular run.
- runArgs – This table contains the arguments to pass to the stages for a particular run.
- batchOptsOverride – This table contains name/value pairs of batch options to override for a particular stage in a particular run.
Library
The static library for the workflow system contains a set of functions that allows C++ programs to create new workflow runs and for specifying settings for these runs. The source code and the available functions can be viewed in CVS and is also compiled and available at ~glastrm/grits-cpp/src. The library makes use of the Qt libraries and uses qmake as its build system. The Qt library currently used is version 4 and it's installed in ~glastrm/externals which is a set of symlinks using AFS @sys variable to the appropriate platform installation in nfs.
Executable
The workflow system has a single executable called workflowCallback. It is executed by the batch submission system when each stage starts executing and stops executing. When called for the start of the execution of a stage, it simply updates the mysql tables to indicate the start time. When executed to indicate the stop of execution of a stage, additionally it also checks the condition table to determine which, if any, other stages need to be executed. If new stages for execution are found they are submitted to the batch submission system and the cycle is repeated until a particular run has no more stages to execute.
Release Manager
Just like in the CMT version, the Release Manager consists of a few scripts that are registered in the workflow system as various stages. Conditions are setup in the workflow stages for which stages to execute next. Additionally, the Release Manager consists of several mysql tables.
Database
The Release Manager database is stored on mysql-node03. The database name is rd_releasemgr. The database contains the following tables:
+---------------------------+ | Tables_in_rd_releasemgr | +---------------------------+ | build | | buildPackage | | extLib | | os | | outputMessage | | package | | settings | | subPackage | | subPackageCompileFailures | | variant | | versionType | +---------------------------+
The explanation of each of the tables is as follows
- build – This contains the platform dependent information for a particular build.
- buildPackage – This contains the platform independent information for a particular build.
- extLib – This contains the external libraries used for a particular build.
- os – This contains a list of operating systems currently supported.
- outputMessage – This contains the checkout or compile output for a particular build.
- package – This contains a list of checkout packages currently supported.
- settings – This contains both global and build specific settings for the Release Manager.
- subPackage – This contains a list of subPackages and their compile status for a particular build.
- subPackageCompileFailures – This contains the location of specific compile failures in the compile output for a particular subPackage.
- variant – This contains a list of supported variants.
- versionType – This contains a list of supported versionTypes.
The os, variant, versionType, and package tables define a list of possible combinations of what the Release Manager can compile or keep track of. When new packages, oses, etc. are created they need to be inserted in these tables. Each of these tables will create automatic unique IDs that are referenced by the other tables. The settings table references both platform independent settings and platform dependent settings. The platform independent settings have NULL values for the columns that reference the os or variant tables and only include values for the package and versionType tables. The platform dependent settings have values that refer to os, variant, versionType, and package tables.
Executables
The executables and a short description of what they perform is as follows
- checkoutBuild – This executable checks out code from cvs in preparation to building it.
- cleanBuild – This executable erases parts of a build that are not needed for execution (source code and temporary files).
- compileBuild – This executable builds the code checked out with checkoutBuild.
- createDoxygen – This executable generates doxygen output for a build.
- deleteBuild – This executable is an interactive program to trigger the RM to erase a build.
- eraseBuild – This executable erases everything belonging to a build and marks the build as hidden in the database.
- finishBuild – This executable marks the build as having finished.
- releaseManagerDaemon – This executable runs in the background submitting new builds to the workflow.
- testBuild – This executable runs the unit tests for a particular build.
- triggerBuild – This executable can be used to trigger a new build.
The releaseManagerDaemon runs for 24 hours on glastlnx20. In that period it checks for new builds to submit to the workflow based on the rules specified in the settings tables. It is capable of submitting builds for all the combinations that are allowed by the os, variant, versionType, and package tables as long as appropriate settings exist in the settings table. This daemon is started up by trscrontab as the glastrm user on glastlnx20 every 24 hours shortly after the previous instance has quit.
All executables take an argument of --buildId buildId. The buildId references the unique ID stored in the build table.