You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Overview

Just like the CMT Release Manager, the SCons Release Manager is split into several components. These components are the batch submission system, the workflow system, and the release manager system. Unlike the CMT Release Manager, the SCons Release Manager is not written in perl. It is written in C++ relying mostly on the Qt library. Additionally, it depends on the Boost Spirit library, and the stx-exparser library. The Qt library is used for most platform independent interactions. The Boost spirit and stx-exparser are used on the linux side only for linux components only.

Batch Submission System

Just like its CMT counterpart, the Batch Submission System is used to submit jobs to lsf, monitor the jobs, and notify the layers above of finished jobs. The information for the Batch Submission system is stored in a MySQL database on mysql-node03.slac.stanford.edu. Additionally, Batch Submission System consists of a c++ library and two applications which obtain their information from the database.

Just like the CMT counterpart, the Batch Submission System accepts jobs to execute from the upper layers of the system, executes them on lsf, and notifies the upper layer of state changes.

Database

The MySQL database for the Batch Submission System can be accessed by the rd_lsf_u user and the glastreader user. The database name is rd_lsf. The database contains the following tables

  • arguments – This table contains the arguments to be passed to the program when executing on lsf.
  • callback – This table contains the details on which state changes the upper layer should be notified.
  • callbackArgs – This table contains arguments to pass to the callback program when executing it.
  • command – This table contains the command to execute on lsf.
  • job – This table contains contains meta data, such as when it was registered, on commands to be executed on lsf.
  • lsfOptions – This table contains any options to be passed to lsf when submitting.
  • output – This table contains the output of the lsf job.
  • run – This table contains the details of a job's attempt at executing on lsf.
  • settings – This table contains global settings for the Batch Submission System.

Applications

The Batch Submission System consists of two applications. The lsf Daemon and the lsf callback application. The daemon is setup to run on glastlnx20 via trscron. It checks the database for new jobs. Jobs are submitted to lsf in suspend mode. For each job, there are three lsf tasks submitted. The first tasks is the actual command to be executed. The second task is a call to lsf callback application with the condition that it be executed as soon as the first task starts execution. The third task is a call to lsf callback application with the condition that it be executed as soon as the first task finishes execution. All lsf tasks are submitted and initially put into suspend mode. If all three tasks are submitted successfully, the jobs are changed from suspend to pending mode so they actually start execution. If at any time anything goes wrong, the lsf tasks are killed and the run in the table is marked as failed. The job is put back into the queue to be attempted at a later time as a new run.

The lsf callback application is called once when the lsf task starts and once when it finishes. When started, it updates the run table with the information about the start time. Additionally, it checkes the callback table to determine if the higher layers wish to be notified when the job has started. If so, it calls the callback command to notify the higher levels. When called for a task that has finished. The callback system checks the lsf output, parses some of the information, like the return code, and stores that information in the run, job, and output tables. Additionally, it checks for any callback information for the higher layers.

  • No labels