Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The batch processor This software allows for the easier submission of batch jobs via a web interface. A script that submits the batch job (to allow for more customization in this command) is all that is needed for this system to work.

1) Webpage

To use this system, choose an experiment from https://pswww-dev.slac.stanford.edu/apps-dev/portal/select_experiment.php.

1.1) Batch Job Definitions

Under the Experiment tab, the Batch defs tab is the location where scripts are stored. Below is an example where the example script is used which is described in detail below. A hash name is given to the absolute path of the script along with an arbitrary number of parameters which are separated by a space. An number of hashes can be created; as a hash is entered, a new row appears.

1.2) Batch Control

Over on Run Tables tab, the Batch control tab is where this hash may be applied to experiment runs. In the drop-down menu of the Action column, any hash defined in Batch defs may be selected. There are plans to add more options such as the ability to apply a hash to every run or to automatically apply a hash to a run in real-time.

...

Once a hash is applied to a run, it will appear as shown below. In this case, the example case has finished as shown by the DONE status (other statuses are described below). The last two columns also warrant some explanation.

1.2.1) Status

There are multiple different statuses that a job can have. They include a few used by LSF: PENDRUN, DONE, and EXIT. There are also a few others to describe possible situations that could arise:

  • Request Sent - The first status of a job. If the job has yet to be submitted into LSF, Request Sent is shown to give the user feedback to their submitting.
  • Persisted -  The batch client is down (i.e. the system that submits the jobs for the user).
  • LSF_Slow - A timeout has occured (currently set at 5 seconds) on the posting of the job back to the batch manager. It will mostly be caused by the system hanging up on the submission of the job to LSF.

1.2.2) Actions

There are four different actions which can be applied to a script. They do the following if pressed:

...

 - Returns details for the current job by invoking the "bjobs -l" command on the LSF ID.

1.2.3) Report

This is a customizable column which can be updated by the used script by posting to the correct URL. The URL is stored in the environment variable BATCH_UPDATE_URL. The counters shown in the screenshot above were done with the following syntax which was posted in a for loop in a python script: 

...

As shown, the color of the output can also be customized. Whenever a POST is done for some submitted job (via the hash), the stored JSON for that job is updated only for what is posted. One value of this JSON is counters, along with others like lsf_id, job_database_idstatus and so on.

2) Hash Script

The following example scripts live at /reg/g/psdm/web/ws/test/apps/release/logbk_batch_client/test/submit.sh and /reg/g/psdm/web/ws/test/apps/release/logbk_batch_client/test/submit.py.

2.1) submit.sh

The script that the hash corresponds to is the one that submits the job via the bsub command. This script is shown below.

...

This script will run the batch job on psdebugq and store the log files in /reg/g/psdm/web/ws/test/apps/release/logbk_batch_client/test/logs/<lsf_id>. Also, it will pass all arguments passed to it to the python script, submit.py (these would be the parameters entered in the Batch defs tab).

2.2) submit.py

The Python script is the code that will do analysis and whatever is necessary on the run data. Since this is just an example, the Python script, submit.py, doesn't get that involved. It is shown below.

Code Block
from time import sleep
from requests import post
from sys import argv
from os import environ
from numpy import random
from string import ascii_uppercase

print 'This is a test function for the batch submitting.\n'

## Fetch the URL to POST to
update_url = environ.get('BATCH_UPDATE_URL')
print 'The update_url is:', update_url, '\n'

## Fetch the passed arguments as passed by submit.sh
params = argv
print 'The parameters passed are:'
for n, param in enumerate(params):
    print 'Param %d:' % (n + 1), param
print '\n'

## Run a loop, sleep a second, then POST
for i in range(10):
    sleep(1)
    rand_char = random.choice(list(ascii_uppercase))
 
    print 'Step: %d, %s' % (i + 1, rand_char)
    post(update_url, json={'counters' : {'Example Counter' : [i + 1, 'red'],
                                         'Random Char' : rand_char}})

2.3) Log File Output

The print statements print out to the run's log file. The output of submit.py is below. The first parameter is the path to the Python script, the second is the experiment name, the third is the run number and the rest are the parameters passed to the script.

...