The Automatic Run Processor (or ARP, for short, because I want that to catch on) is a web service that allows for the easier submission of batch jobs via a web interface. A script that submits the batch job (to allow for more customization in this command) is all that is needed for this system to work.

1) Webpage

To use this system, choose an experiment from https://pswww.slac.stanford.edu/apps-dev/portal/select_experiment.php.

1.1) Batch Job Definitions

Under the Experiment tab, the Batch defs tab is the location where scripts are stored. Below is an example where the example script is used which is described in detail below. Any number of hashes can be created; as a hash is entered, a new row appears.

1.1.1) Id

The Id is a unique identifier of each hash. This is used is used in the backend of the software to keep track of each hashtag/executable pair.

1.1.2) Hash

The name given to the executable to be run. As shown below, each hash can be selected under the Action column to apply the script that the hash represents to that run.

1.1.3) Executable

The absolute path to the batch script. An example can be seen here. This script must contain the batch job submission command (bsub) since it gives the user the ability to customize the the batch submission. Overall, it can act as a wrapper for the code that will do the analysis on the data along with submitting the job.

1.1.4) Parameters

The parameters that will be passed to the executable. They must be space-separated.

1.1.5) Autorun

Used for experiments that are currently running. If checked, every new run that is finished (i.e. every run that finishes after the box is checked) will has that hash executed on it by the user that checks the box. If hovered over, the checkbox will show who has checked the box.

1.1.6) Delete

Will remove the hash from the list and the database.

1.2) Batch Control

Over on Run Tables tab, the Batch control tab is where this hash may be applied to experiment runs. In the drop-down menu of the Action column, any hash defined in Batch defs may be selected. There are plans to add more options such as the ability to apply a hash to every run or to automatically apply a hash to a run in real-time.

Once a hash is applied to a run, it will appear as shown below. In this case, the example case has finished as shown by the DONE status (other statuses are described below). The last two columns also warrant some explanation.

1.2.1) Status

There are multiple different statuses that a job can have. They include a few used by LSF: PEND, RUN, DONE, and EXIT. There are also a few others to describe possible situations that could arise:

Request Sent - The first status of a job. If the job has yet to be submitted into LSF, Request Sent is shown to give the user feedback to their submitting.
Persisted - The batch client is down (i.e. the system that submits the jobs for the user).
LSF_Slow - A timeout has occured (currently set at 5 seconds) on the posting of the job back to the batch manager. It will mostly be caused by the system hanging up on the submission of the job to LSF.

1.2.2) Actions

There are four different actions which can be applied to a script. They do the following if pressed:

- Attempt to kill the job (via the bkill command). A green success statement will appear near the top-right of the page if the job is killed successfully and a red failure statement will appear if the job is not killed successfully.

- Delete the hash from the run. Note: this does not kill the job, it only removes it from the webpage.

- Returns the log file for the job. If there is no log file or if no log file could be found, it will return blank.

- Returns details for the current job by invoking the "bjobs -l" command on the LSF ID.

1.2.3) Report

This is a customizable column which can be updated by the used script by posting to the correct URL. The URL is stored in the environment variable BATCH_UPDATE_URL. The counters shown in the screenshot above were done with the following syntax which was posted in a for loop in a python script:

{'counters' : {'Example Counter' : [i + 1, 'red'], 'Random Char' : rand_char}}

As shown, the color of the output can also be customized. Whenever a POST is done for some submitted job (via the hash), the stored JSON for that job is updated only for what is posted. One value of this JSON is counters, along with others like lsf_id, job_database_id, status and so on.

2) Hash Script

The following example scripts live at /reg/g/psdm/web/ws/test/apps/release/logbk_batch_client/test/submit.sh and /reg/g/psdm/web/ws/test/apps/release/logbk_batch_client/test/submit.py.

2.1) submit.sh

The script that the hash corresponds to is the one that submits the job via the bsub command. This script is shown below.

#!/bin/bash

ABS_PATH=/reg/g/psdm/web/ws/test/apps/logbk_batch_client/test
bsub -q psdebugq -o $ABS_PATH/logs/%J.log python $ABS_PATH/submit.py "$@"

This script will run the batch job on psdebugq and store the log files in /reg/g/psdm/web/ws/test/apps/release/logbk_batch_client/test/logs/<lsf_id>. Also, it will pass all arguments passed to it to the python script, submit.py (these would be the parameters entered in the Batch defs tab).

2.2) submit.py

The Python script is the code that will do analysis and whatever is necessary on the run data. Since this is just an example, the Python script, submit.py, doesn't get that involved. It is shown below.

from time import sleep
from requests import post
from sys import argv
from os import environ
from numpy import random
from string import ascii_uppercase

print 'This is a test function for the batch submitting.\n'

## Fetch the URL to POST to
update_url = environ.get('BATCH_UPDATE_URL')
print 'The update_url is:', update_url, '\n'

## Fetch the passed arguments as passed by submit.sh
params = argv
print 'The parameters passed are:'
for n, param in enumerate(params):
    print 'Param %d:' % (n + 1), param
print '\n'

## Run a loop, sleep a second, then POST
for i in range(10):
    sleep(1)
    rand_char = random.choice(list(ascii_uppercase))
 
    print 'Step: %d, %s' % (i + 1, rand_char)
    post(update_url, json={'counters' : {'Example Counter' : [i + 1, 'red'],
                                         'Random Char' : rand_char}})

2.3) Log File Output

The print statements print out to the run's log file. The output of submit.py is below. The first parameter is the path to the Python script, the second is the experiment name, the third is the run number and the rest are the parameters passed to the script.

This is a test function for the batch submitting.

The update_url is: http://psanaphi110:9843//ws/logbook/client_status/450 

The parameters passed are:
Param 1: /reg/g/psdm/web/ws/test/apps/logbk_batch_client/test/submit.py
Param 2: xppi0915
Param 3: 134261
Param 4: param1
Param 5: param2


Step: 1, R
Step: 2, J
Step: 3, T
Step: 4, P
Step: 5, S
Step: 6, B
Step: 7, E
Step: 8, K
Step: 9, X
Step: 10, V

3.0 Frequently Asked Questions (FAQ)

Is it possible to submit more than one job per run?

Yes, each run can accept multiple hashtags.

Can a submitted job submit other subjobs?

Yes, in a standard LSF fashion, BUT the ARP will not know about the subjobs. Only jobs submitted through the ARP webpage are known to the ARP.

When using the 'kill' option, how does ARP know which jobs to kill?

The ARP keeps track of the hashtags for each run and the associated LSF jobid. That information allows the ARP to kill jobs.

As far as I understand there is a json entry for each line which stores info, can one access this json entry somehow?

The JSON values are displayed in the ARP webpage automatically. To access them programmatically, use the kerberos endpoint

ws-kerb/batch_manager/ws/logbook/batches_status/<experiment_id>.
For example, this gives the batch processing status for experiment id 302

import requests
from krtc import KerberosTicket
from urllib.parse import urlparse
ws_url = "https://pswww.slac.stanford.edu/ws-kerb/batch_manager/ws/logbook/batches_status/302"
krbheaders = KerberosTicket("HTTP@" + urlparse(ws_url).hostname).getAuthHeaders()
r = requests.get(ws_url, headers=krbheaders)
print(r.json())

Child pages

Automatic Run Processing