Reason for change

During the last week, we have experienced a somewhat serious network problem on the connection between the MOC and SLAC. This problem was exacerbated by a few missed contacts due to ARRs and (probably) by a non-optimal TDRSS schedule due to the Space Shuttle mission. This resulted in several instances (at least one per day) of 'bunched-up' deliveries: 3 or 4 deliveries showing up at the same time for processing, which needed to be manually throttled into L1 processing (the pipeline requested 24/7 monitoring for all the last week).

We implemented an automatic throttling method for L1, which allows only a limited number of runs to start processing at the same time. This is implemented using a fixed number (3, for the time being) of global locks: for each run, the findChunks pre-empt script will try to allocate one of the available locks; if this succeeds, the run will start processing and the lock will be released by mergeMeritChunks (at approximately 70% of the processing). If no lock can be obtained (no throttling lock is available), the pre-empt script will wait 10 minutes and try again.

Test Procedure

We have processed data runs in the DEV pipeline with this mechanism, which proved to work as expected.
The overall performances of the system with throttling have not been tested: we don't expect the mechanism to introduce a general slowdown under normal operating conditions, but we will know this for sure only once it's implemented.
The 2 main throttling parameters (number of available locks and position at which the throttling lock is removed) can be adjusted modifying a config file (this doesn't require the upload of a new L1Proc).

Rollback procedure

We can switch back to the previous version of L1Proc.

CCB Jira

SSC-200@JIRA

Details

L1Pipeline v1r73:
- Adding an automatic throttling mechanism for L1Proc.