Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Only one delivery can process a run at a time. This is enforced by a lock file in the run directory on u41/L1. If findChunks fails or there are permanent failures in the run and another part of the run is waiting, it has to be removed by hand. It should never be removed unless the only failures in the run are findChunks or permanent ones, or there's a deadlock. Even then you have to wear a helmet and sign a waiver.

HalfPipe locks

The halfpipe uses several locks that can get left behind when jobs die and then cause trouble with retries and rollbacks and other delivereies.

Halfpipe.mergEvt has a run lock. The names are like /nfs/farm/g/glast/u42/ISOC-flight/Downlinks/lock/135d293d where the last bit is the run number in hex. If one of these jobs fails, it may leave this behind, then the retry or rollback will wait until it is removed by hand. But make sure that there actually isn't another instance running.

If cleanup dies, it leaves behind
/nfs/farm/g/glast/u42/ISOC-flight/Downlinks/lock/cleanup
and then all cleanup jobs just pend.

Similarly for launchOnline and
/nfs/farm/g/glast/u42/ISOC-flight/Downlinks/lock/launchOnline
and if
/nfs/farm/g/glast/u42/ISOC-flight/Downlinks/stage/.decode
is there, launchOnline will fail.

findChunks

This process is now automatically retried like most of the others. When it fails, it attempts to the run lock (see above) and the throttle lock (next section) by hand (and usually succeeds). But if it fails harder than usual, you might still have to do that by hand. Also, you'll probably have to:
mv /nfs/farm/g/glast/u41/L1/${runId}/${runId}${deliveryid}_chunkList.txt /nfs/farm/g/glast/u41/L1/${runId}/${runId}${deliveryid}_chunkList.txt.tmp

...