Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Remember to escape the square brackets if you are in tcsh.

When NOT to rollback

There's currently (1.80) a bug that causes duplicated events during crumb merges in some circumstances.

Do not roll back doCrumb streams or successful digitization, fakeFT2, or setupCrumbs process instances. It should be safe to roll back recon process instances, failed or terminated digitization, fakeFT2, or setupCrumbs process instances, and doChunk streams.

When in doubt, roll back the whole doChunk stream.

When to rollback

Wait for the "setStatus" stream to have run.

...

  • Look at the dontCleanUp file. It should contain a list of all merge processes that missed files, and what files were missing. IIt's possible that it will get garbled by multiple jobs writing to it at once, so if it doesn't seem to make sense, you can still get the information by following the steps below.

...

Only one delivery can process a run at a time. This is enforced by a lock file in the run directory on u52/L1. If findChunks fails or there are permanent failures in the run and another part of the run is waiting, it has to be removed by hand. It should never be removed unless the only failures in the run are findChunks or permanent ones, or there's a deadlock. Even then you have to wear a helmet and sign a waiver.

findChunks

HOPEFULLY MOSTLY OBSOLETE, see next paragraph: This process is not automatically retried like most of the others. If it fails, you have to roll it back by hand. And remove the run lock (see above) and the throttle lock (next section) by hand. And you'll probably have to
mv /nfs/farm/g/glast/u52/L1/${runId}/${runId}_${deliveryid}_chunkList.txt /nfs/farm/g/glast/u52/L1/${runId}/${runId}_${deliveryid}_chunkList.txt.tmp

It does get retried now. When it fails, it attempts to perform the steps in the paragraph above (and usually succeeds). But if it fails harder than usual, you might still have to do that by hand.

#WBF I think we should have a section on chunkLists. If anyone beats me to writing it, go ahead!

...