These instructions were provided by Maria Elena after the data processing issues on 11/29-11/30 2017. While the instructions are specific to runs on those days, the process should be general. I've annotated them with some comments and questions as well.
Some commentary from Maria Elena on applicability of the following:
The recipe is valid also for the old runs (from 11/29). I understand that they were repiped, but it'd be nice to also clean up what was run on the DataProcessing page, so that all the alarms etc. are in synch.
BTW, the recipe below also works after repiping. One can still put the repiped chunks in the staging location (/nfs/farm/g/glast/u28/stage/Delivery/Run/), make/deploy the new chunklists, and rollback the failed L1 streams. So that everything stays clean and tidy.
$ cd /nfs/farm/g/glast/u42/ISOC-flight/Downlinks/171130020 <--downlink ID
$ touch haltOnline
$ touch haltL1 # this will prevent the error on launching a new L1 stream
Rollback halfpipe from launchChunks. Do the same for 171130019 and 171130020 (downlink IDs). This will produce new chunkLists for 533759413 (run #) and cleanup halfPipe failures
Obtain a release of svac/L1Pipeline. Wait for halfPipe to finish for 533759413
(what exactly does this mean? are we just getting a tagged copy of the code from the repository? answer seems to be yes)
Go to your release of svac/L1Pipeline and execute:
$ ./newChunkList.py /nfs/farm/g/glast/u28/stage/171130019/r0533759413
$ ./newChunkList.py /nfs/farm/g/glast/u28/stage/171130020/r0533759413
$ cp r0533759413*.txt.new /nfs/farm/g/glast/u41/L1/runs/533/r0533759413/
(you will need glastraw access to write to the folder on u41)
$ cd /nfs/farm/g/glast/u41/L1/runs/533/r0533759413/
$ mv r0533759413_171130020_chunkList.txt r0533759413_171130020_chunkList.txt.old
$ mv r0533759413_171130020_chunkList.txt.new r0533759413_171130020_chunkList.txt
$ mv r0533759413_171130019_chunkList.txt r0533759413_171130019_chunkList.txt.old
$ mv r0533759413_171130019_chunkList.txt.new r0533759413_171130019_chunkList.txt
Rollback 171130020/r0533759413 and 171130019/r0533759413 from findChunks
WARNING!!! DO NOT REMOVE THE RUN LOCK YET!!!
Suspend the findChunk jobs for piece that you want to be executed last (the order shouldn't matter, but I would do it in the same order as the halfPipe)
Remove the lock:
$ cd /nfs/farm/g/glast/u41/L1/runs/533/r0533759413/
$ rm r0533759413.lock
Resume the suspended job, once the first findChunks has started.
That's it. Hopefully this fixes everything.
Here are the full e-mail threads with the details of some of the specific problems that were (possibly) unique to this situation but may be instructive in the future