These instructions were provided by Maria Elena after the data processing issues on 11/29-11/30 2017. While the instructions are specific to runs on those days, the process should be general. I've annotated them with some comments and questions as well.

 

Some commentary from Maria Elena on applicability of the following:

The recipe is valid also for the old runs (from 11/29). I understand that they were repiped, but it'd be nice to also clean up what was run on the DataProcessing page, so that all the alarms etc. are in synch.

BTW, the recipe below also works after repiping. One can still put the repiped chunks in the staging location (/nfs/farm/g/glast/u28/stage/Delivery/Run/), make/deploy the new chunklists, and rollback the failed L1 streams. So that everything stays clean and tidy.

 

Before Rolling Back the HalfPipe

$ cd /nfs/farm/g/glast/u42/ISOC-flight/Downlinks/171130020 <--downlink ID

$ touch haltOnline

$ touch haltL1 # this will prevent the error on launching a new L1 stream

 

How to Roll Back the HalfPipe

Rollback halfpipe from launchChunks. Do the same for 171130019 and 171130020 (downlink IDs).  This will produce new chunkLists for 533759413 (run #) and cleanup halfPipe failures

Fix the Chunk Lists

Obtain a release of svac/L1Pipeline. Wait for halfPipe to finish for 533759413

(what exactly does this mean? are we just getting a tagged copy of the code from the repository? answer seems to be yes)

A copy is installed here: /afs/slac.stanford.edu/g/glast/ground/PipelineConfig/Level1/5.7

 

Go to your release of svac/L1Pipeline and execute:

$ ./newChunkList.py /nfs/farm/g/glast/u28/stage/171130019/r0533759413

$ ./newChunkList.py /nfs/farm/g/glast/u28/stage/171130020/r0533759413

 

$ cp r0533759413*.txt.new /nfs/farm/g/glast/u41/L1/runs/533/r0533759413/

(you will need glastraw access to write to the folder on u41)

 

$ cd /nfs/farm/g/glast/u41/L1/runs/533/r0533759413/

$ mv r0533759413_171130020_chunkList.txt r0533759413_171130020_chunkList.txt.old

$ mv r0533759413_171130020_chunkList.txt.new r0533759413_171130020_chunkList.txt

$ mv r0533759413_171130019_chunkList.txt r0533759413_171130019_chunkList.txt.old

$ mv r0533759413_171130019_chunkList.txt.new r0533759413_171130019_chunkList.txt

Fix the Failed L1Proc Streams

Rollback 171130020/r0533759413 and 171130019/r0533759413 from findChunks

WARNING!!! DO NOT REMOVE THE RUN LOCK YET!!!

 

Suspend the findChunk jobs for piece that you want to be executed last (the order shouldn't matter, but I would do it in the same order as the halfPipe)

 

Remove the lock:

$ cd /nfs/farm/g/glast/u41/L1/runs/533/r0533759413/

$ rm r0533759413.lock

 

Resume the suspended job, once the first findChunks has started.

 

That's it. Hopefully this fixes everything.

 

Full e-mail threads

Here are the full e-mail threads with the details of some of the specific problems that were (possibly) unique to this situation but may be instructive in the future

 

  • No labels