Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The other parameter is set on a per submission source basis.  This is the ReaperDelayMinutes parameter and is found in each of the later sections on that page.  This controls how long a process has to be dead for before the reaper kills it.  It is typically set to 60 or 120 minutes.  NOTE (from Warren): I'm not convinced that ReaperDelayMinutes actually does anything. Restarting the pipeline set both of them back to default.

 

HalfPipe marked as Failed but L1 started successfully

This came up with delivery 180409011.  In this case the launchL1 task was started simultaneously on two different hosts in the batch queue.  The first on ran successfully but since the second one failed with an error saying the L1 stream already existed, the process was marked as failed.  To clean up the data display do the following:

1) Create a lock file in the appropriate downlink directory: /nfs/farm/g/glast/u42/ISOC-flight/Downlinks/<downlink id>/haltL1

2) Check to see if the input directory has moved. It is normally at /nfs/farm/g/glast/u28/stage/<downlink id> but L1 may have moved it to /nfs/farm/g/glast/u41/L1/deliveries/<YYMM>/<downlink id> where YYMM is just the first 4 digits of the downlink id.  If it has moved, just create a symbolic link at the old location. e.g. ln -s /nfs/farm/g/glast/u41/L1/deliveries/1804/180409011 /nfs/farm/g/glast/u28/stage/180409011

3) rollback the launchL1 process so it runs properly.

4) remove the lock file and the symbolic link.