Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

1) How do we know that? Is it just because the logs complain that they are not there? How do we check for their existence?
2) Where do they live to copy them from?
3) I'm assuming they need to get copied to directory that it can't find them in (root://glast-rdr.slac.stanford.edu//glast/Scratch/l1Stage/runs/542/r0542185591/e00000000000002729588/event/ in this case). Or do they go somewhere else?
4) How do we perform the copy?

See #10 below

5) I'm guessing we'll need to move/rename the existing chunkList file so a new one can be created at this point? Is this correct? (BTW the notes say we should have a section on chunkList files that no one has written yet)
6) Where do we roll back at to get everything going again? Just a general rollback on the command line? Or is there a specific task that can be rolled back to kick everything off properly again?

...

Each has a directory corresponding to the chunk number. The second-to-last component in the path you give above.

I use a command like this to generate a script to move them back to NFS:

"awk '/^\ /afs.*xrdcp/ {print  $1, $2, $3, $5, $4}' /nfs/farm/g/glast/u41/L1/logs/PROD/L1Proc/5.5/doRun/findChunks/160xxxxxx/515xxx/011/485xxxxxx/001xxx/694/archive/198092024/logFile.txt

Final thoughts from Warren on this particular issue:

There are some things here I don't understand, LSF is definitely screwing up and maybe the ippeline too, but:

...

Here's an example the output for the issue in https://www-glast.stanford.edu/protected/mail/opsprob/15272.html

Code Block
languagebash
titleExample
collapsetrue
% awk '/^\/afs.*xrdcp/{print  $1, $2, $3, $5, $4}' /nfs/farm/g/glast/u41/L1/logs/PROD/L1Proc/5.9/doRun/findChunks/220xxxxxx/805xxx/013/681xxxxxx/413xxx/156/archive/342431233/logFile.txt | tee cp.sh
/afs/slac.stanford.edu/g/glast/applications/xrootd/PROD/bin/xrdcp -np -f root://glast-rdr.slac.stanford.edu//glast/Scratch/l1Stage/runs/681/r0681413156/e00000000000009129671/event/r0681413156_e00000000000009129671_v342431233_event.evt /nfs/farm/g/glast/u28/stage/220805013/r0681413156/r0681413156-e00000000000009129671.evt
/afs/slac.stanford.edu/g/glast/applications/xrootd/PROD/bin/xrdcp -np -f root://glast-rdr.slac.stanford.edu//glast/Scratch/l1Stage/runs/681/r0681413156/e00000000000008510966/event/r0681413156_e00000000000008510966_v342431233_event.evt /nfs/farm/g/glast/u28/stage/220805013/r0681413156/r0681413156-e00000000000008510966.evt
% cat cp.sh
/afs/slac.stanford.edu/g/glast/applications/xrootd/PROD/bin/xrdcp -np -f root://glast-rdr.slac.stanford.edu//glast/Scratch/l1Stage/runs/681/r0681413156/e00000000000009129671/event/r0681413156_e00000000000009129671_v342431233_event.evt /nfs/farm/g/glast/u28/stage/220805013/r0681413156/r0681413156-e00000000000009129671.evt
/afs/slac.stanford.edu/g/glast/applications/xrootd/PROD/bin/xrdcp -np -f root://glast-rdr.slac.stanford.edu//glast/Scratch/l1Stage/runs/681/r0681413156/e00000000000008510966/event/r0681413156_e00000000000008510966_v342431233_event.evt /nfs/farm/g/glast/u28/stage/220805013/r0681413156/r0681413156-e00000000000008510966.evt

From Michael:

The log file is of the findChunks instance that copied the evt files but didn't finish for whatever reason.

It's prudent to check that the script looks reasonable.  I execute it as me, it never was neccessary to become glastraw.  After that remove the run lock and roll back findChunks.

Throttling the Pipeline

If you ever need to limit the amount of work being done on the pipeline (like we wanted to with the LAT restart in April 2018), you can manually create throttle locks to limit the number of simultaneous runs that can be worked on at a time.  Right now the pipeline is set to allow up to 6 runs to be worked on at once.  If you want to limit that, simply create lock files in the /nfs/farm/g/glast/u41/L1/throttle directory of the form 0.lock, 1.lock, ... up to 5.lock.  The contents can be anything you want.  It is just the presence of the file that stops things from running.  Each lock file created will reduce the number of simultaneous runs by one.  Creating all six will stop the pipeline from processing anything.

...