Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

cp /nfs/farm/g/glast/u41/L1/runs/<first 3 digits of run number>`echo $rn | cut -c1-3`/r0$rn/r0${rn}_${dn1}_v000_magic7Hp.txt $dn1/magic7_$dn1.txt

cp /nfs/farm/g/glast/u41/L1/runs/<first 3 digits of run number>`echo $rn | cut -c1-3`/r0$rn/r0${rn}_${dn2}_v001_magic7Hp.txt $dn1$dn2/magic7_$dn2.txt

Note that the version number might be larger than v000 or v001, and there might be multiple versions for any given directory if the run had to rolled back, so you should check the run directory to see what the largest version number is for each delivery.

...

8) Then follow the instructions on this page - Instructions for rolling back a halfPipe stream and susbsequent L1 stream - starting at the "Fix the Chunk Lists" section, with the following variations in the "Fix the Failed L1Proc Streams" section.

a) There won't may not be any run locks.  If so, just roll back the stream you want to start first. (if there is a run lock, you don't need to make any changes to the instructions on the linked page)

...

c) Roll back all the other streams.

Note: if you roll back from the findChunks process, all the old messages for the run will be there including the error messages.  If you just Be sure to just rollback from findChunks.  Do not roll back the entire stream , the messages are reset.  Either way works, just be aware of the message dates when looking at logs if you rolled back from findChunksfrom the button at the top of the page.

9) Once the streams are rolled back you need to create a dummy chunktoken as the RePipe process doesn't create them.

touch /nfs/farm/g/glast/u28/stage/chunktokens/r0$rn/pippo

This file and directory also needs need to be removed manually once the last of the deliveries has finished started processing. Otherwise If you don't, the cleanupCompleteRun process won't start. M.E originally said after if started processing but the last run didn't finish properly (checkRun still failed) if it wasn't therewill fail.  If you forget, remove the chunk token and roll back cleanupCompleteRun.

Verify that the cleanupCompleteRun task actually ran for the last delivery.  There have been cases where it hasn't run even when following these instructions.  Waiting on full details from Maria Elena on how to properly launch this job if it doesn't start automatically and what errors to look for.

If the log of checkRun shows hpFinal=False and there are no other errors, you need to set the halfpipe status to "Complete" per the last step of #10 below and rollback the checkRun task.

10) Once all the L1Proc streams have finished run the following command

...

to set the run completion status to "Complete".

Also follow the instruction on the HTF Run completion status page if necssary to set the status in the Runs section to complete as well.

11) Notify the data quality shifter that the run is ready to be reviewed.

 

Some Issues to Watch Out For

Overlapping runs in the same delivery

I've seen this happen exactly once but it caused some problems in the repipe.  If you have two runs that need to be reprocessed, and they both have parts in the same delivery, completely process the first one before starting the second one.  I had repiped both of the runs and copied all the event files over and then the processing of the first run removed the staging directory and the data for the second run was gone and I had to repipe it again.

Issues with Magic7 data

Sometimes, especially with runs that span three deliveries, an arbitrary assignment of event files to each delivery as indicated by step 7 above results in the corresponding magic7 data files not covering the time range covered by the event files.  This causes the doChunk substreams either at the beginning or end of the set to fail depending on where the data gap lies.  The magic7 data typically extends quite a ways before and after the time of the delivery so this usually isn't a problem but it occasionally pops up. There are two solutions.

On is to move event files around so that they are covered by the magic7 data for each delivery.  You could do this in advance by looking at roughly the chunk ranges in the originally processing and moving the appropriate files.  Or if you don't discover it until after the fact, moving things then.  However, in the latter case, many of the event files may have already been moved off the staging disk and that may require another repipe to set everything back up.

The other option is to merge in the magic7 data from the previous or later delivery (depending on where the problem is),  This is what I typically do as it then just requires rolling back the failed doChunk runs.  Simply grab the other magic7 data file from the run directory (the ones copied in step 6) into a temporary file, remove the overlapping data, and then concatenate the  two files in the right order and rename the resultant file properly (i.e. magic7_<download number>.txt) in the proper directory.  Then you can just roll back the failed processes and it will continue on.

 

 

Original e-mail

This is the text of the original email from Maria Elena that started the process of creating this page:


Ok, here's how to create empty halfPipe folders after Re-Piping:

...