Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • G4 propagator error. If recon dies complaining about g4propagator, we can't fix it. If this happens, email Heather (heather625{at}gmail) and Anders (borgland{at}slac) (and possibly Warren (focke{at}slac) and Maria Elena (monzani{at}slac)). Include a link to the log file, which will tell them where the core file is.
  • Overlapping chunks. If findChunks complains about overlapping chunks, tell Bryson; if trending merges complain, tell Bryson and datamonlist@glast2.Stanford.EDU. Rollback won't help.
  • Deliveries arriving too closely will mangle the display. The underlying processes are fine, but email obsproblist{at}glast2.stanford.edu, Jim (jchiang{at}slac) or Bryson (blee{at}slac) to fix the display.
  • Digitization crashes. Sometimes we get a bad event and have to skip it. Instructions for that are in the rollback section below.

...

What to do in case of permanent failures: contact the appropriate people above, if you are sure you know what happened. Otherwise, page Warren and/or Maria Elena (see L1 shift schedule). If there is another part of the run waiting, the run lock (see below) will have to be removed by hand; page unless you're really sure of what you're doing.

Other failures

This is a comprised list of failures that don't really fit into the other major three onescategories

  • Too few events in the run, or gaps - can lead to too few events in the magic7 file FT2 failure. Try to copy the /nfs/farm/g/glast/u28/stage/XXX/magic7_XXX.txt (where XXX is the delivery number) to /nfs/farm/g/glast/u28/stage/YYY/magic7_YYY.txt (where YYY is the delivery where fakeFT2 failed ... remember to change XXX to YYY on the magic7 file name). Then rollback fakeFT2. If this fails, email Giacomo Vianello Andrea (tramacer{at}slac) for additional information regarding problems with FT2.

...

You will need to then manually re-enter the run into L1Proc. To do this, bkill any findChunk processes that are associated with the RunID, remove the run lock from /nfs/farm/g/glast/u41/rXXX (where XXX is the run number), also move all *chunkList*.txt (leave the .txt.tmp ones alone) files to something else (just suffixing them with ".ignore" should work)

#WBF I think we should have a section on chunkLists. If anyone beats me to writing it, go ahead!

and issue the following command:

...

How To Fix common FASTCopy problems

From Steve Tether: I have a prototype script that corrects a test problem I created in the NIGHTLY database. If you see an ingestion failure for one or more Level zero files of a delivery first take a look at the FASTCopy logs available through the Data Processing web app. (Click on the progress bar for FASTCopy then on the links for those files under Input Products that show a status of INGESTFAIL). If it looks like an NFS glitch, i.e., a message says such-and-such a directory or file does not exist, run the following command in a terminal window that is logged into the SLAC AFS cell and has the ISOC PROD environment set up:

...

The script will check whether the files for that delivery are still on disk. If any are not or if the script reports a failure then you'll have to refer the problem to me or Jim P. Normally the files received for a given day (UTC) are archived and removed from disk at about noon (Pacific time) the next day. If all goes well then the last output from the script will be a listing of the L0 files whose statuses were reset to NEW. After that the L0 file states should go through SUBMITTED and stop at INGESTDONE. If ingestion fails again then Jim or I will have to handle it.

...