...
What to do in case of transient failures: rollback the affected process(es) when possible (see below for the rollback procedure). Look for the dontCleanUp file and check the Log Watcher (see below). If recon segfaults for no apparent reason, email Heather and Anders before the rollback, including a link to the log file, which will tell them where the core file is. For pipeline deadlocks, email Dan and include a link to the process instance.
Transient failures are rare lately. For the last couple of months, most failed processes are automatically retried once. This usually fixes transient issues, so usually when there's a failure it indicates an actual problem.
...