Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • single batch host is killing several jobs ('rogue' LSF host)
  • group of batch hosts crashed or went offline
  • afs crashed on one of the host machines
  • scratch disk is full on one or more of the host machines - see what to do here
  • staging disk is full
  • /nfs/farm/g/glast/u52 or /nfs/farm/g/glast/u15 are is full

How to recognize infrastructure failures: they usually affect a large number of jobs, either on the same LSF host or on different LSF hosts.

...