Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here we'll go over specific cases and how to identify them by looking at the log files. In general log

The job seems stuck

  1. Check that the filters do not filter out all the pulses. In the current state, if all shots are being filtered, the job will just hang forever. This can easily spotted by looking for the following section in the log file:


    Code Block
    languagebash
    themeRDark
    did not select any event, quit now!
    getFilter: Cut 0.500000 < evr/code_94 < 1.500000 passes 0 events of 11252, total passes up to now: 0 
    getFilter: Cut 24.464119 < scan/diag_x < 24.496000 passes 11252 events of 11252, total passes up to now: 0
    getFilter: Cut 0.500000 < damage/jungfrau1M < 1.500000 passes 11252 events of 11252, total passes up to now: 0


  2. The MPI error handling is not yet in a very good state. If one of the rank fails (often rank 0), the other rank will still be hanging there, waiting for further tasks. The will this hang forever. In this, a traceback of the error will be available towards the end of the log file (it is not always at the very end). It typically can be identified by a line looking like:

    Code Block
    languagebash
    themeRDark
    Traceback (most recent call last):
    ...


...