Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Debug the cube production

The cube job can sometimes fail or get stuck. Unfortunately, the code is still in a state such that a failure won't always result in the code exiting, but instead just hanging.

Here we'll go over specific cases and how to identify them by looking at the log files.

The job seems stuck


Job is done but the report does not show "Cube: Done"

This means the job failed somewhere. Open the log files and look if one of the following cases apply:

  1. Check that the filters do not filter out all the pulses. In the current state, if all shots are being filtered, the job will just hang forever. This can easily spotted by looking for the following section in the log file:

    Code Block
    languagebash
    themeRDark
    did not select any event, quit now!
    getFilter: Cut 0.500000 < evr/code_94 < 1.500000 passes 0 events of 11252, total passes up to now: 0 
    getFilter: Cut 24.464119 < scan/diag_x < 24.496000 passes 11252 events of 11252, total passes up to now: 0
    getFilter: Cut 0.500000 < damage/jungfrau1M < 1.500000 passes 11252 events of 11252, total passes up to now: 0
    The MPI error handling is not yet in a very good state. If one of the rank fails (often rank
    0
    ), the other rank will still be hanging there, waiting for further tasks. The job will hang forever. In this case, a traceback of the error will be printed towards the end of the log file (it is not always at the very end). It typically can be identified by a line looking like:
    Code Block
    languagebash
    themeRDark
    Traceback (most recent call last):
    ...

Analyze cube results

An example notebook of a cube file analysis can be found in /cds/group/psdm/sw/tools/smalldata_tools/example_notebooks/cube.ipynb.