Page History
...
Debug the cube production
The cube job can sometimes fail or get stuck. Unfortunately, the code is still in a state such that a failure won't always result in the code exiting, but instead just hanging.
Here we'll go over specific cases and how to identify them by looking at the log files.
The job seems stuck
Job is done but the report does not show "Cube: Done"
This means the job failed somewhere. Open the log files and look if one of the following cases apply:
Check that the filters do not filter out all the pulses. In the current state, if all shots are being filtered, the job will just hang forever. This can easily spotted by looking for the following section in the log file:
The MPI error handling is not yet in a very good state. If one of the rank fails (often rankCode Block language bash theme RDark did not select any event, quit now! getFilter: Cut 0.500000 < evr/code_94 < 1.500000 passes 0 events of 11252, total passes up to now: 0 getFilter: Cut 24.464119 < scan/diag_x < 24.496000 passes 11252 events of 11252, total passes up to now: 0 getFilter: Cut 0.500000 < damage/jungfrau1M < 1.500000 passes 11252 events of 11252, total passes up to now: 0
), the other rank will still be hanging there, waiting for further tasks. The job will hang forever. In this case, a traceback of the error will be printed towards the end of the log file (it is not always at the very end). It typically can be identified by a line looking like:0
Code Block language bash theme RDark Traceback (most recent call last): ...
Analyze cube results
An example notebook of a cube file analysis can be found in /cds/group/psdm/sw/tools/smalldata_tools/example_notebooks/cube.ipynb
.
Overview
Content Tools