Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We have some evidence that running this can fix the problem with teb to drp DrpEbReceiver process (the "383" (counting from zero) or "384" event problem) but maybe have to run eblf_pingpong in the "right direction" ("-S" maybe has to be on the broken node?).  The reason for this number is that libfabric by default has 384 buffers in the "completion queue", and somehow the completion queue is getting stuck.

Ric theorizes that perhaps IB driver is holding onto resources that get flushed out by this?

Run this on two different nodes:

...