Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To-Do List

Nvidia GTC questions
  • what intermittent errors does cuda-memcheck not detect?
    • hardware
    • cudamemcpy
    • others?
  • cuda-gdb generates output for kernel launches. slows down the code dramatically? becomes unusable.
    • set flag "set cuda kernel_events 0"
    • submit bug report if not solved
  • how does cuda deal with memory fragmentation?
  • nvvp error: "102 metrics have invalid values due to inconsistencies in the required event values"
    • trying to match up the counters in time, if not well-synchronized gives the above error. Larger pages.
    • handled differently by nsight (replays previous profiler)
  • double complex math: really fp64 instructions?
    • only double-precision
  • talk to Gernot Ziegler about instruction limited kernels?
  • is our zherk kernel latency limited?
    • multiple of 8 for k (8 rows at time in the loop)
    • may be limited by pieces at the beginning/end (end: scaling by alpha, beta, beginning: load the shared memory) loop over k in the middle
    • kepler: k up to 1000 for top performance
  • cufftplanmany memory leak
  • trigger crash on nan? how do nan's get produced?
    • not possible to trigger a crash on nan
  • why do they use bytes-per-instruction

...