Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • cuda-gdb generates output for kernel launches. slows down the code dramatically? becomes unusable.
    • set flag "set cuda kernel_events 0"
    • submit bug report if not solved
  • how does cuda deal with memory fragmentation?
  • nvvp error: "102 metrics have invalid values due to inconsistencies in the required event values"
  • double complex math: really fp64 instructions?
  • talk to Gernot Ziegler about instruction limited kernels?
  • is our zherk kernel latency limited?
    • multiple of 8 for k (8 rows at time in the loop)
    • may be limited by pieces at the beginning/end (end: scaling by alpha, beta, beginning: load the shared memory) loop over k in the middle
    • kepler: k up to 1000 for top performance
  • cufftplanmany memory leak
  • trigger crash on nan? how do nan's get produced?
    • not possible to trigger a crash on nan
  • why do they use bytes-per-instruction

...