...
- cuda-gdb generates output for kernel launches. slows down the code dramatically? becomes unusable.
- set flag "set cuda kernel_events 0"
- submit bug report if not solved
- how does cuda deal with memory fragmentation?
- nvvp error: "102 metrics have invalid values due to inconsistencies in the required event values"
- double complex math: really fp64 instructions?
- talk to Gernot Ziegler about instruction limited kernels?
- is our zherk kernel latency limited?
- multiple of 8 for k (8 rows at time in the loop)
- may be limited by pieces at the beginning/end (end: scaling by alpha, beta, beginning: load the shared memory) loop over k in the middle
- kepler: k up to 1000 for top performance
- cufftplanmany memory leak
- trigger crash on nan? how do nan's get produced?
- not possible to trigger a crash on nan
- why do they use bytes-per-instruction
...