...
- we were not 16-byte aligning cuDoubleComplex variables. error showed up "much later" in cuGetVector (error 11) and cudaDeviceSynchronize (error 4). Did binary search to find source of error. How do we program error-checks so that run-time errors show up "immediately"? cuda_safe_call?
- if we have 1 number used by many threads should it go into shared memory? constant memory? we would think constant memory would be the right answer. shared memory would give a bank conflict.
- cuda-gdb generates output for kernel launches. slows down the code dramatically? becomes unusable.
- understand crash with rpa-gpu-expt running rpa_only_Na_cuda.py with nvprof
- how does cuda deal with memory fragmentation?
- nvvp error: "102 metrics have invalid values due to inconsistencies in the required event values"
- double complex math: really fp64 instructions?
- talk to Gernot Ziegler about instruction limited kernels?
- is our zherk kernel latency limited?
- cufftplanmany memory leak
- trigger crash on nan? how do nan's get produced?
- cuda valgrind?
3/12/2013
- look profiling on RPA (lin)
- ask about error handling at GTC (lin)
- base.py get_phi_agp kernel
- rpa manuscript (jun)
- k-point parallelization (cpo)
...