Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • we were not 16-byte aligning cuDoubleComplex variables. error showed up "much later" in cuGetVector (error 11) and cudaDeviceSynchronize (error 4). Did binary search to find source of error. How do we program error-checks so that run-time errors show up "immediately"? cuda_safe_call?
    • pattern for error checking: issue different kernels in different streams, then do cudastreamsynchronize and cudagetlasterror
  • if we have 1 number used by many threads should it go into shared memory? constant memory? we would think constant memory would be the right answer. shared memory would give a bank conflict.
  • cuda-gdb generates output for kernel launches. slows down the code dramatically? becomes unusable.
    • set flag "set kernel notification none"
    • submit bug report if not solved
  • understand crash with rpa-gpu-expt running rpa_only_Na_cuda.py with nvprof
    • should file a bug report
  • how does cuda deal with memory fragmentation?
  • nvvp error: "102 metrics have invalid values due to inconsistencies in the required event values"
  • double complex math: really fp64 instructions?
  • talk to Gernot Ziegler about instruction limited kernels?
  • is our zherk kernel latency limited?
  • cufftplanmany memory leak
  • trigger crash on nan? how do nan's get produced?
  • cuda memcheck same as valgrind?
  • get many errors from cublas with race check
    • if really errors: submit bug report
  • what memory access errors can memcheck detect? cudamemcpy? array-out-of-bounds?
    • doesn't detect cudamemcpy errors (or any errors by the host) but does detect array-out-of-bounds accesses within the GPU
3/12/2013
  • look profiling on RPA (lin)
  • ask about error handling at GTC (lin)
  • base.py get_phi_agp kernel
  • rpa manuscript (jun)
  • k-point parallelization (cpo)

...