...
- constants memory
- texture memory
- optimization tricks: pre-fetch etc.
- what does a queued warp do? (does it pre-fetch the memory)
- reducing number of registers in kernel (does compiler typically do this optimally?)
- how to learn with nvvp if we're memory/flops limited
- understanding the nvvp columns
- ask about zher speedup numbers
12/
...
11/2012
- understand nvidia zher speedup plot (including cuda5) (jun/cpo)
- libxc on gpu (lin)
- use CUDA5use common functional file for CPU/GPUuse common work file for CPU/GPUread samuli old talk
- run 3x4x3 pt system
- digest RPA timing measurements (lin)
- multi-alpha zher at a lower prioritythink about moving lambda calc to GPU (jun)
- reduce registers? prefetch?
- explore the parameter space: tile-size try multiple surfaces with jacapo/gpaw-pw (aj)
- paper (jun)
- try calling dacapo density mixing from GPAW (cpo)
- install GPAW on Keeneland (cpo)
- make sure all libxc self-tests runmove suncatgpu01 to CUDA5 (cpo)
- can the alphas for the nt_G really be used for the D's?
12/4/2012
- understand nvidia zher speedup plot (jun/cpo)
- libxc on gpu (lin)
- use CUDA5
- use common functional file for CPU/GPU
- use common work file for CPU/GPU
- read samuli old talk
- run 3x4x3 pt system
- RPA timing measurements (lin)
- multi-alpha zher at a lower priority(jun)
- reduce registers? prefetch?
- explore the parameter space: tile-size
- try multiple surfaces with jacapo/gpaw-pw (aj)
- paper (jun)
- try calling dacapo density mixing from GPAW (cpo)
- install GPAW on Keeneland (cpo)
- make sure all libxc self-tests run
- move suncatgpu01 to CUDA5 (cpo)
...